CytoFA: Automated gating of mass cytometry data via robust skew factor analyzers
Cytometry plays an important role in clinical diagnosis and monitoring of lymphomas, leukaemia, and AIDS. However, analysis of modern-day cytometric data is challenging. Besides its high-throughput nature and high dimensionality, these data typically exhibit complex characteristics such as multimodality, asymmetry, heavy-tailness and other non-normal characteristics. This paper presents cytoFA, a novel data mining approach capable of clustering and performing dimensionality reduction of high-dimensional cytometry data. Our approach is also robust against non-normal features including heterogeneity, skewness, and outliers (dead cells) that are typical in flow and mass cytometry data. Based on a statistical approach with well-studied properties, cytoFA adopts a mixtures of factor analyzers (MFA) to learn latent nonlinear low-dimensional representations of the data and to provide an automatic segmentation of the data into its comprising cell populations. We also introduce a double trimming approach to help identify atypical observations and to reduce computation time. The effectiveness of our approach is demonstrated on two large mass cytometry data, outperforming existing benchmark algorithms. We note that while the approach is motivated by cytometric data analysis, it is applicable and useful for modelling data from other fields.
Lee, S.X. "CytoFA: Automated gating of mass cytometry data via robust skew factor analyzers" Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science (2019): 514-525