SAFE-clustering: Single-cell Aggregated (From Ensemble) Clustering for Single-cell RNA-seq Data
Title: SAFE-clustering: Single-cell Aggregated (From Ensemble) Clustering for Single-cell RNA-seq Data
Speaker: Dr. Yun Li
Department of Genetics, Department of Biostatistics,
University of North Carolina at Chapel Hill
Time: 10:00-11:00am, Tuesday, March 20, 2018
Location: Room 311, Wang Ke-Zhen Building, Peking University
Accurately clustering cell types from a mass of heterogeneous cells is a crucial first step for the analysis of single-cell RNA-seq (scRNA-Seq) data. Although several methods have been recently developed, they utilize different characteristics of data and yield varying results in terms of both the number of clusters and actual cluster assignments. Here, we present SAFE-clustering, Single-cell Aggregated (From Ensemble) clustering, a flexible, accurate and robust method for clustering scRNA-Seq data. SAFE-clustering takes as input, results from multiple clustering methods, to build one consensus solution. SAFE-clustering currently embeds four state-of-the-art methods, SC3, CIDR, Seurat and t-SNE + k-means; and ensembles solutions from these four methods using three hypergraph-based partitioning algorithms. Extensive assessment across 14 datasets with the number of clusters ranging from 3 to 14, and the number of single cells ranging from 49 to 32,695 showcases the advantages of SAFE-clustering in terms of both cluster number (19.0% and 46.0% reduction in absolute deviation to the truth) and cluster assignment (on average 27.3% improvement, and up to 34.5% over the best of the four methods, measured by adjusted rand index). Moreover, SAFE-clustering is computationally efficient to accommodate large datasets, taking < 10 minutes to process 28,733 cells.