scCorr: an R package for A graph-based k-partitioning approach for single-cell analysis

1. Installation

library('devtools')
install_github("CBIIT-CGBB/scCorr")

2. Motivation

One of the challenges in single cell RNA-sequence analysis is abundance of zero values that results in biased estimation of gene-gene correlations for downstream analyses. Here, we present a novel graph-based k-partitioning method by merging “homology” cells to reduce the zero values. The method is robust and reliable for the detection of correlated gene pairs that is fundamental for network construction, gene-gene interaction, and cellular -omic analyses. The associated publication was "A novel graph-based k-partitioning approach improves the detection of gene-gene correlations by single-cell RNA sequencing" on BMC Genomics 2022.

Data analysis workflow

The example R codes are: tsne, k-partitioning, merging clusters, cluster ID renaming, correlation analysis. More examples are at the end of this page and named as R codes (please download data for them) in each section.

3. Data and the zero count distributions

R codes

A total of 21,430 genes have zero values in at least one cell (A) and more than 95% of 15,973 cells show zero values in at least one cell (B).Among a set of 347 genes from KEGG, all genes have zero value in at least one cell (C) and 95% of 15,973 cells contains zero value in at least one gene (D).

E-G shows reductions of zero values in merged cells. The percentage of zero values of 21,430 genes is remarkably reduced in the merged cells. The reduction of zero value is approximately 50% among 50 merged cells (E). Similarly, zero values of 347 genes selected from KEGG are reduced in merged cells (F). The reduction of zero values in merged cells is consistently observed in 6 different number of cell sets (G).

4. Graphical based k-partitioning approach

R codes

H-L present the workflow and features of scCorr method. First, data dimensional reduction and cell classification by tSNE and cell type identification using marker gene approach (H). Secondly, cell partitioning based on tSNE plot by using scCorr with different number of clusters (I: k=100; J: k=1,000). Average number of cells per cluster is shown (K).

5. Cluster tree visualization

R codes and one full example (from clustering to tree plotting R codes)

ScCorr enables to trace evolutional process of each petitioned cluster (L).

6. Correlation method comparison

R codes R codes

Correlated genes are shown in –log10 p values (A) and r value (B). Gene-gene correlations between two methods are in the same direction in some cases (C) while gene-gene correlations are in opposite direction between two methods in other cases (D).

7. The correlation method validation

R codes R codes

E and F show top 10 correlated genes in different number of clusters partitioned by scCorr among CD4 T cells evaluated by –log 10 p value (E) and r value (F). Performances of scCorr for cell type identification of CD4 T cells are shown in G (k=117) and H (k=10). Area Under Curve (AUC) was greater by using scCorr (AUC: 0.97 and 0.96) than using unflustered single cell (AUC=0.55).

8. Distributions of zero value of gene expressions

R codes

Distributions of zero value expressions in four sets of simulated datasets (A) and in the scRNA-seq dataset with 21,430 genes , 15973 cells (B).

9. More t-SNE plot-based k-partitioning cluster examples

R codes

t-SNE plot-based k-partitioning cluster. All cells are clustered as 50, 100, and 1,000 groups (A). The same clusters are shown in dot-plot views (B), where each dot represents a cluster and the size is proportional to the cluster size.

10. Tree-based visualization of cell clusters

R codes R codes

Tree-based visualization of cell clusters by k-partitioning algorithm (A: Ladder clusters N=20-40; B: Circle clusters N=20-40; C: circle clusters, N=100-1,000). The size of each dot size represents a proportional of the cell number in one cluster. A line connects two closest clusters.

11. Correlation by non-clusterd method and by scCorr clustered method

R codes

Correlation of two co-expression gene pairs: MAPK1 pair and DUSP2 pairs by non-clustered Correlation method (A) and by scCorr clustered method (B).

12. Correlation of top 10 co-expressed gene pairs in different number of partitioned clusters

R codes

Correlation of top 10 co-expressed gene pairs from cluster 40 in different number of partitioned clusters: evaluated by p values and correlation coefficient values. In the title, n#, the numbers are the thresholds for the cluster merging. If one cluster cell number less than the threshold, the cluster will be merged into the adjacent cluster.

13. Estimation of Computation Time

R codes

The xy.coordinate is the regions for scaling. For example, xy.coordinate is 50, the scaling region will be from -50 to 50, and so on. We suggest that the xy.coordinates are 300 or 400 for about single cells from 5,000 to 15,000. The xy.coordinates could be the increased if you have more single cells.

14. Examples of the functions of the package

c_list : A graph-based k-partitioning method with scaling

d_list : Merging homological single cells by one coordinate with density method

GCluster : Graphical based clustering

get_value : Converting single cell based matrix to cluster based matrix

m_list : Merging homological single cells by one coordinate by window sizes

merge_list : Merging cluster into adjacent cluster if the merged cluster single cell number less than one cutoff

mgGCLuster : Merging clusters given the merged cluster IDs

scale_v : Scaling function

tj_list : Merging homological single cells by trajectory analysis

tjGCluster : Trajectory analysis function for tj_list

tjGCluster2 : Trajectory analysis function II for tj_list

r_c : Rotating coordinate

Name		Name	Last commit message	Last commit date
Latest commit History 546 Commits
Image		Image
R		R
data		data
examples		examples
man		man
test		test
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scCorr: an R package for A graph-based k-partitioning approach for single-cell analysis

1. Installation

2. Motivation

Data analysis workflow

3. Data and the zero count distributions

4. Graphical based k-partitioning approach

5. Cluster tree visualization

6. Correlation method comparison

7. The correlation method validation

8. Distributions of zero value of gene expressions

9. More t-SNE plot-based k-partitioning cluster examples

10. Tree-based visualization of cell clusters

11. Correlation by non-clusterd method and by scCorr clustered method

12. Correlation of top 10 co-expressed gene pairs in different number of partitioned clusters

13. Estimation of Computation Time

14. Examples of the functions of the package

About

Releases

Packages

Contributors 2

Languages

CBIIT-CGBB/scCorr

Folders and files

Latest commit

History

Repository files navigation

scCorr: an R package for A graph-based k-partitioning approach for single-cell analysis

1. Installation

2. Motivation

Data analysis workflow

3. Data and the zero count distributions

4. Graphical based k-partitioning approach

5. Cluster tree visualization

6. Correlation method comparison

7. The correlation method validation

8. Distributions of zero value of gene expressions

9. More t-SNE plot-based k-partitioning cluster examples

10. Tree-based visualization of cell clusters

11. Correlation by non-clusterd method and by scCorr clustered method

12. Correlation of top 10 co-expressed gene pairs in different number of partitioned clusters

13. Estimation of Computation Time

14. Examples of the functions of the package

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages