Dataset information
This report has been verified by Polly as per framework v1.0 Learn More
| Dataset information | Value |
|---|---|
| Dataset ID | GSE144735_GPL24676_raw_polly_processed |
| Abstract | Immunotherapy for metastatic colorectal cancer is effective only for mismatch repair-deficient tumors with high microsatellite instability that demonstrate immune infiltration, suggesting that tumor cells can determine their immune microenvironment. To understand this cross-talk, we analyzed the transcriptome of 91,103 unsorted single cells from 23 Korean and 6 Belgian patients. Cancer cells displayed transcriptional features reminiscent of normal differentiation programs, and genetic alterations that apparently fostered immunosuppressive microenvironments directed by regulatory T cells, myofibroblasts and myeloid cells. Intercellular network reconstruction supported the association between cancer cell signatures and specific stromal or immune cell populations. Our collective view of the cellular landscape and intercellular interactions in colorectal cancer provide mechanistic information for the design of efficient immuno-oncology treatment strategies. |
| Description | Single cell 3' RNA sequencing of 6 Belgian colorectal cancer patients |
| Number of cells | 21321 |
| Number of genes | 23855 |
| Number of samples | 18 |
| Organism | Homo Sapiens |
| Tissue | Cecum, Colonic Mucosa, Colon Sigmoideum, Rectum, Colon Ascendens |
| Disease | Adenocarcinoma, Mucinous, Colorectal Neoplasms, Adenocarcinoma |
| Cell Lines | None |
| Cell Type | Epithelial Cell, Stromal Cell, B Cell, T Cell, Myeloid Cell, Mast Cell |
| Drug | None |
| Marker genes for cell type are available | True |
| Doublet detection method | scrublet |
| Normalization method | log1p: true; target_sum: none; scaling_applied: true; max_value: none; zero_center: false |
| Remove gene groups | none |
| Batch correction method and key | batch_removal_method: harmony; batch_key: sample |
| Regress covariates | none |
1. Metadata information
| Metadata information | Value |
|---|---|
| Polly curated metadata fields are present at dataset level ℹ | Pass |
| Polly curated metadata fields are present at sample level ℹ | Pass |
| Polly curated metadata fields are present in output file ℹ | Pass |
| Custom fields are present in output file ℹ | Pass |
| Publication Link is provided ℹ | Pass |
| Publication Link is valid ℹ | Pass |
| Dataset-Level vs. Sample-Level Metadata: concordance check ℹ | Pass | Accuracy of raw counts availability tag ℹ | Pass |
2. Data Matrix
| Data Matrix | Value |
|---|---|
| Unique Cell Barcodes ℹ | Pass |
| Unique Gene Identifiers ℹ | Pass |
| Embeddings are available ℹ | Pass |
| Gene Identifier Format ℹ | Pass |
| Raw counts are available in output file ℹ | Pass |
| Raw vs Processed Counts are different ℹ | Pass |
| Valid Raw Counts ℹ | Pass |
| Concordance of number of cells in raw and processed counts matrices in output file ℹ | Pass |
| Valid Columns ℹ | Pass |
| Highly Variable Genes is available ℹ | Pass |
| Valid Processed Counts ℹ | Pass |
| UMAP/tSNE Projections are available ℹ | Both present |
| QC Metrics are available ℹ | Pass |
| Reproducibility of Gene Counts ℹ | Pass |
| Reproducibility of UMI Counts ℹ | Pass |
| Cluster information is available ℹ | Pass |
| Number of Clusters ℹ | 16 |
| Minimum genes per cell threshold ℹ | 500 |
| Minimum cells per gene threshold ℹ | 2 |
3. Cell Clusters in umap Embeddings Colored by Samples: Re-Processed and Polly Datasets
Figure 1a: Sample level distribution of clustering pattern of cells with the
help of
umap embeddings on the existing on polly data.
Figure 1b: Sample level distribution of clustering pattern of cells with the
help of umap embeddings on the re - processed data to validate reproducibility of results.
Figure 1a: Sample level distribution of clustering pattern of cells with the help of umap embeddings on the existing on polly data.
Figure 1b: Sample level distribution of clustering pattern of cells with the help of umap embeddings on the re - processed data to validate reproducibility of results.
| The plot visualizes the distribution of samples across various clusters. For both Polly and
reprocessed dataset, these should appear very similar.
Additionally the plot for Polly datasets can be used to understand if there is any batch-effect.
Sample Clustering: If samples are grouped in a diverse manner, where cells from the same sample are not closely clustered together, this suggests no batch effects on samples. Batch Effect Evidence: If the opposite is true, with cells from the same sample clustering together, there might be evidence of batch effects on samples. Biological Variation Check: It's essential to ensure that any batch effects observed are not due to inherent biological differences between samples. Distribution Visualization: The plot also illustrates how samples are spread across different clusters, providing insights into their distribution. Limitation of Reprocessed dataset: Note that using the UMAP/tSNE plot for reprocessed dataset may not be a valid approach to assess batch effects on samples, particularly when dealing with re-processed data primarily focused on reproducibility checks. |
4. Cell Clusters in umap Embeddings Colored by 'Author Cell Types': Comparison Between Polly and Re-Processed Datasets
Figure 2a: Author cell type level distribution of clustering pattern of cells
with the help of umap embeddings on the existing on polly data.
Figure 2b: Author cell type level distribution of clustering pattern of cells
with the help of umap embeddings on the re - processed data to validate reproducibility of
results.
Figure 2a: Author cell type level distribution of clustering pattern of cells with the help of umap embeddings on the existing on polly data.
Figure 2b: Author cell type level distribution of clustering pattern of cells with the help of umap embeddings on the re - processed data to validate reproducibility of results.
|
Cell Type Distribution (author-defined): The plot visualizes the distribution of
author-defined cell types across various clusters. As a quality check, for both Polly and reprocessed
dataset, these should appear very similar.
Cell Type Similarity: UMAP plot also reveals the degree of similarity between different cell types. If cell types A and B are closely clustered, their gene expression patterns are similar, indicating biological similarities between these cell types. |
5. Cell Clusters in umap Embeddings Colored by 'Curated Cell Types': Comparison Between Polly Dataset and Re-Processed Data
Figure 5a: Curated cell type level distribution of clustering pattern of cells
with the help of umap embeddings on the existing on polly data.
Figure 5b: Curated cell type level distribution of clustering pattern of cells
with the help of umap embeddings on the re - processed data to validate reproducibility of
results.
Figure 5a: Curated cell type level distribution of clustering pattern of cells with the help of umap embeddings on the existing on polly data.
Figure 5b: Curated cell type level distribution of clustering pattern of cells with the help of umap embeddings on the re - processed data to validate reproducibility of results.
|
Cell Type Distribution by Elucidata (Curation Experts): The plot visualizes how
curated cell types are distributed among different clusters. As a quality check, For both Polly and
reprocessed dataset, these should appear very similar.
Cell Type Relationships: It shows the proximity of different cell types within the clusters. If cell types A and B cluster closely, it suggests similar gene expression patterns between them, indicating biological similarities between these cell types. |
6. Violin plot visualization for doublet
Figure 5: Sanity check of detected doublets
| To assess the validity of doublet predictions, we plot the distribution of detected genes in predicted doublets v/s singlets per sample (number of genes per count are expected to be typically higher in heterotypic doublets). If doublets are removed the plot only shows the distribution of genes per count in singlets. |
7. Cell Type Frequency Distribution
| Cell type (reported in publication) | Cell type (Polly curated) | Number of cells | |
|---|---|---|---|
| 0 | ["B cells"] | ["B cell"] | 4232 |
| 1 | ["Epithelial cells"] | ["epithelial cell"] | 3062 |
| 2 | ["Mast cells"] | ["mast cell"] | 195 |
| 3 | ["Myeloids"] | ["myeloid cell"] | 2311 |
| 4 | ["Stromal cells"] | ["stromal cell"] | 6160 |
| 5 | ["T cells"] | ["T cell"] | 5361 |
Table 2: Table displaying author cell types, curated cell types and the number of cells for each cell-type
| Authors frequently supply cell types that may not adhere to ontological standards or utilize abbreviations and marker gene names. These are substituted with ontological terms. The table offers insight into the degree of alignment between the ontological terms and the terms provided by the authors. |
1. Expression of Marker Genes Across Cell Types
Figure 1: The dot plot showcases the expression levels (often represented by dot size) and prevalence (often represented by dot color intensity) of specific marker genes across different cell types.
| Marker genes that are predominantly expressed in specific cell types validate the identified cell populations and help in characterizing and annotating them. |
2. Expression of Marker Genes Across Clusters
Figure 2: The dot plot showcases the expression levels (often represented by dot size) and prevalence (often represented by dot color intensity) of specific marker genes across different clusters.
| This visualization aids in understanding the heterogeneity within the dataset and can hint at different cellular states or subtypes within a cell type. |