Dataset information

This report has been verified by Polly as per framework v1.0 Learn More

Dataset information Value
Dataset ID GSE144735_GPL24676_raw_polly_processed
Abstract Immunotherapy for metastatic colorectal cancer is effective only for mismatch repair-deficient tumors with high microsatellite instability that demonstrate immune infiltration, suggesting that tumor cells can determine their immune microenvironment. To understand this cross-talk, we analyzed the transcriptome of 91,103 unsorted single cells from 23 Korean and 6 Belgian patients. Cancer cells displayed transcriptional features reminiscent of normal differentiation programs, and genetic alterations that apparently fostered immunosuppressive microenvironments directed by regulatory T cells, myofibroblasts and myeloid cells. Intercellular network reconstruction supported the association between cancer cell signatures and specific stromal or immune cell populations. Our collective view of the cellular landscape and intercellular interactions in colorectal cancer provide mechanistic information for the design of efficient immuno-oncology treatment strategies.
Description Single cell 3' RNA sequencing of 6 Belgian colorectal cancer patients
Number of cells 21321
Number of genes 23855
Number of samples 18
Organism Homo Sapiens
Tissue Cecum, Colonic Mucosa, Colon Sigmoideum, Rectum, Colon Ascendens
Disease Adenocarcinoma, Mucinous, Colorectal Neoplasms, Adenocarcinoma
Cell Lines None
Cell Type Epithelial Cell, Stromal Cell, B Cell, T Cell, Myeloid Cell, Mast Cell
Drug None
Marker genes for cell type are available True
Doublet detection method scrublet
Normalization method log1p: true; target_sum: none; scaling_applied: true; max_value: none; zero_center: false
Remove gene groups none
Batch correction method and key batch_removal_method: harmony; batch_key: sample
Regress covariates none
1. Distribution of Key Quality Control Metrics

Figure 1: These violin plots display the distribution of quality control metrics for each cell. Metrics include the number of genes detected, total transcript counts, and the percentage of mitochondrial transcripts.

A good-quality dataset would typically have a reasonable number of genes detected per cell and a moderate total transcript count. High mitochondrial transcript percentages can indicate low-quality, dying cells. Please Note: certain datasets do not have mitochondrial genes (MT-), thus figure for percentage of mitochondrial transcripts may be empty.


2. UMAP visualization of cells colored by sample

Figure 2: Sample level distribution of clustering pattern of cells with the help of UMAP embeddings.

If cells from the same sample cluster together distinctly from cells of other samples, it may indicate the presence of batch effects. Ideally, cells should be mixed and group based on their biological characteristics rather than their originating sample, indicating that the data is free of significant batch effects and the samples are comparable.


3. Stacked barplot of cell types distributed across samples

Figure 3: The bar plot showcases the distribution and abundance of different cell types within each sample. Each color in a bar represents a different cell type with the height of the color segment indicating the count of that cell type in the sample.

A uniform distribution of cell types across samples, may suggest that the sample preparation and preprocessing methods used were effective and there was minimal bias or variation in the processing steps. In some cases, if the experiment design ensures enrichment of a cell-type in a sample, then a non-uniform distribution is also valid.


4. Stacked barplot of clusters distributed across samples

Figure 4: The bar plot showcases the distribution and abundance of different clusters within each sample. Each color in a bar represents a different cluster with the height of the color segment indicating the count of that cluster in the sample.

Generally, a uniform distribution of clusters across samples, suggests there was minimal bias or variation in the processing steps.


5. Stacked barplot of cell-types distributed across clusters

Figure 5: The bar plot showcases the distribution and abundance of different cell types within each cluster. Each color in a bar represents a different cell-type with the height of the color segment indicating the count of that cell-type in the cluster.

Generally, each cluster should have only one cell-type to indicate accurate cell-type annotation. A corner-cases are observed when the authors have only provided cell ID to cell-type mapping and no marker genes. These need to manually rectified.


6. Distribution of (a) Cell Counts (b) Median Gene Counts (c) Median Mitochondrial Genes, across Samples

Figure 6a: The bar plot visualizes the total count of cells detected in each sample. Each bar corresponds to a different sample, with its height representing the number of cells.

This plot provides an understanding of the sample distribution in terms of cellularity. A wide variance in cell numbers across samples might indicate inconsistencies in cell isolation, sample preparation, or sequencing depth. Consistent cell counts across samples, however, would suggest a more uniform sampling process.