ssGSEA for one sample

Abdullah Naveed

unread,

Nov 11, 2021, 12:10:10 PM11/11/21

to gsea-help

Dear GSEA Team,
I had a question and was hoping to get some help.

I have gene sets from 2 clusters of Fibroblasts that were identified from single cell RNA sequencing. We then tried sorting these 2 fibroblast populations in vitro and sent them for bulk RNA sequencing. We need to identify whether our sorting technique works so I want to use the gene signature of each cluster and check the enrichment score in our Bulk RNA samples individually.

I know I need to use ssGSEA for this but it also requires 2 samples. We are not sure which cluster our bulk RNA aligns with so how can I go about answering this question ?

Would appreciate any advise or a link to a tutorial if this question has already been answered.

Thank you!

Anthony Castanza

unread,

Nov 11, 2021, 1:37:05 PM11/11/21

to gsea...@googlegroups.com

Hello,

For this, I would probably take the gene level TPM values for each of the bulk sequencing samples and format them into a GCT file. If you have multiple sorting runs resulting in multiple bulk samples you would add each one as it's own column to the GCT file.

I would take your signatures from the single cell clusters and format them into a GMX or GMT file.

You would then run ssGSEA on the GCT file using the signatures in the GMT/X file.

This would give you a score for each set for each sample (i.e. a Cluster 1 and a Cluster 2 score for each bulk sample)

The expectation would be that the Cluster 1 sorted cells score highly for the Cluster 1 signature and not the cluster 2 signature and vice versa.

If you have replicates you could then do something like a Wilcox test on the scores to get a pValue for each signature for the sorting.

The specifications for the file types I mentioned above are here: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats

Let me know if you have any other questions

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/24f1dd82-8e0d-4c65-b0bf-b31993883b25n%40googlegroups.com.

Abdullah Naveed

unread,

Nov 11, 2021, 1:46:28 PM11/11/21

to gsea-help

Appreciate the help. I will let you know if I have more questions. Thank you so much

Abdullah Naveed

unread,

Nov 16, 2021, 12:00:19 AM11/16/21

to gsea-help

Hey,
Thank you for your help. I was able to run ssGSEA on my samples. I have 3 questions and would really appreciate any help

How do I interpret these values. I know they are enrichment scores but is there a further guide on this ? Also I kept the normalization method to none in gene pattern when asked. I assume this was the right thing to do since I already had TPM values in my GCT file
How do I check if one sample was statistically enriched while others weren't. I know you mentioned the Wilcox test but do you know a guide I could use to run it ? I have 20 samples and 5 different gene markers to test.
Is there a graphical way to show these results like we can for GSEA ? I want to basically show these results to my lab. What would be a good way to do this ?

Thank you again for your help. I know these are basic questions but I am new to all this. Appreciate it a lot

On Thursday, November 11, 2021 at 12:37:05 PM UTC-6 Anthony Castanza wrote:

Anthony Castanza

unread,

Nov 16, 2021, 4:28:20 PM11/16/21

to gsea-help

Hello,

Yes, leaving normalization to none was the correct option for TPM data.

The best option we have for any kind of statistical analysis of ssGSEA results is probably the ssGSEA_ROC module, which takes an ssGSEA result as well as a binary phenotype assignments and computes ROC metrics (i.e. AUC, and Matthews correlation coefficients) as well as both permutation based (if there are enough samples) and Wilcox pValues for each gene set for a given binary comparison.

This ssGSEA_ROC module is available through GenePattern. We developed the technique for this publication: https://pubmed.ncbi.nlm.nih.gov/33428749/ for determining if certain gene sets were a good classifier for Glioblastoma subtypes.I think it might be a good fit here. Otherwise, there are options in GenePattern for constructing hierarchically clustered heatmaps using ssGSEA scores (the HierarchicalClustering and HierarchicalClusteringViewer modules, but those don't do any statistics, they'd just allow you to show how your samples cluster on the basis of their enrichment scores).

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Reply all

Reply to author

Forward