ssGSEA Nanostring input expression data

290 views
Skip to first unread message

Hanyin Wang

unread,
Feb 11, 2021, 6:48:34 AM2/11/21
to GenePattern Help Forum
Hello,

I would like to use ssGSEA for my Nanostring project (800 targeted genes, 13 samples), and am wondering what would be the appropriate input expression data.

I am currently using the DESeq2-based workflow as suggested in this article, and am wondering if I could use normalized counts generated by DESeq2 as input data? I understand there has been discussion on appropriate input data for RNAseq, but I figured Nanostring would be different as it uses housekeeping genes for normalization, and the concept of TPM or FPKM does not apply to Nanostring per my understanding.

Many thanks.
Hanyin

Anthony Castanza

unread,
Feb 11, 2021, 12:55:16 PM2/11/21
to genepatt...@googlegroups.com

Hi Hanyin,

 

The issue here is not the normalization but the number of genes. Only measuring ~800 genes is not a sufficient sample of the expression data to be able to do any kind of pathway level enrichment test (like GSEA) accurately. GSEA in particular is designed to run on the entire universe of expressed genes. So unfortunately the answer is no.

 

That said, other than the number of genes, I’m not sure if nanostring data presents an appropriate quantification for ssGSEA. The important thing with ssGSEA is that the magnitude of a feature’s expression is not correlated with the length of the feature. In short read sequencing longer genes produce more fragments, I don’t think that’s the case for nanostring data which is, I believe a more absolute count. So, if you had some way to recover the information for the genes not assayed by nanostring (like an imputation method – unfortunately not something we offer guidance on) I think then it could probably work.

 

Sorry we couldn’t be of more help here,

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

--
You received this message because you are subscribed to the Google Groups "GenePattern Help Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genepattern-he...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/71bc18c9-a2d8-4a99-83f1-98000be17896n%40googlegroups.com.

Hanyin Wang

unread,
Feb 11, 2021, 8:59:30 PM2/11/21
to genepatt...@googlegroups.com
Hey Anthony,

Thanks for the kind reply. This is tremendously helpful.

I previously had discussion with Nanostring's bioinformatician about the best method to perform pathway analysis (if any). Bottom line is that any approach for pathway analysis for Nanostring needs to take into account that only a targeted gene panel was tested. Therefore GSEA (likely you sharply pointed out) or other hypergeometric-based tests (like g:Profiler) that are based on the entire universe of expressed genes are not appropriate for Nanostring.

Per my understanding, ssGSEA will only calculate enrichment score if a gene is within a gene set, therefore effectively screening out those gene sets not including any targeted genes. When using the GSVA package in R to calculate ssGSEA, you can select gene sets containing minimal numbers of genes to start. Nanostring's own nSolver software used a similar approach to calculate pathway scores. I was debating between GSVA vs ssGSEA, and after discussion with the author of GSVA it appears ssGSEA has better performance with a smaller sample size. There have been good publications analyzing Nanostring using ssGSEA.

Hope the above thoughts are reasonable. Thanks for guiding me on appropriate input data here.

Many thanks.
Hanyin


You received this message because you are subscribed to a topic in the Google Groups "GenePattern Help Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/genepattern-help/CUjocwBpWVk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to genepattern-he...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/BYAPR05MB57824B1DA40B042BFDEBE7BDF78C9%40BYAPR05MB5782.namprd05.prod.outlook.com.

Anthony Castanza

unread,
Feb 11, 2021, 10:25:45 PM2/11/21
to genepatt...@googlegroups.com

Hi Hanyin,

 

That’s correct. When running ssGSEA, the application performs a subsetting of the gene set such that the genes in the gene set that are not in the sample are not considered in the analysis. It’s a difficult statistical question whether or not the remaining genes are a truly representative sampling of a given gene set that would likely have to be addressed on a case-by-case basis.

 

Hypothetically, let’s say that in an originally 500 member gene set only 20 were analyzed on your panel (that’s the same percentage of the assessed genome assuming a 20,000 gene genome), is testing 4% of a set really enough to infer altered regulation of a pathway? Maybe, if a large proportion of that 4% is extremely perturbed, but its going to get really tricky really fast.

 

GSEA is a robust algorithm, but 800 genes a pretty small sampling and it’s going to be difficult to draw conclusions from. Datasets like this are not really an area where we’ve done any suitability testing. There’s also a concern about biases in the genes selected for profiling in the panel.

If you’re going to try it, that publication seems like a reasonable guideline, but I would urge caution when interpreting the results.

Hanyin Wang

unread,
Feb 12, 2021, 12:12:10 AM2/12/21
to genepatt...@googlegroups.com
Hey Anthony,

Thanks for the kind reply again. I hear your points. Very valid.

In my project as the 800 genes in the targeted panel are selected immune-related genes, I am using MSigDB C7 immunologic signature which has 4800 gene sets. After setting a minimal 3 overlapping genes (from my input data) in each gene set, there are still 3800 genes available for analysis. So hopefully this would be less of a concern, but your points are very well taken. I will make sure to fully disclose the limitation in the publication.

Many thanks.
Hanyin

Reply all
Reply to author
Forward
0 new messages