Hi Hanyin,
The issue here is not the normalization but the number of genes. Only measuring ~800 genes is not a sufficient sample of the expression data to be able to do any kind of pathway level enrichment test (like GSEA) accurately. GSEA in particular is designed to run on the entire universe of expressed genes. So unfortunately the answer is no.
That said, other than the number of genes, I’m not sure if nanostring data presents an appropriate quantification for ssGSEA. The important thing with ssGSEA is that the magnitude of a feature’s expression is not correlated with the length of the feature. In short read sequencing longer genes produce more fragments, I don’t think that’s the case for nanostring data which is, I believe a more absolute count. So, if you had some way to recover the information for the genes not assayed by nanostring (like an imputation method – unfortunately not something we offer guidance on) I think then it could probably work.
Sorry we couldn’t be of more help here,
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
--
You received this message because you are subscribed to the Google Groups "GenePattern Help Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
genepattern-he...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/genepattern-help/71bc18c9-a2d8-4a99-83f1-98000be17896n%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "GenePattern Help Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/genepattern-help/CUjocwBpWVk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to genepattern-he...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/BYAPR05MB57824B1DA40B042BFDEBE7BDF78C9%40BYAPR05MB5782.namprd05.prod.outlook.com.
Hi Hanyin,
That’s correct. When running ssGSEA, the application performs a subsetting of the gene set such that the genes in the gene set that are not in the sample are not considered in the analysis. It’s a difficult statistical question whether or not the remaining genes are a truly representative sampling of a given gene set that would likely have to be addressed on a case-by-case basis.
Hypothetically, let’s say that in an originally 500 member gene set only 20 were analyzed on your panel (that’s the same percentage of the assessed genome assuming a 20,000 gene genome), is testing 4% of a set really enough to infer altered regulation of a pathway? Maybe, if a large proportion of that 4% is extremely perturbed, but its going to get really tricky really fast.
GSEA is a robust algorithm, but 800 genes a pretty small sampling and it’s going to be difficult to draw conclusions from. Datasets like this are not really an area where we’ve done any suitability testing. There’s also a concern about biases in the genes selected for profiling in the panel.
If you’re going to try it, that publication seems like a reasonable guideline, but I would urge caution when interpreting the results.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/CAPD5WcDS18RX7ikGPYiFdRD-1wZqVRzaVzn4wsptziLJ2fvAJw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/BYAPR05MB5782A556FD48EA3BC27994A2F78B9%40BYAPR05MB5782.namprd05.prod.outlook.com.