Hello,
GSEA can experience a (typically) small run-to-run variance in normalized scores and significance as a result of the random number seed used to generate the permuted matrix that underlies the null distribution.
This is generally only has a small impact on scores when the dataset fits the expectations of the GSEA algorithm. The variance can be exacerbated by issues such as running phenotype permutation with fewer than 7 samples per phenotype or running with a highly restricted dataset where many of the genes that were not differentially expressed were removed. Datasets should be normalized, all expressed genes should be provided to GSEA, and for datasets with fewer than 7 samples per phenotype, gene set permutation should be used instead of phenotype permutation.
Can you tell us a little more about the dataset (number of genes and number of samples) and settings that you used?
With regard to normalization, yes, changing the normalization can have a large impact on the scores that are produced by GSEA. Generally we recommend the normalized counts output from DESEq2 and this is what the pipelines we’ve put together for the GenePattern platform use, however I don’t recall if we did direct comparisons with the vst method.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
gsea-help+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/gsea-help/7402bc55-4454-4544-88f3-166cd20c607en%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB7609118EDCC0DDEE2186093EF7BA9%40SJ0PR05MB7609.namprd05.prod.outlook.com.