Hi Irina,
All of GSEA’s calculations are empirical, we calculate a null distributions from the input data and the input gene sets and base all calculations off of that. The FDR is calculated across all permutations of all gene sets tested. So this number is going to change substantially if its looking for a false discovery rate amongst just its null distribution permutations, vs. false discovery against a bunch of other gene sets and their permutations. We generally recommend running GSEA using the lowest level applicable sub/collection, and using the values from that analysis rather than running individual gene sets.
We’re actually planning to disable reporting of the FDR when single gene sets are run as it doesn’t provide any added value over the more directly applicable NOM pValue calculation.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
gsea-help+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/gsea-help/110e6b3f-eb18-4859-af47-a6f9d7dd9812n%40googlegroups.com.
Hi Irina,
You can test single sets, and this is common for cases such as yours where they’re from papers, you just should disregard the FDR value as it isn’t meaningful in single-set mode. When running in multi-set mode the FDR is a global calculation so you can run it with however many other sets you want you just need to consider that the value given is calculated vs the learned null distribution those gene sets produce. If you’re looking to calculate a better “global” FDR, you could include your set with MSigDB’s C2:CP, that’s our collection of curated gene sets from publications, or, something like Reactome’s gene sets would give very good coverage of possible pathways to build a robust null distribution. However, MSigDB’s versions would only work for you if your data is Human, otherwise you’d also need to orthology convert the gene set you’ve found.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/5cd75caa-2a74-4e9c-8ef1-d8893f2384ffn%40googlegroups.com.
Yes, each p-value is calculated for a single set relative to its permutation distribution.
-Anthony
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/dfb358f3-18e1-4881-811c-f515f08153dan%40googlegroups.com.