GSEA with multiple vs. single gene sets: Different FDR values

Irina Heggli

unread,

May 3, 2021, 7:42:02 AM5/3/21

to gsea-help

Hi all,

I performed GSEA with my dataset and I realized that the FDR varies quite a lot if I test one single gene set only or if I test it together with multiple other gene sets. Which FDR value should I trust more?

Thank you very much for your help.

Irina

Anthony Castanza

unread,

May 3, 2021, 1:15:14 PM5/3/21

to gsea...@googlegroups.com

Hi Irina,

All of GSEA’s calculations are empirical, we calculate a null distributions from the input data and the input gene sets and base all calculations off of that. The FDR is calculated across all permutations of all gene sets tested. So this number is going to change substantially if its looking for a false discovery rate amongst just its null distribution permutations, vs. false discovery against a bunch of other gene sets and their permutations. We generally recommend running GSEA using the lowest level applicable sub/collection, and using the values from that analysis rather than running individual gene sets.

We’re actually planning to disable reporting of the FDR when single gene sets are run as it doesn’t provide any added value over the more directly applicable NOM pValue calculation.

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/110e6b3f-eb18-4859-af47-a6f9d7dd9812n%40googlegroups.com.

Irina Heggli

unread,

May 4, 2021, 6:24:18 AM5/4/21

to gsea-help

Dear Anthony,

Thank you very much for your answer. What would be a good number of gene set collection you would recommend? You said the lowest level applicable sub/collection. However, I found an interesting gene set in a paper, so with how many gene sets basically would i need to combine this gene set?

Thank you.

Best,

Irina

Anthony Castanza

unread,

May 4, 2021, 1:00:29 PM5/4/21

to gsea...@googlegroups.com

Hi Irina,

You can test single sets, and this is common for cases such as yours where they’re from papers, you just should disregard the FDR value as it isn’t meaningful in single-set mode. When running in multi-set mode the FDR is a global calculation so you can run it with however many other sets you want you just need to consider that the value given is calculated vs the learned null distribution those gene sets produce. If you’re looking to calculate a better “global” FDR, you could include your set with MSigDB’s C2:CP, that’s our collection of curated gene sets from publications, or, something like Reactome’s gene sets would give very good coverage of possible pathways to build a robust null distribution. However, MSigDB’s versions would only work for you if your data is Human, otherwise you’d also need to orthology convert the gene set you’ve found.

To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/5cd75caa-2a74-4e9c-8ef1-d8893f2384ffn%40googlegroups.com.

Irina Heggli

unread,

May 4, 2021, 3:03:37 PM5/4/21

to gsea-help

Hi Anthony,

Perfect, thank you. And one last question: The p-value however, can be regarded when running a single gene set?

Best,

Irina

Anthony Castanza

unread,

May 4, 2021, 3:04:47 PM5/4/21

to gsea...@googlegroups.com

Yes, each p-value is calculated for a single set relative to its permutation distribution.

-Anthony

To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/dfb358f3-18e1-4881-811c-f515f08153dan%40googlegroups.com.

Reply all

Reply to author

Forward