Leading edge analysis after preranked GSEA

1,422 views
Skip to first unread message

Johann Shane Tian

unread,
May 6, 2019, 2:18:54 AM5/6/19
to gsea...@googlegroups.com
Hi all,

I have a question about the said topic. I ran a preranked GSEA (for a colleague) and saw results that had FDR > 0.25. According to her, there is a way to conduct further analysis, a leading edge analysis, to check if the hits with FDR > 0.25 are still considered reliable. Her qRT-PCR results were consistent with her hypothesis but not the preranked GSEA. Therefore, she was thinking of "salvaging" the information, to include her GSEA results one way or the other.

As such, can someone guide me on (1) what the leading edge analysis is roughly about, (2) how should I conduct it after the preranked GSEA (because I vaguely remember I can only do that for GSEA not preranked), and (3) if leading edge analysis is not the way to go, then is there any other way to justify the FDR > 0.25 inclusion?

Thank you.

Regards,
Johann Shane Tian

Virus-free. www.avg.com

Anthony Castanza

unread,
May 6, 2019, 10:36:22 AM5/6/19
to gsea-help
Hello Johann,

Leading edge analysis is primarily concerned with examining the overlap between the "leading edge" subsets of each enriched gene set to determine how strong the overlap in the composition of the enrichment is, and which genes might be overrepresented in contributing to multiple set enrichment. The choice of GSEA preranked should not (I believe) affect your ability to run a leading edge analysis in the GSEA application. 

That said, in general, GSEA-preranked results are less reliable than standard GSEA results, this stems from the need to use gene-set based permutation testing rather than phenotype based permutation testing for the generation of the null distribution and multiple hypothesis testing. As such, we recommend the use of the FDR<0.25 threshold with standard GSEA's phenotype permutation mode only, and a more typical <0.05 threshold when running in other modes. This, along with more explanation of the leading edge analysis, is described in our user-guide: http://software.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html (see: 'Interpreting GSEA Results")

If possible, assuming a sufficient number of samples, I would recommend that your colleague re-run her GSEA using the underlying gene expression matrix in standard GSEA mode with phenotype-based permutation testing. If there are questions about how to how about this, we'd be happy to assist.

-Anthony

Anthony S. Castanza, PhD
Curator, MSigDB
Mesirov Lab, Department of Medicine
University of California, San Diego

Johann Shane Tian

unread,
May 7, 2019, 9:26:26 AM5/7/19
to gsea...@googlegroups.com
Hi Dr Anthony,

Thank you for the mail. So a leading edge analysis is an another independent test to show overlapping enriched genes across "selected" hits? So what I mean is if I see some hits that have a high FDR, can still have several enriched genes that overlap between them? In such a case, is it possible to say/state that these enriched genes are important for these geneset hits?

I did a preranked mode because I am not familiar with the standard GSEA method. If I am not mistaken, the standard GSEA requires a phenotype input, which should come from microarray data? My colleague have no data that might incline towards standard analysis (I believe).

I also noticed that the pre-ranked analysis sets FDR values to < 0.25 by default; did not remember reading pre-ranked analysis to use < 0.05 though. If it is a recommendation to use < 0.05 for pre-ranked analysis, is it still possible to give confidence to hits that show > 0.05?

Thank you for your guidance.

Regards,
Johann

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/51db3ee4-3394-48a9-907f-11602555a7d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anthony Castanza

unread,
May 13, 2019, 11:04:30 AM5/13/19
to gsea-help
Hi Johann, 

Apologies for the delay getting back to you.
Yes, genes can overlap between sets with significant and non-significant FDRs, in fact, this overlap can help explain why a set may have a significant P-value but not a significant FDR.

The standard GSEA method requires a phenotype input of some type, (e.g. Knockout vs. Control). This can be manually generated for, for example, RNA-seq data according to the .cls file format documentation: http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#Phenotype_Data_Formats

I don't know what type of data you're working with, or how many samples you have in each group so I can't really say if this is appropriate for you, but generally the results from standard mode are superior to preranked if you have enough data to run phenotype permutation testing rather than gene set permutation testing.

There is nothing inherently wrong with looking at potential hits that have a significant P-value but a FDR between 0.05 and 0.25, but remember that these "hits" are likely better explained by other gene sets in the enrichment and confidence in their specific relevance is low. Remember that GSEA, like any kind of pathway or gene set analysis, should be considered a hypothesis generating tool and that biological relevance of the hits should be validated with independent lines of evidence.

-Anthony

Anthony S. Castanza, PhD
Curator, MSigDB
Mesirov Lab, Department of Medicine
University of California, San Diego

To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages