GSEA rank file

1,065 views
Skip to first unread message

sarah awan

unread,
Aug 22, 2022, 12:23:10 AM8/22/22
to gsea-help
Hi team,

I want to know that which method is correct for making the rnk file and then analyse it in pre-ranked gsea.  Usually, I prefer to make rnk by using the formula (=SIGN(logFC)- (Pvalue)) but I have also found that some people rank their genes based on thresholds such as log fold change. When I performed by using the rnk formula and threshold, the result arrangement is different. Now I am confused about which method I have to use as I want to select one of the pathways for my downstream analysis.

Can you please help me out?

Thanks,
Sarah

Anthony Castanza

unread,
Aug 22, 2022, 3:48:55 PM8/22/22
to gsea...@googlegroups.com

Hi Sarah,

 

The default metric that GSEA uses when you provide it a complete dataset to rank is the signal to noise ratio; the difference of group means divided by the sum of the group standard deviations. This metric is effectively the change scaled by the coherence of the change for each gene. When preranking your data we don’t really give a specific recommendation for how to do so as we can’t reasonably test.

 

Signing the -log pValue with the direction of the fold change will give you an enrichment result that is computed with respect to how confident you are in the changes for the genes in the set (i.e. sets with higher confidence compounded changes will have larger enrichment scores), computing on the basis of log fold change will give you an enrichment result that is computed with respect to magnitude of change (i.e. sets made up of genes with larger expression shifts will have larger enrichment scores). Ideally what you would want is a metric where sets with genes having larger expression changes are more enriched, but you would want to down-weight the sets where the change is a result of a low-confidence effect.

 

Either way, you wouldn’t want to apply a threshold, you would want to include genes both with non-significant p-values and genes that aren’t above some log2FC threshold, you would just want them to contribute minimally to the score.

 

Some people have suggested using a metric like log2(FC)*-log10(pValue), to get that “best of both worlds” metric when it isn’t possible to use GSEA’s internal signal-to-noise ratio ranking, but this isn’t something we’ve rigorously tested.

 

Does that make sense?

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/eecf5ce7-f7ec-4538-982e-2f736b0beae6n%40googlegroups.com.

sarah awan

unread,
Aug 22, 2022, 7:15:32 PM8/22/22
to gsea...@googlegroups.com
Hi Anthony,

Thanks for your clarification. But if you want me to tell in the simple way which method is more acceptable for the analysis. Can I use rank file with logfold change or with the formula (logfoldchange-pvalue).
As I am confused and couldn’t choose the pathway for my next analysis. I have discussed with so many people but all have different opinions if you just specify which method is most reliable then in future I will stick with that method for my next analysis step.

Thanks,
Sarah

You received this message because you are subscribed to a topic in the Google Groups "gsea-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gsea-help/p8aKtdzbx1M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB7609A6ED6AA7E25EA8ADEAD4F7719%40SJ0PR05MB7609.namprd05.prod.outlook.com.

sarah awan

unread,
Aug 23, 2022, 3:00:37 AM8/23/22
to gsea...@googlegroups.com
Hi Anthony,

I also want to add one more thing that I am using RNA-seq data with “ no_collapse”.  Also, I am unsing GSEApreranked method. So in this case which method will be fine in making of the appropriate rank file. Should I go for logfoldchnage or “logfold-pvalue” . Kindly give me your feedback.

Thanks,
Sarah

Anthony Castanza

unread,
Aug 23, 2022, 5:32:17 PM8/23/22
to gsea...@googlegroups.com
Hi Sarah,

We recommend always using the collapse method with one of our provided chip files to ensure that the symbols used in the dataset match the symbols used in MSigDB (these can change across time and MSigDB's files are designed to ensure compatible symbols so no genes get omitted when they shouldn't be).

As to ranking, we don't give specific recommendations for how to rank genes for GSEA Preranked, it is generally considered an advanced mode and we direct users towards the built-in ranking implementation that has been rigorously tested with GSEA.

Of the two however, ranking by Log2(FC), or ranking by signed significance, I would generally say that Log2(FC) is closer to the intent of GSEA algorithm. However, without adjusting for coherence, as in the signal to noise metric, Log2FC can be rather noisy with some arbitrarily large changes as a result of outliers. But Log2FC should not be used as a threshold (i.e. no cutoffs should be applied on its basis) just a ranking metric. The same for signed significance (all genes should be provided not just those with, for example. p<0.05).

The best thing to do might actually be to perform GSEA both ways (log2FC and signed pvalue), then load both sets of results into EnrichmentMap (https://apps.cytoscape.org/apps/enrichmentmap), and identify sets selected as significant by both metrics. EnrichmentMap supports a mode where two datasets from GSEA can be loaded in simultaneously and co-displayed. That would probably be the most rigorous way to do it.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

sarah awan

unread,
Aug 24, 2022, 10:28:56 PM8/24/22
to gsea...@googlegroups.com
Thanks Anthony for your guidance.

Cheers,
Sarah

Reply all
Reply to author
Forward
0 new messages