GSEA Pre-Ranked

153 views
Skip to first unread message

Giorgia Silvestrini

unread,
Jan 31, 2022, 5:44:14 AM1/31/22
to gsea-help
Hi everyone!!
I just have a question to clear my mind about PreRanked analysis. In the .rnk file to provide as input to GSEA, next to the name of the gene I have to paste the respective values of log2FC taken from the DESeq2 file or the score obtained in this way: SIGN(log2FC)*-log10(p-value adjusted)? Could there be differences in the respective output?
Thanks,
Giorgia

Anthony Castanza

unread,
Jan 31, 2022, 7:09:27 PM1/31/22
to gsea-help
Hi Giorgia,

There would likely be differences between these two ranking methods, however we can't say which is "best" as we haven't tested the options extensively.
The theory behind the later ranking method you've suggested is that it provides a degree of "over weighting" to significantly differentially expressed genes that might improve GSEA's sensitivity and specificity towards gene sets that are being driven by changes in those genes. However, we can't say specifically if these potential benefits result in any real improvements over running with the "standard" log2FC ranking. It's worth noting however, that GSEA in it's non-preranked mode, uses the signal to noise ratio, which scales the difference in group means by the standard deviations of the groups - essentially up-weighting genes with tighter standard deviations, the idea there is kind of similar to the idea behind the up-weighting for significant genes in that more robust differences are given more importance to the calculation.

Sorry I can't give you a concrete answer, but hopefully this helps some.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Giorgia Silvestrini

unread,
Feb 2, 2022, 4:52:52 AM2/2/22
to gsea...@googlegroups.com
Hi Anthony,
So if I understand well, for the PreRanked analysis I can use either a .rnk file with the name of the genes and next to it the log2FC value taken from DESeq2 (first screen) or a .rnk file with the name of the genes and next to it the score calculated as SIGN(log2FC)*-log10(pAdjusted) (second screen) and both methods are correct? See attached images. 
Another small clarification. When I click on "Detail" next to the name of the pathway that I get as output from the analysis, I get a list of X genes and next to some of them appears the words "YES" under the heading "Core enrichment". Does this mean that these are the genes that fit best with the pathway? Do I only need to consider these? 
Giorgia

--
You received this message because you are subscribed to a topic in the Google Groups "gsea-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gsea-help/kSySbJ4Dhow/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/8452f6f4-ea56-4611-8c0b-dd28fc1c75dbn%40googlegroups.com.
Screenshot (182).png
Screenshot (181).png

Giorgia Silvestrini

unread,
Feb 2, 2022, 4:55:52 AM2/2/22
to gsea...@googlegroups.com
Hi Anthony,
So if I understand well, for the PreRanked analysis I can use either a .rnk file with the name of the genes and next to it the log2FC value taken from DESeq2 (screen with the log2fc entry in the second column) or a .rnk file with the name of the genes and next to it the score calculated as SIGN(log2FC)*-log10(pAdjusted) (the other screen) and both methods are correct? See attached images. 
Another small clarification. When I click on "Detail" next to the name of the pathway that I get as output from the analysis, I get a list of X genes and next to some of them appears the words "YES" under the heading "Core enrichment". Does this mean that these are the genes that fit best with the pathway? Do I only need to consider these? 
Giorgia 
Il giorno mar 1 feb 2022 alle ore 01:09 Anthony Castanza <acas...@cloud.ucsd.edu> ha scritto:
--
Screenshot (181).png
Screenshot (182).png

Anthony Castanza

unread,
Feb 2, 2022, 5:36:21 PM2/2/22
to gsea-help
Hi Giorgia,

As I said before, we don't have a stance on which ranking method is superior here. I've had responses from users saying that either has performed better on their dataset. We always advise considering GSEA as a discovery tool for uncovering potential pathway effects and experimentally validating it's findings.

As to the "Core Enrichment" genes these are also called the "Leading Edge" of the gene set, or, the genes that are found in the ranked list before the gene set reaches its peak (this peak, the maximum deviation from zero is the set's Enrichment Score), these core enrichment genes are the genes that are driving the enrichment score. I wouldn't say that you only consider them, but they are the genes that are likely to be responsible for driving that pathway.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAOhj_QN2Y94mOZh%2B7MhEHbzrnUvkGe%3D5dD1cnJ7njN_-TZSTrA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages