GSEA-Preranked or Standard GSEA?

Mahtab Dastpak

unread,

Mar 25, 2021, 2:12:31 PM3/25/21

to gsea-help

Hello,

I have an issue with using GSEA-Preranked or standard GSEA (using normalized count files) for RNA-seq data.

We used both techniques and the results are different. So, it was really hard to conclude which method explained correctly. Would you please help me with a better conclusion?

Thanks,

Mahtab

Anthony Castanza

unread,

Mar 25, 2021, 2:27:23 PM3/25/21

to gsea...@googlegroups.com

Standard GSEA and GSEA preranked give different results because the genes are ranked differently, and they (by default) use different permutation methods. How did you rank your genes for GSEA Preranked? If you used Log2(FC) then to compare results you can use log2_ratio_of_classes in standard GSEA and the rankings should be generally similar but we recommend using the default signal to noise ratio if you have more than three samples. Signal to noise ratio includes information about both the magnitude of change and the standard deviation of the sample groups which gives an improved result over log2(FC) in our hands.

GSEA Preranked, because it doesn’t have access to the sample level information has to run in gene_set permutation mode for pValue and FDR calculation. Standard GSEA runs in phenotype permutation mode by default but if you have fewer than 7 samples per group we recommend changing this to gene_set permutation mode as well because it is not possible to generate 1000 distinct permutations from smaller experiments.

If you can run GSEA in standard mode, you should, if you have enough samples to run phenotype permutation you should. If you can’t run phenotype permutation, you should run standard GSEA with gene set permutation. If you don’t have enough samples to run signal2noise ratio, then its best to use Preranked with your own ranking metric. Log2(FC) isn’t really ideal, sign(log2(fc))*-log10(pValue) seems to work better in user’s hands, or if you’re using DESeq2 you could try the Wald Statistic.

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/2843466e-d7b7-4b2b-bbe1-07cf2c713cc2n%40googlegroups.com.

Mahtab Dastpak

unread,

Mar 25, 2021, 3:29:55 PM3/25/21

to gsea...@googlegroups.com

Thanks very much.

It is really helpful.

Best,

Mahtab

To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/BYAPR05MB57828C31A2E2B536337BCDDFF7629%40BYAPR05MB5782.namprd05.prod.outlook.com.

--

Ph.D. in Cell & Molecular Biology,
Stem Cell and Regenerative Medicine Research Department,
ACECR-Khorasan Razavi Branch,
Mashhad. Iran.
P.O.Box: 9177949367

Harish K

unread,

Jan 17, 2024, 12:28:08 PM1/17/24

to gsea-help

Sorry for necro-bumping the thread, but we had some questions from our users as to when to use GSEA pre-ranked or when to use the normalized counts profiles, and if there were any publications/blogs that compared the difference between the two.

Any pointers as to why the recommendations currently prefer the counts profiles would be useful!

Regards,

Harish

Anthony Castanza

unread,

Jan 17, 2024, 2:01:09 PM1/17/24

to gsea...@googlegroups.com

Hello,

In the future I'd ask that you please create a new thread to ask your own questions.

That said, the recommendation is to always use standard GSEA when you have access to the underlying count matrix. This enables the use of the validated ranking metrics that we provide, and the statistically superior Phenotype permutation method (if your data has a sufficient sample number to use that method), and enables GSEA to account for divide by zero/NaN computations through built in correction factors.

GSEA Preranked should be considered a fallback method used when the count matrix is not available, or there was a specialized pipeline for the differential expression computation that can't be captured by GSEA's basic two phenotype comparison.

Unfortunately I don't have any publications on hand to reference that have done direct benchmarking of the standard method vs the preranked method, however, for a manual computation of the signal to noise ratio with GSEA's documented correction factors, the Preranked method should produce identical results as a standard GSEA run in gene set permutation mode.

Also, while we don't have specific validated recommendations for usage of the GSEA Preranked mode, many users have found a ranking calculation of Log2(FC)*-log10(pValue) to be a decent method that provides a robust gene ranking in the absence of the count matrix.

Sorry I couldn't be of more help here

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/c198f939-c0d0-48ac-8d6b-ad21f703e24dn%40googlegroups.com.

Reply all

Reply to author

Forward