Looking for Advice on GSEA Set-Up with Unique Experimental Design

Alexander Pessell

unread,

Jun 25, 2025, 12:55:10 PMJun 25

to gsea-help

Hi all,

I consulted this subreddit for information before stumbling across this help page (crosspost link: https://www.reddit.com/r/bioinformatics/comments/1lk6m8a/looking_for_advice_on_gsea_setup_with_unique/). I have continued working on my sequencing analysis pipeline after DESeq2 analysis for differential gene expression and am now focusing on gene set enrichment analysis. For reference, here are the replicates I have in the normalized counts file (.cgt, directly scraped from DESeq2):

0% stenosis - x6 replicates (x3 from the upstream of a blood vessel, x3 from the down)
70% stenosis - x6 replicates (x3 from the upstream of a blood vessel, x3 from the down)
90% stenosis - x6 replicates (x3 from the upstream of a blood vessel, x3 from the down)
100% occlusion - x6 replicates (x3 from the upstream of a blood vessel, x3 from the down)

Main question to address for now: How does stenosis/occlusion alone affect these vessels?

The issue I am having is that the replicates split between the upstream and downstream are neither technical replicates nor biological replicates (due to their regional differences). In DESeq2, this was no issue, as I set up my design as such to analyze changes in stenosis while considering regional effects:

~region + stenosis

But for GSEA, I need to decide to compare two groups. What is the best way to do this? From what I gather, and advice over there, is to use GSEA preranked. In the future, I might be interested in comparing regional differences, but for right now, I am only interested in the differences purely due to the effect of stenosis.

In the past, I always used standard GSEA but now believe it is my only option to do GSEA preranked (where I will undoubtadly have to choose a ranking metric, which I know is a common question and one in which I am unsure how to approach). I know that when using lfcshrink() in this pipeline, it aids in ranking of genes by using the log2(fold-change) exclusively, but I am open to perspective on that.

Thanks!

Alex

Anthony Castanza

unread,

Jun 25, 2025, 7:16:28 PMJun 25

to gsea-help

Hi Alex,

Unfortunately yes, the best option in the case of a complex experiment is GSEA Preranked. This would allow you to use a gene ranking produced directly by DESeq2 which should have been computed in a confounding-variable aware way.

The best metric DESeq2 produces for use with GSEA is probably the test statistic "stat" column, this represents the log2FoldChange divided by lfcSE, and is a pretty reasonable substitute for the signal2noise ratio we use by defualt, but alternatively people commonly use metrics like log2(fc)*-log10(pValue) or simply the sign(log(fc))*-log10(pValue). These metrics generally work well enough, and are pretty widely accepted, but we do generally shy away from giving firm recommendations here as it is typically better to consult with a bioinformatic local to your institution for these more complicated scenarios.

Sorry I couldn't be of more assistance here, but do still let me know if you have any additional questions

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

Alexander Pessell

unread,

Jun 26, 2025, 8:43:10 AMJun 26

to gsea-help

Hi Dr. Castanza,

I appreciate your feedback! The main priority was to ensure that I had the correct intuition of performing GSEA preranked over the standard. Thank you for clarifying that! I know there is a lot of debate about which ranking metric to use, and I appreciate your quick reply.