Large discrepancy in NES and FDR with classic versus weighted

606 views

Skip to first unread message

Luis

unread,

May 16, 2018, 2:37:30 PM5/16/18

to gsea-help

We are using GSEA on an RNA-seq dataset. We performed differential expression analysis with DESeq2 and generated a RNK file with a metric we consider appropriate and has been discussed in other posts (signed -log10 of the adjusted p-values). Following the recommendations of the GSEA FAQ page, we then used GSEAPreranked. We separately ran two analyses, changing only the Enrichment statistic: one with "classic" and one with "weighted". All other parameters were left untouched, and were as follows:

Gene sets database: c5.bp.v6.1.symbols.gmt (from MSigDB)

Number of permutations: 1000

Max size: 500

Min size: 10

Normalization mode: meandiv

Seed for permutation: 149

Although the enrichment scores were generally much higher for the "weighted" option, the normalized enrichment scores were much lower (and the FDR values much higher). Below is an example, with results from the same gene set. My specific question is: Is there an explanation for this large discrepancy in NES (-4.7 vs -1.5) and FDR values (0 vs 0.49), when the only thing we changed was the Enrichment statistic? Thank you in advance for your clarification.

ptamayo

unread,

Jun 10, 2018, 1:29:29 PM6/10/18

to gsea-help

Luis,

I believe that behavior is in part a consequence of the gene set being so large. Large gene sets contains many entries where the weighting is taking into account in the enrichment score calculation and can be more sensible to the input parameters. Despite this I think it is fine to report any of those results. One alternative would be to consider a smaller gene set that represents the same biology and is perhaps more typical in behavior.