Can I use z-scores as ranking metric for GSEA Preranked?

695 views
Skip to first unread message

Patrick Tamukong

unread,
Oct 25, 2021, 9:48:23 PM10/25/21
to gsea-help
Dear GSEA Help Team, 

I have microarray data for which I am comparing 2 conditions. The data is preprocessed (normalized, batch-corrected and log2-transformed). I computed log2FC and used as the ranking metric for GSEA Preranked but was not quite pleased with my results. So I calculated z-scores [z-score = {(mean_tx_Group - total_mean)*sqrt(n)}/sd_total] and used as the ranking metric. 

Kindly help clarify me on whether the z-score can be used as a ranking metric for GSEA Preranked. If yes, would you consider the z-score a better ranking statistic since it includes information on dispersion or standard deviation unlike the log2FC? Under what circumstances would the z-score not be useful as a ranking metric? For example, when dealing with different data types such as RT-qPCR, RNA-Seq, etc. 

Thanks so much on your time and help. 

Sincerely

Patrick T. 

Anthony Castanza

unread,
Oct 26, 2021, 12:18:02 AM10/26/21
to gsea...@googlegroups.com

Hi Patrick,

 

I think z-score could be a perfectly reasonable metric to try for GSEA-Preranked. In fact, another method in the GSEA family – single sample GSEA (ssGSEA), internally ranks genes using z-score. That said, how many samples do you have per group? Is there a particular reason you’re using GSEA Preranked rather than standard GSEA with the Singal2Noise ranking metric (the standard GSEA ranking metric that also accounts for sample standard deviations)?

 

We tend to shy away from giving specific recommendations for when and how to rank genes for GSEA Preranked since we often don’t know enough about the specifics of the experiment to say if a given choice is a good one.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/f9aebdb0-ae24-478d-8d99-ed39dc86cfcfn%40googlegroups.com.

 

Patrick Tamukong

unread,
Oct 26, 2021, 10:57:25 AM10/26/21
to gsea-help
Thanks so much Anthony. 

In a previous question to which you responded, I seemed to understand from your response that GSEA doesn't work well with log2-transformed data. I have 305 samples in my treatment group and 238 samples in my control group but again, the data is log2-transformed. Because it is thus transformed, when I ran GSEA, I choose Diff-of-Classes as the ranking metric. Can I still use the default signal-to-noise for my data type? 
Thanks on letting me know I'm good with using z-scores in GSEA Preranked. I do have another study with only 23 samples per group and I think this would be a perfect scenario for using z-scores to rank the genes. 

Sincerely, 

Patrick T. 

Anthony Castanza

unread,
Oct 26, 2021, 12:53:30 PM10/26/21
to gsea...@googlegroups.com

Hi Patrick,

 

I might suggest un-logging the data, the official recommendation is to not use log scaled data with the default metrics.

 

Also, with more than 7 samples per group there is an additional reason why you might want to unlog your data and use that for standard GSEA rather than GSEA Preranked; you'd be able to easily use standard GSEA's phenotype permutation mode, which has advantages over the gene set permutation mode used by default in GSEA Preranked.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

Patrick Tamukong

unread,
Oct 27, 2021, 10:49:11 AM10/27/21
to gsea-help
Many thanks Anthony. 

Have a great day. 

Patrick T. 

Reply all
Reply to author
Forward
0 new messages