how to interpret enrichment plot

2,815 views
Skip to first unread message

Dong

unread,
Sep 13, 2021, 6:28:25 PM9/13/21
to gsea-help
Hi,  

I got a little bit confused by the enrichment plot and need some help.  In the cls file, I assign "1" to samples showing high expression level of my target gene, and "0" to low expression samples.  In the GSEA analysis, I choose High vs. Low as phenotype label.  The plot is attached.  Basically, you can see the geneset is enriched or upregulated in the "low" group.  However, the blue text says "negatively correlated", NOT "positively correlated".  My interpretation is that this negatively correlated means genes in blue areas are negatively correlated with the high expression of my target gene.  It doesn't mean that low expression of my target gene are negatively correlated with the genes in blue area.  It seems that my description is confusing.  I believe that many people may also get confused.  I notice that some papers present the plot in different way to avoid confusing people.  This is one example: https://www.science.org/doi/10.1126/scitranslmed.aaz5683 (Fig.1C). 

I wonder whether you have better way to deal with this seemly confusing issue.  And what do you think about the example that mentioned. 

Thanks.
Dong  
enplot_GSE9650_EFFECTOR_VS_EXHAUSTED_CD8_TCELL_UP_3.png

Anthony Castanza

unread,
Sep 13, 2021, 6:45:47 PM9/13/21
to gsea...@googlegroups.com

Hi Dong,

 

GSEA is optimized for the "general use case" and in that case, GSEA refers to things in reference to the positive enrichment phenotype. The language the chart uses is probably not ideal.

In this case, since you've defined the high expression of the gene of interest as the positive phenotype, GSEA is saying that enrichment of these gene sets appears anticorrelated with that high expression, i.e. samples with high expression of this gene have low enrichment of the genes in this gene set. I think I'm understanding what you're saying here?
If you output SVG plots, it would be relatively easy to scrub this potentially confusing text from the image.

 

Just as a note, if expression of a target gene is what you're interested in using as the sample profile, GSEA actually supports a mode that can directly select a gene of interest and GSEA will use the actual expression level of that gene instead of a binary CLS file, then use Pearson correlation between that gene's expression and the other expressed genes as the ranking metric for running GSEA. So if your "Gene A" is the gene of interest, GSEA will calculate the Pearson correlation between gene A and all other genes, then walk down the ranking list and, if, say Gene C, F, and, H etc are in the gene set if they're positively correlated with the expression of Gene A it'll increase the score and if they're negatively correlated with Gene A it'll decrease the score. Perhaps that mode would be of interest to you here?

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/1a665051-de9d-43e2-9d5d-fcc70d9f20b6n%40googlegroups.com.

Dong

unread,
Sep 13, 2021, 8:11:47 PM9/13/21
to gsea-help
Hi Anthony, 

Yes, that is exactly what I was saying.  I think that scrubbing the potentially confusing text from the image may be a good option.  The plot as shown in Fig. 1C of the STM paper https://www.science.org/doi/10.1126/scitranslmed.aaz5683 indeed can also confuse people to some extent. 

So, you are suggesting the analysis by using the gene of interest as a continuous variable?  Indeed, I tried both ways.  Another is using the gene of interest as a categorical variable and the median is used as cutoff.  Which way would you prefer?

Thanks.
Dong

Dong

unread,
Sep 13, 2021, 10:05:16 PM9/13/21
to gsea-help
Hi Anthony, 

I have one more question about analysis using gene of interest as continuous variable: if I get many enriched genesets and want to make box plot to show the enriched genesets of interest, what will be the phenotype label?  For the categorical variable, I can use "High vs. Low" as phenotype label.  But I don't know the appropriate phenotype label for continuous variable. 

Thanks.
Dong

Anthony Castanza

unread,
Sep 13, 2021, 11:00:34 PM9/13/21
to gsea-help
With regard to your question about preference, I don't know that there is a consensus which method is best, the binary metric with the standard enrichment is certainly the more widely used one though, although I would suspect that the continuous correlation might be able to detect more subtle signals.

The enrichments in continuous mode would be positive (pearson) correlation between gene of interest and genes in gene set and negative (person) correlation. I think the correct labels in that case would be positively correlated with gene of interest and negatively correlated with gene of interest.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Dong

unread,
Sep 13, 2021, 11:28:40 PM9/13/21
to gsea-help
Hi Anthony, 

Thank you so much for your help!

Best,
Dong

Dong

unread,
Sep 14, 2021, 8:05:04 AM9/14/21
to gsea-help
Hi Anthony, 

I have one more question related to analysis using continuous variable.  As suggested by GSEA guideline, if each group is more than 7 samples, it'd better to use phenotype instead of geneset as permutation types.  And the FDR cutoff is 0.25.  

What about analysis based on continuous variable?  Phenotype or geneset permutation? 
We need to look at FDR or nominal P value or both?  
And what about the cutoff value?  
Also, if the customized geneset database contains only 28 genesets, is it good to increase the cutoff value?  For example, increasing FDR cutoff from 0.05 to 0.25?  If so, how to determine the appropriate cutoff value?

Thanks.
Dong 

rakesh s nair

unread,
Dec 9, 2021, 2:54:21 PM12/9/21
to gsea-help
thanks anthony...your explanation is a great help...
Reply all
Reply to author
Forward
0 new messages