GSEA analysis plots

624 views
Skip to first unread message

Sarah Rashid

unread,
Apr 20, 2022, 10:13:10 AM4/20/22
to gsea-help
Hi everyone,

First of all, I appreciate this amazing software which is really useful for data analysis. I have a question about the GSEA plots. I  have my RNA Seq data of leukaemia mouse cell lines. I have two groups and in each group, I have four samples. I performed the differential analysis in R studio by using edge R. I have an excel file of mouse gene symbols with all the information of log FC, value, and logCPM. I have performed the pathway analysis in GSEA software by using "pre-ranked GSEA". I uploaded the rnk file in software and got the results. The plots which I got after analysis was just showing the positive phenotype and there is no negative phenotype. Usually, in the plot, there is a red line on the left hand and a blue line on the right-hand side, but in the case of plots, it is just showing the red line. When I checked the negative phenotype there is no information in it. I just got the information in the positive phenotype file. 
So what should I do in this case? Can I go to perform the simple GSEA analysis?
my second question is it right to put these graphs in the publication as I am working on a paper.
Kindly help me out if you have any suggestions to make the plots properly or any other thing I can do with my data.
 I am also attaching the plots so that you can get an idea about the plots which I have generated through GSEA.

Screenshot 2022-04-19 120206.png

Anthony Castanza

unread,
Apr 20, 2022, 2:00:26 PM4/20/22
to gsea-help
Hello,

What was in the rnk file you uploaded? If it was the Log2FC it should have been all genes, both positive and negative, if it was the pValues, these generally need to be transformed to fit the data shape expected by GSEA (i.e. taking the -log10(pValue)*sign(FC) or some such metric).

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/4d6fefff-a1e8-45ed-9944-4938816c34b5n%40googlegroups.com.

Sarah Rashid

unread,
Apr 22, 2022, 10:18:35 AM4/22/22
to gsea-help
Hi Anthony,

Thanks for your response. I have run the analysis on rnk file which I have created by using the formula in the excel "=SIGN(logfc)-log10(Pvalue).
I have now understood that by making rnk file this way I am unable to get the negative phenotype results. 
My question is Can I run the analysis by using all the values of logFC along with all genes?
OR
Do I need to first filter the genes by specifying the logFC value and then run the analysis in GSEA.

I am a bit confused that what should I do?

Kindly help me out.

Thanks,
Sarah

Anthony Castanza

unread,
Apr 22, 2022, 12:33:49 PM4/22/22
to gsea...@googlegroups.com

HI Sarah,

 

Yes you should be able to run GSEA with Log2FC values in preranked mode instead, no you do not need to apply filtering by the log2fc.

That said, the =SIGN(log2fc)*(-LOG(pvalue,10)) formula in excel should provide both positive and negative values. The formula I provided is slightly different than the one you included so it might just be a slight issue with the excel syntax.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

Sarah Rashid

unread,
Apr 22, 2022, 10:29:22 PM4/22/22
to gsea-help
Hi Anthony,

Thanks for your feedback. Kindly check the attached file below which I will be going to convert into .rnk file by using "Tab delimited.txt". If any issue in the file please inform me.

I want to confirm one more thing Can we run the "Collapse" in pre-ranked GSEA. As I have read in the user guide that we can't use it. I have mouse RNA seq data having their gene symbols. When I simply run the analysis by using "no collapse", it always showed me the error that "after pruning nine of the genes pass the threshold". So I reckon that GSEA contains human gene symbols and I have mouse gene symbols. So for this purpose, I have used the collapse function and in the chip file, I am using "MOUSE Gene Symbols remapping human orthologs". After performing this I got my results. But I am not sure whether I am on the right track for my analysis. Kindly give me your feedback on this matter. 

Thanks,
Sarah
logfc screen shot.png

Anthony Castanza

unread,
Apr 23, 2022, 3:28:04 AM4/23/22
to gsea-help
Yes you do have to use collapse with Preranked for mouse genes, the information in the user guide should be updated. We recently added a new default collapse mode to Preranked which mitigates some of the more serious issues with using it with Preranked datasets.

I don't see anything obviously incorrect with the attachment you sent although I can't be 100% sure from a screenshot. How many total genes were in the file?
Assuming it's >10,000 everything seems fine to me here.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Sarah Rashid

unread,
Apr 23, 2022, 6:50:41 AM4/23/22
to gsea-help
Hi Anthony,

Thanks for your reply. I have a list of 10317 genes. 
I want to know if I don't get the FDR < 0.05 in my pathway analysis, what should I do in this case. Can I stick with the PValue as I have found the pathways that have P value < 0.05? As I am working on the manuscript so want to make sure about my analysis.

Kindly give me your suggestion.

Thanks,
Sarah

Anthony Castanza

unread,
Apr 25, 2022, 1:08:33 PM4/25/22
to gsea...@googlegroups.com

You had no sets with FDR's less than 0.05? That is quite unusual. What collections from MSigDB were you running? Generally we recommend running the lowest lvel subcollection that contains your sets of interest (i.e. C5.GO.BP will generally give better results than C5.ALL)

 

That said, 10317 is kind of a small number of genes for a mouse transcriptomic study, I would generally expect almost 3x that number of expressed genes. Was this data pre-filtered in some way? That might be affecting your ability to robustly detect pathway dysregulation if too many genes were removed.

 

Generally in preranked mode, which uses the gene set permutation method, an FDR of <0.05 is considered a good threshold for statistical significance, in other modes of GSEA we sometimes use an FDR threshold of 0.25 however we generally only recommend this threshold when running standard GSEA in phenotype permutation mode, not gene set permutation mode. Unfortunately this mode is only available if using the normalize counts matrix as the GSEA input and not a preranked list.

sarah awan

unread,
Apr 27, 2022, 1:29:54 AM4/27/22
to gsea...@googlegroups.com
Hi Anthony,

Thanks for your response. I am using C2 Kegg (curated) for pathway analysis. Unfortunately, I did not get any significant results if I stick with FDR q-value.  The NOM p-values are significant so I am not sure whether I can put NOM p-values in my results or should I consider the FDR q-value. However. all the C5 (BP, MF, CC) have significant FDR q-values.

Any suggestion on C2 kegg analysis.

Thanks In advance,

Sarah


Reply all
Reply to author
Forward
0 new messages