Running gene set enrichment for a specific pathway

68 views
Skip to first unread message

Sana Majid

unread,
Mar 14, 2023, 8:30:37 PM3/14/23
to gsea-help
Hi,

I have RNA seq results where the predictor is a continuous variable. I want to run gene set enrichment for only one or a set of pathways (for example, insulin receptor signaling pathway and insulin secretion pathway) to see if those specific pathways are enriched in our results. How do I go about doing this? 

Thank you!
Sana

Anthony Castanza

unread,
Mar 15, 2023, 1:44:43 PM3/15/23
to gsea...@googlegroups.com
Hi Sana,

If you're asking about how to run GSEA with a continuous variable as the phenotype, this is fairly straightforward with the continuous format CLS file, see the specification here: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#CLS:_Continuous_.28e.g_time-series_or_gene_profile.29_file_format_.28.2A.cls.29 once you have a CLS file in that format, you'll need to change the "metric for ranking genes" parameter to either the Pearson or Spearman correlation options. This will allow GSEA to return results where the enrichment score is a function of the correlation of each gene's expression with your continuous phenotype.

If you're asking about how to run just a specific gene set or handful of gene sets, this can be done as well, you would need to prepare your gene sets of interest in one of our gene set database formats: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#Gene_Set_Database_Formats
these files can be loaded into GSEA like any other data file, and assuming they are formatted correctly, will be made available in the "Local" tab of the gene set selection dialogue. I should caution here however, that if you only run a single gene set, the FDR that GSEA produces is not meaningful and should be discarded (although the pValue remains sound).

Let me know if you have any other questions

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/0fedfdd8-90e4-4cc0-bba3-2de5af681953n%40googlegroups.com.

Sana Majid

unread,
Mar 22, 2023, 7:08:24 PM3/22/23
to gsea-help
Hi Anthony,

Thank you! 
I have the properly formatted cls file with the continuous variable, the gct counts file (but I converted the gene symbols rather than the ENSEMBL IDs) and am using the gene sets I created as a gmx file. I set the "metric for ranking genes" parameter to Spearman correlation as you mentioned. However, I keep getting the follwing error 1006: "Too few samples in the dataset to use this metric".

Would you know what the issue might be?

Best,
Sana

Castanza, Anthony

unread,
Mar 22, 2023, 7:17:26 PM3/22/23
to gsea...@googlegroups.com

Hi Sana,

How many samples are there in your dataset?
It is possible that there is some formatting issue with the data either the dataset fie or the CLS file. It would be difficult to say specifically what might be going on wihout seeing the data files themselves.

If you like you can send your dataset and cls file (confidentially) to gsea...@broadinstitute.org, and we can take a look. Otherwise, screenshots of the files open a plain text editor (preferably one that shows whitespace characters), or excel would be helpful for figuring out what might be going wrong.

 

-Anthony

 

Anthony S. Castanza, PhD

Department of Medicine

University of California, San Diego

 

To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+..@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.

Sana Majid

unread,
Mar 22, 2023, 7:33:34 PM3/22/23
to gsea...@googlegroups.com
Hi Anthony,

There are 433 samples and more than 15,000 genes.
I can send over the screen shots of the counts file and the other files to the email you provided.

Thanks!
Sana



Sana Majid

unread,
Mar 23, 2023, 1:55:56 PM3/23/23
to gsea-help
Hi Anthony,

As you mentioned, the cls file was the problem! Once I got that as tab separated it worked out.

Thanks very much!
Sana

Sana Majid

unread,
May 3, 2023, 11:20:29 AM5/3/23
to gsea-help
Hello,

I have another question where I want to run gene set enrichment for specific pathways on my DEG output that has the fold change and p-values (since we ultimately adjusted for certain covariates). How can I do this? I saw something about using GSEAPreranked, but am not sure if that is the case-- and if it is, would I run a calculation to generate the enrichment score?

Last time I had loaded a normalized counts file, but this doesn't take the adjustments into account.

Thank you!
Sana

Castanza, Anthony

unread,
May 3, 2023, 12:41:30 PM5/3/23
to gsea...@googlegroups.com

Hi Sana,

 

GSEA Preranked would be the way to run this data, however, it only supports ranking by a single metric. If  your statistical software produced any other statistics such as a Wald Test statistic I would recommend to use that, or if not you could try some sort of combined score such as -log(pValue)*logFC.

Also you’ll still want to include information for all genes, not just significant differentially expressed genes.

 

Let me know if you have any more questions!

Reply all
Reply to author
Forward
0 new messages