First time GSEA

167 views
Skip to first unread message

Ramon Garcia Areas

unread,
Sep 23, 2021, 3:26:01 PM9/23/21
to gsea...@googlegroups.com
Hi Anthony,

I hope this email finds you well. I was wondering if there is a video tutorial for running the GSEA analysis software. I downloaded it, but I am unsure how to set it up and what format to have the gene expression data in. Any insight  would be greatly appreciated.

Best regards,

Ramón

Anthony Castanza

unread,
Sep 23, 2021, 3:28:15 PM9/23/21
to gsea...@googlegroups.com

Hi Ramón,

 

We generally direct users to the (text based) user guide: https://software.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html

But also this video (that was produced by a 3rd party) is generally pretty good: https://www.youtube.com/watch?v=KY6SS4vRchY

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAHpuyfxTSXjW4Pr66SZa9hQhz-hKUGZRxU_b5mufpuuuLrKjGg%40mail.gmail.com.

Ramon Garcia Areas

unread,
Sep 23, 2021, 3:30:37 PM9/23/21
to gsea...@googlegroups.com
Hi Anthony,

Thank you for your speedy reply. I will try those two resources out. 

Best regards,

Ramón 

Ramon Garcia Areas

unread,
Sep 25, 2021, 6:40:23 PM9/25/21
to gsea...@googlegroups.com
Hi Anthony,

The files I got back from the RNAseq analyses contain the gene name, fold change and p values. Is it possible to run a GSEA analysis using a file with fold changes? Thank you for your insight.

Best,

Ramon 

Anthony Castanza

unread,
Sep 26, 2021, 5:46:09 PM9/26/21
to gsea...@googlegroups.com

Hi Ramon 

 

You can run this data in GSEA Preranked mode, if you just take the full list of genes and their Log2(FC) and format them as a .rnk file: Data formats - GeneSetEnrichmentAnalysisWiki (broadinstitute.org)

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

Ramon Garcia Areas

unread,
Sep 28, 2021, 10:53:59 PM9/28/21
to gsea...@googlegroups.com
Hi Anthony, 

Thank you so much, I´ll give it a try.

Best,

Ramon

Ramon Garcia Areas

unread,
Oct 3, 2021, 6:16:44 PM10/3/21
to gsea...@googlegroups.com
Hi Anhtony,

How does one create a RANK file? Can it be done in Excel? Thank you for your insight.

Best,

Ramón

Anthony Castanza

unread,
Oct 4, 2021, 1:36:46 PM10/4/21
to gsea-help
Hi Ramón,

The RNK file is a simple two column format (see the link I sent you previously), yes it can be made it Excel, just be careful about Excel's propensity towards messing up some gene names by converting them to dates. The best way to prevent this is to use excel's import tool (File>Import) and then setting all column types as text (not general) when importing your dataset.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego


Ramon Garcia Areas

unread,
Oct 5, 2021, 10:42:24 PM10/5/21
to gsea...@googlegroups.com
Hi Anthony,

I was able to upload the list in the load data tab (photo1), but when I went to run the gsea preranked nothing came up in the ranked list. Where should I be loading the file? Thank you so much for your help.

Best,

Ramon

Capture2.JPG
Capture1.JPG

Anthony Castanza

unread,
Oct 5, 2021, 10:57:42 PM10/5/21
to gsea-help
The file you loaded has the file extension .txt, it needs.to have the file extension .rnk or GSEA won't interpret it correctly as a RNK file.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Ramon Garcia Areas

unread,
Oct 13, 2021, 11:02:19 PM10/13/21
to gsea...@googlegroups.com
Hi Anthony,

I hope this email finds you well. I tried running the RNK file and got the attached error message. Maybe I'm npot setting up the file correctly? Thank you for any insight.

Best,

Ramón

On Thu, Sep 23, 2021 at 1:28 PM Anthony Castanza <acas...@cloud.ucsd.edu> wrote:
Capture1.PNG

Anthony Castanza

unread,
Oct 13, 2021, 11:24:39 PM10/13/21
to gsea...@googlegroups.com

Hi Ramón

 

From this error message it’s looking like the CHIP file you’ve selected isn’t actually the right one for your gene identifiers.

The one you picked only works with unversioned Human Ensembl Gene IDs. Those IDs look like ENSG00000012345 or such. If the ID ends with a .version number in your data (like ENSG00000012345.6) that suffix would have to be removed. If your genes were converted to their gene symbol already then you’d want to use the Human_Gene_Symbol_with_Remapping_ chip. There are corresponding chips for Mouse or Rat data as well.

 

If you’re not sure exactly what chip you need, if you send a screenshot of your RNK file’s contents I can probably tell you which one it is.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

From: Ramon Garcia Areas
Sent: Wednesday, October 13, 2021 8:02 PM
To: gsea...@googlegroups.com

Anthony Castanza

unread,
Oct 14, 2021, 1:09:18 PM10/14/21
to gsea...@googlegroups.com

Hi Ramón,

 

Based on the error message, that doesn't appear to be the right chip file for your dataset. Could you send me a sample, or a screenshot, of what the gene identifiers in your ranked list look like?

Ramon Garcia Areas

unread,
Oct 14, 2021, 11:35:23 PM10/14/21
to gsea...@googlegroups.com
Hi Anthony, 

Thank you for your reply. I figured out which chip set to use and I was able to generate graphs using a preranked RNK file that lists genes with positive and negative log fold changes. However, I am unsure if one mountain graph includes genes with positive and negative fold changes because in the analysis page it lists two different phenotypes. Does the software separate the genes with negative values from the ones with positive values? Thank you for your insight.

Best,

Ramón

Anthony Castanza

unread,
Oct 15, 2021, 3:07:40 PM10/15/21
to gsea...@googlegroups.com

Hi Ramón,

 

GSEA works its way down the entire ranked gene list computing if the genes in the gene set are, on balance, upregulated or downregulated. The gene sets that are upregulated on balance are presented in the positive enrichment phenotype (for preranked na_pos) and the gene sets that are on balance downregulated are presented in the negative enrichment phenotype (for preranked na_neg). It is possible for gene sets that are enriched in one direction to contain some genes that responded in the opposite direction, this just means that the signal of those genes was not enough to overbalance the signal of the other genes in the set. GSEA doesn't consider the up and down halves of the ranked list separately, it looks at the dataset as a whole and tries to evaluate each set in its global context.

 

Does that make sense?

Ramon Garcia Areas

unread,
Oct 19, 2021, 11:18:13 PM10/19/21
to gsea...@googlegroups.com
Hi Anthony,

Thank you, your explanation was very helpful.

Best,

Ramón

Ramon Garcia Areas

unread,
Nov 7, 2021, 8:03:08 PM11/7/21
to gsea...@googlegroups.com
Hi Anthony,

I hope this email finds you well. I'd like to ask, if there is a way to know if the GSEA software has identified all the genes in the lists I submit? Thank you for any insight.

Kind regards,

Ramón

Anthony Castanza

unread,
Nov 8, 2021, 4:11:17 PM11/8/21
to gsea-help
Hi Ramón,

If you're using the collapse dataset function there should be an output called "Symbol_to_probe_set_mapping_details.tsv" in the results directory that lists all the identified gene ids in the input data and what the chip file mapped them to.
We also produce a "gene_set_sizes.tsv" output that would tell you if genes were filtered out of a gene set because they were not detected in the input dataset, but I don't think we explicitly write out which genes were filtered out because they were not detected.
We do produce a gene_sets.gmt in the edb directory of the result output that contains the gene sets used after GSEA's filtering has been applied. It should be possible to get the genes that were dropped from these gene sets by comparing the original GMT against this filtered GMT.

Hope this helps,

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Ramon Garcia Areas

unread,
Nov 8, 2021, 4:14:11 PM11/8/21
to gsea...@googlegroups.com
Hi Anthony,

I'm analyzing a preranked list, not collapse. Does that change anything? Thank you for your insight. 

Kind regards,

Ramon 

Anthony Castanza

unread,
Nov 8, 2021, 4:20:37 PM11/8/21
to gsea...@googlegroups.com

If you're not using the collapse functions then there will be no " Symbol_to_probe_set_mapping_details.tsv" and GSEA will use all the symbols as-is, it will still use these symbols to filter the gene sets and produce the filtered "gene_sets.gmt" and  gene_set_sizes.tsv" which should tell you if there were genes that are in gene sets what were not detected in your dataset.

Ramon Garcia Areas

unread,
Nov 8, 2021, 4:24:38 PM11/8/21
to gsea...@googlegroups.com
Thanks Anthony!  Greatly appreciate all your help.  

Best regards,

Ramón

Ramon Garcia Areas

unread,
Nov 21, 2021, 7:51:39 PM11/21/21
to gsea...@googlegroups.com
Hi Anthony,

I hope this email finds you well. I am analyzing some data I generated with GSEA software and would like to ask about the statistical value of a result. Are gene sets with a NOM p-val greater
than 0.05 considered not significant? Thank you for your insight.

Best regards,

Ramón

Virus-free. www.avast.com

On Thu, Sep 23, 2021 at 1:28 PM Anthony Castanza <acas...@cloud.ucsd.edu> wrote:

Anthony Castanza

unread,
Nov 21, 2021, 9:57:45 PM11/21/21
to gsea-help
Hi Ramón,

A gene set with a NOM pValue >0.05 would generally not be considered significant if GSEA is run in gene_set permutation mode. If GSEA was run in Phenotype permutation mode, which is quite strict, we'd generally recommend relaxing this threshold to <0.25.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Ramon Garcia Areas

unread,
Nov 21, 2021, 10:36:00 PM11/21/21
to gsea...@googlegroups.com
Hi Anthony,

Thank you very much for your reply. How would I know if I ran it gene_ser permutation or phenotype permutation? I uploaded preranked FC lists and got pos phenotype gene sets and neg phenotype gene sets.  Thank you for your 
Insight. 

Best,

Ramón 





Anthony Castanza

unread,
Nov 21, 2021, 10:56:56 PM11/21/21
to gsea-help
GSEA Preranked always uses gene set permutation since per-sample phenotype information is unavailable when using an externally ranked list.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Ramon Garcia Areas

unread,
Nov 21, 2021, 10:58:23 PM11/21/21
to gsea...@googlegroups.com

Ramon Garcia Areas

unread,
Nov 24, 2021, 10:58:05 PM11/24/21
to gsea...@googlegroups.com
Hi Anthony,

I hope this email finds you well. I have a quick formatting question. When I convert my excel file to a txt.rnk, the columns are not well aligned (photo attached). Might this affect how the software reads the file? Thank you for your insight. 

Best,

Ramon

Test.jpg

Anthony Castanza

unread,
Nov 24, 2021, 11:20:38 PM11/24/21
to gsea-help
Hi Ramon,

This is a frequent visual glitch with plain text editors.
As long as there is only one tab it should be fine.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Ramon Garcia Areas

unread,
Nov 24, 2021, 11:28:00 PM11/24/21
to gsea...@googlegroups.com
Hi Anthony,

Thank you for your speedy reply. I hope you have a great Thanksgiving! 

Best,

Ramón

Ramon Garcia Areas

unread,
May 1, 2022, 1:06:08 AM5/1/22
to gsea...@googlegroups.com
Hi Anthony,

I hope this email finds you well.

I downloaded the new version of GSEA software and got the attached error message when I uploaded a file I had previously analyzed. Is something wrong with the file? Thank you for your insight.

Best,

Ramón
Screenshot 2022-04-30 230301.jpg

David Eby

unread,
May 2, 2022, 2:53:01 PM5/2/22
to gsea...@googlegroups.com
Hi Ramón,

This is a relatively new error message added with GSEA 4.2.0 but the meaning is pretty clear, or at least should be unless there's something strange about the file.  Would you mind sharing it with us over on the gsea-team private address?

Do you happen to remember the former version of GSEA you used before?

Thanks,
David

Reply all
Reply to author
Forward
0 new messages