David Eby www.gsea-msigdb.org igv.org
genepattern.org
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/75d1c28f-fb0e-4ad5-b2cb-320520fbec59%40googlegroups.com.
Hi Prit,
790 genes is over the default GSEA maximum gene set size threshold.
You’ll need to expand the “Basic Fields” section and change the “Max size: exclude larger sets” parameter from the default 500 to something above your gene set size.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/b268d37a-f3f1-445e-b6b6-b4f6c26f15a8n%40googlegroups.com.
Hello,
The error message you’ve included is unrelated to the previous issues in this thread.
In your case, it would appear that you’ve selected some combination of Gene Set Database files that includes both a subcollection and at least one of it’s parent collections. The MSigDB gene set files are hierarchical (eg. You can run C5:GO:BP and C5:GO:MF together but you couldn’t run both C5:GO:BP and the higher level C5:GO in the same run as it will contain repeats of the sets that contain both and give this error. In this case, I believe that the duplicates are in one of the C2 levels. If you share what you used in the gene sets database input box, I can tell you where they’re coming from more specifically.
That’s assuming you’re using MSigDB gene sets. If you’re using your own custom file, you’d need to manually check it for duplicates.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/2efa63bc-0704-4a15-841e-64e172950881n%40googlegroups.com.
When using the annotate page to perform an overlap statistic test, GSEA internally uses the Human_Gene_Symbol_with_Remapping and Mouse_Gene_Symbol_with_Remapping_to_Human_orthologs CHIP files to ensure that all symbols are Harmonized into the MSigDB namespace (additionally, a few other CHIPs are also used to pick up other commonly used namespaces).
Running GSEA in the desktop app is different than running an overlap statistic test. When running an overlap statistic test, you need to select just significant genes or genes of interest and usually these are run as separate lists of up and downregulated genes (but not always). When running GSEA you actually need the *entire* list of all expressed genes both significant and non-significant along with either the full expression information (in GSEA's regular mode) or with a precomputed ranking metric (in GSEA Preranked mode). Including the non-significant genes allows GSEA to compute the full ranking distribution when testing for overrepresentation. If you don’t have the full expression information, you're likely to get the "none of the gene sets passed size thresholds" error as GSEA strips genes from gene sets if they aren't in the underlying expression dataset.
If your dataset already contains the full expression information for all genes and you are still experiencing these errors, then yes, sending your dataset (an be done confidentially to gsea...@broadinstitute.org if you want to keep it off the public help forum), as well as any other input files you used for GSEA, will help us in debugging the error messages here.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/318a2eec-4a2c-457e-9e42-5aaf57c2a9ebn%40googlegroups.com.
-Anthony
Anthony
S. Castanza, PhD
Curator,
Molecular Signatures Database
Mesirov
Lab, Department of Medicine
University
of California, San Diego


Hello,
So, what you're going to want to do here is copy the Ensembl gene id's from the Description column to the Name column but remove their version suffixes (so ENSMUSG00000045515.5 would become ENSMUSG00000045515 and similar for all the genes). Then you can use the Mouse_ENSEMBL_Gene_ID_Human_Orthologs_MSigDB.v7.4.chip file selected from the dropdown for the Chip platform field.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
From:
gsea...@googlegroups.com <gsea...@googlegroups.com> on behalf of Eunjoo Kim <7wan...@gmail.com>
Date: Tuesday, November 16, 2021 at 3:21 PM
To: gsea-help <gsea...@googlegroups.com>
Subject: Re: [gsea-help]
Number of genes are21898 and it is mouse sample.


To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/ed643bd5-17eb-4097-be4e-fe87ec8530a8n%40googlegroups.com.
Hello,
It looks like GSEA is still not able to match the IDs. Did you ensure that both the IDs were moved to the first column (the column would still need to be called NAME) and the decimal version suffixes were stripped from all genes?
If you could provide a screenshot of the modified file opened in a plain text editor I can give suggestions on what might've gone wrong.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/29ac3b1a-8801-424f-a766-ecf26a8a5507n%40googlegroups.com.

Okay, I'm not actually seeing anything wrong with that data here. I suppose it's possible there could be some hidden spaces or something that the parser is choking on though.
Would you possibly be willing to send this dataset (confidentially) to us so I can take a closer look and debug the issue? We have a private email address gsea...@broadinstitute.org that can be used to send confidential data.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
From:
gsea...@googlegroups.com <gsea...@googlegroups.com> on behalf of Eunjoo Kim <7wan...@gmail.com>
Date: Wednesday, November 17, 2021 at 12:25 PM
To: gsea-help <gsea...@googlegroups.com>
Subject: Re: [gsea-help]

Error! Filename not specified.
Error! Filename not specified.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/7cf7ee11-f3a1-4d07-a4e0-8aee94dfce94n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/b8b76240-4500-4e13-a3eb-1a90c7a5b19cn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/c3b36161-cc3f-4488-912d-0118f2e565a0n%40googlegroups.com.
Hi Owaisa,
In the future please create a new thread to discuss your specific error message.
That said, GSEA expects ranking information for all expressed genes, not highly filtered subsets (i.e. just “significant” genes) like you appear to have provided. GSEA needs the additional information that these non-differentially expressed genes provide in order to properly compute the enrichment scores.
Please try again with the full dataset and let me know if you continue to encounter errors.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
From: Owaisa Haider
Sent: Sunday, February 26, 2023 9:02 AM
To: gsea-help
Subject: Re: [gsea-help]
I have 16 genes in my dataset I am using it the first time it gives the error of the threshold
On Tuesday, 11 October 2022 at 18:10:25 UTC+5 Anthony Castanza wrote:
Hi William
In the future, we ask that a new issue be opened to prevent unwanted replies to the original posters.
These appear to be mouse genes in the Riken gene I'd format. Are all the IDs Riken IDs? Or is this a mix of gene symbols and Riken IDs? I don't think we fully support the Riken database IDs, just those that are accepted as interim gene symbols.
If these are gene symbols and not just Riken IDs then are you using GSEA's collapse option with the Mouse gene symbols chip?
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San DiegoOn Tue, Oct 11, 2022, 4:24 AM William Chao <asq257...@gmail.com> wrote:
Hi Anthony,
I got the same error when running GESA, and my dataset is attached below, I also checked the detail on the website, however, it did not solve the problem. My input file contains 12752 genes from the mouse. And, I still don't know how to solve it. Thanks a lot.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/50d8ea8a-d13d-4978-8118-5ef5ad759262n%40googlegroups.com.