How to solve a warning: Mixed MSigDB versions detected

447 views
Skip to first unread message

Maria Villa

unread,
Jul 12, 2023, 8:13:14 AM7/12/23
to gsea-help
Dear Team, I am running an analysis of RNAseq data whose genes are identified with gene symbols. I am analysing the enrichment of a custom signature (also with gene symbols as identifiers) but recurrently the software warns me that:
Mixed MSigDB versions detected. The selected CHIP does not match the version of the MSigDB collection selected. Some gene identifiers may not be mapped.
What should I do to avoid this? When I set the parameters, in Chip Platform I selected Human_HGNC_ID. Should I maybe select another chip? Or change any other parameter?
Thank you very much in advance.
Kind regards,
María.

David Eby

unread,
Jul 12, 2023, 1:53:44 PM7/12/23
to gsea...@googlegroups.com
Hi Maria,

This warning was added when we introduced the new Mouse MSigDB collections to warn users if they accidentally choose a Human GMT and a Mouse CHIP (or vice versa) for their analysis.  It is also meant to detect when the versions of those two files differ, like using MSigDB 7.5 and 7.4 together.

Make sure that version is the same in each of these files to avoid this warning.  So, for the latest MSigDB version, make sure that both files are '2023.1.Hs' if you are targeting the Human collections or '2023.1.Mm' if you are targeting the Mouse collections.

We've received a number of questions on this, so it's definitely possible that we have a bug in the warning code.  Let me know if you're still having trouble since it would be helpful in tracking down any issues.

Thanks,
David

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/b8446473-172e-4c94-a14a-b864609a777bn%40googlegroups.com.

Anthony Castanza

unread,
Jul 17, 2023, 12:42:16 PM7/17/23
to gsea-help
Hi Maria,

As an additional issue here,  Human_HGNC_ID is unlikley to be the correct CHIP for your dataset, HGNC IDs are of the format "HGNC:12345" and are not the HGNC approved gene symbols which are present in the Human_Gene_Symbol_with_Remapping chip files.
That said, If you are running your analysis with a custom gene signature you created it is not necessary to use our chip files. The critical step is to make sure that the genes in your signature are using the same symbols as the genes in your expression dataset as the symbols for genes can change over time. Our chip files are designed to make the necessary conversions for running various datasets against our gene sets. It can still be helpful to also convert your custom gene set using our mapping chip file though (although we don't offer a specific tool for this) as it can allow you to have a "harmonized" result should you choose to run an analysis with any of our gene sets down the road.

Let us know if you have any additional questions

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

Ben G.

unread,
Aug 24, 2023, 4:47:45 PM8/24/23
to gsea-help
Hello Team,

I am having the same issue, but I can't see where the version discrepancy is. I am using the REACTOME_SIGNALING_BY_TGF_BETA_RECEPTOR_COMPLEX.v2023.1.Hs.grp gene set and the Human_Ensembl_Gene_ID_MSigDB.v2023.1.Hs.chip CHIP platform on RNA-seq data from TCGA. I have trimmed the version numbers from the Ensembl IDs, but I keep getting the same error and GSEA collapses the dataset until it is empty. I've previously run similar GSEA on these data, so I'm confused. Any assistance would be appreciated.

Thank you very much,
Ben Greulich

Castanza, Anthony

unread,
Aug 25, 2023, 2:39:28 PM8/25/23
to gsea...@googlegroups.com

Hi Ben,

 

Could you send a sampling of the IDs from your dataset file?

Also, please confirm that the “Collapse/Remap to Gene Symbols” parameter was set to “Collapse”, otherwise any specified chip file will not actually be used by the software.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

--
You received this message because you are subscribed to a topic in the Google Groups "gsea-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gsea-help/9ikkPzMd6kU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/73d995ee-325f-443b-b1a9-d61aa4acbe31n%40googlegroups.com.

 

David Eby

unread,
Aug 26, 2023, 6:52:22 PM8/26/23
to gsea...@googlegroups.com
Hi Ben,

It's possible there are some bugs in this new check for matching versions and species.  It's a relatively new feature for MSigDB v2023.1 and we've already needed to fix it several times already.

Note that if you are downloading a gene set GRP / GMT /GMX from our web site and using the Load Data screen to bring it into GSEA then this check is just a warning that there might possibly be a mismatch.  There's no (easy) way for the GSEA Desktop to know that files coming in through Load Data haven't been modified by the user.  If you haven't modified them and you know that the "2023.1.Hs" part of the file name matches then you can ignore this warning.

We should probably revisit this check.  I'll bring this up to the team.

Thanks,
David

Ben G.

unread,
Aug 28, 2023, 11:32:26 AM8/28/23
to gsea-help
Here is a small list of a few IDs from the dataset. But yes, I had the parameter set to "Collapse".

ENSG00000280583

ENSG00000280586

ENSG00000280587

ENSG00000280588

ENSG00000280589

ENSG00000280594

ENSG00000280595

ENSG00000280598

ENSG00000280599



Ben

Castanza, Anthony

unread,
Aug 28, 2023, 1:44:54 PM8/28/23
to gsea...@googlegroups.com

So, most of the ids from your sampling appear to have been removed from Ensembl after version 84 with no successor genes annotated.
Our Ensembl chip file does contain information for mapping old Ensembl IDs to new Ensembl IDs when the retired ID has a successor gene annotated, but it seems like (at least for your sampling) this isn’t possible.

Is this a full expression dataset? How many genes are present in total before the collapse fails?


The gene set you mentioned contains 94 genes, so there should be quite a bit of wiggle room for Collapse to restrict the gene universe before it becomes too small to assess.

 

Would you be willing to share this dataset with us confidentially by sending it to gsea...@broadinstitute.org?

 

-Anthony

 

Anthony S. Castanza, PhD

Department of Medicine

Ben G.

unread,
Aug 29, 2023, 8:11:58 AM8/29/23
to gsea-help
Yes, this was RNA-seq data from TCGA. I am sending the data to the email provided, thank you!

Ben

Reply all
Reply to author
Forward
0 new messages