Comparing Mouse data with human data genesets - getting error "After Pruning, none of the the genesets passed size threshold

355 views
Skip to first unread message

Nupur

unread,
Aug 2, 2022, 11:18:49 AM8/2/22
to gsea-help

Hi, 

I am a beginner in GSEA. 
I am comparing my mouse expression rank file with a human geneset that I prepared (.gmx file). 

I am constantly getting the error 1001 = "After Pruning, none of the the genesets passed size threshold".

My question is - can I compare the human and mouse data on GSEA or do I need to change gene nomenclature etc. ?

I have tried a few things, 
- my expression data rank file works well with any MsigDB geneset.
- I made a dummy geneset from my expression dataset rank file and that seems to work well.
- all formatting for creating .gmx file is correct and I have followed the instructions on GSEA wiki page.

Could it be that the gene nomenclature of the mouse expression rank file and the human geneset .gmx file have different nomenclature ?

I tried to manually search for a few genes from my human geneset on the mouse rank file and couldn't find any.

I would really appreciate some help with this!

Thanks,
Nupur

Anthony Castanza

unread,
Aug 2, 2022, 4:37:04 PM8/2/22
to gsea...@googlegroups.com
Hi Nupur,

Yes the nomenclature is different, the MSigDB genesets are made available (currently) exclusively in Human Gene Symbols. If your dataset is mouse, and you're using the CHIP files we provide in the collapse dataset function, then GSEA's internal tools are using these chip files to convert the dataset to orthologous human symbols so that the gene sets we provide can be used. If you're creating your own gene sets and they are not in human gene symbols, then you would not want to use the CHIPs, or Collapse Dataset functionality, that we provide.

Mouse and Human gene symbols have a different format (i.e. PTEN for human and Pten for mouse) however, it is not generally advised to "convert" symbols between the two spaces strictly by adjusting the letter case, proper orthology conversion using gene conservation data is always recommended as closely related genes might not be a strict 1:1 conversion.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/9bae1f99-c211-4045-8df0-5adba76aa686n%40googlegroups.com.

Rolando Milian

unread,
Aug 3, 2022, 8:01:47 AM8/3/22
to gsea...@googlegroups.com

Hi Anthony and Nupur,

You can find sets for mouse and rat here http://download.baderlab.org/EM_Genesets/current_release/   from the Bader Lab

 

Best,

Rolando

 

Rolando Garcia-Milian, MLS, AHIP

Research and Education Support for Bioinformatics | Lecturer in Epidemiology

Cushing/Whitney Medical Library | Environmental Health Sciences Department | Yale/NIDA Neuroproteomics Center

Yale School of Public Health | Yale School of Medicine

Office 203.785.6194

Email rolando...@yale.edu

Anthony Castanza

unread,
Aug 3, 2022, 11:37:53 AM8/3/22
to gsea-help
Hello Rolando,

Yes, I am aware of this resource. It is not officially provided by the GSEA-MSigDB team and we can't vouch for the methodology it uses to convert the gene sets provided into the mouse symbol space.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Nupur

unread,
Aug 8, 2022, 2:22:50 PM8/8/22
to gsea-help
Hi Anthony, 

Thanks for your response. 

I have prepared my own Geneset for GSEA. This geneset is made from Human genes. It is a small Geneset of 2 Genesets. 

My dataset is generated from Mouse samples RNA-Seq. 

As far as the formatting go, both Human geneset and Mouse dataset have their genes written in UPPERCASE alphabets. 

However, everytime I perform a GSEA analysis, I keep getting Error 1001.  

Any suggestions, what could be going wrong in my GSEA comparison. 

- Nupur

Anthony Castanza

unread,
Aug 8, 2022, 3:01:40 PM8/8/22
to gsea...@googlegroups.com
Hi Nupur,

How many genes are in each of your custom gene sets? GSEA has built in minimum (15) and maximum (500) thresholds for the sets and the sets will be discarded if they violate those default thresholds (unless they're adjusted).
That said, it is not correct to uppercase mouse gene symbols to get them to match human symbols. The mouse symbols should be left in their canonical format as approved by MGI (generally first letter uppercase rest lowercase) and the MSigDB provided chip files, or some other method of orthology conversion should be used to convert them to match the human symbol space.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Nupur

unread,
Aug 8, 2022, 4:26:23 PM8/8/22
to gsea-help
Hi Anthony, 

I have about 100 genes in each of the geneset. 

I also noticed that the geneset works fine with a dummy dataset that has more genes. My 'test' dataset has fewer genes in it ~500. Do you think the comparison is very constricted due to a very small number of genes in both geneset and dataset ? 

Thanks,
Nupur

Anthony Castanza

unread,
Aug 8, 2022, 5:02:55 PM8/8/22
to gsea-help
You can not run GSEA with such restricted datasets.
GSEA needs information for all expressed genes. We don't support running it in any other way and trying to will cause these sorts of errors.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
Reply all
Reply to author
Forward
0 new messages