mouse gene symbols to human homologs

500 views
Skip to first unread message

joyce

unread,
Dec 16, 2020, 3:24:50 PM12/16/20
to gsea-help
Hi GSEA Team,

I work with mouse data and have been using the Mouse_Gene_Symbol_Remapping_Human_Orthologs_MSigDB.v7.2.chip file for GSEA. I understand that sometimes multiple mouse gene symbols map to the same human orthologs. Could you please tell me what the GSEA algorithm does in this case? 

I am also wondering about the cases where a given mouse gene symbol corresponds to several human homologs. How is the choice of the human ortholog made in the chip file in such cases?

Thank you very much,
Joyce

Anthony Castanza

unread,
Dec 16, 2020, 3:52:43 PM12/16/20
to gsea...@googlegroups.com

Hi Joyce,

 

This is a little bit of a complicated, and not necessarily great process. At the root level we use the https://www.alliancegenome.org/ data to build our orthology chips. When multiple orthologs exist for a gene, we keep the single ortholog with the best reciprocal match according to this data. This substantially cuts down on the number of multiple mappings, however, there are some genes that are impossible to pick a best match for in this way. So, for these we take a relatively unsophisticated approach of mapping them to all their possible orthologues and then assigning them a single match based on the age of the orthologue’s annotation in NCBI, where the longer a gene has been annotated by NCBI, the higher its priority is for being assigned as the mouse ortholog. I’m hoping to incorporate some additional data before this step in the next release that will cut down the number of genes that make it to this assignment step. For the reverse direction, this is handled by the collapse settings specified by the user in the GSEA software.

 

This process is generally unfortunate, but it prevents the computational explosion caused by assigning all:all matches. Its important to note that this is done consistently both when building the CHIP files, and when building mouse-derived gene sets in MSigDB, so if the set contains one of these unfortunate genes, it will get mapped the same in both.

We’re continually working on improvements to this process, and, in addition, are working on resources specifically for analyzing mouse data without orthology conversion that we hope to release relatively soon.

 

Happy to address any additional questions you might have,

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/80a7b8a5-85b6-4896-bc50-533ac11a106bn%40googlegroups.com.

I-Hsiu Lee

unread,
Dec 16, 2020, 4:49:46 PM12/16/20
to gsea...@googlegroups.com
Thank you, Anthony, for your reply. It's great to hear that you are working on resources specifically for analyzing mouse data without orthology conversion. Your team's hard work is greatly appreciated!

Reply all
Reply to author
Forward
0 new messages