Hello all! Can anyone please help me with conversion of mouse gene symbols to HUGO? I have a list of 9171 mouse genes obtained from RNA-Seq, and I want to analyze them with GSEA. However, for further work I first have to convert their official mouse gene symbols to HUGO notation. GSEA recommends (http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/RNA-Seq_Data_and_Ensembl_CHIP_files) first to convert all genes to their ENSEMBL ID (for example Trp53 -> ENSMUSG00000059552) using BioMart, and then convert the ENSEMBL ID to HUGO using the dictionary available on GSEA webpage (ENSEMBL_mouse_gene.chip, ftp://ftp.broadinstitute.org/pub/gsea/annotations/) (for example, ENSMUSG00000059552 -> TP53).
However, not all genes were well converted that way.
1. Only 8335 genes out of 9171 had a unique HUGO symbol.
2. 373 genes are present in the ENSEMBL dictionary of mouse gene symbols provided by BioMart (http://useast.ensembl.org/biomart/martview/1b29a6e5676193a70708c8674600ceb5), but are absent in the dictionary of HUGO symbols provided by GSEA (ENSEMBL_mouse_gene.chip).
3. 462 gene symbols were absent in the BioMart dictionary, so I could not even find an ENSEMBL ID for them.
4. Finally, one gene (Snora16a) has two HUGO symbols (SNORA16A and SNORA16B).
So, after all I need to know how to convert those remaining 373+462+1=836 gene symbols to HUGO notation to be able to analyze them with GSEA adequately. This means that for each mouse gene symbol I need to know a corresponding HUGO symbol used in GSEA gene sets (if some mouse gene symbols correspond to several synonymous HUGO symbols, then I must know them all).
I would appreciate any help of a GSEA administrator or any other experienced person.
Thanks
Alex Surnov
Saint Louis University