Re: Assistance with obtaining gene names for archived ENS IDs at the UCSC Genome Browser

499 views
Skip to first unread message

Mohammad Goodarzi

unread,
Jan 23, 2018, 3:46:10 PM1/23/18
to Cath Tyner, gen...@soe.ucsc.edu
Hello,

I have a set of gene ensemble ID and I try to map them to gene name, I have mapped all data bases and some are missing while some are changed.
can you please help me to overcome this problem?
Please find the data in attachment 

Thanks
Mohammad 

On Tue, Jan 23, 2018 at 11:13 AM, Cath Tyner <ca...@ucsc.edu> wrote:
Hello Mohammad,

I saw your request on the Ensembl dev forum - I work in User Support for the UCSC Genome Browser. You may want to see if our support team can provide a solution for you.

If you are interested in this, please email the UCSC Genome Browser support team public help forum with a detailed request:


Please also include about 10 of your ENS IDs so that we can use them in example output for you.

> On 22 Jan 2018, at 13:55, Mohammad Goodarzi <mohammad...@gmail.com> wrote:
>
> Hello,
>
> Thank you for your reply.
> Is it possible to guide me how to use one of your archive with Biomart or any other programming language ?
> When it comes to 3000 genes , it is very difficult to do them one by one .
>
> Thanks
> Mohammad

Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


Genenames.xlsx

Cath Tyner

unread,
Jan 23, 2018, 6:36:33 PM1/23/18
to Mohammad Goodarzi, UCSC Genome Browser Public Help Forum
Hello Mohammad,

Thanks for contacting the UCSC Genome Browser support team. 

I took a quick look at your file with 60,488 rows, and it seems that most ENS IDs have at least some columns with gene names. If the suggestions below aren't sufficient to accomplish your goal, please reply to this forum with about 10 or so IDs that you are trying to match gene names to. In other words, please send a sample of only the ENS IDs that you are having problems with, and let us know exactly the goal you wish to accomplish, with examples. This way, we can use your example ENS IDs to help you.

Below are some related previously answered mailing list questions that may help, and please also feel free to search our forum archive for related posts: https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome

Getting the ENS version number from Ensembl's Biomart tool:

Mapping UCSC IDs to ENS IDs

Mapping old-to-new UCSC IDs

Getting old gene IDs from hg19

Using MySQL to query for ENS IDs:

Using the Table Browser tool to get gene symbols

​As you move forward, please feel free to respond to this forum at any time if our support team can provide further assistance, and please always feel free to search our mailing list archives for related posts.

Thank you for contacting the UCSC Genome Browser support team. 
Please send new and follow-up questions to one of our mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


Mohammad Goodarzi

unread,
Jan 23, 2018, 7:30:04 PM1/23/18
to Cath Tyner, UCSC Genome Browser Public Help Forum
Hello, 

can you please annotate these genes 


ensembl_gene_id
ENSGR0000002586
ENSGR0000124333
ENSGR0000124334
ENSGR0000167393
ENSGR0000168939
ENSGR0000169084
ENSGR0000169093
ENSGR0000169100
ENSGR0000178605
ENSGR0000182162
ENSGR0000182378
ENSGR0000182484
ENSGR0000185203
ENSGR0000185291
ENSGR0000185960
ENSGR0000196433
ENSGR0000197976
ENSGR0000198223
ENSGR0000205755
ENSGR0000214717
ENSGR0000223274
ENSGR0000223484
ENSGR0000223511
ENSGR0000223571
ENSGR0000223773
ENSGR0000225661
ENSGR0000226179
ENSGR0000227159
ENSGR0000228410
ENSGR0000228572
ENSGR0000229232
ENSGR0000230542
ENSGR0000234622
ENSGR0000234958
ENSGR0000236017
ENSGR0000236871
ENSGR0000237040
ENSGR0000237531
ENSGR0000237801
ENSGR0000263835
ENSGR0000263980
ENSGR0000264510
ENSGR0000264819
ENSGR0000265658
ENSGR0000270726
ENSGR0000275287
ENSGR0000276543
ENSGR0000277120
ENSGR0000280767
ENSGR0000281849


Thanks
Mohammad

Cath Tyner

unread,
Jan 23, 2018, 9:31:11 PM1/23/18
to Mohammad Goodarzi, UCSC Genome Browser Public Help Forum
Hello again Mohammad,

Thank you for the refined list of example IDs! I'm assuming these are in regard to human genes? If they are for a different organism, please respond to this forum and let us know.

I've found the following GENCODE resource which I believe may apply to your example cases:


Why do some gene and transcript ids start with ENSGR or ENSTR in the GTF/GFF3?

Since the GTF convention dictates that feature ids have to be unique for different genome regions, we slightly modify the Ensembl feature id by >replacing the first zero with an "R". Thus, "ENSG00000182378.10" in chromosome X becomes "ENSGR0000182378.10" in chromosome Y.

If we convert your list to replace the "R" with a "0", e.g.,

sed 's/R/0/g' yourENSGRexamples.txt

ENSG00000002586
ENSG00000124333
ENSG00000124334

...and then use the converted list, I can find matches of these ENSG IDs both in the UCSC Genome Browser as well as Ensembl. 

There are ways to obtain gene names and other annotations for these ENSG ids  in the UCSC Genome Browser, but if you just want gene names, it's probably most efficient to just use Ensembl's BioMart tool. 

Doing so, I got 45 (out of your 50) results:

Gene stable ID Gene name
ENSG00000002586 CD99
ENSG00000124333 VAMP7
ENSG00000124334 IL9R
ENSG00000167393 PPP2R3B
ENSG00000168939 SPRY3
ENSG00000169084 DHRSX
ENSG00000169093 ASMTL
ENSG00000169100 SLC25A6
ENSG00000178605 GTPBP6
ENSG00000182162 P2RY8
ENSG00000182378 PLCXD1
ENSG00000182484 WASH6P
ENSG00000185203 WASIR1
ENSG00000185291 IL3RA
ENSG00000185960 SHOX
ENSG00000196433 ASMT
ENSG00000197976 AKAP17A
ENSG00000198223 CSF2RA
ENSG00000205755 CRLF2
ENSG00000214717 ZBED1
ENSG00000223274 RNA5SP498
ENSG00000223484 TRPC6P
ENSG00000223511 AL683807.1
ENSG00000223571 DHRSX-IT1
ENSG00000223773 CD99P1
ENSG00000225661 RPL14P5
ENSG00000226179 LINC00685
ENSG00000227159 DDX11L16
ENSG00000228410 ELOCP24
ENSG00000228572 AL954722.1
ENSG00000229232 KRT18P53
ENSG00000230542 LINC00102
ENSG00000234622 AL683807.2
ENSG00000234958 FABP5P13
ENSG00000236017 ASMTL-AS1
ENSG00000236871 LINC00106
ENSG00000237040 DPH3P2
ENSG00000237531 AL672277.1
ENSG00000237801 AMD1P2
ENSG00000265658 MIR3690
ENSG00000270726 AJ271736.1
ENSG00000275287 Metazoa_SRP
ENSG00000277120 MIR6089
ENSG00000280767 AL732314.4
ENSG00000281849 AL732314.6

At this point, you may want to double check with Ensembl just to make sure we're on the right track here, but if your list is derived from a GTF/GFF file, then this may be the direction to move forward with. If this is the case, then it looks like you may already have at least some of these ENSGids in your Genenames.xlsx file already. 

​As you move forward, please feel free to respond to this forum at any time if our support team can provide further assistance, and please always feel free to search our mailing list archives for related posts.

Thank you for contacting the UCSC Genome Browser support team. 
Please send new and follow-up questions to one of our mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube
UCSC Genome Browser Announcements List (for new data & software)
Request on-site training & workshops at your institution

​Enjoy,​
Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


Reply all
Reply to author
Forward
0 new messages