Amino acid to Ensembl transcript coordinates

22 views
Skip to first unread message

Kiana Mohajeri

unread,
Mar 25, 2022, 9:18:34 PM3/25/22
to gen...@soe.ucsc.edu
Hi there,

We have a list of amino acid positions for a set of genes and wanted to return the Ensembl transcript coordinates associated with these amino acids - we have ~100 amino acid sites we wanted to do this query for. Is this something that could be done through the table browser?

Thanks so much for your help,

Kiana

Luis Nassar

unread,
Mar 25, 2022, 9:44:39 PM3/25/22
to Kiana Mohajeri, gen...@soe.ucsc.edu

Hello Kiana,

Thank you for your interest in the Genome Browser.

There are two ways to go about this, I will briefly cover both as they have different strengths.

Since you have a relatively small amount of regions, you can use the Table Browser define regions button to paste in the ~100 sites. For example, I'll paste in these:

chrX   151073054   151173000
chrX   151183000   151190000 
chrX   151283000   151290000
chrX   151383000   151390000

You will then want to select the track with the ensembl transcript IDs. For hg38, that can be:

Genes and gene predictions -> GENCODE V29 -> knownGene

And for hg19, that can be:

Genes and gene predictions -> GENCODE V39lift37 -> Basic

Then you can click get output for a list of the transcripts that are in those regions. If you would like a shorter output, you can change the output format to selected fields from primary and related tables. Here is an example where I chose selected fields, and only selected chromchromStartchromEnd, and name using the regions I shared above:

#chrom    chromStart    chromEnd    name
chrX    151168221    151168326    ENST00000579077.1
chrX    151303383    151397142    ENST00000668689.1
chrX    151303430    151397018    ENST00000664935.1
chrX    151304625    151396195    ENST00000664896.1

If you followed the steps for hg38, the order of the columns may be a bit different. That is due to the data being a different format.

The second way to accomplish this is the Table Browser intersection feature. Which would require you to first create a custom track (http://genome.ucsc.edu/cgi-bin/hgCustom) of your regions. The advantage here is the custom track can by any size or it can also be dynamically created as the output of the Table Browser. Lastly, if you create a custom track with additional columns you would like to keep, say your data looks like this:

chrX   151073054   151173000   specialValue1   extraAnnotation1
chrX   151183000   151190000   specialValue2   extraAnnotation2
chrX   151283000   151290000   specialValue3   extraAnnotation3
chrX   151383000   151390000   specialValue4   extraAnnotation4

And you would like to match it with the intersecting transcript and retain the custom annotations, you could use the Data Integrator (http://genome.ucsc.edu/cgi-bin/hgIntegrator).

You can take a look at the following archived questions for additional information:

https://groups.google.com/a/soe.ucsc.edu/g/genome/c/XHp6o2AZLSE/m/BbAiHAV1CwAJ
https://groups.google.com/a/soe.ucsc.edu/g/genome/c/wac5It1Qlh4/m/P2ONUrEeBAAJ
https://groups.google.com/a/soe.ucsc.edu/g/genome/c/n7kT8Ctg2a8/m/RTaK-8OSBQAJ

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/05C5E7F0-A7F4-4930-8B0C-A9721FCB00C0%40tornadobio.com.
Reply all
Reply to author
Forward
0 new messages