Determining Exon Rank in Transcript Using Table Browser

394 views
Skip to first unread message

Dhir, Apoorv

unread,
Jul 8, 2014, 5:24:26 PM7/8/14
to gen...@soe.ucsc.edu
Hi,

I am a researcher at the University of Pittsburgh and have been using the UCSC Table Browser to generate a list of exon boundaries for the whole exome. Is there anyway to rank these exons within their respective transcripts? I have found this function useful in Ensembl and would like to attach similar rankings to data from UCSC. Alternatively, is there a way to assign an Ensembl Exon ID to exons generated by UCSC? I have been able to assign Ensembl Transcript IDs, but cannot figure out how to assign Exon IDs. Thank you very much for your time and help. I look forward to hearing from you.

Best,

Apoorv Dhir
Research Fellow
Cancer Genomics Facility
dhi...@upmc.edu

Matthew Speir

unread,
Jul 14, 2014, 11:59:38 AM7/14/14
to Dhir, Apoorv, gen...@soe.ucsc.edu
Hi Apoorv,

Thank you for your question about the UCSC Table Browser. Could you
provide me with some sample output from Ensembl? This would help me
suggest settings that you could use to get similar output from the UCSC
Table Browser.

I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

Matthew Speir

unread,
Jul 15, 2014, 3:03:37 PM7/15/14
to Dhir, Apoorv, gen...@soe.ucsc.edu
Hi Apoorv,

Thank you for the sample output. Unfortunately, it's not possible to get
the exact output you're requesting as we don't have the Ensembl exon
identifiers stored in any of our databases. You can however get a
similar output that includes Ensembl transcript identifiers plus the
exon number within that transcript from the UCSC Table Browser. For
example, if you were use the steps I describe below to get output for
the region chr21:33031597-33041570 (which includes the SOD1 gene), the
output would look like this:

chr21 33025905 33027740
ENST00000449339_exon_0_0_chr21_33025906_r 0 -
chr21 33030246 33030540
ENST00000449339_exon_1_0_chr21_33030247_r 0 -
chr21 33031709 33031813
ENST00000449339_exon_2_0_chr21_33031710_r 0 -
chr21 33031934 33032154
ENST00000270142_exon_0_0_chr21_33031935_f 0 +
chr21 33036102 33036199
ENST00000270142_exon_1_0_chr21_33036103_f 0 +
...

The exons are numbered starting at 'exon_0'. You can use the following
settings to get this style of output from the Table Browser:

1. Navigate to the Table Browser,
http://genome.ucsc.edu/cgi-bin/hgTables.

2. Select the following options:
clade: Mammal
genome: Human
assembly: Feb. 2009 (GRCh37/hg19)
group: Genes and Gene Predictions
track: Ensembl Genes
table: ensGene
region: Select "genome" for the entire genome, define a single
position in the 'position' box, or paste in a list of specific regions
by clicking the 'define regions' button.
output format: BED - browser extensible data
output file: enter a file name to save your results to a file, or
leave blank to display results in the browser.

3. Click 'get output'.

4. Under the 'Create one BED record per' section, select 'Exons
plus [0] bases at each end'.

5. Click 'get BED'.

I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 7/14/14, 12:00 PM, Dhir, Apoorv wrote:
> Hi Matthew,
>
> Thanks for your response! I have attached a file containing the sample Ensembl output you asked for. I am interested in associated information in the column named "Exon Rank in Transcript" to exon lists created using the UCSC Table Browser. My guess was that linking the two datasets using the Ensembl Exon ID might be the easiest. Any assistance you can provide is greatly appreciated!
>
> Best,
> Apoorv

Dhir, Apoorv

unread,
Jul 17, 2014, 10:46:07 AM7/17/14
to Matthew Speir, gen...@soe.ucsc.edu
Hi Matthew,

Thanks again for all your help! I am able to generate this output from UCSC, but I am still a little confused. What is the biological meaning of an exon numbered 0? Is this essentially referring to the first exon in a transcript? Also, what is the meaning of having two exons with the same number in one gene? Is this the result of alternative splicing? Sorry for all the questions, just want to make sure I am properly interpreting the information. I appreciate your help.

Matthew Speir

unread,
Jul 21, 2014, 7:38:59 PM7/21/14
to Dhir, Apoorv, gen...@soe.ucsc.edu
Hi Apoorv,

Yes, we use exon 0 to refer to the first exon within a transcript. As
for your second question, it is possible that a gene may have multiple
transcripts associated with it. For example, you can see in the
following session that there are 4 transcripts in the Ensembl genes
track associated with the SOD1 gene:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=mspeir&hgS_otherUserSessionName=hg19_sod1EnsTrans.
When getting output from the Table Browser as I described before, the
exons would be numbered according to their position within the transcript.

I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


Reply all
Reply to author
Forward
0 new messages