Exon number for Genomic coordinates

1,004 views
Skip to first unread message

Hersh

unread,
Dec 22, 2015, 11:12:05 AM12/22/15
to gen...@soe.ucsc.edu
Hi,

I have a list of genomics coordinates and I want to know these coordinates belongs to which exon number of a specific gene (for hg19 assembly). Is there any way I can get this quickly?

This is what I have. 
------------------
chr1 43814978
chr1 43814980
chr1 43815007
chr3 178916853
chr3 178916875
-------------------

Regards
Hersh

Cath Tyner

unread,
Dec 23, 2015, 6:13:25 PM12/23/15
to Hersh, gen...@soe.ucsc.edu
Dear Hersh,

Thank you for using the UCSC Genome Browser and for submitting your question regarding a fast way to find exon numbering based on a list of coordinates.

One way to view exon numbering is by using the UCSC Genome Browser graphical interface to view each coordinate, one at a time. Using your first coordinate as an example, navigate to the hg19 assembly Gateway Page, and then add the correct format of your coordinates by typing a colon directly after "chr1" like this: "chr1:43814978" and press the submit button.


Inline image 1

You are now zoomed on chromosome 1 to a graphical representation of the single base pair of your coordinate. You can hover over the exon block of the gene MPL to see Exon (10/10). Please note that there are multiple transcripts for the gene MPL.

Here is a link to that position where you could alter the URL to arrive to your other positions:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr1%3A43814978-43814978 
(for example, swap in your chr3 178916853 information, the %3A represents a colon).

At the edge of each side in the viewer, you can also hover your mouse pointer over the directional arrows of each transcript to jump to the edges of the next and previous exons (you may wish to zoom out first).

From the UCSC Genes track, you can also click on any transcript's gene symbol name to find further exon information, such as the total number of exons (and the total number of coding exons) in any particular transcript.

If you have a long list of coordinates, please reply to this mailing list, and I can describe another way to output a list of exon numbers for each coordinate.

References:
Video: How do I identify exon numbers with the UCSC Genome Browser?
https://www.youtube.com/watch?v=gK8B6sjzhmM&list=UUQnUJepyNOw0p8s2otX4RYQ

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.

Enjoy,​
Cath
. . .
Cath Tyner
UC Santa Cruz Genomics Institute


> --
>
> ---
> You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Cath Tyner

unread,
Dec 27, 2015, 5:43:40 PM12/27/15
to Hersh, gen...@soe.ucsc.edu
Hello again Hersh,

Thank you for 
​clarifying that you would like instructions that will list exon numbering for a large file of coordinates. Please try following the steps below to generate these data.


​Step 1: Create a custom track of all exons, in all genes, across the entire human genome (hg19). This track will be a superset, which includes exon numbering information. In a later step, we will filter for the subset of coordinates that you're interested in.

1.1. ​UCSC Genome Browser > Reset All User Settings

1.2. UCSC Genome Browser > Tools > Table Browser

1.3. Table Browser Settings

Assembly: Feb. 2009 (GRCh37/hg19)
Group: Genes and Gene Predictions
Track: UCSC Genes
Table: knownGene
Region: genome
Output format: custom track

1.4. Click the "get output" button.

1.5. On the "Output knownGene as Custom Track page," change the BED record setting:

Create one BED record per: Exons plus

1.6. Click the button, "
get custom track in genome browser
​​
"

The custom BED file "superset" track (all exons, all genes, entire genome) is now loaded in the browser for hg19. Below are example rows; note the exon numbering in bold:

chr3    178916537    178916965    uc003fjk.3_exon_1_0_chr3_178916538_f    0    +
chr3    178917477    178917687    uc003fjk.3_exon_2_0_chr3_178917478_f    0    +

Step 2. Create a BED3 file of your coordinates. A BED3 file has 3 columns: 1) chromosome number, 2) starting position, and 3) ending position. Since you listed data points as an example, your coordinates will have the same starting and ending positions.  

2.1. Let's format your raw data. For these instructions, I'll put your example data in a file called temp.

chr1 43814978 
chr1 43814980 
chr1 43815007
chr3 178916853
chr3 178916875

2.2. Use Awk to create a BED3 file:

​From the command line, issue the command:
cat temp | awk '{print $1 " " $2 " " $2}'

chr1 43814978 43814978
chr1 43814980 43814980
chr1 43815007 43815007
chr3 178916853 178916853
chr3 178916875 178916875

2.3. Copy the BED3 output so that you can paste these data in a later step. 

Step 3. Create a second "subset"  custom track from your BED3 format of your coordinates, which you created in Step 2.2.

3.1. From the genome browser, click on the button, "manage custom tracks."

Inline image 1

3.2. From the page "Manage Custom Tracks," select the button, "add custom tracks."

3.3. Make sure your Custom Track Settings are as follows, and then paste in (or upload) your BED3 data that was created in Step 2.2 above.

Clade: Mammal 
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)

​3.4. Click the "submit" button. You now have 2 custom tracks loaded:

Track 1: "User_Track" Your subset (example data, N=5)
Track 2: "tb_knownGene" Your superset of all exons, all genes, entire genome (hg19). 

Inline image 3
​Step 4. Use the Data Integrator Tool to filter the "superset" track such that only coordinate points from your "subset" track are displayed. 

4.1. From the "Manage Custom Tracks" page, go to > Tools > Data Integrator

4.2. Change "region to annotate" to "genome"

4.3. First, add your "subset" track: "User_Track"

4.4. Second, add your "supertrack": "tb_knownGene" 

You now have your 2 tracks loaded into the Data Integrator as seen in the screenshot below. Make sure the "User Track" appears first in the section "Configure Data Sources." If needed, you can drag and drop files to reorder them by clicking/holding & moving the small vertical double-headed arrows to the left of each track name.

Inline image 4

4.5. Customize fields for output by clicking the "Choose fields" button under "Output Options" and then click "Done."
 
Inline image 5

​4.6. ​From "Output Options," click "Get output." Below is the output you should see. This is output from the "superset" data, filtered to only show the coordinates you gave as example data. Note that there are multiple transcripts listed for coordinates (spaces added between your 5 coordinate regions for viewing ease), and exon numbering may differ with transcript variation.

chr1 43814978 uc009vwr.3_exon_9_0_chr1_43814934_f
chr1 43814978 uc001ciw.3_exon_9_0_chr1_43814934_f
chr1 43814978 uc001civ.3_exon_9_0_chr1_43814934_f

chr1 43814980 uc009vwr.3_exon_9_0_chr1_43814934_f
chr1 43814980 uc001ciw.3_exon_9_0_chr1_43814934_f
chr1 43814980 uc001civ.3_exon_9_0_chr1_43814934_f

chr1 43815007 uc009vwr.3_exon_9_0_chr1_43814934_f
chr1 43815007 uc001ciw.3_exon_9_0_chr1_43814934_f
chr1 43815007 uc001civ.3_exon_9_0_chr1_43814934_f

chr3 178916853 uc003fjk.3_exon_1_0_chr3_178916538_f

chr3 178916875 uc003fjk.3_exon_1_0_chr3_178916538_f

​Step 5. Understanding the output, 0-based coordinate system.

The last step of the process involves taking one more step to understand our 0-start numbering system. Because the first base in a chromosome is numbered 0, you must add 1 to the exon number. Taking the first row in the above output as an example:

chr1 43814978 uc009vwr.3_exon_9_0_chr1_43814934_f

​The output shows that this coordinate point (​chr 1 43814978) falls on exon numbered 9. However, since browser numbering starts at 0, you must add 1 to the exon number, making the final outcome for this particular transcript change from the "0-based number 9" to the actual exon number of 10. 

chr1 43814978 uc009vwr.3_exon_10

Inline image 6


Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.

Cath
. . .
Cath Tyner
UC Santa Cruz Genomics Institute

Cath
. . .
Cath Tyner
UC Santa Cruz Genomics Institute

On Wed, Dec 23, 2015 at 7:25 PM, Hersh <pari...@gmail.com> wrote:
Hi Cath,

Thanks for your response. 

Yes, I do have a long list of targets(over 1000) for which I need to find exon number. If you can let me know a work around for this, it will be very helpful. 

Regards
Hersh

Hersh

unread,
Dec 28, 2015, 12:23:24 PM12/28/15
to Cath Tyner, gen...@soe.ucsc.edu
Hi Cath,

Thanks for your response. 

Yes, I do have a long list of targets(over 1000) for which I need to find exon number. If you can let me know a work around for this, it will be very helpful. 

Regards
Hersh
On 24 December 2015 at 04:43, Cath Tyner <ca...@ucsc.edu> wrote:
Reply all
Reply to author
Forward
0 new messages