coordinates for coding regions for specific genes

94 views

Skip to first unread message

Watt, Jennifer Eunyoung

unread,

Jul 5, 2013, 2:06:31 PM7/5/13

to gen...@soe.ucsc.edu

I am trying to find coordinates for coding regions for specific genes in reference to Human ref hg19?

Where should I look for in UCSC website? I am not able to find it in reasonably amount time.

Your response will be appreciated.

The output should be something like this where the first coordinate is start and the second being end of exons.

Chr20:54963028-54963128

Chr20:54963228-54963328

Chr20:54963428-54963528

Chr20:54963628-54963728

Chr20:54963828-54963728

Chr20:54963628-54963928

Thanks.

Jennifer Watt

Jonathan Casper

unread,

Jul 5, 2013, 6:54:22 PM7/5/13

to Watt, Jennifer Eunyoung, gen...@soe.ucsc.edu

Hi Jennifer,

Thanks for your question about finding exon coordinates. You can obtain this information using the Table Browser.

Select the hg19 assembly and the "Genes and Gene Prediction Tracks" group. If there is a specific track of genes you wish to use, you can select it from the "Track" dropdown menu. For this example we'll use the "UCSC Genes" track, with "knownGene" for the table.

For region of interest select "genome", then click "paste list" for identifers. You can now enter one or more gene identifiers and click "submit". It was not clear from your question whether you have a list of gene names (e.g., BARD1, SIRT1) or the accession numbers of specific splice variants (e.g., uc0021to.2, NM_000899). Either type of identifier will work, but they will give slightly different results. If you provide a specific accession number, you will only receive coordinates for that variant. If you provide a gene name, you will receive coordinates for all splice variants of that gene that are found in the track.

To get just the start and stop position of the coding region for each gene, choose "selected fields from primary and related tables" for your output format. Click "get output". On the next page, put a check by "chrom", "cdsStart", and "cdsEnd". The output will look like this:

#chrom cdsStart cdsEnd
chr1 99150452 99225687
chr1 99127287 99225687

If instead you'd like the start and stop positions of each coding exon within the gene, you'll need to use a different output format. Select "BED - browser extensible data" and click "get output". You will now be taken to a second page where you can configure your BED output. Choose to create BED records for "Coding Exons", and click "get BED". The output will look like this:

chr2 215593399 215593732 uc002veu.2_cds_0_0_chr2_215593400_r 0 -
chr2 215595134 215595232 uc002veu.2_cds_1_0_chr2_215595135_r 0 -
chr2 215609790 215609883 uc002veu.2_cds_2_0_chr2_215609791_r 0 -

The first three fields of each line are chromosome, exon start, and exon end. The fourth field contains information including which splice variant is being listed (in this case, uc002veu.2) and which exon the entry is for (the first number after "cds", starting from 0). If there are multiple variants in the output, the exons will be grouped by variant.

Similar questions have been answered in our mailing list archives, so you may wish to look there for more information (see in particular this question). You can reach the archives by visiting our home page and clicking on "Contact Us" (http://genome.ucsc.edu/contacts.html).

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group

--

Reply all

Reply to author

Forward

0 new messages