Finding exon-only length of a gene

968 views
Skip to first unread message

Erwin, Kristine

unread,
Mar 14, 2016, 3:52:16 PM3/14/16
to gen...@soe.ucsc.edu
Good afternoon,
 
Am I able to find the length of a gene that includes exons only?
 
At the top of each entry (for example: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr3%3A48554073-48605302&hgsid=481841181_8Q2AD5r8vKVxi27hyb8hnLruvqqv), I can see the total length of the gene is 51,230 bp. Is it possible to find the length of the coding sequence only?
 
I would very much appreciate if you could direct me to find this information.
 
Thank you in advance,
 
Kristine
 
 

Christopher Lee

unread,
Mar 15, 2016, 2:24:08 PM3/15/16
to Erwin, Kristine, gen...@soe.ucsc.edu

Hi Kristine,

Thank you for your question about finding the exon only length
of a gene. It is unclear from the link you shared what exactly
you want to find the length of, but the 51,320 number you are
seeing is the size of the genome (in base pairs) that you are
currently viewing in the Genome Browser window, not the length
of an individual gene.

That said, you can still use the Genome Browser to find the total
length of the exons in an individual gene using the Table Browser
tool: http://genome.ucsc.edu/cgi-bin/hgTables.

From the Table Browser page, make the following selections:

clade: Mammal
genome: Human
assembly: hg38
group: Genes and Gene prediction Tracks
track: GENCODE V22
table: knownGene
identifiers: paste or upload a list of gene names or ID's
output format: selected fields from primary and related tables

Then click "get output". From the get output page, in the section
"Select fields from hg38.knownGene" choose exonStarts, and
exonEnds. In the "hg38.kgXref fields" section, choose "geneSymbol"
and then click "get output".

For each exonStart and exonEnds pair, take the difference of
exonEnds-exonStart and sum them, this will give you the total
exon length.

Here is a similar previously answered mailing list question with an
example: https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/l4Roei7mdxg/ljAaFlsXcR0J

For more information on the table browser please see the Table
Browser User's Guide at:
https://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html

You can also search our archived mailing list at
https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome
to see similar questions that have been asked previously.

Thank you again for your inquiry and using the UCSC Genome Browser.
If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible
forum. If your question includes sensitive data, you may send it instead
to genom...@soe.ucsc.edu.


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Erwin, Kristine

unread,
Mar 17, 2016, 12:37:19 PM3/17/16
to Christopher Lee, gen...@soe.ucsc.edu
Hi Christopher,
 
Thank you very much for your response. I apologize that my question was not clear. I have found the data I was looking for, which is the length of the mRNA sequence for the most prevalent isotope of the gene I am interested in. E.g. This page shows the length as 7972bp for the most prevalent isotope of CEP290 (please correct me if I am wrong!): http://genome.ucsc.edu/cgi-bin/hgGene?hgsid=481828683_WsYDP0dikI1qD0Z7DTuLaWnM2yAP&hgg_do_getMrnaSeq=1&hgg_gene=uc001tar.3
 
I have a list of over 5000 genes in excel, that I have been manually typing into your database and clicking through to find the mRNA sequence length. Is there a function on your website to automate this process somehow? Maybe I can input all the genes I am interested in and the data can be auto-pulled, or maybe I can download the mRNA length information for human genes and search this using an excel function?
 
Please let me know if there is any faster way of mining this information.
 
Thank you very much,
 
Kristine

Christopher Lee

unread,
Mar 18, 2016, 6:17:53 PM3/18/16
to Erwin, Kristine, gen...@soe.ucsc.edu

Hi Kristine,

Thank you for your question about finding a list of 5000 mRNA sequences lengths.
If you have a list of UC ID's like the one from your example: uc001tar.3, then
you can use the Table Browser to select the mRNA length from the kgTargetAli table.
Note that if you have a mix of ID's, some from hg19 and some from hg38, then you
will have to perform two separate queries:

1. Head to the table browser http://genome.ucsc.edu/cgi-bin/hgTables

2. Make the following selections:
clade: Mammal
genome: Human
assembly: hg38, or hg19
group: All Tables
table: knownGene
identifiers: paste or upload your list of 5000 ID's


output format: selected fields from primary and related tables

3. Click "get output".

4. In the "Select Fields from db.knownGene" section, where db is either hg38 or hg19, check the checkbox "name".

5. In the "linked tables" section, check the checkbox of the row containing "hg38", "kgTargetAli", "Summary info about a patSpace alignment".

6. Click "allow selections from checked tables".

7. In the "db.kgTargetAli fields" section, check the checkbox qSize.

8. Click "get output".

You will now have a list of the form:

#hg38.knownGene.name    hg38.kgTargetAli.qSize
uc001xjt.4     5351
uc059cqu.1    565

The qSize field is the size of the mRNA, minus the poly-A tail, so it will
be slightly smaller than the number you get when you click through to see
the mRNA from the browser. For more information on how we trim poly-A tails
see this archived mailing list question:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/Lsf-VTWAUW0/OZzh87H0GAAJ

Thank you again for your inquiry and using the UCSC Genome Browser.
If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible
forum. If your question includes sensitive data, you may send it instead
to genom...@soe.ucsc.edu.

Reply all
Reply to author
Forward
0 new messages