question about transcriptome

12 views
Skip to first unread message

Lisse, Thomas

unread,
Aug 19, 2015, 11:40:24 AM8/19/15
to gen...@soe.ucsc.edu

Dear UCSC Genome Browser,

 

I was wondering how to determine the transcriptome size in basepairs for various organisms using the Table Browser on your site?

 

I need this information to estimate depth of coverage for various RNAseq experiments, and this value is important. For example, the estimated human transcriptome is 60mb, but how can I find this out using your genome browser?

 

I read a blog on SEQweb and they said it can be done with the UCSC browser, but no details were given.

 

I appreciate your help.

 

Sincerely

 

Thomas lisse

 

Jonathan Casper

unread,
Aug 20, 2015, 7:08:46 PM8/20/15
to Lisse, Thomas, gen...@soe.ucsc.edu

Hello Thomas,

Thank you for your question about obtaining the transcriptome size of a genome. I suspect the feature described on the blog is the summary data provided by the UCSC Table Browser. The "summary/statistics" button near the bottom of the Table Browser page will give you information about the coverage from our tracks. Please note, however, that the value will change depending on which data set you choose to represent the transcriptome. Here are the steps to use the "summary/statistics" button.

1. Open the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables
2. Select your clade, genome, and assembly of choice. As an example, you might select "clade: Mammal", "genome: Human", and "assembly: Dec. 2013 (GRCh38/hg38)".
3. Select the track group and track that you wish to use to represent the transcriptome. One option would be to use a track from the "Genes and Gene Predictions" group like UCSC Genes, RefSeq Genes, or one of the GENCODE Genes tracks. Another option would be to select a track from the "mRNA and EST" group such as "Human mRNAs" or "Human ESTs".
4. Select the region "genome" to ensure your results will cover the full assembly.
5. Click the "summary/statistics" button.

On the resulting page, you should see a series of statistics. "item bases" is unlikely to be relevant to your interests - that includes intron regions. I suspect the value you are looking for is "block bases", which will tell you how much of the genome is covered by individual blocks (i.e., exons and UTRs). "block total" is a count of how many bases would be covered if all the blocks were laid end-to-end. The "block bases" value is smaller than "block total" because it only counts overlapping entries once; "block total" counts a base multiple times if it is covered by multiple transcript records.

For example, the hg38 gene track "GENCODE v22" (and table knownGene) has a "block bases" of 124,946,431 (4.10% of the assembly). The "Human ESTs" track gives a result of 371,496,959 (12.18% of the assembly).

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group


--


Reply all
Reply to author
Forward
0 new messages