Canonical Transcript

120 views

Skip to first unread message

Patidar, Rajesh (NIH/NCI) [C]

unread,

Mar 20, 2017, 10:52:52 AM3/20/17

to gen...@soe.ucsc.edu

Hi,

I am little bit confused over the canonical transcript for TP53.

I downloaded the knownToRefSeq.txt which says following is canonical transcript for TP53.

uc002gij.3 NM_001276760

but this does not produce the longest protein. The longest protein is “NM_000546 and NM_001126112”.

Could you please share the rules used to define canonical transcripts?

Thanks,

Rajesh

Matthew Speir

unread,

Mar 24, 2017, 4:29:14 PM3/24/17

to Patidar, Rajesh (NIH/NCI) [C], gen...@soe.ucsc.edu

Hi Rajesh,

Thank you for your question about canonical transcripts in the UCSC Genes track.

The UCSC Genes track scores each transcript in a "gene cluster" and then selects that transcript with the highest CDS score as the "canonical" transcript for that cluster. These CDS scores are produced using the txCdsPredict program and are based on a number of things, including ORF length, other ORF features, and whether or not the transcript is present in RefSeq.

For the case of the TP53 gene in the hg19 genome, the txCdsPredict program scores two transcripts the same, uc002gim.3 and uc002gij.3. The UCSC Genes construction process chooses uc002gij.3 it is before uc002gim.3 alphabetically.

You may also be interested in the use of APPRIS tags in the GENCODE Genes track, https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeGencodeV19. Items in the GENCODE Genes tracks are given various tags, http://www.gencodegenes.org/gencode_tags.html, some of which are APPRIS tags. The APPRIS group attempts to annotate alternatively splicing methods using various methods. For almost every gene loci, they have attempted to select a single CDS variant as the "PRINCIPAL" isoform, though there are still some that don't have a "PRINCIPAL" tag. So, if you are looking to create a set of canonical genes, it may be best to look to using the Table Browser and a set of filters to extract the APPRIS "principal" tagged transcripts from the GENCODE transcript set. If you're interested in this information, I can come up with some steps to extract it from the Table Browser.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group