A question about CDS completness

398 views
Skip to first unread message

Alexis ALLOT

unread,
Nov 5, 2013, 9:58:44 AM11/5/13
to gen...@soe.ucsc.edu
Hello,

When I look for the transcript ENST00000566928 on the UCSC :

http://genome-euro.ucsc.edu/cgi-bin/hgc?hgsid=194570952&c=chr15&o=41145393&t=41149024&g=ensGene&i=ENST00000566928

It says that both CDS start and end are 'complete'.

But when I look, for the same transcript on ensembl (same database version, 73), it says that both 5' and 3' are 'CDS incomplete' :

http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000166145;r=15:41145394-41149024;t=ENST00000566928

Could you explain me why cds status is different between ucsc ensembl track and ensembl ?

Thanks,
Alexis.

Emily Pritchard

unread,
Nov 5, 2013, 12:47:12 PM11/5/13
to gen...@soe.ucsc.edu
Hi Alexis

In Ensembl we have this transcript listed as 5' and 3' incomplete because it lacks both a start codon and a stop codon, as you can see in this cDNA view:
http://www.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA?db=core;g=ENSG00000166145;r=15:41145394-41149024;t=ENST00000566928

Hopefully someone at UCSC will be able to let you know why this was changed in their import.

All the best

Emily
Ensembl helpdesk
--
 
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Hiram Clawson

unread,
Nov 5, 2013, 1:01:59 PM11/5/13
to Emily Pritchard, gen...@soe.ucsc.edu

Good Morning Emily:

We find start/stop codons in the GTF definition file from Ensembl:

15 protein_coding start_codon 41145394 41145395 . +0 gene_id
"ENSG00000166145"; transcript_id "ENST00000566928"; exon_number "1"; gene_name "SPINT1";
gene_biotype "protein_coding"; transcript_name "SPINT1-007";
15 protein_coding stop_codon 41149022 41149024 . +0 gene_id
"ENSG00000166145"; transcript_id "ENST00000566928"; exon_number "8"; gene_name "SPINT1";
gene_biotype "protein_coding"; transcript_name "SPINT1-007";

As found in the file:
ftp://ftp.ensembl.org/pub/release-73/gtf/homo_sapiens/Homo_sapiens.GRCh37.73.gtf.gz

I'm assuming this is why our conversion system called it complete ?

--Hiram

Emily Pritchard

unread,
Nov 5, 2013, 1:13:40 PM11/5/13
to Hiram Clawson, gen...@soe.ucsc.edu
Good Evening Hiram

I think that your import is misinterpreting our GTF file. The
"start_codon" and "stop_codon" in the files indicate the start and end
of the protein coding region of that transcript, not the ATG and
nonsense codon. If you look at the sequence, you can see that it lacks
these features that would make the CDS complete.

Emily
Reply all
Reply to author
Forward
0 new messages