Homo_sapiens.GRCh37.70.gtf vs UCSC hg19

456 views
Skip to first unread message

Anna Battenhouse

unread,
Aug 4, 2015, 6:20:09 PM8/4/15
to gen...@soe.ucsc.edu, Vishy Iyer
Greetings gentle genome browser folk -

In the process of annotation wrangling I ran into a peculiarity where the coordinates of an Ensembl transcript in the UCSC Genome Browser do not match those in the Ensembl gtf (specifically, Homo_sapiens.GRCh37.70.gtf, which I believe is their final GRCh37 annotation set).

transcript_id ENST00000375867 (RP11-1E11.1), exon 1
UCSC hg19: chr9:90795588-90796362
Ensembl Homo_sapiens.GRCh37.70.gtf:  9:90795584-90796324

The starts are off by 4, ends by 38. The UCSC coordinates also match gencode.v19.annotation.gtf coordinates for both the transcript and the specific exon.

Can you explain the discrepancy to me?

Thank you,

--
Anna Battenhouse | Associate Research Scientist | Iyer Lab | Institute for Cellular & Molecular Biology | Center for Systems and Synthetic Biology | The University of Texas at Austin | 2500 Speedway, MBB 3.106 | Austin, TX 78712-0159 | (512) 232-2632 (o) | (512) 587-4159 (m) | abatte...@utexas.edu

Luvina Guruvadoo

unread,
Aug 4, 2015, 7:05:45 PM8/4/15
to Anna Battenhouse, gen...@soe.ucsc.edu, Vishy Iyer
Hello Anna,

Thank you for your question. The Ensembl Genes track on our Browser is based on Ensembl version 75, whereas the gtf file you mentioned was released for version 70. If you search for the transcript ID (ENST00000375867) in the Ensembl gtf for version 75, you will find that the coordinates match what's represented in the genome browser for hg19: http://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/

I hope this helps. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group


--


Anna Battenhouse

unread,
Aug 5, 2015, 12:42:56 PM8/5/15
to Luvina Guruvadoo, Anna Battenhouse, gen...@soe.ucsc.edu, Vishy Iyer
Ah yes, I now see that your Ensembl genes track does say it is based on version 75. I will download that version and use it.

I must say, I find the Ensembl downloads a bit challenging to navigate. It seems unnecessarily hard to figure out which GrCh build a release corresponds to.

Do you update the UCSC consolidated genes track (knownGenes & friends) whenever one of the sources is updated? I noticed there is a very recent refFlat (August 3, 2015), but it does not look like that has been incorporated in UCSC genes.


Thank you,
Anna

Luvina Guruvadoo

unread,
Aug 6, 2015, 6:32:06 PM8/6/15
to Anna Battenhouse, gen...@soe.ucsc.edu, Vishy Iyer
Hello Anna,

We do not have plans to update the UCSC Genes track on our hg19 assembly at this time. Note for hg38, we recently switched the default gene track from UCSC Genes to Gencode Genes. You can read more about this transition here: http://genome.ucsc.edu/goldenPath/newsarch.html#062915

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group

Reply all
Reply to author
Forward
0 new messages