Human Reference Genome Annotation GTF download

6,448 views
Skip to first unread message

Pankaj Agarwal

unread,
Apr 10, 2014, 2:06:33 PM4/10/14
to gen...@soe.ucsc.edu
Hi,

I am looking to download the UCSC version of the human reference annotation file (which I believe is in GTF format) from the UCSC Genome Browser website but cannot readily find the file.  The closest that I saw was linked from

http://hgdownload.cse.ucsc.edu/downloads.html#human

to

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/

But there are many files in the link above, and I am not sure which file represents the complete set of annotations.

Just as a reference, the ENSEMBL provides their version of the annotation file at the following location

http://www.ensembl.org/info/data/ftp/index.html
ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens

I am doing rna-seq data analysis for which I need the annotation file.

I would appreciate your help for locating the annotation file.

Sincerely,

- Pankaj

--------------------------------------

Pankaj Agarwal, M.S

Bioinformatician

Bioinformatics Shared Resource

Duke Cancer Institute

Duke University

919-681-6573

p.ag...@duke.edu

 


Matthew Speir

unread,
Apr 11, 2014, 3:59:03 PM4/11/14
to gen...@soe.ucsc.edu
Hi Pankaj,

Thank you for your question about getting a gene annotation file from the UCSC Genome Browser. Unfortunately, we do not provide a download for the UCSC Genes annotation track in GTF format. You can however generate this file yourself using the Table Browser, http://genome.ucsc.edu/cgi-bin/hgTables. To get this information from the Table Browser, use the following steps:

1. Navigate to http://genome.ucsc.edu/cgi-bin/hgTables

2. Select the following options:
    clade: Mammal
    genome: Human
    assembly: Feb. 2009 (GRCh37/hg19)
    group: Genes and Gene Predictions
    track: UCSC Genes
    table: knownGene
    region: Select "genome" for the entire genome.
    output format: GTF - gene transfer format
    output file: enter a file name to save your results to a file, or leave blank to display results in the browser

3. Click 'get output'.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--




Pankaj Agarwal

unread,
Apr 12, 2014, 12:42:11 PM4/12/14
to gen...@soe.ucsc.edu

Hi,

 

Thank you for your clear instructions.  I had a quick follow up question.

 

Can you please provide some information on what are the basic differences in the annotation provided by Ensembl and UCSC, in general how they are produced and, in particular, what is the basic differences in their content and data structure.

 

Thank you,

 

- Pankaj

 

From: Matthew Speir [mailto:msp...@soe.ucsc.edu]
Sent: Friday, April 11, 2014 3:49 PM
To: Pankaj Agarwal
Subject: Re: [genome] Human Reference Genome Annotation GTF download

 

Hi Pankaj,

Thank you for your question about getting a gene annotation file from the UCSC Genome Browser. Unfortunately, we do not provide a download for the UCSC Genes annotation track in GTF format. You can however generate this file yourself using the Table Browser, http://genome.ucsc.edu/cgi-bin/hgTables. To get this information from the Table Browser, use the following steps:

1. Navigate to http://genome.ucsc.edu/cgi-bin/hgTables

2. Select the following options:
    clade: Mammal
    genome: Human
    assembly: Feb. 2009 (GRCh37/hg19)
    group: Genes and Gene Predictions
    track: UCSC Genes
    table: knownGene
    region: Select "genome" for the entire genome.
    output format: GTF - gene transfer format
    output file: enter a file name to save your results to a file, or leave blank to display results in the browser

3. Click 'get output'.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

On 4/10/14, 11:06 AM, Pankaj Agarwal wrote:

--

 

Luvina Guruvadoo

unread,
Apr 14, 2014, 1:11:41 PM4/14/14
to Pankaj Agarwal, gen...@soe.ucsc.edu
Hi Pankaj,

Please see the following track description pages for details:

UCSC Genes
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=knownGene

Ensembl
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=ensGene

In general, to see the description for any track, you can click on the gray bar to the left of the track in the main display or click on the track title above its configuration pull down menu.

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages