How to download Refseq annotation gtf file

397 views
Skip to first unread message

Aritra Deb

unread,
Jun 3, 2024, 12:52:07 PM6/3/24
to gen...@soe.ucsc.edu
Hello,

I'm looking for the RefSeq genome reference .gtf files (both GRCh37 and GRCh38) for bulk RNAseq data analysis. Is it available from UCSC genome browser database? Previously I downloaded and used the file 'GCF_000001405.40_GRCh38.p14_genomic_fixed.gtf' from NCBI. But it gave an error [Error: no valid ID found for GFF record] while running StringTie. Please let me know where and how to download the correct file. Thanks.

Aritra Deb
Research Fellow
The Wellcome-Wolfson Institute for Experimental Medicine
School of Medicine, Dentistry and Biomedical Sciences
Queen's University Belfast
97 Lisburn Road
Belfast
BT9 7BL

Jairo Navarro Gonzalez

unread,
Jun 5, 2024, 7:19:14 PM6/5/24
to Aritra Deb, gen...@soe.ucsc.edu

Hello,

Thank you for using the UCSC Genome Browser and sending your inquiry.

You can find the NCBI RefSeq GTF files for hg38 and hg19 on the hgdownload server.

hg38:

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz

hg19:

https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.ncbiRefSeq.gtf.gz

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/PA4PR07MB8838E0FB0D45397F65970E37ACFE2%40PA4PR07MB8838.eurprd07.prod.outlook.com.

Hong-Seok Ha

unread,
Nov 1, 2024, 12:52:32 PM11/1/24
to UCSC Genome Browser Public Support, Jairo Navarro Gonzalez, gen...@soe.ucsc.edu, Aritra Deb

Hello,

I would like to inquire about obtaining RefSeq GTF or BED12 files for all versions in hg19. For example, for an entry like NM_xxxxxx.4, I need files that include all versions, such as NM_xxxxxx.1, NM_xxxxxx.2, NM_xxxxxx.3, and NM_xxxxxx.4. When I search RefSeq Genes data, I can see information on previous versions, but they are not available for download via the Table Browser. Could you please let me know if there is a way to obtain files that contain all versions of the RefSeq data?

Thank you very much.
Hongseok Ha

Jairo Navarro Gonzalez

unread,
Nov 1, 2024, 8:11:14 PM11/1/24
to Hong-Seok Ha, UCSC Genome Browser Public Support, Aritra Deb

Hello,

Thank you for using the UCSC Genome Browser and sending your inquiry.

For questions about data access, most tracks have a "Data Access" section that should guide you through most of your questions. For example, the NCBI RefSeq track for hg19:

https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&c=chr7&g=refSeqComposite

On that page, there is a link to the archives for this track:

https://hgdownload.soe.ucsc.edu/goldenPath/archive/hg19/ncbiRefSeq/

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser

On Fri, Nov 1, 2024 at 3:30 AM Hong-Seok Ha <empero...@gmail.com> wrote:

Hello,

I would like to inquire about obtaining RefSeq GTF or BED12 files for all versions in hg19. For example, for an entry like NM_xxxxxx.4, I need files that include all versions, such as NM_xxxxxx.1, NM_xxxxxx.2, NM_xxxxxx.3, and NM_xxxxxx.4. When I search RefSeq Genes data, I can see information on previous versions, but they are not available for download via the Table Browser. Could you please let me know if there is a way to obtain files that contain all versions of the RefSeq data?

Thank you very much.

Aritra Deb

unread,
Dec 10, 2024, 4:44:30 PM12/10/24
to gen...@soe.ucsc.edu, Jairo Navarro Gonzalez
Hello,

Thank you very much! Your previous email helped me a lot for the differential analysis of RNA-seq data considering RefSeq Hg19 as reference (hg19.ncbiRefSeq.gtf). Now my question is if I want the gff3 file for this 'hg19.ncbiRefSeq.gtf' then where/how I can get it? If there is any repository from where I can download the gff3 file then please let me know. Alternatively, if there is any tool which can be used to create gff3 file from a gtf file please let me know how to use it. Thanks.

Dr. Aritra Deb
Research Fellow
Centre for Public Health
School of Medicine, Dentistry and Biomedical Sciences
Institute of Clinical Science B
Queen's University Belfast
Royal Victoria Hospital
97 Lisburn Road
Belfast
BT12 6BJ

From: Jairo Navarro Gonzalez <jnav...@ucsc.edu>
Sent: 06 June 2024 00:18
To: Aritra Deb <A....@qub.ac.uk>
Cc: gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
Subject: Re: [genome] How to download Refseq annotation gtf file
 
You don't often get email from jnav...@ucsc.edu. Learn why this is important
This message is from an external sender. Please take care when responding, clicking links or opening attachments.

Matthew Speir

unread,
Dec 13, 2024, 6:04:16 PM12/13/24
to Aritra Deb, gen...@soe.ucsc.edu, Jairo Navarro Gonzalez
Hello, Aritra.

Two of our engineers share that many pipelines they've encountered are compatible with both GTF and GFF3. This is because GTF is a functional subset of GFF3. Could you share why you need GFF3 specifically instead of GTF?

We only provide GTF files and we don't have any tools for converting files into GFF3. But there are several tools available online. One is Galaxy (https://usegalaxy.org/) which has an interface for gffread that can be used for converting other formats into GFF3.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---

Matthew Speir

UCSC Genome Browser, User Support


Aritra Deb

unread,
Dec 26, 2024, 12:54:58 PM12/26/24
to Matthew Speir, gen...@soe.ucsc.edu, Jairo Navarro Gonzalez
Hello,

Apologies for a delayed reply. I somehow missed your email. The reason I'm looking for GFF3 along with GTF is that I'm trying to perform a differential splicing analysis for RNA-seq data. For this purpose I'm using the MAJIQ/VOILA method(https://biociphers.bitbucket.io/majiq-docs/index.html). I have used this method with Gencode Grch38 reference where both GTF and GFF3 is available, and that worked fine. Now I'm trying to run the same with Refseq hg19 reference, as this is required for further analysis. I tried to convert the GTF to GFF3 but it didn't work. So I'm looking for a correct GFF3 or a proper tool to convert from GTF to GFF3. Hope this has explained the situation. Thanks.

Aritra

From: Matthew Speir <msp...@ucsc.edu>
Sent: 13 December 2024 23:03
To: Aritra Deb <A....@qub.ac.uk>
Cc: gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>; Jairo Navarro Gonzalez <jnav...@ucsc.edu>

Subject: Re: [genome] How to download Refseq annotation gtf file
 
You don't often get email from msp...@ucsc.edu. Learn why this is important

Matthew Speir

unread,
Dec 29, 2024, 3:40:03 PM12/29/24
to Aritra Deb, gen...@soe.ucsc.edu, Jairo Navarro Gonzalez
Hello, Aritra.

NCBI provides annotation files in both GTF and GFF format on their website: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/. If you need more details about the specific files, there is a README in that directory.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---

Matthew Speir

UCSC Genome Browser, User Support

Maximilian Haeussler

unread,
Dec 30, 2024, 8:53:09 AM12/30/24
to Matthew Speir, Aritra Deb, gen...@soe.ucsc.edu, Jairo Navarro Gonzalez
Hi Aritra,
> I tried to convert the GTF to GFF3 but it didn't work.

As Matt mentioned, the tool gffread should produce valid GFF3s from most files that vaguely look like GFF or GTF, it can fix up most common GFF problems and should produce valid GFF3 files: 
$ gffread input.gtf -o output.gff3
If its output is not accepted by your pipeline, you may contact the pipeline's author or the gffread author.

I've seen "gtf_to_gff3" mentioned on the internet from GenomeTools, but have never tried it.

best
Max


Reply all
Reply to author
Forward
0 new messages