Dera UCSC team:Hi!We want to use the Refseq transcripts as the reference set in our RNA-seq analysis,so I downloaded two refseq file.We found the refGene.txt.gz in UCSC http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ is different from ref_GRCh37.p13_top_level.gff3.gz in NCBI ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/ in transcripts numbers.We wonder that which standard and method of refGene.txt.gz is based on and why the refseq files in UCSC and NCBI have so many differences?Is it right to use refGene.txt.gz in your website as reference sequences?Look forward your answer.Thank you !--
Hello,
The answer to your question is that you used the wrong RefSeq file. If you open the containing directory at ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/ and look at the timestamp, you will see that the file ref_GRCh37.p13_top_level.gff3.gz was last updated in 2013. That file is close to two years old, and many more transcripts have been released since that time.
The directories that we use to obtain RefSeq transcripts are the release files at ftp://ftp.ncbi.nlm.nih.gov/refseq/release/complete/ combined with the daily updates from ftp://ftp.ncbi.nih.gov/refseq/daily/. Unfortunately, there is no one file that we can point you to, as we are continually incorporating new data from the daily RefSeq updates. Please note that this process also involves data for more than just the human hg19 genome assembly, so you may need to apply some further filters to the files in those directories to find the data that you want.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--