The difference of Refseq annotation between UCSC and NCBI

176 views
Skip to first unread message

雪儿

unread,
Apr 22, 2015, 12:05:43 PM4/22/15
to genome
Dera UCSC team:
   Hi!
   We want to use the Refseq transcripts as the reference set in our RNA-seq analysis,so I downloaded two refseq file.We found the refGene.txt.gz in    UCSC http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ is different from ref_GRCh37.p13_top_level.gff3.gz in NCBI     ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/ in transcripts numbers.We wonder that which standard and method of refGene.txt.gz is based on and why the refseq files in UCSC and NCBI have so many differences?Is it right to use refGene.txt.gz in your website as reference sequences?
  Look forward your answer.Thank you !

Luvina Guruvadoo

unread,
Apr 24, 2015, 4:15:26 PM4/24/15
to 雪儿, genome
Hello,

Thanks for your question. In placing RefSeq alignments on the browser, we use several filtering criteria. These are outlined on the RefSeq Genes track description page: "RefSeq mRNAs were aligned against the human genome using blat; those with an alignment of less than 15% were discarded. When a single mRNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept." You can read more about the RefSeq Genes track here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=refGene. Also, a "reference" set of genes could refer to any set - you would have to determine what your criteria are for selecting a reference.

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group


--


Reply all
Reply to author
Forward
0 new messages