Request to check refGene data.

36 views
Skip to first unread message

Park JinSil

unread,
Jun 3, 2020, 11:59:04 AM6/3/20
to gen...@soe.ucsc.edu

Hello Im JinSil Park in Macrogen from Korea

 

While looking at Rat refgene file, I found an error in refGene.txt.gz (01-Mar-2020 version)

In refGene.txt.gz, the position of Vom2r4 is shown as chr10, but in GenomeBrowser, it was found in both chr1/chr10.

chr1 seems to be correct, but please check which one is correct ASAP.

 

Best regards.

from JinSil Park

 

 

 

 

Daniel Schmelter

unread,
Jun 10, 2020, 2:59:41 PM6/10/20
to Park JinSil, gen...@soe.ucsc.edu

Hello JinSil Park,

Thank you for writing to the UCSC Genome Browser with your question about gene duplication and transcript finding. We apologize for the delayed response.

The download files and Genome Browser image appear consistent, showing the data from refGene.txt.gz and ncbiRefSeq.txt.gz as annotated. NCBI's RefSeq track annotates the Vom2r4 gene on chr10 and UCSC's RefGene annotates that same transcript on chr1. Your image highlights annotations on the two different tracks.

It may help to explain how these two tracks differ. RefSeq uses the coordinates directly as provided by NCBI, which is a somewhat recent addition. In the past, NCBI only gave FASTA sequences and relied on bioinformatics groups to align them to the genome, which is how RefGene was created. Naturally, alignment pipelines vary, especially in regions like this which have segmental duplications (as can be seen via BLAT). This leads to some ambiguity inherent in gene annotations; NCBI Refseq, Ensembl, and UCSC each call the "right" place in a different location.

In these situations, we generally recommend folks use the most up-to-date version of RefSeq data and if necessary, double-check against other gene sets, understanding that a reliable annotation might not yet exist. See NCBI's comment on that transcript:

COMMENT     INFERRED REFSEQ: This record is predicted by genome sequence
            analysis and is not yet supported by experimental evidence. The
            reference sequence was derived from AABR03001580.1.
            On or before Jul 26, 2007 this sequence version replaced
            XM_218106.3, XM_001067385.1.

I hope this was helpful. If you have any more questions, please reply-all to gen...@soe.ucsc.edu. All messages sent to that address are publicly archived. If your question includes sensitive data, please reply-all to genom...@soe.ucsc.edu.
All the best,

Daniel Schmelter
UCSC Genome Browser



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/06D1DB035E67534687449551A65D2F0801447F36EF%40MGMAIL.macrogen.com.

Daniel Schmelter

unread,
Jun 11, 2020, 4:51:46 PM6/11/20
to Park JinSil, gen...@soe.ucsc.edu
Hello JinSil Park,

I just wanted to share an additional resource with you that provides more information about the NCBI RefSeq vs UCSC RefSeq (RefGene) datasets:
http://genome.ucsc.edu/FAQ/FAQgenes.html#ncbiRefseq
 
All the best,
Daniel Schmelter
UCSC Genome Browser 
Reply all
Reply to author
Forward
0 new messages