Discrepancies - UCSC Genome Browser (and refGene) vs NCBI Gene (RefSeq)

1,905 views
Skip to first unread message

Dag Are Hov

unread,
Jan 29, 2013, 7:02:37 AM1/29/13
to gen...@soe.ucsc.edu
Hi

We are working as part of a project where our current focus is to best annotate a list of human SNV mutations to the correct position in genes and further in proteins. The SNV input is genome coordinates (chromosome#; position; reference; alternate)

As we are evaluating the use of the Annovar package, we want to qualify the source of the genome coordinates it provides. As of now we are testing the refGene table from UCSC and wanted to compare it to the stated source, RefSeq at NCBI. We see that there are differences between UCSC refGene and NCBI RefSeq on where various genes are placed on the genome.

I will let CDK11A and CDK11B be examples of such differences. Using refGene in Annovar, we see that in many cases the same SNV position calls both CDK11A and CDK11B. To further investigate, we checked a position in the UCSC Genome Browser.  To test from the page http://genome.ucsc.edu/cgi-bin/hgGateway you can enter the position chr1:1640000 and hit submit. You have two sections of annotated genes, "UCSC Genes" and "RefSeq Genes". In the RefSeq Genes section you display both CDK11A and CDK11B.

On your Genome Browser web page, you state that you use the 2009 human reference sequence GRCh37 and you link to NCBI. Following the link to NCBI, one can read from the revision history there are various assembly names for the human genome, and that the current name is GRCh37.p10.

As we can assess RefSeq gene annotations through "NCBI Gene", RefSeq's genomic position for CDK11A and CDK11B can be found using the two links below, sections "Genomic context" and "Genomic regions, transcripts, and products". The two genes are, according to RefSeq, not overlapping on the genome. The earlier mentioned position, chr1:1640000, is here only overlapping with CDK11A.

CDK11A:
http://www.ncbi.nlm.nih.gov/gene/728642

CDK11B:
http://www.ncbi.nlm.nih.gov/gene/984

The web pages from the above two NCBI links also show the section "NCBI Reference Sequences (RefSeq)", where the RefSeq annotation version used is stated to be "... Homo sapiens Annotation Release 104"

From NCBI's mapview (using Annotation release 104) you can also see the clear genomic separation between CDK11A and CDK11B:
http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?TAXID=9606&CHR=1&MAPS=regions,ugHs,modelrna,ensgenes,rna,genes[1631466.25%3A1658493.75]-r&QSTR=728642[gene_id]&QUERY=uid%28-2121845465,-2121845464,-2146427376%29&zoom=0.1

Question:
Are the RefSeq Genes found using the UCSC Genome Browser and refGene table based on the human genome version GRCh37.p10 and the RefSeq Annotation Release 104?

If so, -why do we see these differences?

Best regards,
Dag Are Hov

Brian Lee

unread,
Jan 30, 2013, 5:01:29 PM1/30/13
to daga...@gmail.com
Dear Dag Are Hov,

Thank you for using the UCSC Genome Browser and your question about
discrepancies between the UCSC Genome Browser (refGene) and NCBI Gene
(RefSeq), specifically whether the RefSeq Genes found using the UCSC Genome
Browser and the refGene table are based on the human genome version GRCh37.p10.

Your observations are correct, our hg19 assembly does not include any
of the assembly
patches, it is just GRCh37, not GRCh37.p10. We do our own alignments of RefSeq
Genes using BLAT, and the track is automatically updated with whatever
is in RefSeq
right now. This results with our RefSeq coordinates to differ at times
with NCBI's RefSeq
coordinates, the differences are mainly due to our use of BLAT.

You may be pleased to learn that since NCBI does their own alignments
and sometimes
they're not the same as ours, we are considering a new feature where
we will include a
warning when our alignments are in a different place than NCBI's,
where we may also
include NCBI's alignments as well.

Also, you may find it useful to learn that our our SNP tracks use
dbSNP's functional annotations for coloring and filtering. dbSNP uses
NCBI's RefSeq, both curated NM_* and predicted XM_* which UCSC doesn't
include.

Thank you again for your inquiry and using the UCSC Genome Browser, if
you have further questions please feel free to contact the mailing
list again at gen...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages