Hello Emilie,
Thank you for your question about differences in exon counts between
RefSeq and the UCSC Genome Browser.
These differences seems to be stemming from two different things.
The first is based on how we create the "RefSeq Genes" track in the
UCSC Genome Browser and the second is due to assembly differences.
Often these differences in exon counts are due to how the RefSeq
Genes track in the UCSC Genome Browser is created. In short, the
track is created by mapping the RefSeq mRNAs to the genome using
BLAT. You can read more about how the track is produced on the track
description page:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=refGene. This
may produce differences between the annotations in RefSeq and what
we display in the UCSC Genome Browser.
For two of the annotations you've highlighted,
NM_014249 and NM_018474
, these differences are likely due to looking at these annotations
on hg19. If you look at these annotations on hg38, you will see that
these two match the exon count from RefSeq:
This reason for the annotations being different is related to the
first reason I highlighted. To create the RefSeq track, we are
taking the mRNAs and remapping them to the most recent version as
well as all previous versions of the human genome, while RefSeq is
creating these mRNAs based on the most recent assembly, hg38.
Assemblies change overtime, so those that are created for the most
recent assembly, may not map as well to previous versions. In the
links to the two transcripts above, I've also included the "Hg19
Diff" track that highlights those regions of the hg38 assembly that
are different than the previous hg19 assembly.
We are currently working with NCBI to create a track that includes
annotations directly from their database rather than relying on our
current method of re-mapping their mRNAs. Unfortunately, I can't
give you a time estimate of when this track may be available.
I hope this is helpful. If you have any further questions, please
reply to
gen...@soe.ucsc.edu. All messages sent to that address are
archived on a publicly-accessible Google Groups forum. If your
question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group