Dear Sir/Madam,
We use the resources from the ucsc genome browser a lot (and are very happy with it). However occasionally we find some small mistakes, particularly in the refGene annotation. I was wondering where we can send such information to, and whether it is possible that the information gets corrected?
The finding that we just have concerns the annotation of the gene TNNI3 (or NM_000363) according to refseq. The only transcript according to refseq has 7 exones and does not encode for a valid protein.
However with some puzzling we figured out that the more likely gene has a shorter first exon and an additional small exon (total of 8 exons) which seems to be supported by other data resources. (we used Alamut)
To be precise the definition of the first exon should start at chr19: 55,668,947 rather than 55,668,935 and an additional exon should be introduced at position 55668664 – 55668676
Please let me know whether this information can be used by you to make improvements?
Kind regards,
Christian Gilissen PhD
Department of Human Genetics (855)
Radboud University Medical Center
Geert Grooteplein 10
6525 GA Nijmegen, the Netherlands
Email. christian...@radboudumc.nl
Tel. +31 24 36 68940
Het Radboudumc staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629.
The Radboud university medical center is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.
Hello, Christian.
Thank you for reporting this error. We perform our own alignments as described in the Methods section of the RefSeq Genes description page at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=refGene. In rare cases, mistakes are made such as the one you just illustrated.
According to the GenBank record at http://www.ncbi.nlm.nih.gov/nuccore/NM_000363?report=GenBank, there are indeed 8 exons in TNNI3.
If you examine the UCSC Genes, RefSeq Genes and GENCODE tracks side-by-side, you will see that UCSC Genes and GENCODE both contain exon 2 which occurs at chr19:55,668,664-55,668,676. The problem in the RefSeq Genes track stems from an apparent issue with exon 1. If you view chr19:55,668,933-55,668,960, you will see that exon 1 properly ends at 55,668,947 in both UCSC Genes and GENCODE. In RefSeq Genes, a stop codon correctly appears, but the exon erroneously continues beyond it. This is why the size of exon 1 is improperly reported and why the missing exon 2 never appears in the RefSeq Genes track.
Please contact us again at gen...@soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
---
Steve Heitner
UCSC Genome Bioinformatics Group
--
Hi Steve,
Thanks very much for your rapid response.
Three further questions then regarding this then:
1. What is now the procedure? Will you fix this and if you do within what time-frame? (i.e. can we wait for you to fix this?)
2. What would you recommend to us for resolving this issue? Should we base our proteins on UCSC genes instead?
3. We calculate this for all refSeq genes, and we only looked into this because we’re interested in this gene. Would it be useful for you that we send you a complete list of all transcripts that do not encode a valid protein? (as you said there are not so many)
Kind regards,
Christian
--