The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
Hello Reda,
Thank you for your interest in the Genome Browser.
After you search for an accession, such as NP_000652, you will get results on two tracks:
RefSeq Genes is our in-house built track where we map RefSeq mRNA sequences using BLAT (called UCSC RefSeq). The second track, NCBI RefSeq genes..., are NCBI's alignments imported from their site. The scope of this answer is limited to the in-house track, even though in this case the coordinates are nearly identical between the two. For more information on this, see our genes FAQ (http://genome.ucsc.edu/FAQ/FAQgenes.html#ncbiRefseq).
You are correct that BLATing the protein sequence for NP_000652 yields better hits in chrX and chr15. However, what we align are the mRNA sequences, not the protein sequences. If you BLAT the sequence for NM_000661.5 you will see the top hit is the chr4 protein annotation.
As you have said, the protein sequence top hit nearly always matches the actual annotation. In this rare case, the chrX and chr15 matches are actually annotated by GENCODE as pseudogenes: http://genome.ucsc.edu/s/Lou/ML25597. There are some additional filtering steps that occur in choosing the alignment match for the track (and ties are extremely unlikely), however, the top score match when BLATing the mRNA sequence should match the annotation in most all cases.
I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.
Lou Nassar
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/56C1B329-A8B1-4F51-B330-1C01DBBDEB13%40crick.ac.uk.
Hello Reda,
Unfortunately, we do not map proteins to the genome because the hits are not unique. However, you could use the public MySQL server to get the locus for a protein id:
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser
Want to share the Browser with colleagues?
Host a workshop: https://bit.ly/ucscTraining
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/EDD60180-EFC7-4AEF-858B-4BB125CC170B%40crick.ac.uk.