
Dear Zhaokunli,
Thank you for using the UCSC Genome Browser and your question about SHANK3.
One of our most commonly asked questions reflects how the alignment of the UCSC RefSeq is processed here at UCSC and can differ from the RefSeq coordinates found at NCBI (more below). For SHANK3 another issue is that it is not possible for aligners to get the correct alignment and gene model because the non-patched human assembly genome sequence is missing for part of the gene.
Here is a link to a session that displays a track called the Differences between NCBI RefSeq Transcripts and the Reference Genome: http://genome.ucsc.edu/s/dschmelt/SHANK3Exon12
Note that the grey item in this RefSeq Differences track indicates a double gap where a number of bases have been skipped from the transcript, which are the missing 5' bases you mention.
If you click on the very bottom track "UCSC annotations of RefSeq RNAs" for SHANK3 you will enter a details page for the item. There you can find a link titled, "View details of parts of alignment within browser window." On that resulting page will see that there is a section of bases in the transcript that do not align:
CAGcccgagc gggcccggcg gccccggccc cgcgcccggc cccggCCCCG 1600
You will likely appreciate our FAQ page entry that discusses this topic further and specifically SHANK3: http://genome.ucsc.edu/FAQ/FAQgenes.html#ncbiRefseq
Here are details from that section:
In some rare cases, the NCBI and UCSC exon boundaries differ. Activating both RefSeq and UCSC RefSeq tracks helps you investigate the differences. Activating the RefSeq Alignments track shows NCBI's splign alignments in more detail, including double lines where both transcript and genomic sequence are skipped in the alignment. When available, the RefSeq Diffs subtrack may be helpful too. The upcoming MANE gene set will contain a set of high-quality transcripts that are 100% alignable to the genome and are part of both RefSeq and Ensembl/GENCODE but at the time of writing this project is at an early stage.
An anecdotal and rare example is SHANK2 and SHANK3 in hg19. It is impossible for either NCBI or BLAT to get the correct alignment and gene model because the genome sequence is missing for part of the gene. NCBI and BLAT find slightly different exon boundaries at the edge of the problematic region. NCBI's aligner tries very hard to find exons that align to any transcript sequence, so it calls a few small dubious "exons" in the affected genomic region. GENCODE V19 also used an aligner that tried very hard to find exons, but it found small dubious "exons" in different places than NCBI. The RefSeq Alignments subtrack makes the problematic region very clear with double lines indicating unalignable transcript sequence.
Of additional note, in the above session, you will see a yellow item with the name chr22_KQ759762v1_fix. If you click into that item you will arrive at the details page for "Reference Assembly Fix Patch Sequence Alignments (chr22_KQ759762v1_fix)." On this item's details page, you can find a link called "View corresponding position range on chr22_KQ759762v1_fix" that will jump you over to the patched human assembly genome sequence for this region. In this patched location, you will see that the SHANK3 gene displays differently where the alignment occurs successfully. Through the use of our muti-region view, that allows slicing the genome to see different sections side-by-side, in this session link you can see how the chr22_KQ759762v1_fix region (left) compares side-by-side with the original region (right): http://genome.ucsc.edu/s/brianlee/Patch
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
All the best,
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/5ee8ea36.1c69fb81.e2386.af89SMTPIN_ADDED_BROKEN%40mx.google.com.