IUPAC nucleotide codes
Hello,
Does anyone know how web version and the standalone version of BLAT handle different IUPAC nucleotide codes? For example, this query ‘AATAAAGTCTAAKTTAAAATCTGGAGCTGCCTTGGAGGAGAAAAGT’ contains a ‘K’ for G or T. When aligned against hg19 with BLAT the highest scoring alignment is against chromosome 3, and with the standalone version the highest scoring alignment is against chromosome 9. When looking at the details from the web version, it appears the ‘K’ is dropped:
cDNA YourSeq
AATAAAGTCt AATTAAAATC TGGAGCTGCC TTGGAGGAGA AAAGT
tcctgacttc aggtgatccg cccgcctcag gctcccaaag tgctgggatt 149377395
acaggcatga gccaccgcgc ccagcctgcc ttaatatttt tacagggtaa 149377445
AATAAAGTCg AAgTTAAAAT CTGGAGCTGC CTTGGAGGAG AAAAGTttaa 149377495
ggaaaagaca aggccactca tagttttgcc tcggaaaagg tagaattttg 149377545
gggccactcc ctgaatggct gcatccatat ccaaaacaga accacc
I am trying to figure it out myself, but if anyone knows the answer I would appreciate any help!
Jonah
Hello Jonah,
Thank you for your question about using IUPAC nucleotide codes with the BLAT tools. I have passed along your question about the different treatments of "K", but it sounds like BLAT, along with the rest of the kent tools, does not support IUPAC codes. One of our engineers warns that using too many IUPAC codes may convince the web version of BLAT to treat your query as a protein sequence, unless you change the query type from "BLAT's guess" to "DNA".
My standalone BLAT searches for your sequence in the hg19 genome assembly found the same region of chromosome 3 as the web version - I wasn't able to find any hits on chromosome 9. Perhaps this is due to some additional parameters you have passed to the standalone version? More information on how to tune the command-line version of BLAT to produce similar results to the web version is available at http://genome.ucsc.edu/FAQ/FAQblat.html#blat5.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--
--
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
--