Hi,
i now have some Ion Torrent data and am trying to run it through
QIIME. I am not denoising it as I believe current denoising algorithms
will need to be modified to handle the error profile of the PGM
compared to 454. However, Acacia from University of Queensland looks
like it should be able to handle Ion Torrent data in the "near" future
-
http://www.nature.com/nmeth/journal/v9/n5/full/nmeth.1990.html.
I have used Galaxy workflow to create fasta and qual files from the
original SFF files (if you use perl script,
fasta_convert.pl, the code
will need to be modified for Ion Torrent encoding).
So now i have single fasta and qual files containing 3 barcoded
samples.
So, first I want to use split_libraries.py to tag the sequences for
downstream analysis. I have a mapping file with the IonSet1 barcodes,
the Primer and description. I also have a reverse primer which is an
equimolar pool of four primers (how would I enter that into the
mapping file?):
1046r CGACAGCCATGCANCACCT
1046r-PP CGACAACCATGCANCACCT
1046r-AQ1 CGACGGCCATGCANCACCT
1046r-AQ2 CGACGACCATGCANCACCT
IonSet1 also comes with an adapter sequence.
So I have tried split_libraries.py -m map.txt -f data.fasta -q
data.qual -o split_out -b 11 -l 50
(My read lengths are around 100bp from a 314 chip).
However, I did not get any assigned reads, with the majority of reads
being captured by [Num mismatches in primer exceeds limit of 0:]
I have used grep on the fasta file to identify that the forward primer
is present (i excluded reverse primer at the moment). I also noted
that the barcodes were present. There is also the four base tcag tag
at the beginning of each sequence, which I believe should be removed.
In between the barcode and the primer is the adapter. Adding the
adapter to the start of the primer in the mapping file does not
resolve the issue.
However, I have noted that the adapter is not universally consistent
and that many adapter sequences have insertions or mismatches of
bases. I am checking with Ion Torrent about this.
It would be good to get a handle on where I might be going wrong -
should I remove the tcag bases first from the start of the sequences,
does the adapter need to be included with the primer, how to handle
the pooled reverse primers?
Regards,
Matt