Working with Ion Torrent data - issues with QIIME

Matt Wade

unread,

Jun 7, 2012, 5:28:04 AM6/7/12

to Qiime Forum

Hi,

i now have some Ion Torrent data and am trying to run it through
QIIME. I am not denoising it as I believe current denoising algorithms
will need to be modified to handle the error profile of the PGM
compared to 454. However, Acacia from University of Queensland looks
like it should be able to handle Ion Torrent data in the "near" future
- http://www.nature.com/nmeth/journal/v9/n5/full/nmeth.1990.html.

I have used Galaxy workflow to create fasta and qual files from the
original SFF files (if you use perl script, fasta_convert.pl, the code
will need to be modified for Ion Torrent encoding).

So now i have single fasta and qual files containing 3 barcoded
samples.

So, first I want to use split_libraries.py to tag the sequences for
downstream analysis. I have a mapping file with the IonSet1 barcodes,
the Primer and description. I also have a reverse primer which is an
equimolar pool of four primers (how would I enter that into the
mapping file?):

1046r CGACAGCCATGCANCACCT
1046r-PP CGACAACCATGCANCACCT
1046r-AQ1 CGACGGCCATGCANCACCT
1046r-AQ2 CGACGACCATGCANCACCT

IonSet1 also comes with an adapter sequence.

So I have tried split_libraries.py -m map.txt -f data.fasta -q
data.qual -o split_out -b 11 -l 50
(My read lengths are around 100bp from a 314 chip).

However, I did not get any assigned reads, with the majority of reads
being captured by [Num mismatches in primer exceeds limit of 0:]

I have used grep on the fasta file to identify that the forward primer
is present (i excluded reverse primer at the moment). I also noted
that the barcodes were present. There is also the four base tcag tag
at the beginning of each sequence, which I believe should be removed.

In between the barcode and the primer is the adapter. Adding the
adapter to the start of the primer in the mapping file does not
resolve the issue.

However, I have noted that the adapter is not universally consistent
and that many adapter sequences have insertions or mismatches of
bases. I am checking with Ion Torrent about this.

It would be good to get a handle on where I might be going wrong -
should I remove the tcag bases first from the start of the sequences,
does the adapter need to be included with the primer, how to handle
the pooled reverse primers?

Regards,
Matt

Tony Walters

unread,

Jun 7, 2012, 1:48:12 PM6/7/12

to qiime...@googlegroups.com

Hello Matt,

It looks like there are a number of issues to address.

On problem is the indels-split_libraries.py doesn't handle those for the barcode or forward primers, as it expects the first X bases (where X is the length of the barcodes) to be followed by Y bases (where Y is the length of the linker-primer sequences). Normally, in 454 processing, the sfftools (or similar software) removes the initial sequencing adapter that ends in the tcag sequencing key. Hopefully IonTorrent has software that can remove that initial adapter sequence, otherwise you will need a custom parser to find the first instance of "tcag" in the sequence and strip the sequence leading up to the end of the tcag key.

The final sequences should look like this (may not read all the way through the reverseprimer or adapter in all cases):

barcode-linkerprimer-targeted amplicon read-reverseprimer-adapter

Once this is resolved, then the reverse primers could be addressed.

Because the reads are short, you could generally just skip the reverse primer removal, but I'm guessing you're using the V6 primer set, which has a small amplicon. I would just use this sequence for the ReversePrimer column: CGACRRCCATGCANCACCT

which will cover the primer possibilities. You may have to increase the reverse primer mismatches from the default 0 with the --reverse_primer_mismatches parameter.

I hope this helps,

Tony

Matt Wade

unread,

Jun 7, 2012, 2:09:18 PM6/7/12

to Qiime Forum

Thanks Tony,

I have resolved the issue by using the Ion Torrent re-analysis option
and removing the barcode set name from the experimental set-up
section. This has the effect of retaining the barcode in the FastQ
file (but removes the key tcag).

I then processed the FastQ file using the perl script
(fasta_convert.pl with a modified encoding) and got the fasta and qual
files.

i used the ReversePrimer with the RR inserted as you suggested, BUT I
also included the adapter at the start of the Primer as they were not
removed by Ion Torrent.

Doing this (before your reply) seemed to work (the data is poor, so I
am only doing this as an initial "test").

The alternative would be to add the previously removed barcode-linker-
primer to each corresponding fasta sequence (i have my own script for
this, although it is cumbersome to do when it could be avoided).

Thanks again,
Matt

> On Thu, Jun 7, 2012 at 3:28 AM, Matt Wade <cosmicspace...@googlemail.com>wrote:
>
>
>
>
>
>
>
> > Hi,
>
> > i now have some Ion Torrent data and am trying to run it through
> > QIIME. I am not denoising it as I believe current denoising algorithms
> > will need to be modified to handle the error profile of the PGM
> > compared to 454. However, Acacia from University of Queensland looks
> > like it should be able to handle Ion Torrent data in the "near" future

> > -http://www.nature.com/nmeth/journal/v9/n5/full/nmeth.1990.html.

Tony Walters

unread,

Jun 7, 2012, 2:15:04 PM6/7/12

to qiime...@googlegroups.com

Hello Matt,

You shouldn't need to include the adapter following the reverse primer. It works a bit differently than the forward primer removal, as it does a local alignment for the primer, and if it finds it and it's below the allowed mismatches, it removes the primer and any sequence following it, so you should get the adapter removed automatically in that case.

-Tony

Reply all

Reply to author

Forward