ION Torrent analysis

satis...@gmail.com

unread,

Sep 17, 2012, 1:13:08 PM9/17/12

to qiime...@googlegroups.com

Hi All,

Can you guys , thrown light on how to approach the analysis for data from ION TORRENT?

I have tried the normal pipeline that we use for 454 analysis ( since due to same SFF file format). Any help on how to approach this is highly appreciated.

Satish

Daniel McDonald

unread,

Sep 17, 2012, 1:15:21 PM9/17/12

to qiime...@googlegroups.com

Hey Satish,

There was a previous post about Ion Torrent
(https://groups.google.com/forum/?fromgroups=#!topic/qiime-forum/QzFLMECIyQQ)

Let me know if this helps,
Daniel

> --
>
>
>

satis...@gmail.com

unread,

Sep 17, 2012, 1:32:02 PM9/17/12

to qiime...@googlegroups.com

Hi Daniel,

I looked at the post , but my questions are:

1) I have SFF files should I convert them to fasta and qual file ?

2) I also had Fasta files for the same sample , I saw that a few sequences had the tag "tcga" ? Do qiime has any scripts to preprocess the data ?

3) once I get the right fasta and qual file , do you think performing a normal analysis will be good ?

Satish

Jose Carlos Clemente

unread,

Sep 17, 2012, 5:12:09 PM9/17/12

to qiime...@googlegroups.com

Satish,

I have no experience with ION Torrent, but I would suggest generating
the fasta and qual files using process_sff.py, and then trying the
default pipeline for the analysis. Check carefully the log files at
each step looking for things that might be potentially failing, e.g.
do you see many sequences failing to pass split_libraries.py, not
enough sequences hitting ref during OTU picking, way too many/too few
OTUs, etc.

If others in the forum have previously analyzed ION Torrent data, we'd
like to hear your experience and what things might need to be
modified/added to the current pipeline.

Jose

On Mon, Sep 17, 2012 at 11:32 AM, satis...@gmail.com

> --
>
>
>

Matt Wade

unread,

Sep 18, 2012, 3:59:43 AM9/18/12

to qiime...@googlegroups.com

Hi,

I only have experience so far with quite a poor dataset (i.e. low read length and number of reads).

Below is a post i sent "offline" to Greg regarding my initial experiences (also check out this publication by Andrew Whiteley)

My samples are pretty similar to Tony's - 3 samples from an anaerobic digester, median read length very low (58), with mean ~100bp. A lot of unassigned reads with RDP but the major archea were identified (this was using universal bacteria primers - the archea primer runs completely failed). My colleague who performed the sequencing thinks she has identified the problem (PCR step) - it certainly seems a lot more sensitive that Ion Torrent would have you believe, but I reckon we can at least get to a decent level of sequencing performance once these issues have been resolved.

Regarding the data analysis:

- The Ion Torrent Suite produces two output files, SFF and FASTQ
- The SFF is the raw data and has key, barcode, adapter and primer retained with each sequence
- The Fastq has those elements removed

- I have used Galaxy workflow to create fasta and qual files from SFF. However, using split_libraries.py with these files results in no reads being assigned due to non-location of primer (possible adapter issue)
- I used fasta_convert.pl to create fasta and qual files from the Fastq file. However, this script requires some amendment to get the right encoding for Ion Torrent. Additionally, barcodes and primer are missing. I have my own script for reintroducing those, sample by sample.

Better option:

- in Ion Torrent administration, I removed the barcode identifier from the experiment and ran re-analysis. This produced a Fastaq file with barcode-adapter-primer intact (key removed).
- I used the amended fasta_convert.pl script to create fasta and qual files.
- My mapping file contained barcode & adapter-primer sequence (and reverse primer).

This seemed to work ok.

It would be good to test this properly with a mock community with known identities and with a good Ion Torrent run.

Similarly, for denoising, there is a possibility that Chris Quince, ourselves and Liverpool University are going to work on adapting ampliconnoise for Ion Torrent. I believe this should involve determining the noise profile for the instrument (likelihoods etc). I understand that PCR error may not be an issue with Ion Torrent.

For denoiser, i have not looked into the algorithm or its mechanism, so cannot comment on what would be required - but I guess Rob Knight and others could easily investigate given access to Ion Torrent.

As mentioned in my post, Acacia has been released by University of Queensland (Hugenholtz's group). At present it doesn't seem to work with Ion Torrent data (I have just been playing, nothing serious), but Lauren Bragg says she is aiming to work on the Ion Torrent option.

Regards,

Matt

Jose Carlos Clemente

unread,

Sep 18, 2012, 10:10:39 AM9/18/12

to qiime...@googlegroups.com

Thanks Matt, this is very useful.

Jose

> --
>
>
>

satis...@gmail.com

unread,

Sep 21, 2012, 1:22:41 PM9/21/12

to qiime...@googlegroups.com

Hi Matt,

Thanks for you indetail reply.

I saw that sff_extract.py ( with the ion torrent issue fixed) does split up the SFF file to Fasta and qual.

But , that Fasta file has tag - barcode - linker primer - sequence .

Do you have any script in handy to parse out tag ( barcode identifier)?

Satish

satis...@gmail.com

unread,

Sep 24, 2012, 11:45:14 AM9/24/12

to qiime...@googlegroups.com

Hi All,

I have Linker primer given like this

5' nnn ncc tac ggg agg cag cag 3'

Were the primer sequence varies from one seq to another ? Any idea about how to deal with such kind of Linker primer.

Satish

Reply all

Reply to author

Forward