Tony,
Thanks for the thoughtful reply, I am sorry it's taken me so long to respond. I am still very interested in using QIIME with my PGM data, so to continue this thread:
Ok, let's drop the .SFF file approach and focus on joining the QIIME analysis process midstream. The Ion Torrent PGM provides a FASTQ file so the answer to your question about quality information is yes, we have it. The PGM automatically trims the sequencing adaptor from the start of the reads in the FASTQ file. Thus, the sequences in the FASTQ file have the following "structure":
barcode----16s-specific-forward-primer----16s-sequence-of-interest-----16s-specific-reverse-primer----sequencing-adaptor-P1
I have other software that I can use to do any or all of the following:
1) Take the FASTQ file containing the barcoded reads and convert it into individual FASTA files for each barcode.
2) Use the quality information in the FASTQ file to trim and quality filter the reads that make it into the barcode separated FASTA files.
3) Remove the sequencing adaptor(s) and the 16s-specific primer sequence used to generate the amplicons.
Thus, I can generate FASTA files (one for each bar code used) containing sequence that is quality filtered. From reading the tutorial it would seem that these files are ready for generating OTUs using pick_otus_through_otu_table.py which requires only a .fna file for input. Looking at the Fasting_Example.fna file provided in the QIIME tutorial, it appears the a .fna file is simply a FASTA file in which the header has a specific format consisting of 5 pieces of information.
Example header from Fasting_Example.fna:
>FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_
I think:
1) FLP3FBN01ELBSX is the unique read identifier.
2) length is the length of the read.
3) xy is a unique location specific to the region of the 454 PTP plate.
4) region is the region of the PTP plate.
5) run is a unique identifier for the 454 run.
I am guessing that pick_otus_through_otu_table.py really only needs the first part of the header (the unique sequence identifier) and the associated sequence to work, is this correct? If so, all I need to do is generate my FASTA files using my other software and then feed them into pick_otus_through_otu_table.py.
Thanks in advance for your reply.
Regards,
George