Preparing data set from PGM (Ion Torrent) runs to be used in Qiime - trim primers

1,353 views
Skip to first unread message

Gbiota

unread,
Oct 9, 2013, 10:15:34 AM10/9/13
to qiime...@googlegroups.com
Hello Qiime team, 

I am using Qiime 1.7 in a virtual box, which was installed following the instructions written on your webpage.

I have sequenced bidirectionally the V5-V6 of 16S rRNA in a Ion Torrent apparatus. I read in the Qiime forum that PGM runs cannot be easily desmultiplexed nor filtered using Qiime pipeline (through the scrip: split_librareis.py). So to overcome this problem, I have desmultiplexed my data in the Ion PGM server and exported FASTQ files (each file has only one sample) containing genomic sequences without barcodes and adapters squences. Then for quality filtering, I have used Galaxy Platform to filtered the data set by min/max length and min quality score of 25. Finally, I converted the FASTQ into FASTA files using Galaxy platform tools.

At this stage I have the data set to enter in Qiime pipeline, however, some doubts came to my mind concerning the presence of the primers used to amplified the 16S rRNA gene.

In split_libraries.py script, providing the mapping file with "LinkerPrimerSequence" and "ReversePrimer" columns the script automatically trim the AdapterA/BarcodeSequence/ForwardPrimer and ReversePrimer/AdapterB (using the -z option) from the data set. In my case the forward and reverse primers are kept in the data set and I am not sure if the primers should be trimmed or not.

Do you think that in this case is necessary to trim the forward and reverse primers, in order to perform downstream analysis on Qiime (pick_otus, align_seqs, etc...)?

If so, do you have any clue how to do it?

I have tried to use truncate_reverse_primer.py script (with "truncate only" option) to remove the reverse primer and to remove the forward primer by replacing in the "ReversePrimer" column of the mapping file, with the sequence of the forward primer. However, since I am using bidirectional sequencing, the sequences starting with the reverse primer (which are in the reverse complement orientation) are completely trimmed out.

As far as I understood this script trim the primer and all the following sequences, which are normally considered by Qiime pipeline as barcodes or adapters, but in my case the following sequence is actually my genomic data.


Would you give me a help on this?
Thank you very much for your time and work,
Gbiota.

Jai Ram Rideout

unread,
Oct 9, 2013, 12:28:59 PM10/9/13
to qiime...@googlegroups.com
Hi Gbiota,

It's a good idea to remove the forward and reverse primers from your sequences. Take a look at these forum threads about processing Ion Torrent data in QIIME and see if they help. I don't have any personal experience with processing Ion Torrent data in QIIME, but it seems like there are a number of people who have successfully performed analyses with this type of data, so hopefully these threads will give you an idea of techniques they used and how most people are importing this type of data into QIIME. Try also searching the forum for keywords like 'pgm' or 'ion torrent', as there are likely other threads out there that may be useful.







Good luck with your analysis!

-Jai


--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Gregg Iceton

unread,
Oct 11, 2013, 12:38:36 PM10/11/13
to qiime...@googlegroups.com
Hi.  Its a bit late now, but you can most definitely demultiplex Ion Torrent data in QIIME.  When your run analysis is complete in the Torrent Suite, run the plugin FastQ Creator to get a FASTQ file with barcodes still intact.  Then run the qiime script convert_fastaqual_fastq to get fasta and qual files, and then split_libraries using those files.  The quality data is perfectly usable when using the Ion Torrent FASTQ file, its just the SFF file that is different.

Jai Ram Rideout

unread,
Oct 11, 2013, 3:15:45 PM10/11/13
to qiime...@googlegroups.com
Thanks Gregg!


--

G biota

unread,
Oct 17, 2013, 5:44:42 AM10/17/13
to qiime...@googlegroups.com
Hi again,

Thank you very much for the interesting suggestions.
I am going to try demultiplexing my data set using Qiime, as Gregg suggested.

Thank you!
Gbiota.

JenB

unread,
Jun 19, 2014, 12:46:40 PM6/19/14
to qiime...@googlegroups.com
Hello Gbiota, 
Could you share your command on how you did your quality filtering on the your Ion Torrent fastq files in the Galaxy Platform?

I am about to start processing my Ion Torrent data and need to get over the first step of data quality filtering.

Any guidance would be great.
thank you,
Jennifer

G biota

unread,
Jun 23, 2014, 9:11:38 AM6/23/14
to qiime...@googlegroups.com

Hello Jennifer,


Sorry for the late reply.


I use the galaxy platform through https://usegalaxy.org/.

In order to perform data quality filtering I do the following steps:


  1. Click on “Get Data” and upload your FASTQ file;

  2. Then click on “NGS: QC and Manipulation”;

  3. Click on “FASTA QC”  and execute this tool by choosing your FASTQ file as input – This tool reports the distribution of the size and quality of your sequences;

  4. Click on “FASTQ Groomer converter”;

    1. Choose your FASTQ file as input and then on the “Input FASTQ quality scores” option and choose Sanger.

  5. Now go to the “Filter FASTQ tool” and select the min. and max. size for your sequences, as well as, the min. quality score you want to use as filter;

  6. Finally, convert your FASTQ to FASTA file by using the “FASTQ to FASTA converter tool”.

  7. Download your FASTA files and do the downstream analysis in QIIME.


Good luck for the quality filtering,

Gbiota.








--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gregg Iceton

unread,
Jul 10, 2014, 3:49:28 AM7/10/14
to qiime...@googlegroups.com
Just FYI the suggestion is that Ion Torrent quality scores are consistently under called by around 5, thus q20 in Ion Torrent = q25 in Illumina / 454.  Somewhat controversial I know, particularly since this suggestion comes from Ion Torrent themselves though purportedly it was an independent sequencing centre that told them this.
Reply all
Reply to author
Forward
0 new messages