quality filtering Ion Torrent data and/or a workflow for ion torrent data?

946 views
Skip to first unread message

JenB

unread,
Jun 19, 2014, 12:57:09 PM6/19/14
to qiime...@googlegroups.com
Hello All,
I have been reading post after post about using Qiime with Ion Torrent data and I am wondering if anyone has a working pipeline for their Ion Torrent data at this point?

I would like to begin running my data using Qiime but I am reading that i first need to quality filter the data.  Does anyone have any suggestions on what tool is best do that on Ion Torrent data?

My data is demultiplexed, one fastq file per sample (oral microbiome samples)
I have primers but no barcodes on my sequences.
Median length reads is 200bp.

It seems as though many people are using Qiime for Ion Torrent data which is great but each person has their own specific way to do their processing.  Posts seem to start in about 2009 and go to earlier this year and I think that Qiime has made some progress with adding some new scripts.  Does anyone have a workflow they are willing to share for Ion Torrent data?

Thanks so much,
Jen

Lisa

unread,
Jun 20, 2014, 3:22:51 PM6/20/14
to qiime...@googlegroups.com

Hi Jen,

There are obviously many different ways, but I use pretty much the same pipeline as the Brazilian Microbiome Project (http://www.brmicrobiome.org/%23!16sprofilingpipeline/cuhd#!16sprofilingpipeline/cuhd).  It will make your life a lot easier to get a non-demultiplexed fastq file from the Torrent Suite.  If you have barcodes that are all the same length, you could follow the entire BMP protocol.  I stupidly decided to use the IonXpress barcodes, which are variable length, so I need to do the demultiplexing step in qiime.  If you need help with that part, let me know. 

I’m interested in hearing what others are doing. 

Lisa

Jennifer Barb

unread,
Jun 20, 2014, 3:55:26 PM6/20/14
to qiime...@googlegroups.com
Hi Lisa,
Thank you so much for the reply.  I saw the BMP pipeline last week and emailed the authors asking about it.  So you have used that workflow with success?  They let me know that the pipeline is being reviewed now for publication.

All of my barcodes are the same length.  Can i follow the BMP workflow with already demultiplexed data or do you highly recommend I go back to my sequence provider and discuss this issue with them to see about getting non-demultiplexed data from them?

Thank you so much for the reply!
Jen


--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/b7DrGWrXbuU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Lisa

unread,
Jun 21, 2014, 10:24:34 PM6/21/14
to qiime...@googlegroups.com
It'll be much easier if you ask them for the non-demultiplexed data, especially if you have a lot of samples. Just ask them to rerun the analysis without the barcode option selected. If you don't want to wait, you'll have to rename the reads and then join all the files together.

Yes! I am getting great results with that pipeline. I started using uparse last year bc the otu picking options in qiime weren't working well with my pgm data. I optimized uparse with some mock communities and then recently saw that the BMP is using the same parameters.

Jennifer Barb

unread,
Jun 23, 2014, 1:54:47 PM6/23/14
to qiime...@googlegroups.com
Hi Lisa,
Again thank you for the dialogue regarding this.  I am eager to try out the BMP protocol for this data however I contacted my sequencing lab and they informed me that with their current Ion Torrent Software (FileExporter 4.0), the option to keep the barcodes is not there.  Do you happen to know what version of software this might be?  Or I am wondering if I should contact Life Tech regarding this.  I do not know if this is a lot of work for our sequencing lab either in case they can not do it.  You mentioned renaming my reads if I can't get the data non-demultiplexed.  Can you expand on that?  Meaning, I would need to go in and edit my fastq file reads to add back in the barcodes and rename them, then rejoin all files together so that this pipeline thinks that the data is in the format it expects??

Thanks again so much,
Jen




On Sat, Jun 21, 2014 at 10:24 PM, Lisa <lma...@gmail.com> wrote:
It'll be much easier if you ask them for the non-demultiplexed data, especially if you have a lot of samples. Just ask them to rerun the analysis without the barcode option selected.  If you don't want to wait, you'll have to rename the reads and then join all the files together.

Yes! I am getting great results with that pipeline. I started using uparse last year bc the otu picking options in qiime weren't working well with my pgm data. I optimized uparse with some mock communities and then recently saw that the BMP is using the same parameters.

Lisa Mattei

unread,
Jun 23, 2014, 3:00:58 PM6/23/14
to qiime...@googlegroups.com
No problem. The barcode option is in torrent suite, not in the file exporter plugin. They should be able to open your run report and click reanalyze. The barcode type is one of the options. They need to change it to the blank option and then click reanalyze. When it's done, then run the file exporter plugin to get the fastq file. I'm not at lab today, but I can send you screenshots tomorrow if they're still having problems. 

Jennifer Barb

unread,
Jun 24, 2014, 8:04:16 AM6/24/14
to qiime...@googlegroups.com
Lisa, that would be so wonderful if you could send me some screen shots.  I will then take it over to our sequencing lab to run it by them.
This would be super helpful!  I so appreciate your correspondence with me on this.  I had been investigating using Mothur for quite some time but it seemed to be very difficult to use my data there as well.  I am so glad to have found the Brazilian Microbiome Project pipeline for this.
Sincerely,
Jen

Lisa

unread,
Jun 24, 2014, 9:40:25 AM6/24/14
to qiime...@googlegroups.com
Here it is.  Good luck!

reanalysis no barcodes.pdf

Jennifer Barb

unread,
Jun 24, 2014, 10:56:05 AM6/24/14
to qiime...@googlegroups.com
Thank you so much for this, Lisa.
I am going to work with our sequencing guy to see if we can get this done!
You have been a great help!
Jen


On Tue, Jun 24, 2014 at 9:40 AM, Lisa <lma...@gmail.com> wrote:
Here it is.  Good luck!

Jennifer Barb

unread,
Jun 26, 2014, 2:40:43 PM6/26/14
to qiime...@googlegroups.com
Hi Again Lisa, 
So I got the data back from our lab with barcodes in tact thanks to your help!!  

I started the pipeline but I think that I might be filtering out too much data as I am getting a small number of sequences back after runnign the fastq_filter step.  what value do you use for the fastq_maxee argument?  I used the default on the BMP pipeline of .5 and I am getting 30% converted.  Not sure if this seems ok?

Also, at the abundance sort and discard singletons step, it seems as though I am losing a lot of my unique sequences.

I need to go through and thoroughly read what each of these scripts are doing but after I get down to the chimera filtering step, i am getting 98 non-chimeras.  Doesn't that seem quite small when starting with 254000 sequences?  or does this seem accurate?

The BMP pipeline has been great so far but there seems to be a step missing just before step 8 onthe website.  He specifies a reads_uparse.fa file as input but does not list the step to create it.  Do you happen to know it?

Thank you again!
Jen


On Tue, Jun 24, 2014 at 9:40 AM, Lisa <lma...@gmail.com> wrote:
Here it is.  Good luck!

Lisa

unread,
Jun 27, 2014, 9:15:35 AM6/27/14
to qiime...@googlegroups.com
Hi Jen,

I started the pipeline but I think that I might be filtering out too much data as I am getting a small number of sequences back after runnign the fastq_filter step.  what value do you use for the fastq_maxee argument?  I used the default on the BMP pipeline of .5 and I am getting 30% converted.  Not sure if this seems ok?

What primers are you using?  I use the 515-806 V4 primers with 200bp sequencing chemistry and typically 60-70% of my reads are converted with those parameters.  If you're using the same primers, having 30% of the reads converted would indicate a very poor sequencing run.  If you're using different primers, you may have to adjust the truncation length.  Are you losing most of your reads b/c of short length or poor quality? 

Also, at the abundance sort and discard singletons step, it seems as though I am losing a lot of my unique sequences.

You can turn off the discard singletons by using -minsize 1 and see if this changes your results.

I need to go through and thoroughly read what each of these scripts are doing but after I get down to the chimera filtering step, i am getting 98 non-chimeras.  Doesn't that seem quite small when starting with 254000 sequences?  or does this seem accurate?

I've never worked with oral samples, so I don't know if this is reasonable.

The BMP pipeline has been great so far but there seems to be a step missing just before step 8 onthe website.  He specifies a reads_uparse.fa file as input but does not list the step to create it.  Do you happen to know it?

I don't see a file named reads_uparse.fa.  Are you looking at the Illumina protocol?  The reads.fa file is from step #2 on the PGM protocol.

Best,

Lisa

Jennifer Barb

unread,
Jun 27, 2014, 12:36:29 PM6/27/14
to qiime...@googlegroups.com
HI Lisa,
I am going to rerun some steps to see if I get different results.

Btw, the reads_uparse.fa was fixed by the BMP author.  I emailed him  he fixed it to say reads.fa for step 8...

I am going to see if I can get some better results and may post again.

Thank you so much for helping me through this!
Jen



--

Colin Brislawn

unread,
Jun 29, 2014, 12:36:41 AM6/29/14
to qiime...@googlegroups.com
Hello Jen,
Our lab uses this pipeline on illumina data from similar samples. That's a little different, but here is what our data looks like throughout this pipline.


On Thursday, June 26, 2014 2:40:43 PM UTC-4, JenB wrote:
Hi Again Lisa, 
So I got the data back from our lab with barcodes in tact thanks to your help!!  

I started the pipeline but I think that I might be filtering out too much data as I am getting a small number of sequences back after runnign the fastq_filter step.  what value do you use for the fastq_maxee argument?  I used the default on the BMP pipeline of .5 and I am getting 30% converted.  Not sure if this seems ok?
We typically use max ee or 1 or .5, depending on our data quality. If you only keep 30% when using .5, try moving up to 1. A max ee of 1 is still acceptable and should preserve a lot more of your data.

Also, at the abundance sort and discard singletons step, it seems as though I am losing a lot of my unique sequences.
We often lose more then 50 of our replicated sequences when discarding singletons. In the UPARSE paper, Robert Edgar makes the argument that sequences that only appear once are likely due to sequencing errors or chimeras from all the PCR. If you have adequate sequencing depth, you should have more then one read from your major microbes.
I totally agree with Lisa that you could try it including singletons to see how this impacts your OTU counts.

I need to go through and thoroughly read what each of these scripts are doing but after I get down to the chimera filtering step, i am getting 98 non-chimeras.  Doesn't that seem quite small when starting with 254000 sequences?  or does this seem accurate?
I'm not exactly sure which step you are on, but I'm guessing that is after -cluster_otus, right? In that case, each of those sequences is a centroid from an OTU. Having 98 OTUs from the human mouth is expected when using uparse, but only having 98 IonTorrent reads is not. Figure for typical OTU counts: http://www.nature.com/nmeth/journal/v10/n10/fig_tab/nmeth.2604_F2.html
 
The BMP pipeline has been great so far but there seems to be a step missing just before step 8 onthe website.  He specifies a reads_uparse.fa file as input but does not list the step to create it.  Do you happen to know it?
 We use uparse as described here if you need another reference. https://groups.google.com/forum/#!topic/qiime-forum/zqmvpnZe26g
Thank you again!
Jen

Let me know what works well for you!
Colin
 

Jennifer Barb

unread,
Jun 30, 2014, 2:44:30 PM6/30/14
to qiime...@googlegroups.com
Colin and LIsa,
thank you both for these meaningful suggestions.  I changed my max ee parameter to 1 and sure enough i got 57% converted sequenced as compared to 30% when using max ee .5.
In addition, when i change my -minsize paramater to 1 and not 2, then I retain 15658 sequences compared to 6097.  I don't quite know yet if changing these steps are a good idea but I appreciate the suggestions and I would like to make a comparison once I get further throughthe pipeline, which leads me to my next question.

While moving forward with using the pipeline on the BMP I got to step 11.  The assign_taxononmy.py step requires two files: 

assign_taxonomy.py -i $PWD/otus.fa -o output -r $PWD/rep_set/97_otus.fasta -t $PWD/taxonomy/97_otu_taxonomy.txt


97_otus.fasta and 97_otu_taxonomy.txt

Do you all know where these two files are located?  

I went to the greengenes website as instructed on the qiime website but these filenames are not in the download area.  Am I missing something?

Thank you,
Jen


--

Lisa Mattei

unread,
Jun 30, 2014, 3:11:17 PM6/30/14
to qiime...@googlegroups.com
Hi Jen,

You should have them already and qiime will use them by default if you don't specify (see http://qiime.org/scripts/assign_taxonomy.htm).  I can send you the files if you are still having trouble.

Lisa

Jennifer Barb

unread,
Jun 30, 2014, 3:15:57 PM6/30/14
to qiime...@googlegroups.com
Oh!!! OK. I will try it without specifying.  I did not realize it will use them by default.

Thank you!



--

Jennifer Barb

unread,
Jul 9, 2014, 12:35:48 PM7/9/14
to qiime...@googlegroups.com
Hi Again,
Thank you all for your help and guidance with this.  I am now stuck on another step.  I have gotten through to the point where i have my table of OTUs.  I am trying to run the add-metadata step using biom (step 12 on the bmp pipeline).  

It isn't clear though where the file: otus_rep_set_tax.txt for the --observation-metadata-fp argument is however I think it might be my file that was output from step 11: first_otus_tax_assignments.txt  in the output directory that i indicated (step11 is Assign taxonomy to OTUS using uclust method on QIIME).  Does that seem right to you?

i am getting an error on this step though.
/home/barbj/bin/biom: line 22: exec: pyqi: not found


have you all seen this?

once I get through to step 17 I am pretty much done, right?  

What more analyses do you all run after this?

Thank you!  Your help has been invaluable throughout this process for me!
Jen

Jennifer Barb

unread,
Jul 9, 2014, 1:30:30 PM7/9/14
to qiime...@googlegroups.com
Hi Lisa,
what sample source are you using and did you use the gold.fa as your ref database?

do you know what that db contains?  i notice that the sequences are roughly 540bp in that database.  does not seem to be the full 16S.  

I can't find a lot of info about that database.

jen



On Mon, Jun 30, 2014 at 3:10 PM, Lisa Mattei <lisa....@yale.edu> wrote:

--

Colin Brislawn

unread,
Jul 9, 2014, 4:47:37 PM7/9/14
to qiime...@googlegroups.com
Hi Jen,

The gold database is part of the UCHIME pipeline.
"This is a FASTA file containing the ChimeraSlayer reference database in the Broad Microbiome Utilities (see above), version microbiomeutil-r20110519. This contains all sequences in both orientations (forward and reverse-complemented)."
From: http://drive5.com/uchime/uchime_download.html

So the gold database is only used for chimera checking. It's not for clustering or assigning tax to 16S sequences (it's just for removing the chimeric ones).

Colin

Lisa

unread,
Jul 11, 2014, 11:46:57 AM7/11/14
to qiime...@googlegroups.com
Hi Jen,

You're right about using the file output from step 11 in step 12.  I'm not sure about the error you're getting.  I would try starting a new thread.  

Lisa

Jennifer Barb

unread,
Jul 15, 2014, 11:33:18 AM7/15/14
to qiime...@googlegroups.com
Hi Lisa and Colin,
I finally got the pipeline to work throughout the whole process so thank you so much for your help with it and I started reviewing my results so far.

I just now had a thought however.  I just did a grep on my fastq files with the reverse primer string and found some of the reads matching.  So the question is in the first step on the pipeline, where you specify the forward primer, do you all suggest that i run an additional pipeline with my reverse primer as the primer string in order to catch those reads?

How do you all deal with forward and reverse primers with your PGM data?
thanks,
jen



Colin Brislawn

unread,
Jul 15, 2014, 1:16:23 PM7/15/14
to qiime...@googlegroups.com
Hello Jen,

I've only worked with Illumina data, but we have a similar problem with reverse reads.

Our pipeline begins with usearch7 -fastq_mergepairs. This script tells us how many reads are 'backwards.' We find that a very small amount of our data, maybe less then 0.1% are reversed. So far, we just ignore these reads.

You could, however, go back to your quality filtered reads and run "split_libraries_fastq.py --rev_comp_barcode --rev_comp." This will switch your barcodes around, allowing you to pull out the backwards reads. (I think that will align all the sequences the right way.) You could then merge that with your original split_libraries file and repick OTUs.

We have not done that, but you could probably seqeeze out some more data that way.

Colin

Lisa Mattei

unread,
Jul 15, 2014, 1:40:27 PM7/15/14
to qiime...@googlegroups.com
Hi Jen,

I'm confused because you shouldn't have reverse reads with the PGM.  Did you sequence from both directions or do you mean that the reverse primer is at the end of your read?  

Lisa
Lisa Mattei, PhD
Postdoctoral Associate
Laboratory of Dr. Greg Howe

Yale University School of Medicine
Department of Laboratory Medicine
330 Cedar St., CB 407
New Haven, CT 06520

Jennifer Barb

unread,
Jul 15, 2014, 1:50:43 PM7/15/14
to qiime...@googlegroups.com
HI Lisa and Colin,
Thanks for getting back to me again.  I am confused too!  The reverse primer is at the front of my reads.
For example, let me show you a result of my grep.  My reverse primer is: GGA CTA CHV GGG TWT CTA AT


Grep:

grep "GGACTAC**GGGT*TCTAAT" ./ames/pilotDataV4/fastq_bc/first.fastq > revprimer.fq


now i will pull out one sample by barcode: CTAAGGTAAC

adapatar sequence is GAT...


so these are just a couple of the reads that i pulled out.  I am trying to get more info from our sequencing lab but do you have any idea why I am seeing this?



>

CTAAGGTAACGATGGACTACCCGGGTTTCTAATCCTGTTCGCTCCCCACGCTTTCGAGCCTCAGCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCATTTCACCGCTACACATGGAATTCCACTCTCCCTTCTGCACTCAAGTTTGACAGTTTTCCAAAGCGAACTATGGTTGAGCCACAGCCTTTTAACTTCAGACTTATCAAACCTGCCTGCGCTCGCTTTACGCCC

>

CTAAGGTAACGATGGACTACCCGGGTTTCTAATCCTGTTTGCTCCCCACGCTTTCGCACATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATTCCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCTCCTAAGTACTCTAGCGACCCAGTACT



Jennifer Barb

unread,
Jul 15, 2014, 2:15:21 PM7/15/14
to qiime...@googlegroups.com
Hi Colin,
So here is what I am thinking, since I am using this Uparse/Usearch and qiime pipeline that I found on the BMP website (http://www.brmicrobiome.org/#!16sprofilingpipeline/cuhd), 
I am wondering if it would work for me to specify the reverse primer as the primer in the argument for:

then I found that the reads with the reverse primers are on average about 25-50 bp shorter than those with the forward primers?? I don't understand that either but that is a whole other topic.

At any rate, I was thinking about runing the reverse set all the way up through the abundance sort and discarding singletons steps.

 Strip barcodes ("Ex" is a prefix for the read labels, can be anything you like) <<<USING USEARCH 7>>>

 1.

python $PWD/fastq_strip_barcode_relabel2.py $PWD/reads.fastq ATTACCGCGGCTGCTGG $PWD/barcodes.fa Ex > reads2.fastq

2 - Quality filtering, length truncate, and convert to FASTA <<<USING USEARCH 7>>>

$u -fastq_filter $PWD/reads2.fastq -fastq_maxee 0.5 -fastq_trunclen 200 -fastaout reads.fa

3 - Dereplication <<<USING USEARCH 7>>>

$u -derep_fulllength $PWD/reads.fa -output derep.fa -sizeout

4 - Abundance sort and discard singletons <<<USING USEARCH 7>>>

$u -sortbysize $PWD/derep.fa -output sorted.fa -minsize 2



Then I can concatenate my fasta files for the forward and reverse results at this ponit and proceed with the rest of the pipeline.


There is a step further down on the pipeline that asks you to specify the strand: usearch_global:

http://drive5.com/usearch/manual/usearch_global.html

where mayeb I can change it to -strand both


and then just keep moving forward.


What do you think about this?  might work?


Jen


Lisa Mattei

unread,
Jul 15, 2014, 2:39:45 PM7/15/14
to qiime...@googlegroups.com
Ok, they must have sequenced from both directions by doing two PCRs for each sample and then mixing the pools.  This is how it's written in the Ion Amplification Library Preparation (Fusion Method) protocol (see pg. 3), but sequencing from one direction is fine (I do forward).  Are ~50 of your reads from the reverse primer?  These are NOT paired end reads, so I don't think you can use the Illumina scripts to process them.  

I think you could do the first two steps you suggest below, adjust the orientation of reads.fa using adjust_seq_orientation.py, and then concatenate with the forward reads.fa file.  That way you cluster with the reads all in the same orientation.  

Did you ever figure out what the pyqi: not found error was all about?

Colin Brislawn

unread,
Jul 15, 2014, 2:47:45 PM7/15/14
to qiime...@googlegroups.com
Hey Jen,

This is super interesting. Thanks for talking through your strategy with me.

The scope of my experience is limited to MiSeq and HiSeq, so my advice must be limited too: we are both new at IonTorrent.

I'm not sure how the script fastq_strip_barcode_relabel2.py works because our lab does not use it. Our Illumina data plays nice with split_libraries_fastq.py, so we use that.

I think your pipeline could work well. The only change I would make is to combine the two splits just after quality filtering. This lets dereplication work on ALL reads and will make sure the combined abundances are given to -cluster_otus.


I've got two other thoughts, although they are not very helpful.
1) I have the worst time making sure my reads are in the right direction. When I have played with alternate ways of parsing fastq data, I have to be extra careful keep the all pointing the right way.
2) How are these reads happening, anyway? Lise mentioned a dual PCR which is different from Illumina's. In our sequencing platform, we are more willing to throw away 'backwards' data because the error that made it backwards could also introduce sequencing data even after we align it properly.


Thanks for keeping me in the loop!
Colin

Jennifer Barb

unread,
Jul 15, 2014, 3:07:01 PM7/15/14
to qiime...@googlegroups.com
Great!  I will try that script adjust_seq_orientation as you suggest, Lisa. Thanks!  I think this might work, then I will concatenate and press forward with the pipeline.

Oh and that error, was apparently some issue I was writing wrong in my command line I think because I tried it again the next day after making sure that I had all of the files and the correct paths, and it worked!

I got through the pipeline down to the point where I run core_diversity_analyses and have been looking at the pdfs barcharts and area plots.

I also have been analyzing my OTU count tables with my own software.  

Really have thoroughly appreciated both, yours and Colin's help!

Jen



Lisa Mattei

unread,
Jul 15, 2014, 3:15:20 PM7/15/14
to qiime...@googlegroups.com
Hi Colin,

To answer your question #2 to Jen, they just swap the sequencing adapter and barcode from the forward primer to the reverse primer.  So they'd do one PCR with primers [A - barcode - linker - forward] and [trp1 - reverse] and the second PCR with primers [trp1 - forward] and [A - barcode - linker - reverse].  The two PCRs are mixed and sequenced together.  It's not a mistake to get the backwards reads, so they will have the same error profile as the forward reads. 

Lisa


Jennifer Barb

unread,
Jul 17, 2014, 11:39:01 AM7/17/14
to qiime...@googlegroups.com
HI Guys,
I got some feedback from our sequencing lab and he informed me that they did not do the Fusion Method that Lisa pointed out earlier, they used a kit "Ion Plus Fragment Library Kit". That said, I am not sure still if I should proceed in my pipeline taking the reads with reverse primers.  I am gettinga about equal numbers of reads with forward and those with reverse primers:
fwd: 242588 reads
rev: 253989 reads

I went through the whole process up to my OTU table taking the reverse reads into account and I got 123 OTUS compared to 98 OTUS when using only the forward reads.

I honestly do not know if i should be taking the reverse reads, reverse complementing and then concatenating with my forward reads.  I wonder if it is bringing in duplicate reads.  i would not have even realized this issue if I did not do a grep with my reverse primer provided by the sequencing lab.

do you all have any thoughts on this?
Jen

MAN0006846_RevA_UB_3March2014_Kim.pdf

Colin Brislawn

unread,
Jul 17, 2014, 1:11:40 PM7/17/14
to qiime...@googlegroups.com
Hey Jen,

Lisa would know more about the various IonTorrent kits then I would. It sounds like those reverse reads are supposed to be there if they account for half of the total data. I would try to include them, if possible.

It sounds like you have been able to do this! Just to verify, you got the reverse compliment of read2, combined them, dereplicated, sorted, and clustered. Is that correct? (That seems OK to me.) If you combine before dereplicating, you would not have to worry about duplicate reads being fed into the clustering algorithm.

UPARSE is really good at suppressing false-positives when it comes to new OTUs. When you included your reverse read, you doubled your sequencing depth. Finding 25 new OTUs after doubling your sequencing depth seams perfectly reasonable to me. I say you keep those new OTUs. They could be cool.


I'm still a little surprised that your reverse primer appears in your reverse reads after the barcode. Maybe Lisa knows if that is normal for IonTorrent.

Thanks for keeping me in the loop,
Colin

Jennifer Barb

unread,
Jul 17, 2014, 1:59:31 PM7/17/14
to qiime...@googlegroups.com
Hi,
Yes, that is why I am thinking about keeping them, because there are so many.  That is a lot of reads to toss away.  I did exactly what you wrote in that order and it seems to be working so far. 

I dont know what the orientation is supposed to be from other platforms but mine is:
barcode-adaptor-primer-TARGET

That is great to know about UPARSE... My results are looking decent so far.

Thanks to you both!
Jen


Lisa Mattei

unread,
Jul 17, 2014, 2:22:27 PM7/17/14
to qiime...@googlegroups.com
Hi Jen,

I use a similar protocol when I use the PGM for tumor profiling.  Instead of having the barcode and sequencing adapters as part of the primers, they are ligated to the amplicons after the PCR.  The adapters are randomly ligated to the 5' and 3' ends of the amplicons so there will be a mix of A-amplicon-P1, A-amplicon-A, P1-amplicon-A, P1-amplicon-P1.  The amplicons that have the same adapters on both ends aren't efficiently amplified during the next step, so there will be an enrichment of A-amplicon-P1 and P1-amplicon-A for templating and sequencing. This is a very expensive way to do the library prep because you need to buy Ion Torrent's kits ($500 for 10 samples, plus the barcode set at $7,200).  If you're planning on doing this frequently (and want to stick with the PGM), you should invest in a set of fusion PCR primers.

Anyway, I'm also not sure how you should proceed.  Are you getting similar results if you process the forward and reverse reads separately all the way through to the end?  You mentioned that the reverse reads are shorter than the forward reads.  You might be getting more OTUs when you process the forward and reverse reads together if the sequences aren't completely overlapping.

Lisa

Gregg Iceton

unread,
Jul 18, 2014, 7:00:16 AM7/18/14
to qiime...@googlegroups.com
Just wanted to add a thumbs up for fusion primers.  We use them unidirectionally for environmental samples with great success.

Jennifer Barb

unread,
Jul 18, 2014, 9:08:34 AM7/18/14
to qiime...@googlegroups.com
Thanks, Lisa and Gregg.  I will pass on the info about fusion primers to our sequencing lab and to the PI of the project I am working on.

By any chance, do you all know of a good fastq file quality checking tool for this kind of data?  I have used fastqc in the past with RNAseq data but not sure it is appropriate for 16S DNA sequencing.

Jen



Lisa Mattei

unread,
Jul 18, 2014, 9:16:36 AM7/18/14
to qiime...@googlegroups.com
You can use the -fastq_stats command in usearch.  For example: usearch -fastq_stats reads2.fq -log seqs.stats.log -fastq_qmax 45

Jennifer Barb

unread,
Jul 18, 2014, 10:26:07 AM7/18/14
to qiime...@googlegroups.com
Thanks, Lisa.  I tried it but got a Fatal error: "Phred score 45 out of range 0..41"
perhaps this can't be used with PGM data?

Lisa Mattei

unread,
Jul 18, 2014, 10:31:18 AM7/18/14
to qiime...@googlegroups.com
You need to increase the range of acceptable phed scores by passing -fastq_qmax. 45 is fine for me but you may have to adjust according to your data. 

Jennifer Barb

unread,
Jul 18, 2014, 10:42:37 AM7/18/14
to qiime...@googlegroups.com
OH ok. Got. thank you.

sp

unread,
Nov 18, 2014, 1:53:41 PM11/18/14
to qiime...@googlegroups.com
Hi Lisa,

Can you explain little bit more on demultiplexing in qiime. I have a 16s rRNA data from ion torrent and i intend to use qiime for the analysis. I also have variable length barcode. I wan to start by removing the bad quality reads. I am not able to find a command in qiime that would allow me to qc the data. Any idea on this ?

thank you
sp.

On Friday, June 20, 2014 2:22:51 PM UTC-5, Lisa wrote:

Hi Jen,

There are obviously many different ways, but I use pretty much the same pipeline as the Brazilian Microbiome Project (http://www.brmicrobiome.org/%23!16sprofilingpipeline/cuhd#!16sprofilingpipeline/cuhd).  It will make your life a lot easier to get a non-demultiplexed fastq file from the Torrent Suite.  If you have barcodes that are all the same length, you could follow the entire BMP protocol.  I stupidly decided to use the IonXpress barcodes, which are variable length, so I need to do the demultiplexing step in qiime.  If you need help with that part, let me know. 

I’m interested in hearing what others are doing. 

Lisa

Reply all
Reply to author
Forward
0 new messages