pick_otus:enable_rev_strand_match True, paired end illumina miseq data, and chimeras

Kate Blackwell

unread,

Feb 26, 2016, 3:09:10 AM2/26/16

to Qiime 1 Forum

I seem to be going around in circles trying to resolve my issue, so any advice/help would be greatly appreciated... Please let me know if any additional information is required in addition to what has already been provided

Goal: Preform chimera detection and removal on paired end illumina miseq data

Preformed:

Multiple join paired ends
Multiple split libraries

Next steps:

Detect and filter chimeras
Pick OTUs via open reference

Issues:

Can't use ChimeraSlayer or USearch 6.1 on reversely oriented data
Trying to determine if sequences are reversely oriented
Use of pick_otus:enable_rev_strand_match True
Best place in pipeline to detect and filter chimeras

Questions:

Which would be better to use, ChimeraSlayer or USearch 6.1, on paired end illumina miseq data when trying to target the rare biosphere? The one used would determine where chimera detection and filtering occur in the pipeline, but how much of a difference does it necessarily make?
How do I tell if my sequences are reversely oriented? After using pick_otus:enable_rev_strand_match True and pick_otus:enable_rev_strand_match false, I don't observed failure of the sequences to cluster. This would indicate to me that my sequences are not overwhelming reversely oriented, so I should just leave the parameter as false. Is it even possible to have reversely oriented sequences when the paired-ends were joined? Should pick_otus:enable_rev_strand_match True be used as a parameter on paired end illumina miseq data? I have run the following:

pick_open_reference_otus.py -f -a -O 6 -i seqs.fna -o pick_otus_open_reference -p pick_open_references_otus_parallel_parameters.txt

with parameters:

pick_otus:enable_rev_strand_match True

assign_taxonomy:assignment_method uclust

parallel_assign_taxonomy_uclust:reference_seqs_fp /macqiime/greengenes/gg_13_8_otus/rep_set/97_otus.fasta

parallel_assign_taxonomy_uclust:id_to_taxonomy_fp /macqiime/greengenes/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt

and detected:

Num samples: 2

Num observations: 3512

Total count: 25000

Table density (fraction of non-zero values): 0.529

Counts/sample summary:

Min: 8463.0

Max: 16537.0

Median: 12500.000

Mean: 12500.000

Std. dev.: 4037.000

Sample Metadata Categories: None provided

Observation Metadata Categories: taxonomy

Counts/sample detail:

Sample1: 8463.0

Sample2: 16537.0

and then run:

pick_open_reference_otus.py -f -a -O 6 -i seqs.fna -o pick_otus_open_reference -p pick_open_references_otus_parallel_parameters.txt

with parameters:

assign_taxonomy:assignment_method uclust

parallel_assign_taxonomy_uclust:reference_seqs_fp /macqiime/greengenes/gg_13_8_otus/rep_set/97_otus.fasta

parallel_assign_taxonomy_uclust:id_to_taxonomy_fp /macqiime/greengenes/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt

and detected:

Num samples: 2

Num observations: 3533

Total count: 24123

Table density (fraction of non-zero values): 0.528

Counts/sample summary:

Min: 8354.0

Max: 15769.0

Median: 12061.500

Mean: 12061.500

Std. dev.: 3707.500

Sample Metadata Categories: None provided

Observation Metadata Categories: taxonomy

Counts/sample detail:

Sample1: 8354.0

Sample2: 15769.0

Colin Brislawn

unread,

Feb 26, 2016, 12:24:10 PM2/26/16

to Qiime 1 Forum

Hello Kate,

Thanks for getting in touch with us.

You are on the right track! The output of multiple split libraries should be a single seqs.fna file with qiime compatible sampleIDs. That's the file you feed into chimera checking followed by OTU picking. Just like this: http://qiime.org/tutorials/chimera_checking.html#usearch-6-1

I would use the UCHIME algorithm, which is implemented in usearch61 and vsearch.

I don't think your reads would be reversed oriented after using join paired ends... Is there some clue that alerted you to the possibility of some of them being reversed?

Colin Brislawn

P.S. "rare biosphere" Good luck!

Kate Blackwell

unread,

Feb 26, 2016, 1:39:56 PM2/26/16

to Qiime 1 Forum

Thanks!

I wasn't sure if there would be sequences present that were of reverse orientation despite using join paired ends based on the literature I was reading. Couldn't really find a clear answer...

I decided to use USearch61, however would de novo or a reference data make more sense? The samples I am working with are deepsea sediment and water samples and thus not well represented in the databases as of yet, especially as I am focusing on the "rare biosphere." If a reference database is used, should the same database be used when picking OTUs?

Thanks!

Colin Brislawn

unread,

Feb 26, 2016, 2:55:26 PM2/26/16

to Qiime 1 Forum

I don't think many of your reads will be reversed. After all, the forward and reverse primers are unique so the correction direction of reads should be in the correct files.

I'm a fan of de novo methods, including uchime-denovo, because I don't trust the completeness of the reference databases. UCHIME denovo fully sidesteps this problem. Robert Edgar, the maker of usearch, currently recommends using the same database for uchime-ref as for taxonomy assignment. Because this environment is poorly understood, I think de novo methods would be a good fit for many parts of this project.

Great questions!

Keep in touch,

Colin Brislawn

Kate Blackwell

unread,

Feb 26, 2016, 4:01:48 PM2/26/16

to Qiime 1 Forum

Thanks once again for all of your advice! Is there a way to tell how complete a reference database may be for specific samples? Is it as simple as comparing the resulting chimeras from de novo and a reference database?

Kate Blackwell

unread,

Feb 26, 2016, 4:30:40 PM2/26/16

to Qiime 1 Forum

To clarify, I am using USearch61 when I pick OTUs and Blast when I assign taxonomy. Is USearch61 the same as UCHIME and does using Blast instead of USearch61 to assign taxonomy present an issue? I would still be using the same reference database when checking for chimeras, picking OTUs, and assigning taxonomy. I read on the QIIME website that USearch61 should be used to select OTUs, but I did not see any such requirements when assigning taxonomy.

Thanks!

Colin Brislawn

unread,

Feb 26, 2016, 4:49:36 PM2/26/16

to Qiime 1 Forum

Hello Kate,

One thing you could do is align your reads to your reference database and see what percentage align at given thresholds. Like, you choose a threshold of 97% (common minimum for OTU picking) and see what percentage of your reads are within 97% of anything in your database. In a well characterized community, maybe 70-90% of the reads will be within 97%. In less studied communities, maybe 20-50% of reads will closely match the database.

The log files of open-ref OTU picking will tell you this, showing how many reads matched the reference during the first closed-ref stage and how many did not match and were subsequently clustered de novo in the second step.

Colin

Colin Brislawn

unread,

Feb 26, 2016, 5:22:49 PM2/26/16

to Qiime 1 Forum

Haha, oh I think the overlapping names of programs and algorithms is confusing us. We have to start with the guy who made every single program you just listed, except blast.

Meet Robert Edgar. After making the Multiple Sequence Aligner (MSA) called MUSCLE, he gets a ludicrous number of citations, declares MSA 'a dead field,' then starts a company.

http://www.drive5.com/muscle/

https://robertedgar.wordpress.com/2010/05/02/multiple-protein-alignment-is-a-dead-field/

After selling the company with enough money to retire, he returns to the field of bioinformatics. Because multiple sequence alignment is still dead, he focuses on clustering with an algorithm call 'uclust.' At this, both the algorithm and software is called uclust. The HMP teams up with Robert Edgar to make standardized pipelines for amplicon analysis, built around his uclust software. These pipeline eventually become QIIME, and an old version of uclust still ships with qiime. You can find it by typing 'uclust' in any terminal with qiime loaded.

His research expands, to include search, with a program call 'usearch' that includes the clustering (uclust), search (usearch and ublast), chimera checking (uchime), and much more. All these features are part of usearch 6.1. In usearch 7, he adds specialized OTU clustering (uparse). in 8.1, he adds taxonomy assignment (utax). I think he's now working on chimera checking again.

When you call pick_otus.py -m uclust you are clustering OTUs with the uclust algorytnm as implimented in uclust 1.2.2q.

When you call pick_otus.py -m usearch you are clustering OTUs with the uclust algorytnm as implimented in usearch 5.

When you call pick_otus.py -m usearch61 you are clustering OTUs with the uclust algorytnm as implimented in usearch 6.1.

None of these methods include chimera checking. You should do that in a separate first step, as shown here: http://qiime.org/tutorials/chimera_checking.html#usearch-6-1

I hope that helps!

Colin Brislawn

P.S. There now exists a clean-room implementation of usearch, called vsearch. It's non-heuristic (more precise), can handle larger data sets, and is open source.

It's great to use and the devs are nice. https://github.com/torognes/vsearch

Kate Blackwell

unread,

Apr 19, 2016, 2:51:17 PM4/19/16

to Qiime 1 Forum

Thanks! Your explanation helped immensely.

Reply all

Reply to author

Forward