dada2 to qiime taxa info

Brittany

unread,

Jun 8, 2017, 5:42:17 PM6/8/17

to Qiime 1 Forum

Hello Everyone!

I would like to use dada2 to clean and process my 16S V4 MiSeq reads. My understanding is that dada2 as implemented in QIIME is not capable of processing the paired-end reads for merging. So instead I will be using the dada2 standalone to be able to use both my forward and reverse reads and then merge them. I would then like to import these results into QIIME for down stream analyses. I see that in this tutorial (https://github.com/johnchase/amplicon-pipeline/blob/master/dada2-for-qiime.md ) converting the dada2 output to biom for QIIME is well described.

What is not clear to me is how and when is it optimal to add taxonomy information.

- Option 1: assign taxonomy info in dada2 and then add it via “ biom add-metadata “ (if using this option I am not sure what format the taxa information needs to be in. Should it be in all one string separated by “_” and designation of what taxa level ex: k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus )

- Option 2: assign taxonomy in QIIME using the “rep_set.fna” equivalent I generate from the dada2 output if I follow the tutorial mentioned above: https://github.com/johnchase/amplicon-pipeline/blob/master/dada2-for-qiime.md

Any thoughts are greatly appreciated!

Thank you!

Brittany

Colin Brislawn

unread,

Jun 8, 2017, 6:09:22 PM6/8/17

to Qiime 1 Forum

Hello Brittany,

This is a great question. Both options are valid. The question is, do you want to use the dada2 taxonomy assignment algorithm, or one of the algorithms implemented in qiime (the qiime default is -m uclust, a LCA inferred from top hits of a uclust search).

The 'biom add-metadata' is pretty flexible. You can read about the expected for here:

http://biom-format.org/documentation/adding_metadata.html#adding-sample-and-observation-metadata-to-biom-files

Colin

Message has been deleted

Brittany

unread,

Jun 9, 2017, 11:23:12 AM6/9/17

to qiime...@googlegroups.com

Hi Colin,

Thank you so much for the reply! The link that you sent about adding metadata to a biom table is very helpful.

I just want to clarify to make sure I understand the steps correctly if I decide to assign taxa and make the phylogenetic tree in QIIME using the converted dada2 output.

1. following this tutorial https://github.com/johnchase/amplicon-pipeline/blob/master/dada2-for-qiime.md from the dada2 output I generate a "rep_set.fna" equivalent and an OTU biom table

2. Assign Taxa

a. Using the "rep_set.fna" I can use that in “assign_taxonomy.py” to assign taxonomy to the reads

b. using the “add metadata” option I can add the taxa information to the biom table I generate in step 1

3. Create Phylogenetic Tree

a. Using the "rep_set.fna" I can run the command “align_seps.py”

b. aligned reads then put into “filter_alignment.py”

c. filtered reads can then be used in make_phylogeny.py

Thanks for looking this over!

Brittany

Colin Brislawn

unread,

Jun 9, 2017, 11:58:50 AM6/9/17

to Qiime 1 Forum

Good morning Brittany,

This is correct! This method you describe here is accurate and thorough.

Here is how I ran these two steps on a past project. Please see scripts 7) for taxonomy and 8) for tree building.

https://github.com/pnnl/bernstein-2016-productivity-and-diversity/tree/master/analysis/scripts

There are multiple reasonable ways to do this, so my scripts are just one example.

Let me know if you have other questions,

Colin

Brittany

unread,

Jun 9, 2017, 12:29:49 PM6/9/17

to Qiime 1 Forum

Hi Colin,

Thank you for the scripts and your help! :) I will be sure to be in touch if I have any questions.

Have a great weekend!

Brittany

unread,

Jun 13, 2017, 2:54:28 PM6/13/17

to Qiime 1 Forum

Hi Colin,

Thank you for your scripts and advice, bringing dada2 into QIIME is going pretty well. I have just run into one issue.

My rep_set.fna file that I create from the dada2 following this tutorial has the sequence number and the sequence itself. I can then assign the taxonomy, and get for each sequence number its taxonomic assignment from running parallel_assign_taxonomy_uclust.py

However, the biom table I have created from the dada2 output following the tutorial above has the actual sequence as the OTU identifier not the sequence number. Therefore, I can not directly add the taxa info to otu table as metadata, because it looks for the sequence number and the only identifier is the sequence itself.

I am working on figuring out ways to change the OTU table to list the sequence number rather than the sequence, but thought I would ask to make sure you haven't had this problem before and addressed it.

Thanks!

Brittany

unread,

Jun 13, 2017, 4:32:41 PM6/13/17

to Qiime 1 Forum

I just wanted to send a quick update that I found a work around. I reformatted the rep_set.fna file using awk so that each row had the sequence number and then the sequence itself. I then merged this in R with my OTU table and reformatted it so that the sequence number were now the OTU IDs. The taxa info could then be added to this new OTU table.

Thanks!

Britt

Colin Brislawn

unread,

Jun 13, 2017, 5:56:12 PM6/13/17

to Qiime 1 Forum

Excellent! I'm glad you found this workaround and shared it so that other qiime users can make use of it.