Import UPARSE into Qiime virtual box

baoanxh2006

unread,

Dec 22, 2015, 7:00:49 PM12/22/15

to Qiime 1 Forum

Dear Qiime User!

I have used Qiime virtual box version 1.9.1 and also installed usearch (v5) and usearch61 to analyse my 16S (V3-V4 regions) paired-end illumina data.

I have tried several ways with Qiime alone, usearch and usearch 61: each way resulted in different otu number and percentage of unassigned sequences (up to 12%).

I want to reduced the percentage of unassigned sequences and get a reasonable number of OTUs.

(P.S: my sequencing center send me a report with reasonable OTU number and very low unassigned sequences 0.1%; but I want to learn analysis and also need OTU.biom table and phylegenetic tree for downstream analysis.

They used UPARSE pipeline).

Therefore, I want to try to import UPARSE into Qiime following this post (https://groups.google.com/d/msg/qiime-forum/zqmvpnZe26g/ksFmMwDHPi8J) by Mike Robeson.

1) I used usearch61 and following this process smoothly until the step of calling python script "python fasta_number.py seqs.filtered.derep.mc2.repset.nochimeras.fasta OTU_ > seqs.filtered.derep.mc2.repset.nochimeras.OTUs.fasta".

error "python: can't open file ''fasta_number.py": [Errno 2] No such file or directory"

Is it possible for me to implement this process in Qiime virtual box? if yes, how can I call python script here?

2) I have not done UPARSE before, so please advise me: 1) how to prepare otu.map.uc; 2) can I do "modified the function 'GetSampleID' in the script 'uc2otutab.py' and renamed the script 'uc2otutab_mod.py'" in Qiime virtual box? and how?

3) Most important thing: how can I produce phylogenetic tree for downstream diversity analysis?

I am looking forward to hearing from you!

Thank you very much!

Kind regards,

An

Colin Brislawn

unread,

Dec 22, 2015, 7:41:33 PM12/22/15

to Qiime 1 Forum

Hello An,

Which version of uparse are you using? The new versions make this whole process really easy. (Like, way easier than it was two years ago with this post was written.)

Take a look at the current UPARSE pipeline:

http://drive5.com/usearch/manual/uparse_pipeline.html

Once you get an OTU table and res_set.fna file, we can tackle treebuilding.

Colin

baoanxh2006

unread,

Dec 22, 2015, 10:17:17 PM12/22/15

to Qiime 1 Forum

Hello Colin,

Do you mean usearch version?

There are only two supported versions of usearch: usearch (v5) and usearch61 in my qiime virtual box.

I have checked the UPARSE pipeline you recommended.

Is this pipeline (http://drive5.com/usearch/manual/upp_ill_pe.html) for my illumina paired-end reads?

1) usearch -fastq_mergepairs *_R1_*.fastq -relabel @ -fastqout merged.fq
2) usearch -fastq_filter merged.fq -fastq_maxee 1.0 -relabel Filt -fastaout filtered.fa
3) usearch -derep_fulllength filtered.fa -relabel Uniq -sizeout -fastaout uniques.fa
4) usearch -cluster_otus uniques.fa -minsize 2 -otus otus.fa -relabel Otu
5) usearch -usearch_global merged.fq -db otus.fa -strand plus -id 0.97 \
-otutabout otutab.txt -biomout otutab.json

Can I follow this pipeline using usearch (v5) or usearch61?

If yes, can I skip step 1. Instead, I will use the read after merging and trimming primers from Qiime?

Is uniques.fa equal to res_set.fna?

In step 5, what are otutab.txt and otutab.json?

If not, can I install an independent updated version of usearch in qiime virtual box to follow this pipeline?

Sorry if I misunderstood the process.

Please advise me.

Colin Brislawn

unread,

Dec 23, 2015, 1:15:00 AM12/23/15

to Qiime 1 Forum

Hello An,

Ah yes, all the different versions of usearch.

The maker of USEARCH (and thus, the uparse pipeline) provides documentation for the newest version of his software, which is now 8.1. Those steps probably will not work on earlier versions.

Instead of using only uparse or only qiime, I would suggest jumping back and forth between these two pipelines. How about this:

Use qiime to join. join_paired_ends.py
Use qiime to demultiplex and quality filter your reads. split_libraries_fastq.py The output of this will be a seqs.fna file (equlivent to filtered.fa)
usearch -derep_fulllength
usearch -cluster_otus
usearch -usearch_global

You now have a .biom table! You can then move on to tree building and taxonomy assignment.

To answer your specific questions:

rep_set.fna (from qiime) is your OTU centroids and is equal to otus.fa (from uparse)

otutab.txt and otutab.json are two copies of your OTU table (also called a feature-abundance table sometimes). The .txt one is a flat text format and the .json one is in json format. The data is the same, but some programs prefer one to the other.

Colin

baoanxh2006

unread,

Dec 23, 2015, 4:08:18 AM12/23/15

to Qiime 1 Forum

Hello Colin,

Thank you very much for your prompt and useful advice!

My plan is to do as your recommendation.

Just to be clear, I need to install newest version of usearch to do it, right? Can I install usearch version 8.1 in qiime virtual box?

Many thanks,

Kind regards,

An

Colin Brislawn

unread,

Dec 23, 2015, 11:40:14 AM12/23/15

to Qiime 1 Forum

Hello An,

Yep, I would install the newest version of usearch to use those commands.

Alternatively, you could try installing vsearch, an open source, 64-bit, optimal implementation of usearch. I really like using open source software for science (so I know what's going on) and have had a consistently good experience with vsearch and it's devs.

https://github.com/torognes/vsearch

If you chose to use vsearch like I do, the commands would be

vsearch -derep_fulllength -minuniquesize 2

vsearch -cluster_size

vsearch -usearch_global

So basically the same.

Colin

baoanxh2006

unread,

Dec 23, 2015, 6:43:21 PM12/23/15

to Qiime 1 Forum

Many thanks Colin,

I will try both.

Kind regards,

Hop

Colin Brislawn

unread,

Dec 23, 2015, 6:49:03 PM12/23/15

to Qiime 1 Forum

Let me know how that works for you.

I should mention that vsearch will not perform chimera checking by itself (the way uparse/usearch will). So I would add chimera checking between clustering and read mapping. This will to wonders to suppress OTU inflation.

vsearch -cluster_size

vsearch -uchime_denovo

vsearch -usearch_global

Happy Holidays,

Colin

baoanxh2006

unread,

Dec 24, 2015, 2:19:35 AM12/24/15

to Qiime 1 Forum

Hello Colin,

Thank you very much for your prompt support and nice wish!

I just tried Uparse with newest version of usearch (v8.1) as below steps:

In step1, I relabeled samples by renaming directory, created seqs.fastq. No filtering was applied (q = 0)

And then use fastq_filter for filtering in step 2.

1) multiple_split_libraries_fastq.py -i joined_seqs_no_primers_test/ -o usearch81/split_lib_out/ -p split_libraries_parameters.txt --read_indicator reads --include_input_dir_path --remove_filepath_in_name

2) usearch8.1 -fastq_filter split_lib_out/seqs.fastq -fastq_maxee 0.5 -fastq_trunclen 240 -fastaout filtered.fa

3) usearch8.1 -derep_fulllength filtered.fa -sizeout -fastaout uniques.fa

4) usearch8.1 -cluster_otus uniques.fa -minsize 2 -otus otus.fa -relabel Otu

5) usearch8.1 -usearch_global filtered.fa -db otus.fa -strand plus -id 0.97 -otutabout otutab.txt

6) biom convert --table-type "OTU table" -i otutab.txt -o otutab.biom --to-hdf5

7) assign_taxonomy.py -i otus.fa

8) biom add-metadata --sc-separated taxonomy --observation-header OTUID,taxonomy --observation-metadata-fp uclust_assigned_taxonomy/otus_tax_assignments.txt -i

9) otutab.biom -o otutab_tax.biom

I got issues at step 5, creating otutable. For example, if I label samples by D01, D02 and D03. It created a weird otutable (otutab1.txt) with sample column is sequence ID. It is too big for attachment, so I shared it here (https://drive.google.com/folderview?id=0B1i31VHcJuE7STNEdDFiWjMyTDg&usp=sharing).

If I label samples by D11.0115, D12.0715, D11.0815. It created otutab.txt with only two samples: D11 and D12.

I have spent time to look for the solution, but have not success yet.

Could you please take a look for me?

Merry Christmas and wish you a very happy new year!

Kind regards,

An

otutab.txt

Colin Brislawn

unread,

Dec 24, 2015, 6:51:17 PM12/24/15

to Qiime 1 Forum

Hello An,

I think there is some issue with how the qiime script labels samples and how uparse expects these samples to be labeled.

uparse: http://drive5.com/usearch/manual/upp_labels_sample.html

qiime: http://qiime.org/scripts/add_qiime_labels.html

I think uparse expects Sample1.222 while qiime expects Sample1_222.

Can you help me varify this? Use this command to print the top 10 lines of your otutab1.txt file, then post those lines here

head -n 10 otutab1.txt

Colin

baoanxh2006

unread,

Dec 26, 2015, 9:59:07 PM12/26/15

to Qiime 1 Forum

Hello Colin,

I am so sorry for this late reply!

So happy to see your email! Many thanks.

When I run the command "head -n 10 otutab1.txt", it run weirdly with a lot of data go through on the terminal.

I printed screen when it runs (head n10 otutab1txt1) and at the end (head n10 otutab1txt2) as attached pictures.

Here is result when I run "head -n 10 otutab.txt" with otutab.txt file:

#OTU ID D11 D12

Otu206 127 27

Otu18 3480 380

Otu17 3538 1278

Otu2 1.738e+04 7135

Otu13 4764 1772

Otu1122 240 80

Otu5 1.284e+04 6529

Otu12 6154 679

Otu26 5722 2489

I am looking forward to hearing from you!

Kind regards,

Hop

head n10_otutab1txt1.jpg

head n10_otutab1txt2.jpg

baoanxh2006

unread,

Dec 26, 2015, 11:37:15 PM12/26/15

to Qiime 1 Forum

Hello Colin,

I checked the links you posted, particularly this sentence " If sample= is not found, the sample identifier is assumed to start at the beginning of the label and continue to the first character in the label which is not alphanumeric or an underscore".

Therefore, when I label samples by only D01, D02 and D03. It probably can not recognize the label since not alphanumeric or an underscore were presented.

On the other hand, when I label samples by D11.0115, D12.0715, D11.0815. The samples D11.0115 and D11.0815 were recognized as the same sample D11 with (.) as fisrt non-alphanumeric character.

I will continue my work and will update the situation.

Please advise me if you see any problem with my analysis.

Thanks again!

Kind regards,

An

Colin Brislawn

unread,

Dec 27, 2015, 12:58:40 PM12/27/15

to Qiime 1 Forum

Hello Hop,

The file otutab.txt looks OK. But the sample IDs may not be what you want.

On the other hand, when I label samples by D11.0115, D12.0715, D11.0815. The samples D11.0115 and D11.0815 were recognized as the same sample D11 with (.) as fisrt non-alphanumeric character.

Correct! Very well said.

To avoid, sample IDs should only consist of letters and numbers. Avoid that period, and you should be fine.

Happy New year!

Colin

baoanxh2006

unread,

Dec 29, 2015, 2:40:19 AM12/29/15

to Qiime 1 Forum

Thanks Colin,

Wish you a very happy new year!

Kind regards,

An

baoanxh2006

unread,

Dec 30, 2015, 1:46:34 AM12/30/15

to Qiime 1 Forum

Hello Colin,

Going back to phylogenetic tree, I have tested with following command for making tree and then diversity analysis:

1) align_seqs.py -i otus.fa -o pynast_aligned/

2) filter_alignment.py -i pynast_aligned/otus_aligned.fasta -o pynast_aligned/

3) make_phylogeny.py -i pynast_aligned/otus_aligned_pfiltered.fasta -o rep_tre.tree

4) core_diversity_analyses.py -o core_div_out/ -i otutab_tax.biom -m ../mapping_no_barcodes_1in100_file.txt -t rep_tre.tre -e 110000 --recover_from_failure &

Is the procedure ok?

I am not confident since the command description in Usearch is not clear as Qiime, so I do not know what actually happen. For example, is seqs in otus.fa file (output of cluster_otus command) is aligned?

I am looking forward to your advice!

Kind regards,

An

Colin Brislawn

unread,

Dec 30, 2015, 12:33:59 PM12/30/15

to Qiime 1 Forum

Hello An,

Those scripts looks good!

otus.fa is your list of OTU centroid sequences created by uparse. In qiime, this file would be called rep_set.fna, but the idea is the same. This file has not undergone any sort of MSA, which is we pass it to pynast in the first step. After alignment, we remove gappy places in the alignment (step 2) and calculate a maximum likelihood tree that tries to explain the evolutionary relationship of these organisms (step 3).

A note about step 3: your example output is rep_tre.tree. In qiime, the list of OTUs is called rep_set.fna. Because your OTUs are called otus.fa, otus.tre may be a more consistent name. (This does not matter to the computer, but clear names helps me understand my files.)

Keep up the good work!
Happy New Year!

Colin

baoanxh2006

unread,

Dec 31, 2015, 12:23:50 AM12/31/15

to Qiime 1 Forum

Hello Colin,

I am glad to see your reply!

Thanks kindly for your advice and best wishes.

Have a great New Year Eve!

Cheers,

An

Reply all

Reply to author

Forward