Hi developers and Qiimers!
I am studying bacterial communities from environmental samples. I have received raw and quality controlled sequence (preprocessed) data from the sequencing company in two batches: 40 and 60 samples from MiSeq and HiSeq platforms, respectively. I want to obtain a combined OTU table and run downstream analyses in QIIME. I have merged "quality controlled" sequences of both MiSeq and HiSeq outside QIIME and jumped to the pick_open_reference.py script.
The first lines of the merged file (HX16S.fna) looks like:
>Y12D01_0
TGGGGATATTGGACAATGG....
>Y12D01_1
TGGGGAATAATTGGACAA...
>Y12D01_2
TGGGGAATTTTGGA...
(12D01 is the name of the first of 100 samples)
On page 390 of Navas-Molina et al. "Advancing our understanding of the human microbiome using QIIME." Methods in enzymology 531 (2013): 371, it says:
"QIIME can perform all the steps for generating the OTU table and the phylogenetic tree from the preprocessed data in a single command"....
and
"For open-reference (run time 27 h on 20 processors):
pick_open_reference_otus.py -o $PWD/open_ref_otus -i $PWD/slout/seqs.fna -r $PWD/gg_12_10_otus/rep_set/97_otus.fasta -a -O 20"
I run a similar script (assuming QIIME defaults: uclust; and gg_13_8) on the combined file (HX16S.fna):
pick_open_reference_otus.py -i $PWD/02_Raw/HX16S.fna -o $PWD/03_Open_ref_picked_otus
It resulted into 6 folders (pynast_aligned_seqs; step1_otus; step2_otus; step3_otus; step4_otus and uclust_assigned_taxonomy) and 10 files (final_otu_map.txt; final_otu_map_mc2.txt, index.html; log file; new_refseqs.fna; otu_table_mc2.biom; otu_table_mc2_w_tax.biom; otu_table_mc2_w_tax_no_pynast_failures.biom; rep_set.fna; rep_set.tree).
1. Does this workflow make sense?
2. In the above Navas-Molina et al.2013 single script after preprocessing, where are chimeras removed?
I thank you in advance.
Alain