I am using Illumina MiSeq 600 cycle (2x300)
paired end data for the 16S gene, regions V3 and V4. These are
environmental samples, so I do not expect that all of my reads will be
assigned to the genus or species level but I am concerned because when I
use all of my reads including reads that are not joined, about 50% of
my OTUs are unassigned at any level. When I removed all of the un-joined
reads, my assigned taxonomy plots had much higher assignments (only
5-10% unassigned), but my counts per sample dropped significantly. In my test data for example, it dropped from A: 61,424 to 1825 and B: 65243 to 1886--see below. When I removed chimeric sequences from unjoined reads (using usearch61), there was not a huge change in reads assigned to the genus level (A: 1825 to 1559 and B: 1886 to 1650--see below). On my real data, in some cases only 250 reads were used per
sample, whereas for the same sample including unassigned reads had over
100,000 reads. I'm wondering if there is something that I can change in
my work-flow to better accommodate my data, or if perhaps there is
something I am currently using incorrectly. I had originally processed the same data using the Illumina 16S basespace app, and for the same samples, significantly more reads were assigned to the genus level (A: 120,000; B: 160,431 -which could be unpaired data, I'm not sure).
I tried using SILVA but
haven't gotten my parameter file to work properly, the script is killed
every time (but I think this is a separate issue- I'm not trying to
address that here, unless you think that would give me significantly
better results). This is my first analysis I'm trying to do using Qiime
so any input would be greatly appreciated.
fastq.py -i $PWD/PairedEndData/ -o
$PWD/SplitLib/ --remove_filepath_in_name
--include_input_dir_path
pick_de_novo_otus.py -a -O 7 -i $PWD/SplitLib/non_chimeric_seqs.fasta -o $PWD/OTUS/
summarize_taxa_through_plots.py -i $PWD/OTUS/otu_table.biom -m map.txt -o $PWD/SumTaxa
biom convert -i $PWD/OTUS/otu_table.biom -o $PWD/OTUS/otu_table_tabseparated.txt --to-tsv --header-key taxonomy --output-metadata-id "ConsensusLineage"
biom summarize-table -i $PWD/OTUS/otu_table_tabseparated.txt -o $PWD/OTUS/summarized_OTU_table.txt
Getting rid of unjoined reads (plus steps ** for removing chimeric reads):
multiple_join_pair_ends.py -i $PWD/IlluminaOutput/ -o $PWD/PairedEndData
find $PWD/PairedEndData/ -name "fastqjoin.un*" -print -exec mv {} Remove_Unjoined/ \;
multiple_split_libraries_fastq.py -i $PWD/PairedEndData/ -o $PWD/SplitLib_RMUnjoin/ --remove_filepath_in_name --include_input_dir_path
**identify_chimeric_seqs.py -m usearch61 -i $PWD/SplitLib_RMUnjoin/seqs.fna --suppress_usearch61_ref -o $PWD/ Chimeras_forRMUnJoinSplitLib/
**filter_fasta.py -f $PWD/SplitLib_RMUnjoin/seqs.fna -s $PWD/Chimeras_forRMUnJoinSplitLib/chimeras.txt -n -o
pick_de_novo_otus.py -a -O 7 -i $PWD/SplitLib_RMUnjoin/non_chimeric_seqs.fasta -o $PWD/OTUS_nonchimRMUnjoin/
summarize_taxa_through_plots.py -i $PWD/OTUS_nonchimRMUnjoin/otu_table.biom -m map.txt -o $PWD/SumTaxa_nonchimRMUnjoin
biom convert -i OTUS_nonchimRMUnjoin/otu_table.biom -o OTUS_nonchimRMUnjoin/otu_table_tabseparated.txt --to-tsv --header-key taxonomy --output-metadata-id "ConsensusLineage"
biom summarize-table -i OTUS_nonchimRMUnjoin/otu_table_tabseparated.txt -o OTUS_nonchimRMUnjoin/summarized_OTU_table.txt
Here is my config info:
System information
==================
Platform: linux2
Python version: 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2
2016, 17:42:40) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
Python executable: /home/envs/qiime1/bin/python
QIIME default reference information
===================================
For details on what files are used as QIIME's default references, see here:
https://github.com/biocore/qiime-default-reference/releases/tag/0.1.3Dependency versions
===================
QIIME library version: 1.9.1
QIIME script version: 1.9.1
qiime-default-reference version: 0.1.3
NumPy version: 1.10.4
SciPy version: 0.17.1
pandas version: 0.18.1
matplotlib version: 1.4.3
biom-format version: 2.1.5
h5py version: 2.6.0 (HDF5 version: 1.8.16)
qcli version: 0.1.1
pyqi version: 0.3.2
scikit-bio version: 0.2.3
PyNAST version: 1.2.2
Emperor version: 0.9.51
burrito version: 0.9.1
burrito-fillings version: 0.1.1
sortmerna version: SortMeRNA version 2.0, 29/11/2014
sumaclust version: SUMACLUST Version 1.0.00
swarm version: Swarm 1.2.19 [Mar 1 2016 23:41:10]
gdata: Installed.
QIIME config values
===================
For definitions of these settings and to learn how to configure QIIME, see here:
http://qiime.org/install/qiime_config.html http://qiime.org/tutorials/parallel_qiime.html blastmat_dir: None
pick_otus_reference_seqs_fp: /home/envs/qiime1/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
sc_queue: all.q
topiaryexplorer_project_dir: None
pynast_template_alignment_fp: /home/envs/qiime1/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set_aligned/85_otus.pynast.fasta
cluster_jobs_fp: start_parallel_jobs.py
pynast_template_alignment_blastdb: None
assign_taxonomy_reference_seqs_fp: /home/envs/qiime1/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
torque_queue: friendlyq
jobs_to_start: 7
slurm_time: None
denoiser_min_per_core: 50
assign_taxonomy_id_to_taxonomy_fp: /home/envs/qiime1/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt
temp_dir: /tmp/
slurm_memory: None
slurm_queue: None
blastall_fp: blastall
seconds_to_sleep: 1
-------------------------------------------------------------------------------------------
Here is my count data from each analysis
OTUS_with_joinPE_and_unjoined/summarized_otu_table.txt
Num samples: 2
Num observations: 106416
Total count: 126667
Table density (fraction of non-zero values): 0.506
Counts/sample summary:
Min: 61424.0
Max: 65243.0
Median: 63333.500
Mean: 63333.500
Std. dev.: 1909.500
Sample Metadata Categories: None provided
Observation Metadata Categories: ConsensusLineage
Counts/sample detail:
A: 61424.0
B: 65243.0
---------------------------------------------------------
Removed_Unjoined_denovOTU/summarized_OTU_table.txt
Num samples: 3
Num observations: 56469
Total count: 67068
Table density (fraction of non-zero values): 0.337
Counts/sample summary:
Min: 1825.0
Max: 63357.0
Median: 1886.000
Mean: 22356.000
Std. dev.: 28992.096
Sample Metadata Categories: None provided
Observation Metadata Categories: ConsensusLineage
Counts/sample detail:
A: 1825.0
B: 1886.0
UnJoin: 63357.0
---------------------------------------------------------
Removed_Unjoined_denovOTU_nonchimericseqs/summarized_OTU_table.txt
Num samples: 3
Num observations: 52084
Total count: 62026
Table density (fraction of non-zero values): 0.337
Counts/sample summary:
Min: 1559.0
Max: 58817.0
Median: 1650.000
Mean: 20675.333
Std. dev.: 26970.257
Sample Metadata Categories: None provided
Observation Metadata Categories: ConsensusLineage
Counts/sample detail:
A: 1559.0
B: 1650.0
UnJoin: 58817.0
Thank you for any input!!