Hi Qiime users,
we sequenced two samples of ITS 1-2 target on 454 sequencer. I'm trying to analysis this data through qiime pipeline. After processing data using qiime pipeline, I've noticed about 80% of the reads were unidentified at phylum level for the two samples as shown below.
Taxon s1 s2
k__Fungi;p__unidentified
|
85.31% |
80.47% |
I used the following commands to process the data
pick_reference_otus_through_otu_table.py -p $PWD/ITS_params.txt -i $PWD/split_libarary/seqs.fna -r /home/qiime/qiime_software/its_12_11_otus/rep_set/99_otus.fasta -o $PWD/ref_otus
make_otu_table.py -i $PWD/ref_outs/uclust_ref_picked_otus/seqs_otus.txt -o out_new.biom -t ~/qiime_software/its_12_11_otus/taxonomy/otu_taxonomy.txt
summarize_taxa_through_plots.py -i out_new.biom -o taxa_plot -p ~/ITS_params.txt -m $PWD/ITS_mapping.txt -s
Not sure why a large percentage of data is discarded. Is there a way to identify this data.
upon observation the largest file is 13M "sh_refs_qiime_ver6_99_09.02.2014.fasta" but the previous version its_11_12 release the file size is as follows.
36M 97_otus.fasta 59M 99_otus.fasta
I assume upon updating a database the file size has to be increased. Do you think I just have to merge the two fasta files (old and the new one) ?
Many thanks for your time.
Sp