Hi Qiime users,
I recently ran denovo identify_chimeric_seqs.py using vsearch (renamed as usearch61). It took 48 hours to complete and in the end I obtained an error message along with three files:
XXX_consensus_with_abundance.fasta
XXX_consensus_with_abundance.uc
XXX_smallmem_clustered.log
The command I used was identify_chimeric_seqs.py -m usearch61 -i SASA_TMRU_FR_check_seqs.fna -o chimeric_seqs_97_USearch61 --suppress_usearch61_ref
I chose to suppress the use of a reference database as I was concerned that if I used, for example, my GreenGenes 16S reference database, that this would render me back in position of effectively doing a closed reference OTU picking strategy (which I have already tried and it wipes out a huge proportion of my sequences). So firstly, I am finding it hard to find information on the pros and cons of using a reference database during chimera checking; does anyone have any experience of comparison, have any advice, or knows a good place to seek out this information? I'd love to hear.
The error I got was:
Traceback (most recent call last):
File "/macqiime/bin/identify_chimeric_seqs.py", line 4, in <module>
__import__('pkg_resources').run_script('qiime==1.9.0', 'identify_chimeric_seqs.py')
File "/macqiime/lib/python2.7/site-packages/setuptools-12.2-py2.7.egg/pkg_resources/__init__.py", line 698, in run_script
File "/macqiime/lib/python2.7/site-packages/setuptools-12.2-py2.7.egg/pkg_resources/__init__.py", line 1616, in run_script
File "/macqiime/lib/python2.7/site-packages/qiime-1.9.0-py2.7.egg/EGG-INFO/scripts/identify_chimeric_seqs.py", line 354, in <module>
main()
File "/macqiime/lib/python2.7/site-packages/qiime-1.9.0-py2.7.egg/EGG-INFO/scripts/identify_chimeric_seqs.py", line 350, in main
threads=threads)
File "/macqiime/lib/python2.7/site-packages/qiime-1.9.0-py2.7.egg/qiime/identify_chimeric_seqs.py", line 774, in usearch61_chimera_check
log_lines, verbose, threads)
File "/macqiime/lib/python2.7/site-packages/qiime-1.9.0-py2.7.egg/qiime/identify_chimeric_seqs.py", line 894, in identify_chimeras_usearch61
parse_usearch61_clusters(open(output_consensus_uc, "U"))
File "/macqiime/lib/python2.7/site-packages/burrito_fillings-0.1.0-py2.7.egg/bfillings/usearch.py", line 2483, in parse_usearch61_clusters
KeyError: 'denovo45'
Secondly, can anyone help with how to fix this error?
I've been trying to play with the XXX_consensus_with_abundance.fasta file, printing all the singletons from here, and using this list to remove all the corresponding sequences from an OTU picked, rep_set, aligned file. It later occurred to me that this still would not be taking out the chimeras, so I wanted to ask what I might have done wrong in this chimera checking step, especially given that this step takes me so so long to run (6,187,414 sequences, 3.61GB). I've also been trying to used chimeraslayer, however this has been running for almost a week now.
Thanks for any help or information!