process halted with large dataset using MUSCLE method

Skip to first unread message

Eng Piew Kok

Sep 20, 2016, 11:31:28 PM9/20/16
to Qiime 1 Forum
Hi, I wish to run using MUSCLE de novo methods on Virtual Box. It works well on small test dataset but the process halted when I run the actual dataset which is ~27,000 sequences.  I ran a simple one as below:

Command: -i rep_set.fna -m muscle -a muscle 

Traceback (most recent call last):
  File "/usr/local/bin/", line 211, in <module>
  File "/usr/local/bin/", line 208, in main
  File "/usr/local/lib/python2.7/dist-packages/qiime/", line 123, in __call__
    log_path=log_path, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/qiime/", line 259, in __call__
    result = self.getResult(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/qiime/", line 117, in getResult
    result = module.align_unaligned_seqs(seqs, params=params)
  File "/usr/local/lib/python2.7/dist-packages/bfillings/", line 556, in align_unaligned_seqs
    res = app(int_map.toFasta())
  File "/usr/local/lib/python2.7/dist-packages/burrito/", line 303, in __call__
  File "/usr/local/lib/python2.7/dist-packages/burrito/", line 325, in _handle_app_result_build_failure
    raise ApplicationError("Error constructing CommandLineAppResult.")
burrito.util.ApplicationError: Error constructing CommandLineAppResult.

Could that be memory issue? How do I solve the problem? Would it helps if I increase --muscle_max_memory? Thank you.


Sep 21, 2016, 9:32:22 AM9/21/16
to Qiime 1 Forum

it is possible that this is a memory issue. Try the following:
1) increase memory with --muscle_max_memory. This might not work, as by default muscle tries to allocate 80% of the available memory so you are probably close to the maximum available memory.
2) if increasing memory does not work, run half the sequences and see if it still fails. If it does, run half of those, and repeat successively until you find a number of sequences that you can actually align. That will give you a sense of how "close" you are to a set of sequences you could actually align. If you are too far off from your ~27,000 seqs, then you'll need to find another machine to run the alignment. If you are somewhat close (say, you can align 20K seqs), maybe you can try to filter the input sequences to remove some using quality filtering, similarity, etc.

Hope it helps,

Reply all
Reply to author
0 new messages