align_seqs.py process halted with large dataset using MUSCLE method

73 views
Skip to first unread message

Eng Piew Kok

unread,
Sep 20, 2016, 11:31:28 PM9/20/16
to Qiime 1 Forum
Hi, I wish to run align_seqs.py using MUSCLE de novo methods on Virtual Box. It works well on small test dataset but the process halted when I run the actual dataset which is ~27,000 sequences.  I ran a simple one as below:

Command:
align_seqs.py -i rep_set.fna -m muscle -a muscle 

Error:
Traceback (most recent call last):
  File "/usr/local/bin/align_seqs.py", line 211, in <module>
    main()
  File "/usr/local/bin/align_seqs.py", line 208, in main
    log_path=log_path)
  File "/usr/local/lib/python2.7/dist-packages/qiime/align_seqs.py", line 123, in __call__
    log_path=log_path, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/qiime/util.py", line 259, in __call__
    result = self.getResult(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/qiime/align_seqs.py", line 117, in getResult
    result = module.align_unaligned_seqs(seqs, params=params)
  File "/usr/local/lib/python2.7/dist-packages/bfillings/muscle_v38.py", line 556, in align_unaligned_seqs
    res = app(int_map.toFasta())
  File "/usr/local/lib/python2.7/dist-packages/burrito/util.py", line 303, in __call__
    result_paths)
  File "/usr/local/lib/python2.7/dist-packages/burrito/util.py", line 325, in _handle_app_result_build_failure
    raise ApplicationError("Error constructing CommandLineAppResult.")
burrito.util.ApplicationError: Error constructing CommandLineAppResult.

Could that be memory issue? How do I solve the problem? Would it helps if I increase --muscle_max_memory? Thank you.

Jose

unread,
Sep 21, 2016, 9:32:22 AM9/21/16
to Qiime 1 Forum
Hi,

it is possible that this is a memory issue. Try the following:
1) increase memory with --muscle_max_memory. This might not work, as by default muscle tries to allocate 80% of the available memory so you are probably close to the maximum available memory.
2) if increasing memory does not work, run half the sequences and see if it still fails. If it does, run half of those, and repeat successively until you find a number of sequences that you can actually align. That will give you a sense of how "close" you are to a set of sequences you could actually align. If you are too far off from your ~27,000 seqs, then you'll need to find another machine to run the alignment. If you are somewhat close (say, you can align 20K seqs), maybe you can try to filter the input sequences to remove some using quality filtering, similarity, etc.

Hope it helps,
Jose

Reply all
Reply to author
Forward
0 new messages