align_seqs.py process halted with large dataset using MUSCLE method

73 views

Skip to first unread message

Eng Piew Kok

unread,

Sep 20, 2016, 11:31:28 PM9/20/16

to Qiime 1 Forum

Hi, I wish to run align_seqs.py using MUSCLE de novo methods on Virtual Box. It works well on small test dataset but the process halted when I run the actual dataset which is ~27,000 sequences. I ran a simple one as below:

Command:

align_seqs.py -i rep_set.fna -m muscle -a muscle

Error:

Traceback (most recent call last):

File "/usr/local/bin/align_seqs.py", line 211, in <module>

main()

File "/usr/local/bin/align_seqs.py", line 208, in main

log_path=log_path)

File "/usr/local/lib/python2.7/dist-packages/qiime/align_seqs.py", line 123, in __call__

log_path=log_path, *args, **kwargs)

File "/usr/local/lib/python2.7/dist-packages/qiime/util.py", line 259, in __call__

result = self.getResult(*args, **kwargs)

File "/usr/local/lib/python2.7/dist-packages/qiime/align_seqs.py", line 117, in getResult

result = module.align_unaligned_seqs(seqs, params=params)

File "/usr/local/lib/python2.7/dist-packages/bfillings/muscle_v38.py", line 556, in align_unaligned_seqs

res = app(int_map.toFasta())

File "/usr/local/lib/python2.7/dist-packages/burrito/util.py", line 303, in __call__

result_paths)

File "/usr/local/lib/python2.7/dist-packages/burrito/util.py", line 325, in _handle_app_result_build_failure

raise ApplicationError("Error constructing CommandLineAppResult.")

burrito.util.ApplicationError: Error constructing CommandLineAppResult.

Could that be memory issue? How do I solve the problem? Would it helps if I increase --muscle_max_memory? Thank you.

Jose

unread,

Sep 21, 2016, 9:32:22 AM9/21/16

to Qiime 1 Forum

Hi,

it is possible that this is a memory issue. Try the following:
1) increase memory with --muscle_max_memory. This might not work, as by default muscle tries to allocate 80% of the available memory so you are probably close to the maximum available memory.
2) if increasing memory does not work, run half the sequences and see if it still fails. If it does, run half of those, and repeat successively until you find a number of sequences that you can actually align. That will give you a sense of how "close" you are to a set of sequences you could actually align. If you are too far off from your ~27,000 seqs, then you'll need to find another machine to run the alignment. If you are somewhat close (say, you can align 20K seqs), maybe you can try to filter the input sequences to remove some using quality filtering, similarity, etc.

Hope it helps,
Jose

Reply all

Reply to author

Forward

0 new messages