out of memory

47 views
Skip to first unread message

Yecheng Huang

unread,
Mar 7, 2013, 9:08:07 AM3/7/13
to sate...@googlegroups.com
We run Sate 2.2.7-2013Feb15 with python 2.7 at Redhat Linux cluster. 
It looks like the max-memory didn't pass to muscle. Any help is appreciated, 

here is the error:
"
SATe INFO: Configuration written to "/panfs/pstor.storage/home/rccstaff/yhuang/test/sate/noIR/test1_temp_sate_config.txt".

SATe INFO: Reading input sequences from 'Chl_seqs_noIR.fas'...
SATe INFO: Directory for temporary files created at /panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V
SATe INFO: Name translation information saved to /panfs/pstor.storage/home/rccstaff/yhuang/test/sate/noIR/test1_temp_name_translation.txt as safe name, original name, blank line format.
SATe INFO: Creating a starting tree for the SATe algorithm...
SATe INFO: Performing initial alignment of the entire data matrix...
SATe INFO: Performing initial tree search to get starting tree...
SATe INFO: Starting SATe algorithm on initial tree...
SATe INFO: Max subproblem set to 15
SATe INFO: Step 0. Realigning with decomposition strategy set to centroid
Worker dying.  Error in job.get_results = Traceback (most recent call last):
  File "/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/sate/scheduler.py", line 53, in worker
    job.get_results()
  File "/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/sate/scheduler.py", line 256, in get_results
    self.wait()
  File "/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/sate/scheduler.py", line 201, in wait
    raise self.error
Exception: SATe failed because one of the programs it tried to run failed.
The invocation that failed was: 
    "/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/muscle" "-in1" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/1.fasta" "-in2" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/2.fasta" "-out" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/out.fasta" "-quiet" "-profile"


*** OUT OF MEMORY ***
Memory allocated so far 4284.27 MB
No alignment generated


SATe ERROR: SATe is exiting because of an error:
SATe failed because one of the programs it tried to run failed.
The invocation that failed was: 
    "/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/muscle" "-in1" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/1.fasta" "-in2" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/2.fasta" "-out" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/out.fasta" "-quiet" "-profile"


*** OUT OF MEMORY ***
Memory allocated so far 4284.27 MB
No alignment generated
 
"

Our input fasta is 3.7MB, the command is:
python2.7 /usr/local/sate/latest/run_sate.py -i Chl_seqs_noIR.fas -j test --auto --max-mem-mb=20000 

The config file is:
[clustalw2]
path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/clustalw2

[commandline]
aligned = False
auto = False
datatype = dna
input = Chl_seqs_noIR.fas
job = test
keepalignmenttemps = False
keeptemp = False
multilocus = False
raxml_search_after = False
two_phase = False
untrusted = False

[fakealigner]
path = 

[faketree]
path = 

[fasttree]
args = 
model = -gtr -gamma
options = 
path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/fasttree

[mafft]
path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/mafft

[muscle]
path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/muscle

[opal]
path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/opal.jar

[padaligner]
path = 

[prank]
path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/prank

[probalign]
path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/probalign

[randtree]
path = 

[raxml]
args = 
model = 
path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/raxml

[sate]
after_blind_iter_term_limit = -1
after_blind_iter_without_imp_limit = 1
after_blind_time_term_limit = -1.0
after_blind_time_without_imp_limit = -1.0
aligner = mafft
blind_after_iter_without_imp = -1
blind_after_time_without_imp = -1.0
blind_after_total_iter = -1
blind_after_total_time = -1.0
blind_mode_is_final = True
break_strategy = centroid
iter_limit = -1
iter_without_imp_limit = -1
max_mem_mb = 20000
max_subproblem_frac = 0.5
max_subproblem_size = 15
merger = muscle
move_to_blind_on_worse_score = True
num_cpus = 8
return_final_tree_and_alignment = False
start_tree_search_from_current = True
time_limit = -1.0
time_without_imp_limit = -1.0
tree_estimator = fasttree


Jamie Oaks

unread,
Mar 7, 2013, 6:46:10 PM3/7/13
to sate...@googlegroups.com, Yecheng Huang
Hi,

First, thank you very much for your detailed post; it really helps the
developers diagnose issues and we appreciate it! Ok, the "--max-mem-mb"
argument is only used when Opal is the merger tool. I have not seen
muscle run out of memory before, so congrats on being the first to
"break" it!

It looks like you have 30 sequences, is that correct? If so, your
sequences must be very long? If this is the case, are the sequences
contiguous stretches of DNA or concatenated, dis-contiguous stretches?
You might have to break up the alignment into separate genes (or some
other meaningful unit) and either run SATe on each gene separately or
perform a multi-locus SATe analysis.

In general, SATe is designed to work well for datasets with a lot of
relatively short sequences, rather than few, very long sequences. If you
try to use it to align very long sequences, like chromosomes, it will
likely break one or more of the programs SATe is designed around. If you
have these kind if data, and cannot partition the alignment up in a
meaningful way (e.g., by gene), you probably want to use a different
software package; one that is designed for aligning whole
genomes/chromosomes.

Please let me know if you have any questions, and thanks again for your
post.

Best of luck with your work,

Jamie
> --
> --
> You received this message because you are subscribed to the Google
> Groups "SATe User" group.
> To post to this group, send email to sate...@googlegroups.com
> To unsubscribe from this group, send email to
> sate-user+...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/sate-user?hl=en
>
> ---
> You received this message because you are subscribed to the Google
> Groups "SATe User" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sate-user+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>


--
Jamie Oaks
Biodiversity Institute
Department of Ecology & Evolutionary Biology
University of Kansas
Dyche Hall, 1345 Jayhawk Blvd
Lawrence, KS 66045-7561

Office Phone: 785-864-3439
Office Fax: 785-864-5335
E-mail: joa...@ku.edu

Yecheng Huang

unread,
Mar 8, 2013, 2:23:10 PM3/8/13
to sate...@googlegroups.com
The problem is caused by muscle binary coming with the package. It is a 32 bit binary and has build-in virtual memory limit 4 GB. We have muscle 3.8 compiled at our 64 bit linux, which is 64 bit and have relax on virtual memory . 

I replace our muscle (3.8)  and test on anolis.fasta (from sate data), the result trees are identical with original one. 
Would this fix be OK to general?

Thanks, 

Jamie Oaks

unread,
Mar 8, 2013, 3:19:40 PM3/8/13
to sate...@googlegroups.com, Yecheng Huang
Thanks for the update. I'm glad you were able to solve the problem. Yes, in general it is OK to replace the distributed 32-bit binaries with your own builds, as long as the new binary accepts the same command-line arguments as the original (using the same version is a good way to ensure this).

However, I am not sure if SATe is the best tool for your dataset of relatively few, very long sequences. If the long sequences do not represent contiguous stretches of DNA, you should break up the sequences into contiguous units prior to alignment. If the sequences are contiguous, it is fine to use SATe, but with so few sequences, the main advantage of the SATe algorithm (i.e., the tree decomposition) might not be helping you very much.

Best of luck!

Jamie
--
--
You received this message because you are subscribed to the Google Groups "SATe User" group.
To post to this group, send email to sate...@googlegroups.com
To unsubscribe from this group, send email to sate-user+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sate-user?hl=en
 
---
You received this message because you are subscribed to the Google Groups "SATe User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sate-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Yecheng Huang

unread,
Mar 8, 2013, 3:29:45 PM3/8/13
to sate...@googlegroups.com
Thanks a lot for the suggestion, 
Reply all
Reply to author
Forward
0 new messages