out of memory

Yecheng Huang

unread,

Mar 7, 2013, 9:08:07 AM3/7/13

to sate...@googlegroups.com

We run Sate 2.2.7-2013Feb15 with python 2.7 at Redhat Linux cluster.

It looks like the max-memory didn't pass to muscle. Any help is appreciated,

here is the error:

"

SATe INFO: Configuration written to "/panfs/pstor.storage/home/rccstaff/yhuang/test/sate/noIR/test1_temp_sate_config.txt".

SATe INFO: Reading input sequences from 'Chl_seqs_noIR.fas'...

SATe INFO: Directory for temporary files created at /panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V

SATe INFO: Name translation information saved to /panfs/pstor.storage/home/rccstaff/yhuang/test/sate/noIR/test1_temp_name_translation.txt as safe name, original name, blank line format.

SATe INFO: Creating a starting tree for the SATe algorithm...

SATe INFO: Performing initial alignment of the entire data matrix...

SATe INFO: Performing initial tree search to get starting tree...

SATe INFO: Starting SATe algorithm on initial tree...

SATe INFO: Max subproblem set to 15

SATe INFO: Step 0. Realigning with decomposition strategy set to centroid

Worker dying. Error in job.get_results = Traceback (most recent call last):

File "/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/sate/scheduler.py", line 53, in worker

job.get_results()

File "/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/sate/scheduler.py", line 256, in get_results

self.wait()

File "/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/sate/scheduler.py", line 201, in wait

raise self.error

Exception: SATe failed because one of the programs it tried to run failed.

The invocation that failed was:

"/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/muscle" "-in1" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/1.fasta" "-in2" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/2.fasta" "-out" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/out.fasta" "-quiet" "-profile"

*** OUT OF MEMORY ***

Memory allocated so far 4284.27 MB

No alignment generated

SATe ERROR: SATe is exiting because of an error:

SATe failed because one of the programs it tried to run failed.

The invocation that failed was:

"/panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/muscle" "-in1" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/1.fasta" "-in2" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/2.fasta" "-out" "/panfs/pstor.storage/home/rccstaff/yhuang/.sate/test/tempf1pB5V/step0/centroid/r0/d1/tempmusclevvhIPR/out.fasta" "-quiet" "-profile"

*** OUT OF MEMORY ***

Memory allocated so far 4284.27 MB

No alignment generated

"

Our input fasta is 3.7MB, the command is:

python2.7 /usr/local/sate/latest/run_sate.py -i Chl_seqs_noIR.fas -j test --auto --max-mem-mb=20000

The config file is:

[clustalw2]

path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/clustalw2

[commandline]

aligned = False

auto = False

datatype = dna

input = Chl_seqs_noIR.fas

job = test

keepalignmenttemps = False

keeptemp = False

multilocus = False

raxml_search_after = False

two_phase = False

untrusted = False

[fakealigner]

path =

[faketree]

path =

[fasttree]

args =

model = -gtr -gamma

options =

path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/fasttree

[mafft]

path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/mafft

[muscle]

path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/muscle

[opal]

path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/opal.jar

[padaligner]

path =

[prank]

path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/prank

[probalign]

path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/probalign

[randtree]

path =

[raxml]

args =

model =

path = /panfs/pstor.storage/rcclocal/zcluster/sate/2.2.7-2013Feb15/sate-core/bin/raxml

[sate]

after_blind_iter_term_limit = -1

after_blind_iter_without_imp_limit = 1

after_blind_time_term_limit = -1.0

after_blind_time_without_imp_limit = -1.0

aligner = mafft

blind_after_iter_without_imp = -1

blind_after_time_without_imp = -1.0

blind_after_total_iter = -1

blind_after_total_time = -1.0

blind_mode_is_final = True

break_strategy = centroid

iter_limit = -1

iter_without_imp_limit = -1

max_mem_mb = 20000

max_subproblem_frac = 0.5

max_subproblem_size = 15

merger = muscle

move_to_blind_on_worse_score = True

num_cpus = 8

return_final_tree_and_alignment = False

start_tree_search_from_current = True

time_limit = -1.0

time_without_imp_limit = -1.0

tree_estimator = fasttree

Jamie Oaks

unread,

Mar 7, 2013, 6:46:10 PM3/7/13

to sate...@googlegroups.com, Yecheng Huang

Hi,

First, thank you very much for your detailed post; it really helps the
developers diagnose issues and we appreciate it! Ok, the "--max-mem-mb"
argument is only used when Opal is the merger tool. I have not seen
muscle run out of memory before, so congrats on being the first to
"break" it!

It looks like you have 30 sequences, is that correct? If so, your
sequences must be very long? If this is the case, are the sequences
contiguous stretches of DNA or concatenated, dis-contiguous stretches?
You might have to break up the alignment into separate genes (or some
other meaningful unit) and either run SATe on each gene separately or
perform a multi-locus SATe analysis.

In general, SATe is designed to work well for datasets with a lot of
relatively short sequences, rather than few, very long sequences. If you
try to use it to align very long sequences, like chromosomes, it will
likely break one or more of the programs SATe is designed around. If you
have these kind if data, and cannot partition the alignment up in a
meaningful way (e.g., by gene), you probably want to use a different
software package; one that is designed for aligning whole
genomes/chromosomes.

Please let me know if you have any questions, and thanks again for your
post.

Best of luck with your work,

Jamie

> --
> --
> You received this message because you are subscribed to the Google
> Groups "SATe User" group.
> To post to this group, send email to sate...@googlegroups.com
> To unsubscribe from this group, send email to
> sate-user+...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/sate-user?hl=en
>
> ---
> You received this message because you are subscribed to the Google
> Groups "SATe User" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sate-user+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
Jamie Oaks
Biodiversity Institute
Department of Ecology & Evolutionary Biology
University of Kansas
Dyche Hall, 1345 Jayhawk Blvd
Lawrence, KS 66045-7561

Office Phone: 785-864-3439
Office Fax: 785-864-5335
E-mail: joa...@ku.edu

Yecheng Huang

unread,

Mar 8, 2013, 2:23:10 PM3/8/13

to sate...@googlegroups.com

The problem is caused by muscle binary coming with the package. It is a 32 bit binary and has build-in virtual memory limit 4 GB. We have muscle 3.8 compiled at our 64 bit linux, which is 64 bit and have relax on virtual memory .

I replace our muscle (3.8) and test on anolis.fasta (from sate data), the result trees are identical with original one.

Would this fix be OK to general?

Thanks,

Jamie Oaks

unread,

Mar 8, 2013, 3:19:40 PM3/8/13

to sate...@googlegroups.com, Yecheng Huang

Thanks for the update. I'm glad you were able to solve the problem. Yes, in general it is OK to replace the distributed 32-bit binaries with your own builds, as long as the new binary accepts the same command-line arguments as the original (using the same version is a good way to ensure this).

However, I am not sure if SATe is the best tool for your dataset of relatively few, very long sequences. If the long sequences do not represent contiguous stretches of DNA, you should break up the sequences into contiguous units prior to alignment. If the sequences are contiguous, it is fine to use SATe, but with so few sequences, the main advantage of the SATe algorithm (i.e., the tree decomposition) might not be helping you very much.

Best of luck!

Jamie

--
--
You received this message because you are subscribed to the Google Groups "SATe User" group.
To post to this group, send email to sate...@googlegroups.com
To unsubscribe from this group, send email to sate-user+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sate-user?hl=en

---
You received this message because you are subscribed to the Google Groups "SATe User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sate-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yecheng Huang

unread,

Mar 8, 2013, 3:29:45 PM3/8/13

to sate...@googlegroups.com

Thanks a lot for the suggestion,

Reply all

Reply to author

Forward