Memory Usage with MPI-SSE3

261 views
Skip to first unread message

EmmaH

unread,
May 23, 2013, 6:44:11 AM5/23/13
to ra...@googlegroups.com
Hi,

I'm trying to run 10830 taxa with 1326 patterns using the newest version of RAxML (7.5.3) using the SSE3 implementation (though I've also tried non-SSE3 which doesn't help), requesting 100 bootstraps. My problem is that the run seems to use a very large amount of memory, overrunning the amount available on the node and causing the run to abort.

I'm running on one 12 core node which have 2G each (24G total) allocated. I've calculated the memory usage on your website as 1.7GB which I understand is per-tree, so for each core, multiplied by 12, this should be 20.5GB total. Both the 1.7GB per node and the 20.5GB total should be within the memory allocations allowed.

However, when the job aborts it says that the max vmem usage was 31.7GB (I assume there's a lag where it manages gets that high before it gets aborted). This happens almost immediately after the job starts running (maybe 1-2 minutes, max) so I don't believe it's because of switching to GAMMA which I've read about on the forum.

The program is called using:
mpirun -np 12 ./raxmlHPC-MPI-SSE3 -f a -x 12345 -p 12345 -N 100 -m GTRGAMMA -s C_2010_noDR_10830.PHY -n fullMPI

Why is the memory usage so high? I cannot get more than 24GB on OpenMPI nodes, so this is an issue. I am just trying to use the rapid bootstrapping (it will time-limit-abort before it finishes the ML search), as we do the ML search separately in RAxML-MPI-Lite due to the size of the tree. (Then write the bootstraps on the ML tree.)

Strangely the tree will run using a much older version of RAxML-MPI (7.0.4), requesting 100 BS. It is reporting that max vmem is only 2.15G. How can the old version use such little memory and the new one use so much? I also tried a 7.4 version but it also used too much memory.

Thank you for any advice you can give. We are currently running it using the old RAxML but obviously we'd like to be able to use the newer versions instead.
Unfortunately, I cannot release the sequences as they are confidential patient sequences.

Thanks,
Emma Hodcroft

Alexandros Stamatakis

unread,
May 23, 2013, 6:47:36 AM5/23/13
to ra...@googlegroups.com
Hi Emma,

Why don't you try running the hybrid version with 6 MPI processes and 2
threads per process, that should solve the issue.

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

EmmaH

unread,
May 23, 2013, 9:53:43 AM5/23/13
to ra...@googlegroups.com, Alexandros...@gmail.com
Hi Alexis,

Thank you for your advice. I've installed the Hybrid SSE3 version. I ran it with the command:

mpirun -np 6 ./raxmlHPC-HYBRID-SSE3 -T 2 -f a -x 12345 -p 12345 -N 100 -m GTRGAMMA -s C_2010_noDR_10830.PHY -n C_hybrid

It ran for about 16 minutes before aborting this time. However, it still overran the memory, reaching 26.5GB before aborting. No bootstraps were written in the time it was running.

Was I calling the program correctly? Is there any way to reduce the memory overhead?

Thanks,
Emma

Alexandros Stamatakis

unread,
May 23, 2013, 1:24:39 PM5/23/13
to EmmaH, ra...@googlegroups.com
Hi Emma,

This is weird, I can't see any obvious mistake.
Keep in mind though that the memory calculator only computes the mem
reqs for storing ancestral probability vectors, i.e., it is an
approximation, it may well be that RAxML needs 40% more RAM than this to
compute the tree.

The only advice I can give is to try running the hybrid version with 3
MPI processes and 4 threads per MPI process, i.e.,

mpirun -np 3 ./raxmlHPC-HYBRID-SSE3 -T 4 -f a -x 12345 -p 12345 -N 100
-m GTRGAMMA -s C_2010_noDR_10830.PHY -n C_hybrid

Alexis

EmmaH

unread,
May 27, 2013, 6:56:05 AM5/27/13
to ra...@googlegroups.com, EmmaH, Alexandros...@gmail.com
Hi Alexis,

I have tried your suggestion and ran on 3 process with 4 threads each. This does run, using only max vmem 16GB. However, after 48 hours running (the limit on our cluster), it had not written any bootstraps at all.
Do you have any suggestions on how to get this working? It would be really good to be able to use the newer version instead of the 7.0.4 version.

Thanks for your help,
Emma

Alexandros Stamatakis

unread,
Jun 3, 2013, 3:49:39 AM6/3/13
to EmmaH, ra...@googlegroups.com
Hi Emma,

Can you send me the dataset such that I can have a look on how to best
run this. One solution might be to use GTRCAT instead of GTRGAMMA.

Alexis

EmmaH

unread,
Jun 24, 2013, 7:19:02 AM6/24/13
to ra...@googlegroups.com, EmmaH, Alexandros...@gmail.com
Hi Alexis,

My apologies for the delayed response - I have been out of the country for a few weeks. Unfortunately, I can't send the dataset as it contains confidential patient sequences.

However, if there are any diagnostic files or information I can send along, or if there are any other options/code I can include to have it spit out diagnostic information that might be useful, I would be more than happy to pass along this information, or anything else that might be useful.

In the meantime I am trying the same run I ran previously, but using GTRCAT instead of GTRGAMMA.

Thanks for your help,
Emma

Alexandros Stamatakis

unread,
Jun 24, 2013, 9:06:00 AM6/24/13
to EmmaH, ra...@googlegroups.com
Dear Emma,

Could you please download the latest RAxML version from github please?

I believe that we identified the problem in the meantime.

It was associated with a non-standard memory allocation function we were
using, which we removed in the meantime (see the group discussion with
Siegfried Schloissnig).

It should work now hopefully.

Alexis

EmmaH

unread,
Jun 24, 2013, 11:26:38 AM6/24/13
to ra...@googlegroups.com, EmmaH
Dear Alexis,

Thank you very much for your help. I just downloaded the latest files from github and tried to compile them (MPI SSE3 version). However, I am getting an error, right at the beginning of the compilation:

$ make -f Makefile.SSE3.MPI.gcc
mpicc  -D_WAYNE_MPI -D__SIM_SSE3 -O2 -D_GNU_SOURCE -msse3 -fomit-frame-pointer -funroll-loops     -c -o axml.o axml.c
In file included from axml.c:75:
mem_alloc.h:5: error: redefinition of typedef 'boolean'
axml.h:333: note: previous declaration of 'boolean' was here
make: *** [axml.o] Error 1


Could you give any advice about resolving this error? My apologies if it's something simple.
Thanks,
Emma

Alexandros Stamatakis

unread,
Jun 24, 2013, 11:49:54 AM6/24/13
to ra...@googlegroups.com
Oh, sorry about this, I hade fixed this issue, but forgot to make the
new code available on-line.

If you go again to:

https://github.com/stamatak/standard-RAxML

and download the latest code it should work.

Alexis

EmmaH

unread,
Jun 25, 2013, 9:44:09 AM6/25/13
to ra...@googlegroups.com
Hi Alexis,

Thanks for the new code - I've compiled it without issue and have started a run. I'll let you know if I am still having problems.

Thanks for all your help,
Emma

EmmaH

unread,
Jun 26, 2013, 9:21:46 AM6/26/13
to ra...@googlegroups.com
Hi Alexis,

I just wanted to update you. I tried to run the MPI SSE3 version of the new code, but it still aborted after less than a minute, going over the 24GB memory limit by utilizing 29GB. I will try running in GTRCAT and also will try compiling the Hybrid version and running in that, to see if either solves the problem.

Thanks,
Emma

EmmaH

unread,
Jun 26, 2013, 10:08:48 AM6/26/13
to ra...@googlegroups.com
Hi Alexis,

I have tried running the MPI SSE3 version with GTRCAT, but it aborts very quickly, having used 31GB of memory. The Hybrid version (6 processes, 2 threads each) runs for about 20 minutes, but also aborts, having gone over the memory limit.

It says in the output that this is version 7.6.3, released yesterday, so I'm quite sure I'm using the new version.

Would you have any further advice on the problem?


Thanks for all your help,
Emma

Alexandros Stamatakis

unread,
Jun 27, 2013, 12:20:50 PM6/27/13
to ra...@googlegroups.com
okay, since you can't share the data I will us a simulated dataset of
the same size to reproduce it, can you please send me the dataset
dimensions, i.e, number of taxa and nuber of sites, as well as the
number of distinct patterns that is printed by RAxML.

please also send me the exact command line and the partition file if any.

Alexis

EmmaH

unread,
Jun 28, 2013, 10:23:36 AM6/28/13
to ra...@googlegroups.com
Hi Alexis,

Thank you very much for being willing to do that. I appreciate your helping to work this out.

The dataset has 10,830 sequences with 1,365 sites . RAxML reports that it has 1,326 distinct alignment patterns. I don't know if makes any difference, but this is the pol region of HIV-1 sequences - meaning that the trees are often hard to resolve and can be quite star-like.

There are a couple of my exact program calls in the messages above, but one for the MPI SSE3 version is (asking for 12 nodes using -pe):

mpirun -np 12 ./raxmlHPC-MPI-SSE3 -f a -x 12345 -p 12345 -N 100 -m GTRGAMMA -s C_2010_noDR_10830.phylip -n C_mpi

And the one for the Hybrid version is (asking for 12 nodes using -pe):

mpirun -np 6 ./raxmlHPC-HYBRID-SSE3 -T 2 -f a -x 12345 -p 12345 -N 100 -m GTRGAMMA -s C_2010_noDR_10830.phylip -n C_hybrid

I also am always careful to compile all the different 'versions' of RAxML (both version numbers and MPI/HPC/PTHREADS) in different folders, so that there shouldn't be any problems caused by that.

Thank you again for your help, please let me know if you need more information.
Emma

Alexandros Stamatakis

unread,
Aug 1, 2013, 11:01:40 AM8/1/13
to ra...@googlegroups.com
Dear Emma,

I checked with a dataset of similar size (9000 taxa and 1500 site
patterns) and executed it with your command line and using the latest
RAxML version 7.7.2 on a machine with 8GB RAM.

I tested the memory utilization of the sequential version and the
Pthreads version with 4 threads.

I was not able to reproduce the problem.

The memory utilization (Residing figure for the htop command) varies
slightly between 1-1.7GB

The virtual memory utilization (Virt figure for the htop command) varies
between 3-4 GB.

Thus, I am not able to help you with this, except by recommending that
you upgrade to the latest RAxML version, if you have not already done so.

Cheers,

Alexis
Reply all
Reply to author
Forward
0 new messages