RAxML-ng--MPI version (installation and runtime)

499 views
Skip to first unread message

Maggie Lau

unread,
Oct 28, 2021, 11:50:48 PM10/28/21
to raxml
Hi Alexey and raxml users,

Currently, we are working on a project that deals with the phylogeny of five different proteins within the same family. We tried using RAxML-ng for creating the maximum likelihood trees, however with the smallest datasets (about 500 – 2000 protein seq); RAxML-ng completed the runs within 2-3 weeks [using the number of cores suggested by the -parse command]. However, with the large datasets (between 5000 and 20000 protein seq) it was very slow. We attempted to get the RAxML-ng-MPI version running on our cluster, but we are facing some technical issues on installation.

Q1: Would RAxML-ng-MPI version speed up the runs?

Q2: I have not compiled MPI supported algorithms before. Any advice on how to solve the installation problem would be appreciated.

The following log may provide you with some background:
(base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ cmake --version
cmake version 3.21.0-rc3

(base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ which mpicxx
/opt/intel/impi/2019.3.199/intel64/bin/mpicxx

(base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ mkdir build && cd build
(base) [maglau@ln01 build]$ less ../INSTALL.txt
(base) [maglau@ln01 build]$ cmake ..
CMake Error at /usr/bin/cmake-3.21.0-rc3-linux-x86_64/share/cmake-3.21/Modules/CMakeDetermineSystem.cmake:181 (file):

  file attempted to write a file:
  /home/maglau/tools/raxml-ng-MPI_v1.0.2/CMakeFiles/CMakeOutput.log into a

  source directory.
  Call Stack (most recent call first):
  CMakeLists.txt:47 (project)

CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!

another try

(base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ CXX=/opt/intel/impi/2019.3.199/intel64/bin/mpicxx cmake .

CMake Error at /usr/bin/cmake-3.21.0-rc3-linux-x86_64/share/cmake-3.21/Modules/CMakeDetermineSystem.cmake:181 (file):

  file attempted to write a file:
  /home/maglau/tools/raxml-ng-MPI_v1.0.2/CMakeFiles/CMakeOutput.log into a

  source directory.
  Call Stack (most recent call first):
  CMakeLists.txt:47 (project)

CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!

another try

(base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ module load GCC-7.3.0 
(base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ which gcc
/opt/software/gcc-7.3.0/bin/gcc

(base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ mkdir build && cd build
(base) [maglau@ln01 build]$ cmake -DUSE_MPI=ON ..

CMake Error at /usr/bin/cmake-3.21.0-rc3-linux-x86_64/share/cmake-3.21/Modules/CMakeDetermineSystem.cmake:181 (file):

  file attempted to write a file:
  /home/maglau/tools/raxml-ng-MPI_v1.0.2/CMakeFiles/CMakeOutput.log into a

  source directory.
  Call Stack (most recent call first):
  CMakeLists.txt:47 (project)

CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!

Alexey Kozlov

unread,
Nov 1, 2021, 5:44:58 AM11/1/21
to ra...@googlegroups.com
Hi Maggie,

> Q1: Would RAxML-ng-MPI version speed up the runs?

For single tree search, probably not.
For multiple starting trees and/or bootstrapping (--search or --all commsnd), probably yes.

It also depends on your hardware and dataset properties, so it would help if you post raxml log file
of your analysis.

Also, please read raxml-ng parallelization manual:

https://github.com/amkozlov/raxml-ng/wiki/Parallelization

> Q2: I have not compiled MPI supported algorithms before. Any advice on how to solve the installation
> problem would be appreciated.

This looks like something cmake-related. Apparently you have used the latest release candidate
version. Could you please try with another, stable version of cmake?

Hope this helps,
Alexey


>
> The following *log* may provide you with some background:
> (base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ cmake --version
> cmake version 3.21.0-rc3
>
> (base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ which mpicxx
> /opt/intel/impi/2019.3.199/intel64/bin/mpicxx
>
> (base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ mkdir build && cd build
> (base) [maglau@ln01 build]$ less ../INSTALL.txt
> (base) [maglau@ln01 build]$ cmake ..
> CMake Error at
> /usr/bin/cmake-3.21.0-rc3-linux-x86_64/share/cmake-3.21/Modules/CMakeDetermineSystem.cmake:181 (file):
>
>   file attempted to write a file:
>   /home/maglau/tools/raxml-ng-MPI_v1.0.2/CMakeFiles/CMakeOutput.log into a
>
>   source directory.
>   Call Stack (most recent call first):
>   CMakeLists.txt:47 (project)
>
> CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles".
> CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
> CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
> CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
> -- Configuring incomplete, errors occurred!
>
> *another try*
>
> (base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ CXX=/opt/intel/impi/2019.3.199/intel64/bin/mpicxx cmake .
>
> CMake Error at
> /usr/bin/cmake-3.21.0-rc3-linux-x86_64/share/cmake-3.21/Modules/CMakeDetermineSystem.cmake:181 (file):
>
> file attempted to write a file:
>   /home/maglau/tools/raxml-ng-MPI_v1.0.2/CMakeFiles/CMakeOutput.log into a
>
> source directory.
>   Call Stack (most recent call first):
>   CMakeLists.txt:47 (project)
>
> CMake Error: CMake was unable to find a build program corresponding to "Unix
> Makefiles".CMAKE_MAKE_PROGRAM is not set.You probably need to select a different build tool.
> CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
> CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
> -- Configuring incomplete, errors occurred!
>
> *another try*
>
> (base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ module load GCC-7.3.0
> (base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ which gcc
> /opt/software/gcc-7.3.0/bin/gcc
>
> (base) [maglau@ln01 raxml-ng-MPI_v1.0.2]$ mkdir build && cd build
> (base) [maglau@ln01 build]$ cmake -DUSE_MPI=ON ..
>
> CMake Error at
> /usr/bin/cmake-3.21.0-rc3-linux-x86_64/share/cmake-3.21/Modules/CMakeDetermineSystem.cmake:181 (file):
>
> file attempted to write a file:
>   /home/maglau/tools/raxml-ng-MPI_v1.0.2/CMakeFiles/CMakeOutput.log into a
>
> source directory.
>   Call Stack (most recent call first):
>   CMakeLists.txt:47 (project)
>
> CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles".
> CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
> CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
> CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
> -- Configuring incomplete, errors occurred!
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/f023b3ab-f3a6-49b5-8fd2-d162c5bf69fcn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/f023b3ab-f3a6-49b5-8fd2-d162c5bf69fcn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Ramganesh Selvarajan

unread,
Nov 2, 2021, 5:55:16 AM11/2/21
to raxml
Hi Alexy

Thank you for your response. As you answered for the first question, we have used --all command in our scripts.
Attached two log files 
First file  - raxml-ng_results_serine.raxml (completed run)
Second file - raxml-ng_results_proline1.raxml (running in our system).
Log files have all information about the analysis.

Please take a look and advise.
raxml-ng_results_proline1.raxml.log
raxml-ng_results_serine.raxml.log

Alexey Kozlov

unread,
Nov 2, 2021, 7:03:39 AM11/2/21
to ra...@googlegroups.com
According to the log file, your CPU has 28 cores:

System: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz, 28 cores, 250 GB RAM

So please use all of them, this will make inference ~3x faster. Further, you should also get ~linear
speedup when adding more compute nodes with MPI (ie 2x speedup with 2 nodes etc.).

For even larger datasets, you can consider limiting the number of bootstrap replicates
(eg --bs-trees 100) and/or increasing convergence threshold (eg --lh-epsilon 10)
> <https://groups.google.com/d/msgid/raxml/f023b3ab-f3a6-49b5-8fd2-d162c5bf69fcn%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/f023b3ab-f3a6-49b5-8fd2-d162c5bf69fcn%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/d20b283a-999a-47fb-8c9d-51e5b11d0e3en%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/d20b283a-999a-47fb-8c9d-51e5b11d0e3en%40googlegroups.com?utm_medium=email&utm_source=footer>.

Maggie Lau

unread,
Nov 4, 2021, 10:03:34 AM11/4/21
to raxml
1. Thank you for the advice.
We used -parse command, and executed raxml-ng using the recommended # of threads.
It is good to know that you recommend us using more cores!

2. I downloaded the v1.0.3 MPI version

$ curl -O https://github.com/amkozlov/raxml-ng/releases/download/1.0.3/raxml-ng_v1.0.3_linux_x86_64_MPI.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100   642  100   642    0     0   1158      0 --:--:-- --:--:-- --:--:--  1156


but the file seems to be faulty.

$ unzip raxml-ng_v1.0.3_linux_x86_64_MPI.zip 

Archive:  raxml-ng_v1.0.3_linux_x86_64_MPI.zip

  End-of-central-directory signature not found.  Either this file is not

  a zipfile, or it constitutes one disk of a multi-part archive.  In the

  latter case the central directory and zipfile comment will be found on

  the last disk(s) of this archive.

unzip:  cannot find zipfile directory in one of raxml-ng_v1.0.3_linux_x86_64_MPI.zip or

        raxml-ng_v1.0.3_linux_x86_64_MPI.zip.zip, and cannot find raxml-ng_v1.0.3_linux_x86_64_MPI.zip.ZIP, period.


1. I have downloaded the zip file (v1.0.2) again. and this time the error is not about the cmake but the GCC

[maglau@ln01 build]$ cmake -DUSE_MPI=ON ..

-- Compiler: GNU 4.8.5 => /usr/bin/c++

CMake Error at CMakeLists.txt:69 (message):

  GNU compiler too old! Minimum required: 5.4

-- Configuring incomplete, errors occurred!

See also "/home/maglau/tools/raxml-ng-MPI_v1.0.2/build/CMakeFiles/CMakeOutput.log".

[maglau@ln01 build]$ which cmake

/usr/bin/cmake

[maglau@ln01 build]$ which gcc

/opt/software/gcc-7.3.0/bin/gcc

[maglau@ln01 raxml-ng-v1.0.3]$ cmake --version

cmake version 3.21.0-rc3

Should I use Intel MPI?


Alexey Kozlov

unread,
Nov 4, 2021, 2:13:49 PM11/4/21
to ra...@googlegroups.com
1. your curl call has not downloaded the correct file, you need to add "-L" option to make it work

2. cmake is using the wrong, old version of gcc (4.8.5). please specify your gcc manually as shown here:

https://github.com/amkozlov/raxml-ng/wiki/Installation#troubleshooting
> <https://groups.google.com/d/msgid/raxml/d20b283a-999a-47fb-8c9d-51e5b11d0e3en%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/d20b283a-999a-47fb-8c9d-51e5b11d0e3en%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/b3159b9c-c4ca-4150-acd6-8b0f300d518an%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/b3159b9c-c4ca-4150-acd6-8b0f300d518an%40googlegroups.com?utm_medium=email&utm_source=footer>.

Maggie Lau

unread,
Nov 5, 2021, 4:50:49 AM11/5/21
to raxml
1. got it. that is resolved.

2. complain about compiler

module load GCC-7.3.0

CXX=/opt/software/gcc-7.3.0/bin/gcc /home/maglau/tools/cmake-3.20.6-linux-x86_64/bin/cmake3.20.6 ..

OR

CXX=/opt/software/gcc-7.3.0/bin/gcc cmake .. (cmake version 3.21.0)

OR

OR

CXX=/opt/intel/impi/2019.3.199/intel64/bin/mpicxx /home/maglau/tools/cmake-3.20.6-linux-x86_64/bin/cmake3.20.6 ..


The above gave the following error:

-- The C compiler identification is GNU 4.8.5

-- The CXX compiler identification is unknown

-- Detecting C compiler ABI info

-- Detecting C compiler ABI info - done

-- Check for working C compiler: /usr/bin/cc - skipped

-- Detecting C compile features

-- Detecting C compile features - done

-- Detecting CXX compiler ABI info

-- Detecting CXX compiler ABI info - failed

-- Check for working CXX compiler: /opt/software/gcc-7.3.0/bin/gcc

-- Check for working CXX compiler: /opt/software/gcc-7.3.0/bin/gcc - broken

CMake Error at /usr/bin/cmake-3.21.0-rc3-linux-x86_64/share/cmake-3.21/Modules/CMakeTestCXXCompiler.cmake:59 (message):

  The C++ compiler


    "/opt/software/gcc-7.3.0/bin/gcc"


  is not able to compile a simple test program.


  It fails with the following output:


    Change Dir: /home/maglau/tools/raxml-ng-v1.0.3_MPI/build/CMakeFiles/CMakeTmp

    

    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_12548/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_12548.dir/build.make CMakeFiles/cmTC_12548.dir/build

    gmake[1]: Entering directory '/home/maglau/tools/raxml-ng-v1.0.3_MPI/build/CMakeFiles/CMakeTmp'

    Building CXX object CMakeFiles/cmTC_12548.dir/testCXXCompiler.cxx.o

    /opt/software/gcc-7.3.0/bin/gcc    -o CMakeFiles/cmTC_12548.dir/testCXXCompiler.cxx.o -c /home/maglau/tools/raxml-ng-v1.0.3_MPI/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx

    /opt/software/gcc-7.3.0/libexec/gcc/x86_64-pc-linux-gnu/7.3.0/cc1plus: error while loading shared libraries: libmpfr.so.6: cannot open shared object file: No such file or directory

    gmake[1]: *** [CMakeFiles/cmTC_12548.dir/build.make:78: CMakeFiles/cmTC_12548.dir/testCXXCompiler.cxx.o] Error 1

    gmake[1]: Leaving directory '/home/maglau/tools/raxml-ng-v1.0.3_MPI/build/CMakeFiles/CMakeTmp'

    gmake: *** [Makefile:127: cmTC_12548/fast] Error 2

    

    


  


  CMake will not be able to correctly generate this project.

Call Stack (most recent call first):

  CMakeLists.txt:47 (project)



-- Configuring incomplete, errors occurred!

See also "/home/maglau/tools/raxml-ng-v1.0.3_MPI/build/CMakeFiles/CMakeOutput.log".

See also "/home/maglau/tools/raxml-ng-v1.0.3_MPI/build/CMakeFiles/CMakeError.log".


p.s. I approached the HPC engineer, but they were not able to offer help.


Alexey Kozlov

unread,
Nov 8, 2021, 10:41:26 AM11/8/21
to ra...@googlegroups.com
sorry but this is clearly a problem with compiler/cmake configuration on your system.

so unfortunately I can't help further, please use google and/or ask your local HPC administrator...
> /opt/software/gcc-7.3.0/bin/gcc-o CMakeFiles/cmTC_12548.dir/testCXXCompiler.cxx.o -c
> https://github.com/amkozlov/raxml-ng/releases/download/1.0.3/raxml-ng_v1.0.3_linux_x86_64_MPI.zip <https://github.com/amkozlov/raxml-ng/releases/download/1.0.3/raxml-ng_v1.0.3_linux_x86_64_MPI.zip>
> <https://groups.google.com/d/msgid/raxml/b3159b9c-c4ca-4150-acd6-8b0f300d518an%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/b3159b9c-c4ca-4150-acd6-8b0f300d518an%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/fde28961-f412-4764-86fa-054c4d4743ccn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/fde28961-f412-4764-86fa-054c4d4743ccn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Maggie Lau

unread,
Nov 10, 2021, 6:13:30 AM11/10/21
to raxml

Thank you Alexey. 
The local administer has not been able to offer a solution yet. We will see what we can do.

Maggie Lau

unread,
Jan 25, 2022, 1:54:19 AM1/25/22
to raxml
I now moved to a different computing cluster. The administer of this cluster is able to install the MPI version and it is up running!!! 

We are trying to understand what might be a good way to make the run more efficient. Please see the attached picture for the trial runs we have done.
It looks like 2 nodes/2 tasks per node/64 cpus per node (i.e. 128 cpus in total) is more cost-effective.

Please kindly give us some suggestions on strategizing the run.

p.s. this new computing cluster has 128 cpus per node, and 4GB memory @ cpu.


Screen Shot 2022-01-25 at 2.49.18 PM.png

Alexey Kozlov

unread,
Jan 27, 2022, 1:18:04 PM1/27/22
to ra...@googlegroups.com

> I now moved to a different computing cluster. The administer of this cluster is able to install the
> MPI version and it is up running!!!

congrats! :)

> We are trying to understand what might be a good way to make the run more efficient. Please see the
> attached picture for the trial runs we have done.
> It looks like 2 nodes/2 tasks per node/64 cpus per node (i.e. 128 cpus in total) is more cost-effective.

generally, you should try to use all available cores, i.e. 256 for two nodes. with the new adaptive
parallelization, raxml-ng will automatically decide how to allocate these cores among workers. if
you remove "--force perf_threads", it will also warn you if your alignment/analysis is not large
enough to utilize all cores effiiently.

so based on your results, 2nodes/4tasks/64threads is the best configuration (at least for 2 nodes),
since it inferred 8 trees within 10 hours, and with 2 tasks it was only 6.

Best,
Alexey

>
> Please kindly give us some suggestions on strategizing the run.
>
> p.s. this new computing cluster has 128 cpus per node, and 4GB memory @ cpu.
>
>
> <https://groups.google.com/d/msgid/raxml/fde28961-f412-4764-86fa-054c4d4743ccn%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/fde28961-f412-4764-86fa-054c4d4743ccn%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/a87134a1-0ee4-4e2c-8f68-317fae09901dn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/a87134a1-0ee4-4e2c-8f68-317fae09901dn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Maggie Lau

unread,
Feb 4, 2022, 4:43:58 AM2/4/22
to raxml
Thank you Alexey.

Also, the technical support would like me to bring this to your attention.

When they used 1nodes/2tasks/64threads, it showed 200% utilization; but 
when they used 2nodes/4tasks/64threads, it showed 6400% utilization. They wonder if there is any bug.

Alexey Kozlov

unread,
Feb 4, 2022, 5:47:12 AM2/4/22
to ra...@googlegroups.com
please tell them to check thread placement on the cores using eg "htop"

there must be exactly 1 thread running on each physical CPU core

this is often messed up with MPI. if your technical support can't fix it by changing MPI/jpb
parameters, please try to change parallelization layuot and use 1 MPI rank per node
(eg 2nodes/2tasks/128threads)

if all cores are in use on a 128-core machine, then utilization should be 128 * 100% = 12800%
> <https://groups.google.com/d/msgid/raxml/a87134a1-0ee4-4e2c-8f68-317fae09901dn%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/a87134a1-0ee4-4e2c-8f68-317fae09901dn%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/92cff033-5881-4aa9-8dfc-cfd1c9e60d61n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/92cff033-5881-4aa9-8dfc-cfd1c9e60d61n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Maggie Lau

unread,
Feb 7, 2022, 9:59:44 PM2/7/22
to raxml
Noted! Passed the message to the admin.

I recently submitted a job using 4nodes/4tasks/128threads. Logging onto each node to check that each showed 12800% utilization

the .log file (A) showed 20 ML trees are made, BUT the standard out file (B) showed some trees (e.g. #19) were made multiple times. Either I interpreted the results in the std out file wrongly, or there is error in my command line:

mpirun -np 4 raxml-ng-mpi --all --msa ../proline_parse.raxml.rba --prefix proline_4N4n128c --model LG+I+G -seed 44128 --bs-trees autoMRE --thread 128 --force perf_threads


Part of the .log file (A)

parallelization: coarse-grained (auto), hybrid MPI+PTHREADS (4 ranks x 128 threads), thread pinning: ON

Parallelization scheme autoconfig: 64 worker(s) x 8 thread(s)



Starting ML tree search with 20 distinct starting trees

[05:26:55] [worker #2] ML tree search #19, logLikelihood: -332859.189348

[05:47:46] [worker #7] ML tree search #8, logLikelihood: -332863.865625

[07:02:22] [worker #14] ML tree search #15, logLikelihood: -332814.853938

[07:03:58] [worker #5] ML tree search #6, logLikelihood: -332778.467893

[07:10:45] [worker #8] ML tree search #9, logLikelihood: -332821.908123

[07:22:29] [worker #3] ML tree search #20, logLikelihood: -332847.515380

[07:30:49] [worker #6] ML tree search #7, logLikelihood: -332855.810962

[08:07:21] [worker #13] ML tree search #14, logLikelihood: -332786.696103

[08:10:00] [worker #9] ML tree search #10, logLikelihood: -332828.563535

[08:15:14] [worker #12] ML tree search #13, logLikelihood: -332829.157385

[08:40:41] [worker #10] ML tree search #11, logLikelihood: -332799.770461

[09:41:32] [worker #0] ML tree search #17, logLikelihood: -332797.135762

[09:52:28] [worker #15] ML tree search #16, logLikelihood: -332834.367563

[09:53:59] [worker #4] ML tree search #5, logLikelihood: -332790.057741

[10:06:41] [worker #1] ML tree search #18, logLikelihood: -332747.893445

[10:14:16] [worker #11] ML tree search #12, logLikelihood: -332810.441574

[10:14:29] [worker #2] ML tree search #3, logLikelihood: -332824.392888

[11:44:27] [worker #3] ML tree search #4, logLikelihood: -332797.503232

[15:56:10] [worker #1] ML tree search #2, logLikelihood: -332840.569166

[18:43:34] [worker #0] ML tree search #1, logLikelihood: -332823.646906

Part of the standard out file (B) 

[04:10:32] [worker #18] ML tree search #19, logLikelihood: -332859.189348

[04:10:52] [worker #34] ML tree search #19, logLikelihood: -332859.189348

[04:16:24] [worker #50] ML tree search #19, logLikelihood: -332859.189348

[05:26:55] [worker #2] ML tree search #19, logLikelihood: -332859.189348

[05:45:08] [worker #35] ML tree search #20, logLikelihood: -332847.515380

[05:46:20] [worker #19] ML tree search #20, logLikelihood: -332847.515380

[05:47:46] [worker #7] ML tree search #8, logLikelihood: -332863.865625

[05:51:07] [worker #51] ML tree search #20, logLikelihood: -332847.515380

[07:02:22] [worker #14] ML tree search #15, logLikelihood: -332814.853938

[07:03:58] [worker #5] ML tree search #6, logLikelihood: -332778.467893

[07:10:45] [worker #8] ML tree search #9, logLikelihood: -332821.908123

[07:13:45] [worker #16] ML tree search #17, logLikelihood: -332797.135762

[07:16:28] [worker #32] ML tree search #17, logLikelihood: -332797.135762

[07:22:29] [worker #3] ML tree search #20, logLikelihood: -332847.515380

[07:23:15] [worker #48] ML tree search #17, logLikelihood: -332797.135762

[07:30:49] [worker #6] ML tree search #7, logLikelihood: -332855.810962

[08:07:12] [worker #17] ML tree search #18, logLikelihood: -332747.893445

[08:07:21] [worker #13] ML tree search #14, logLikelihood: -332786.696103

[08:09:02] [worker #33] ML tree search #18, logLikelihood: -332747.893445

[08:10:00] [worker #9] ML tree search #10, logLikelihood: -332828.563535

[08:15:14] [worker #12] ML tree search #13, logLikelihood: -332829.157385

[08:19:08] [worker #49] ML tree search #18, logLikelihood: -332747.893445

[08:40:41] [worker #10] ML tree search #11, logLikelihood: -332799.770461

[09:41:32] [worker #0] ML tree search #17, logLikelihood: -332797.135762

[09:52:28] [worker #15] ML tree search #16, logLikelihood: -332834.367563

[09:53:59] [worker #4] ML tree search #5, logLikelihood: -332790.057741

[10:06:41] [worker #1] ML tree search #18, logLikelihood: -332747.893445

[10:14:16] [worker #11] ML tree search #12, logLikelihood: -332810.441574

[10:14:29] [worker #2] ML tree search #3, logLikelihood: -332824.392888

[11:44:27] [worker #3] ML tree search #4, logLikelihood: -332797.503232

[15:56:10] [worker #1] ML tree search #2, logLikelihood: -332840.569166

[18:43:34] [worker #0] ML tree search #1, logLikelihood: -332823.646906

Maggie Lau

unread,
Feb 8, 2022, 1:33:41 AM2/8/22
to raxml
One more problem to report... which might be relevant.
The job (4nodes/4tasks/128cores) failed twice. It appears that the job was killed in the middle of some task(s) in loop(s) - please see the error message below. The system admin suggested modifying the script to request 4nodes/8tasks/128cores, would that be helpful?

terminate called recursively

terminate called recursively

terminate called recursively

terminate called recursively

terminate called recursively

terminate called recursively

terminate called recursively

terminate called recursively

--------------------------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--------------------------------------------------------------------------

--------------------------------------------------------------------------

mpirun noticed that process rank 3 with PID 189880 on node j1107 exited on signal 6 (Aborted).

--------------------------------------------------------------------------


Alexey Kozlov

unread,
Feb 10, 2022, 11:07:02 AM2/10/22
to ra...@googlegroups.com
hm duplication in stdout file is not a problem, but this error definitely is.

please send me full log file from this run, and ideally also the input files.

Thanks,
Alexey
> /mpirun -np 4 raxml-ng-mpi --all --msa ../proline_parse.raxml.rba --prefix proline_4N4n128c
> --model LG+I+G -seed 44128 --bs-trees autoMRE --thread 128 --force perf_threads/
>
>
> *Part of the .log file (A)*
>
> parallelization: coarse-grained (auto), hybrid MPI+PTHREADS (4 ranks x 128 threads), thread
> pinning: ON
>
> Parallelization scheme autoconfig: 64 worker(s) x 8 thread(s)
>
>
>
> Starting ML tree search with 20 distinct starting trees
>
> [05:26:55] [worker #2] ML tree search #19, logLikelihood: -332859.189348
>
> [05:47:46] [worker #7] ML tree search #8, logLikelihood: -332863.865625
>
> [07:02:22] [worker #14] ML tree search #15, logLikelihood: -332814.853938
>
> [07:03:58] [worker #5] ML tree search #6, logLikelihood: -332778.467893
>
> [07:10:45] [worker #8] ML tree search #9, logLikelihood: -332821.908123
>
> [07:22:29] [worker #3] ML tree search #20, logLikelihood: -332847.515380
>
> [07:30:49] [worker #6] ML tree search #7, logLikelihood: -332855.810962
>
> [08:07:21] [worker #13] ML tree search #14, logLikelihood: -332786.696103
>
> [08:10:00] [worker #9] ML tree search #10, logLikelihood: -332828.563535
>
> [08:15:14] [worker #12] ML tree search #13, logLikelihood: -332829.157385
>
> [08:40:41] [worker #10] ML tree search #11, logLikelihood: -332799.770461
>
> [09:41:32] [worker #0] ML tree search #17, logLikelihood: -332797.135762
>
> [09:52:28] [worker #15] ML tree search #16, logLikelihood: -332834.367563
>
> [09:53:59] [worker #4] ML tree search #5, logLikelihood: -332790.057741
>
> [10:06:41] [worker #1] ML tree search #18, logLikelihood: -332747.893445
>
> [10:14:16] [worker #11] ML tree search #12, logLikelihood: -332810.441574
>
> [10:14:29] [worker #2] ML tree search #3, logLikelihood: -332824.392888
>
> [11:44:27] [worker #3] ML tree search #4, logLikelihood: -332797.503232
>
> [15:56:10] [worker #1] ML tree search #2, logLikelihood: -332840.569166
>
> [18:43:34] [worker #0] ML tree search #1, logLikelihood: -332823.646906
>
> *Part of the standard out file (B) *
> <https://groups.google.com/d/msgid/raxml/92cff033-5881-4aa9-8dfc-cfd1c9e60d61n%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/92cff033-5881-4aa9-8dfc-cfd1c9e60d61n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/417e09b1-61d5-4f08-9eda-88ec38952b5fn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/417e09b1-61d5-4f08-9eda-88ec38952b5fn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Maggie Lau

unread,
Feb 11, 2022, 11:06:16 PM2/11/22
to raxml
Noted the comment on duplications in stdout. Thanks.

Attached please find a zip file containing
- the slurm script
- the stdout
- the err
- the log

FYI, the admin checked that the nodes are running as expected.

Thank you in advance for giving us help!
With your help, we hope to resolve the problem and will be able to run two more even larger alignment files.
proline_4N4n128c-failed.zip

Harsh Kashyap

unread,
Feb 15, 2022, 3:18:51 AM2/15/22
to ra...@googlegroups.com
Hi 
Are you know some php.

To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raxml/92cff033-5881-4aa9-8dfc-cfd1c9e60d61n%40googlegroups.com.

Alexey Kozlov

unread,
Feb 15, 2022, 8:00:45 AM2/15/22
to ra...@googlegroups.com
according to the log, this run was interrupted and restarted from a checkpoint.

could you please try to rerun the analysis from scratch (--redo)?

if this does not help, please send me the input files, and I'll try to reproduce.
> <https://groups.google.com/d/msgid/raxml/417e09b1-61d5-4f08-9eda-88ec38952b5fn%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/417e09b1-61d5-4f08-9eda-88ec38952b5fn%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/00ca9355-357a-4790-b67a-d4d6eb862305n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/00ca9355-357a-4790-b67a-d4d6eb862305n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Maggie Lau

unread,
Feb 15, 2022, 9:33:41 PM2/15/22
to raxml
The job (4nodes/4tasks/128cores) failed twice. The error message was the same. When I restarted the job, it continued from the checkpoint.

I will rerun the analysis from scratch using the -redo option. And let you know.


Maggie Lau

unread,
Feb 16, 2022, 9:31:42 PM2/16/22
to raxml
The job (with --redo) is terminated again after about 8 hours. 

Attached please find a zip file containing
- the slurm script
- the stdout
- the err
- the log



proline_4N4n128c-redo-failed.zip

Alexey Kozlov

unread,
Feb 17, 2022, 7:46:21 AM2/17/22
to ra...@googlegroups.com

Maggie Lau

unread,
Feb 19, 2022, 1:35:41 AM2/19/22
to raxml
Of course! For some reason, I thought it was sent sometime ago. Sorry about that.
Here you go.

proline_parse.raxml.rba
Reply all
Reply to author
Forward
0 new messages