raxml-ng

chris blair

unread,

Jan 25, 2018, 8:04:35 AM1/25/18

to raxml

Hey Alexey,

I finally got raxml-ng up and running on our system. Thanks again for all the help. I'm currently running an analysis with 10 MPI ranks. If I understand correctly, independent ML searches should be simultaneously run on each rank. However, from the output of raxml-ng this appears to not be the case. Can you confirm?

Chris

[00:00:37] Generating parsimony starting tree(s) with 47 taxa

[01:29:04] Data distribution: partitions/thread: 1-1, patterns/thread: 203929-203930

Starting ML tree search with 50 distinct starting trees

[04:48:55] ML tree search #1, logLikelihood: -34048031.742998

[09:48:30] ML tree search #2, logLikelihood: -34048031.700193

[14:07:28] ML tree search #3, logLikelihood: -34048035.555916

Alexey Kozlov

unread,

Jan 25, 2018, 8:20:34 AM1/25/18

to ra...@googlegroups.com

Hi Chris,

that's all right, raxml-ng currently implements fine-grained parallelization only, which means the alignment is split
among MPI processes, and they all work on the *same* ML search.

Best,
Alexey

> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

chris blair

unread,

Jan 25, 2018, 9:26:47 PM1/25/18

to raxml

Ah ok thanks. I'm trying to run the hybrid MPI + Pthreads version in hopes of speeding things up and the run actually seems slower than just using MPI. I specified 16 threads (--threads 16) as this is the number of cores we have per node. When just using MPI it took about 1.5 hours to compute 50 MP starting trees. It's been about double that time with the hybrid version and it has still not finished. I'm happy to back to just using MPI, but I am a bit curious.

Chris

Alexey Kozlov

unread,

Jan 26, 2018, 8:42:27 AM1/26/18

to ra...@googlegroups.com

Hi Chris,

can I see how do you run the hybrid version (submission script + output)? It is a bit more tricky to configure than
plain MPI, common pitfalls include running e.g. 16 MPI ranks x 16 threads on each node, or pinning all threads to the
same core.

Also, please note that parsimony computation is currently not parallelized, so this step won't become any faster if you
add more cores (but it might change in the future).

Best,
Alexey

> <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

chris blair

unread,

Jan 26, 2018, 9:14:19 AM1/26/18

to raxml

Hi Alexey,

Please see attached.

Also, it appears that you guys get a lot of questions about setting parallelized runs. It may be useful (and take some strain off of you guys) to create a webpage, wiki, etc. highlighting similarities and differences in how your programs (e.g. RAxML, RAxML-NG, ExaML, ExaBayes, etc.) handle parallelization. Having a quick go-to would definitely make things easier.

All the best,

Chris

Screen Shot 2018-01-26 at 9.08.54 AM.png

Screen Shot 2018-01-26 at 9.10.21 AM.png

Alexey Kozlov

unread,

Jan 26, 2018, 9:58:59 AM1/26/18

to ra...@googlegroups.com

Hi Chris,

> Please see attached.

This looks good, and given your dataset dimensions, ML search should actually scale very well.

I'd just generate parsimony starting trees in a separate run on a single node, or just reuse those that have been
generated in your previous run ('--tree oldRun.raxml.startTree'), or simply use random starting trees instead (even
better: 25 random + 25 parsimony starting trees).

> Also, it appears that you guys get a lot of questions about setting parallelized runs. It may be useful (and take some
> strain off of you guys) to create a webpage, wiki, etc. highlighting similarities and differences in how your programs
> (e.g. RAxML, RAxML-NG, ExaML, ExaBayes, etc.) handle parallelization. Having a quick go-to would definitely make things
> easier.

Well, this information is already available in the documentation for each tool (except for raxml-ng), but maybe you're
right and we should have a single overview document.

Best,
Alexey

chris blair

unread,

Jan 27, 2018, 8:03:43 PM1/27/18

to raxml

Thanks Alexey. I'm playing around with different configurations trying to see what works well. What I'm noticing (with RAxML) is that there is an inverse relationship between the number of MPI processes used and the speed of the analysis. For example, if I run a -fa search with 10 processes, RaxML takes ~2000 seconds per process for BS optimization. With 2 MPI processes this decreases to about 400 sec. This is quite odd and not what I was expecting.

Alexey Kozlov

unread,

Jan 29, 2018, 7:50:06 AM1/29/18

to ra...@googlegroups.com

Hi Chris,

could you please post the respective submission scripts and output files?

Best,
Alexey

chris blair

unread,

Jan 30, 2018, 1:41:15 PM1/30/18

to raxml

Hi Alexey,

I think I deleted the output from that run. Here is a RAxML script I am running:

#!/bin/bash

#PBS -N RAxML_fa

#PBS -l nodes=4:ppn=16

module load mpich-3.2

module load gcc-4.9.2

echo "Starting raxml parallel job ..."

#How many parallel threads you want.

#In this example, the number of parallel threads is set equal 8.

export OMP_NUM_THREADS=16

# You must explicitly change to the working directory in PBS

cd $PBS_O_WORKDIR

# Use 'mpirun' and point to the MPI parallel executable to run

echo ">>>> Begin RAXML MPI Run ..."

mpirun -np 4 -machinefile $PBS_NODEFILE /home/cblair/cornutum/RAXML/raxmlHPC-HYBRID-SSE3 -T 16 -f a -m GTRGAMMA -p 657488 -x 453655 -s cornutum_25_NEW_edited.phy -o H6 -N 100 -n cornutum_25percent > cornutum_raxml_25percent.out 2>&1

echo ">>>> End RAXML MPI Run ..."

For the sake of comparison I also set up an analysis using raxml-ng with 5 MPI processes and 16 threads. Interestingly, it took about 10 hours to compute one tree in RaxML and about 35 hours for one tree in raxml-ng. I thought raxml-ng would be faster per tree since it is using both MPI and Pthreads for single likelihood calculations (if I'm understanding correctly).

Chris

Alexey Kozlov

unread,

Feb 1, 2018, 12:32:27 PM2/1/18

to ra...@googlegroups.com

Hi Chris,

> I think I deleted the output from that run. Here is a RAxML script I am running:

this looks good, just one question: when you run RAxML with 10 MPI processes, do you also increase the number of nodes
accordingly (i.e. #PBS -l nodes=10:ppn=16)? If no, the performance degradation is expected (due to core
oversubscription). If yes, there might be some problems with assigning MPI processes to nodes and/or threads to CPU
cores. In other words, if we use 10 nodes x 16 cores, we need to have 160 threads in total (e.g., 10 MPI processed x 16
threads) *and* they have to be distributed/pinned such that *exactly* one thread is running on each CPU core on each
node. Actually, MPI / cluster job submission system has to take care of this, but in practice you often need to use some
additional tweaks in submission script / mpirun call to achieve the proper pinning. Therefore, I'd first check this with
htop/top/ps (see also: http://www.glennklockwood.com/hpc-howtos/process-affinity.html).

Please also note, that "BS model parameter optimization" step is not parallelized with MPI, that is, the runtime of this
phase will (ideally) remain constant regardless of the number of MPI processes used.

> For the sake of comparison I also set up an analysis using raxml-ng with 5 MPI processes and 16 threads. Interestingly,
> it took about 10 hours to compute one tree in RaxML and about 35 hours for one tree in raxml-ng. I thought raxml-ng
> would be faster per tree since it is using both MPI and Pthreads for single likelihood calculations (if I'm
> understanding correctly).

That's correct. Do you get the same tree with the same log-likelihood from RAxML and RAxML-NG? Can I see the output of
both runs?

Best,
Alexey

chris blair

unread,

Feb 1, 2018, 1:48:55 PM2/1/18

to raxml

Hi Alexey,

Yes, if I run RAxML with 10 MPI processes I ask for 10 nodes. I do understand that only one thread should run on one core, but am a bit unsure how to make sure this happens on our system (and how to check). What should I look for with top, for example?

I actually killed the RAxML-NG run to free up space for other users. However, the RAxML analysis appeared to have finished successfully. I am attaching the log file if you would like to take a look. I think things ran correctly.

For comparison, I ran a -f a analysis of the same data in RAxML. After a certain number of rapid bootstraps, I received the following error:

=>> PBS: job killed: node 1 (node07) requested job terminate, 'EOF' (code 1099) - received SISTER_EOF attempting to communicate with sister MOM's

Not sure what happened here. I am currently trying to re-run the analysis. So far the error hasn't occurred.

Chris

cornutum_raxml_25percent_50runs.out

Alexey Kozlov

unread,

Feb 2, 2018, 8:25:26 AM2/2/18

to ra...@googlegroups.com

Hi Chris,

> Yes, if I run RAxML with 10 MPI processes I ask for 10 nodes. I do understand that only one thread should run on one
> core, but am a bit unsure how to make sure this happens on our system (and how to check). What should I look for with
> top, for example?

you run "top" on each compute node where RAxML is running, then press "1" and check that *all* 16 CPUs have ~100% load.
after the initialization phase (reading alignment, parsimony starting tree generation etc.), this should always be the case.

> I actually killed the RAxML-NG run to free up space for other users. However, the RAxML analysis appeared to have
> finished successfully. I am attaching the log file if you would like to take a look. I think things ran correctly.

yes, this looks good. as for raxml-ng, do you still have an (incomplete) log output of this run?

> For comparison, I ran a -f a analysis of the same data in RAxML. After a certain number of rapid bootstraps, I received
> the following error:
>
> =>> PBS: job killed: node 1 (node07) requested job terminate, 'EOF' (code 1099) - received SISTER_EOF attempting to
> communicate with sister MOM's
>
> Not sure what happened here. I am currently trying to re-run the analysis. So far the error hasn't occurred.

this looks like a (hardware) node failure, let's see whether it will occur again...

Best,
Alexey

>
> Chris
> Hi

> On Thursday, 1 February 2018 12:32:27 UTC-5, Alexey Kozlov wrote:
>
> Hi Chris,
>
> > I think I deleted the output from that run. Here is a RAxML script I am running:
>
> this looks good, just one question: when you run RAxML with 10 MPI processes, do you also increase the number of nodes
> accordingly (i.e. #PBS -l nodes=10:ppn=16)? If no, the performance degradation is expected (due to core
> oversubscription). If yes, there might be some problems with assigning MPI processes to nodes and/or threads to CPU
> cores. In other words, if we use 10 nodes x 16 cores, we need to have 160 threads in total (e.g., 10 MPI processed x 16
> threads) *and* they have to be distributed/pinned such that *exactly* one thread is running on each CPU core on each
> node. Actually, MPI / cluster job submission system has to take care of this, but in practice you often need to use
> some
> additional tweaks in submission script / mpirun call to achieve the proper pinning. Therefore, I'd first check this
> with
> htop/top/ps (see also: http://www.glennklockwood.com/hpc-howtos/process-affinity.html

> <http://www.glennklockwood.com/hpc-howtos/process-affinity.html>).

chris blair

unread,

Feb 4, 2018, 10:33:04 AM2/4/18

to raxml

Hi Alexey,

Here is the output of raxml-ng before I killed the run.

RAxML-NG v. 0.5.1b BETA released on 01.12.2017 by The Exelixis Lab.

Authors: Alexey Kozlov, Alexandros Stamatakis, Diego Darriba, Tomas Flouri, Benoit Morel.

Latest version: https://github.com/amkozlov/raxml-ng

Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

WARNING: This is a BETA release, please use at your own risk!

RAxML-NG was called as follows:

/home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/raxml-ng-mpi --msa cornutum_25_NEW_edited.phy --threads 16 --search --model GTR+G --tree pars{50}

Analysis options:

run mode: ML tree search

start tree(s): parsimony (50)

random seed: 1517229722

tip-inner: ON

pattern compression: ON

per-rate scalers: OFF

site repeats: OFF

fast spr radius: AUTO

spr subtree cutoff: 1.000000

branch lengths: ML estimate (linked)

SIMD kernels: AVX

parallelization: hybrid MPI+PTHREADS (5 ranks x 16 threads)

[00:00:00] Reading alignment from file: cornutum_25_NEW_edited.phy

[00:00:07] Loaded alignment with 47 taxa and 10284964 sites

Alignment comprises 1 partitions and 1228455 patterns

Partition 0: noname

Model: GTR+FO+G4m

Alignment sites / patterns: 10284964 / 1228455

Gaps: 51.52 %

Invariant sites: 96.75 %

[00:01:01] Generating parsimony starting tree(s) with 47 taxa

[01:39:32] Data distribution: partitions/thread: 1-1, patterns/thread: 15355-15356

Starting ML tree search with 50 distinct starting trees

[39:37:55] ML tree search #1, logLikelihood: -16767369.083129

My RAxML analyses appeared to have worked well. I didn't receive the hardware error the second time around.

Chris

Alexey Kozlov

unread,

Feb 5, 2018, 11:02:17 AM2/5/18

to ra...@googlegroups.com

Hi Chris,

> Here is the output of raxml-ng before I killed the run.

thanks, this looks rather irregular indeed (compared to the RAxML runtime).

If you want to experiment with NG a bit more to find out the reason, I'd suggest the following:

- check thread pinning with top (see above)
- run single tree inference on single node (1 MPI rank x 16 threads, '--tree pars{1}') and compare the runtime to RAxML
and to NG on 5 nodes

> My RAxML analyses appeared to have worked well. I didn't receive the hardware error the second time around.

great :)

Best,
Alexey

Christopher Blair

unread,

Feb 5, 2018, 11:23:56 AM2/5/18

to ra...@googlegroups.com

Hi Alexey,

Thanks for the advice. So I typed 'top' and '1' and it only lists 8 cpus (Cpu0-Cpu7). I assume I am looking at the 'id' column for usage details? Each cpu hovers at around 100%. I thought we had 16 cores per node, but perhaps I am mistaken? We CAN start 16 processes in the PBS script (i.e. ppn: 16) so I assumed this meant we had 16 cores? This is all so confusing.

Chris

To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com <mailto:raxml+unsubscribe@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "raxml" group.

To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

***********************************************

Christopher Blair, Ph.D.
Assistant Professor

Department of Biological Sciences

New York City College of Technology and

Ecology, Evolution and Behavior Program

Graduate Center

The City University of New York

300 Jay Street

Brooklyn, NY 11201

CBl...@citytech.cuny.edu; cbl...@gc.cuny.edu

Website: https://sites.google.com/site/christopherblairphd/home

Office: Pearl 410; Ph: 718-260-5342

Alexey Kozlov

unread,

Feb 5, 2018, 11:31:24 AM2/5/18

to ra...@googlegroups.com

Hi Chris,

are sure you are running top on a compute node where raxml is running and not on the login node from which you submit
you jobs?
You should see ~100% in the "us" column (which means "user" processes), and also raxml-ng must show up among the running
process in the table below. And "id" actually stands for "idle", so if you see 100% in this column, then something went
wrong.

I personally prefer "htop" since it gives a very nice graphical representation of RAM and CPU load.

Best,
Alexey

On 05.02.2018 17:23, Christopher Blair wrote:
> Hi Alexey,
>
> Thanks for the advice. So I typed 'top' and '1' and it only lists 8 cpus (Cpu0-Cpu7). I assume I am looking at the 'id'
> column for usage details? Each cpu hovers at around 100%. I thought we had 16 cores per node, but perhaps I am mistaken?
> We CAN start 16 processes in the PBS script (i.e. ppn: 16) so I assumed this meant we had 16 cores? This is all so
> confusing.
>
> Chris
>

> > > > > raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>
> > > > > > <javascript:>
> > > > > > > <mailto:raxml+un...@googlegroups.com
> <mailto:raxml%2Bun...@googlegroups.com> <javascript:>>.

> > > > raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>
> > > > > <javascript:>
> > > > > > <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>

> > > raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>
> > > > <javascript:>
> > > > > <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>

> > raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>
> > > <javascript:>
> > > > <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com> <javascript:>>.

> raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>
> > <javascript:>
> > > <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com> <javascript:>>.

> > > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to

> raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>
> <javascript:>
> > <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com> <javascript:>>.

> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to

> raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>>.

> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to

> raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>.

> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
>
>
> --

> ***********************************************
> Christopher Blair, Ph.D.
> Assistant Professor
> Department of Biological Sciences
> New York City College of Technology and
> Ecology, Evolution and Behavior Program
> Graduate Center
> The City University of New York
> 300 Jay Street
> Brooklyn, NY 11201

> CBl...@citytech.cuny.edu <mailto:CBl...@citytech.cuny.edu>; cbl...@gc.cuny.edu <mailto:cbl...@gc.cuny.edu>
> <http://individual.utoronto.ca/chrisblair/index.html>

> Website: https://sites.google.com/site/christopherblairphd/home
> Office: Pearl 410; Ph: 718-260-5342
>

> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com

> <mailto:raxml+un...@googlegroups.com>.

Christopher Blair

unread,

Feb 5, 2018, 11:39:02 AM2/5/18

to ra...@googlegroups.com

Hi Alexey,

Looks like we do not have htop installed on our system. You are right, I am probably running top in the wrong place. How can I specify a specific node to top? Looks like the current job is running on node14.

Chris

--

***********************************************

Christopher Blair, Ph.D.
Assistant Professor

Department of Biological Sciences

New York City College of Technology and

Ecology, Evolution and Behavior Program

Graduate Center

The City University of New York

300 Jay Street

Brooklyn, NY 11201

CBl...@citytech.cuny.edu; cbl...@gc.cuny.edu

Alexey Kozlov

unread,

Feb 5, 2018, 11:41:00 AM2/5/18

to ra...@googlegroups.com

you have to ssh to this node (if it's allowed on your cluster), and then run top from there. please ask your admins for
help if unsure.

On 05.02.2018 17:38, Christopher Blair wrote:
> Hi Alexey,
>

> Looks like we do not have htop installed on our system. You are right, I am probably running top in the wrong place. How
> can I specify a specific node to top? Looks like the current job is running on node14.
>
> Chris
>

> <mailto:raxml%2Bun...@googlegroups.com <mailto:raxml%252Bun...@googlegroups.com>>

> > > > > > <javascript:>
> > > > > > > <mailto:raxml+un...@googlegroups.com
> <mailto:raxml%2Bun...@googlegroups.com>

> <mailto:raxml%2Bun...@googlegroups.com <mailto:raxml%252Bun...@googlegroups.com>> <javascript:>>.

> <mailto:raxml%2Bun...@googlegroups.com <mailto:raxml%252Bun...@googlegroups.com>>
> > > > > <javascript:>
> > > > > > <mailto:raxml+un...@googlegroups.com
> <mailto:raxml%2Bun...@googlegroups.com> <mailto:raxml%2Bun...@googlegroups.com
> <mailto:raxml%252Bun...@googlegroups.com>>

> --
> ***********************************************
> Christopher Blair, Ph.D.
> Assistant Professor
> Department of Biological Sciences
> New York City College of Technology and
> Ecology, Evolution and Behavior Program
> Graduate Center
> The City University of New York
> 300 Jay Street
> Brooklyn, NY 11201

> CBl...@citytech.cuny.edu <mailto:CBl...@citytech.cuny.edu>; cbl...@gc.cuny.edu <mailto:cbl...@gc.cuny.edu>
> <http://individual.utoronto.ca/chrisblair/index.html>

> Website: https://sites.google.com/site/christopherblairphd/home
> Office: Pearl 410; Ph: 718-260-5342
>

> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com

> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Christopher Blair

unread,

Feb 5, 2018, 11:42:57 AM2/5/18

to ra...@googlegroups.com

That's what I thought. Unfortunately I don't have permission to do this. I'll send an email to our admin to see if I can obtain permission. Will keep you posted.

--

***********************************************

Christopher Blair, Ph.D.
Assistant Professor

Department of Biological Sciences

New York City College of Technology and

Ecology, Evolution and Behavior Program

Graduate Center

The City University of New York

300 Jay Street

Brooklyn, NY 11201

CBl...@citytech.cuny.edu; cbl...@gc.cuny.edu

Alexey Kozlov

unread,

Feb 5, 2018, 11:43:34 AM2/5/18

to ra...@googlegroups.com

ok great

On 05.02.2018 17:42, Christopher Blair wrote:
> That's what I thought. Unfortunately I don't have permission to do this. I'll send an email to our admin to see if I can
> obtain permission. Will keep you posted.
>

> On Mon, Feb 5, 2018 at 11:40 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> you have to ssh to this node (if it's allowed on your cluster), and then run top from there. please ask your admins
> for help if unsure.
>
> On 05.02.2018 17:38, Christopher Blair wrote:
>
> Hi Alexey,
>
> Looks like we do not have htop installed on our system. You are right, I am probably running top in the wrong
> place. How can I specify a specific node to top? Looks like the current job is running on node14.
>
> Chris
>
> On Mon, Feb 5, 2018 at 11:31 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>

> <mailto:raxml%2Bun...@googlegroups.com <mailto:raxml%252Bun...@googlegroups.com>
> <mailto:raxml%252Bun...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>>

> > > > > > <javascript:>
> > > > > > > <mailto:raxml+un...@googlegroups.com
> <mailto:raxml%2Bun...@googlegroups.com>
> <mailto:raxml%2Bun...@googlegroups.com <mailto:raxml%252Bun...@googlegroups.com>>

> <mailto:raxml%2Bun...@googlegroups.com <mailto:raxml%252Bun...@googlegroups.com>
> <mailto:raxml%252Bun...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>> <javascript:>>.

> CBl...@citytech.cuny.edu <mailto:CBl...@citytech.cuny.edu>; cbl...@gc.cuny.edu <mailto:cbl...@gc.cuny.edu>
> <http://individual.utoronto.ca/chrisblair/index.html>

> Website: https://sites.google.com/site/christopherblairphd/home
> Office: Pearl 410; Ph: 718-260-5342
>

> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com

> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Christopher Blair

unread,

Feb 5, 2018, 3:05:43 PM2/5/18

to ra...@googlegroups.com

Hey Alexey,

Still trying to access top on a compute node. However, the test raxml-ng analysis completed, and much faster than before (~3 vs. 35 hrs per tree). Thus, there seems to be an issue with specifying both MPI and Pthreads. See below:

RAxML-NG v. 0.5.1b BETA released on 01.12.2017 by The Exelixis Lab.

Authors: Alexey Kozlov, Alexandros Stamatakis, Diego Darriba, Tomas Flouri, Benoit Morel.

Latest version: https://github.com/amkozlov/raxml-ng

Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

WARNING: This is a BETA release, please use at your own risk!

RAxML-NG was called as follows:

/home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/raxml-ng-mpi --msa cornutum_25_NEW_edited.phy --threads 16 --search --model GTR+G --tree pars{1}

Analysis options:

run mode: ML tree search

start tree(s): parsimony

random seed: 1517895441

tip-inner: ON

pattern compression: ON

per-rate scalers: OFF

site repeats: OFF

fast spr radius: AUTO

spr subtree cutoff: 1.000000

branch lengths: ML estimate (linked)

SIMD kernels: AVX

parallelization: PTHREADS (16 threads)

[00:00:00] Reading alignment from file: cornutum_25_NEW_edited.phy

[00:00:02] Loaded alignment with 47 taxa and 10284964 sites

Alignment comprises 1 partitions and 1228455 patterns

Partition 0: noname

Model: GTR+FO+G4m

Alignment sites / patterns: 10284964 / 1228455

Gaps: 51.52 %

Invariant sites: 96.75 %

[00:00:17] Generating parsimony starting tree(s) with 47 taxa

[00:00:45] Data distribution: partitions/thread: 1-1, patterns/thread: 76778-76779

Starting ML tree search with 1 distinct starting trees

[00:00:47 -36874918.854127] Initial branch length optimization

[00:01:20 -16998034.205271] Model parameter optimization (eps = 10.000000)

[00:09:09 -16770523.144581] AUTODETECT spr round 1 (radius: 5)

[00:12:22 -16767919.411744] AUTODETECT spr round 2 (radius: 10)

[00:16:18 -16767919.384442] SPR radius for FAST iterations: 5 (autodetect)

[00:16:18 -16767919.384442] Model parameter optimization (eps = 3.000000)

[00:17:10 -16767919.290121] FAST spr round 1 (radius: 5)

[00:25:39 -16767741.137717] FAST spr round 2 (radius: 5)

[00:33:34 -16767646.287843] FAST spr round 3 (radius: 5)

[00:41:29 -16767572.907027] FAST spr round 4 (radius: 5)

[00:49:35 -16767519.234089] FAST spr round 5 (radius: 5)

[00:56:59 -16767494.311764] FAST spr round 6 (radius: 5)

[01:04:28 -16767484.388031] FAST spr round 7 (radius: 5)

[01:12:09 -16767483.790835] FAST spr round 8 (radius: 5)

[01:20:01 -16767483.777842] Model parameter optimization (eps = 1.000000)

[01:21:06 -16767483.745612] SLOW spr round 1 (radius: 5)

[01:44:54 -16767379.026932] SLOW spr round 2 (radius: 5)

[02:06:06 -16767369.032591] SLOW spr round 3 (radius: 5)

[02:26:24 -16767369.032111] SLOW spr round 4 (radius: 10)

[02:40:56 -16767369.031841] SLOW spr round 5 (radius: 15)

[02:55:25 -16767369.031631] SLOW spr round 6 (radius: 20)

[03:05:48 -16767369.031506] SLOW spr round 7 (radius: 25)

[03:15:19 -16767369.031506] Model parameter optimization (eps = 0.100000)

[03:16:13] ML tree search #1, logLikelihood: -16767369.011700

Optimized model parameters:

Partition 0: noname

Rate heterogeneity: GAMMA (4 cats, mean), alpha: 0.020139 (ML), weights&rates: (0.250000,0.000000) (0.250000,0.000000) (0.250000,0.000001) (0.250000,3.999999)

Base frequencies (ML): 0.273566 0.235296 0.227784 0.263354

Substitution rates (ML): 0.928079 4.980946 0.650144 1.036918 4.999550 1.000000

Final LogLikelihood: -16767369.011700

Best ML tree saved to: /home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/TESTS/cornutum_25_NEW_edited.phy.raxml.bestTree

Optimized model saved to: /home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/TESTS/cornutum_25_NEW_edited.phy.raxml.bestModel

Execution log saved to: /home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/TESTS/cornutum_25_NEW_edited.phy.raxml.log

Analysis started: 05-Feb-2018 23:37:21 / finished: 06-Feb-2018 02:53:34

Elapsed time: 11773.234 seconds

--

***********************************************

Christopher Blair, Ph.D.
Assistant Professor

Department of Biological Sciences

New York City College of Technology and

Ecology, Evolution and Behavior Program

Graduate Center

The City University of New York

300 Jay Street

Brooklyn, NY 11201

CBl...@citytech.cuny.edu; cbl...@gc.cuny.edu

Alexey Kozlov

unread,

Feb 5, 2018, 4:00:20 PM2/5/18

to ra...@googlegroups.com

thanks for posting, this looks much better now :) and also in line with the expectations (~3x faster than RAxML)

apart from correct thread pinning, you should probably check how your cluster nodes are connected: low-latency
interconnect such as Infiniband is necessary to achieve good performance with fine-grained parallelization implemented
in RAXML-NG.

> Rate heterogeneity: GAMMA (4 cats, mean),alpha: 0.020139 (ML),weights&rates: (0.250000,0.000000) (0.250000,0.000000)

> (0.250000,0.000001) (0.250000,3.999999)
>
> Base frequencies (ML): 0.273566 0.235296 0.227784 0.263354
>
> Substitution rates (ML): 0.928079 4.980946 0.650144 1.036918 4.999550 1.000000
>
>
> Final LogLikelihood: -16767369.011700
>
>
> Best ML tree saved to:
> /home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/TESTS/cornutum_25_NEW_edited.phy.raxml.bestTree
>
> Optimized model saved to:
> /home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/TESTS/cornutum_25_NEW_edited.phy.raxml.bestModel
>
>
> Execution log saved to:
> /home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/TESTS/cornutum_25_NEW_edited.phy.raxml.log
>
>
> Analysis started: 05-Feb-2018 23:37:21 / finished: 06-Feb-2018 02:53:34
>
>
> Elapsed time: 11773.234 seconds
>
>
>
> On Mon, Feb 5, 2018 at 11:43 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> ok great
>
> On 05.02.2018 17:42, Christopher Blair wrote:
>
> That's what I thought. Unfortunately I don't have permission to do this. I'll send an email to our admin to see
> if I can obtain permission. Will keep you posted.
>
> On Mon, Feb 5, 2018 at 11:40 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>

> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> wrote:
>
> you have to ssh to this node (if it's allowed on your cluster), and then run top from there. please ask
> your admins
> for help if unsure.
>
> On 05.02.2018 17:38, Christopher Blair wrote:
>
> Hi Alexey,
>
> Looks like we do not have htop installed on our system. You are right, I am probably running top in the
> wrong
> place. How can I specify a specific node to top? Looks like the current job is running on node14.
>
> Chris
>
> On Mon, Feb 5, 2018 at 11:31 AM, Alexey Kozlov <alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>

> <mailto:raxml%252Bun...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bun...@googlegroups.com>>>>

>          > > > > > <javascript:>
>          > > > > > > <mailto:raxml+un...@googlegroups.com
> <mailto:raxml%2Bun...@googlegroups.com>
> <mailto:raxml%2Bun...@googlegroups.com <mailto:raxml%252Bun...@googlegroups.com>>
>    <mailto:raxml%2Bun...@googlegroups.com <mailto:raxml%252Bun...@googlegroups.com>
> <mailto:raxml%252Bun...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>>
>       <mailto:raxml%2Bun...@googlegroups.com <mailto:raxml%252Bun...@googlegroups.com>
> <mailto:raxml%252Bun...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>

> <mailto:raxml%252Bun...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bun...@googlegroups.com>>>> <javascript:>>.

> CBl...@citytech.cuny.edu <mailto:CBl...@citytech.cuny.edu>; cbl...@gc.cuny.edu <mailto:cbl...@gc.cuny.edu>
> <http://individual.utoronto.ca/chrisblair/index.html>

> Website: https://sites.google.com/site/christopherblairphd/home
> Office: Pearl 410; Ph: 718-260-5342
>

> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com

> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Christopher Blair

unread,

Feb 5, 2018, 8:28:15 PM2/5/18

to ra...@googlegroups.com

Hi Alexey,

So it doesn't look like I will be able to SSH into a compute node to run top. What if I simply do qstat -n? This shows that the run is using all 16 cores per node.

How should I check how the nodes are connected?

Again, apologies for the technical questions. Our cluster admin is actually a faculty member in the physics department so he cannot devote much time to troubleshoot.

I'm wondering how to proceed going forward. At some point do you plan to implement the same kind of coarse and fine-grained parallelism at RAxML? If so, I may be better sticking with RAxML for the time being as MPI+Pthreads appeared to work ok. I am curious though why running a hybrid MPI/Pthreads run in RAxML-NG resulted in such a drastic slowdown. Using 5 versus 1 MPI processes should have resulted in a speed-up of ~5x.

Chris

--

***********************************************

Christopher Blair, Ph.D.
Assistant Professor

Department of Biological Sciences

New York City College of Technology and

Ecology, Evolution and Behavior Program

Graduate Center

The City University of New York

300 Jay Street

Brooklyn, NY 11201

CBl...@citytech.cuny.edu; cbl...@gc.cuny.edu

Alexey Kozlov

unread,

Feb 6, 2018, 4:45:24 AM2/6/18

to ra...@googlegroups.com

Hi Chris,

> So it doesn't look like I will be able to SSH into a compute node to run top. What if I simply do qstat -n? This shows
> that the run is using all 16 cores per node.

no it's not enough, unfortunately.

> How should I check how the nodes are connected?

It should be stated in you cluster description/documentation, it there is any.

> I'm wondering how to proceed going forward. At some point do you plan to implement the same kind of coarse and
> fine-grained parallelism at RAxML?

yes we do

If so, I may be better sticking with RAxML for the time being as MPI+Pthreads
> appeared to work ok. I am curious though why running a hybrid MPI/Pthreads run in RAxML-NG resulted in such a drastic
> slowdown. Using 5 versus 1 MPI processes should have resulted in a speed-up of ~5x.

as you like, you can also start 5 individual runs with 1 node/10 trees each, but this is less convenient, of course.

Best,
Alexey

>
> Chris

>
> On Mon, Feb 5, 2018 at 4:00 PM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> thanks for posting, this looks much better now :) and also in line with the expectations (~3x faster than RAxML)
>
> apart from correct thread pinning, you should probably check how your cluster nodes are connected: low-latency
> interconnect such as Infiniband is necessary to achieve good performance with fine-grained parallelization
> implemented in RAXML-NG.
>
> On 05.02.2018 21:05, Christopher Blair wrote:
>
> Hey Alexey,
>
> Still trying to access top on a compute node. However, the test raxml-ng analysis completed, and much faster
> than before (~3 vs. 35 hrs per tree). Thus, there seems to be an issue with specifying both MPI and Pthreads.
> See below:
>
> RAxML-NG v. 0.5.1b BETA released on 01.12.2017 by The Exelixis Lab.
>
> Authors: Alexey Kozlov, Alexandros Stamatakis, Diego Darriba, Tomas Flouri, Benoit Morel.
>

> Latest version: https://github.com/amkozlov/raxml-ng <https://github.com/amkozlov/raxml-ng>

> [00:01:20 -16998034 <tel:20%20-16998034>.205271] Model parameter optimization (eps = 10.000000)

>
> [00:09:09 -16770523.144581] AUTODETECT spr round 1 (radius: 5)
>
> [00:12:22 -16767919.411744] AUTODETECT spr round 2 (radius: 10)
>
> [00:16:18 -16767919.384442] SPR radius for FAST iterations: 5 (autodetect)
>
> [00:16:18 -16767919.384442] Model parameter optimization (eps = 3.000000)
>
> [00:17:10 -16767919.290121] FAST spr round 1 (radius: 5)
>
> [00:25:39 -16767741.137717] FAST spr round 2 (radius: 5)
>
> [00:33:34 -16767646.287843] FAST spr round 3 (radius: 5)
>
> [00:41:29 -16767572.907027] FAST spr round 4 (radius: 5)
>

> [00:49:35 -16767519 <tel:35%20-16767519>.234089] FAST spr round 5 (radius: 5)

>
> [00:56:59 -16767494.311764] FAST spr round 6 (radius: 5)
>

> [01:04:28 -16767484 <tel:28%20-16767484>.388031] FAST spr round 7 (radius: 5)

>
> [01:12:09 -16767483.790835] FAST spr round 8 (radius: 5)
>
> [01:20:01 -16767483.777842] Model parameter optimization (eps = 1.000000)
>
> [01:21:06 -16767483.745612] SLOW spr round 1 (radius: 5)
>

> [01:44:54 -16767379 <tel:54%20-16767379>.026932] SLOW spr round 2 (radius: 5)

>
> [02:06:06 -16767369.032591] SLOW spr round 3 (radius: 5)
>
> [02:26:24 -16767369.032111] SLOW spr round 4 (radius: 10)
>

> [02:40:56 -16767369 <tel:56%20-16767369>.031841] SLOW spr round 5 (radius: 15)
>
> [02:55:25 -16767369 <tel:25%20-16767369>.031631] SLOW spr round 6 (radius: 20)

>
> [03:05:48 -16767369.031506] SLOW spr round 7 (radius: 25)
>
> [03:15:19 -16767369.031506] Model parameter optimization (eps = 0.100000)
>
>
> [03:16:13] ML tree search #1, logLikelihood: -16767369.011700
>
>
>
> Optimized model parameters:
>
>
> Partition 0: noname
>
> Rate heterogeneity: GAMMA (4 cats, mean),alpha: 0.020139 (ML),weights&rates: (0.250000,0.000000)
> (0.250000,0.000000) (0.250000,0.000001) (0.250000,3.999999)
>
> Base frequencies (ML): 0.273566 0.235296 0.227784 0.263354
>
> Substitution rates (ML): 0.928079 4.980946 0.650144 1.036918 4.999550 1.000000
>
>
> Final LogLikelihood: -16767369.011700
>
>
> Best ML tree saved to:
> /home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/TESTS/cornutum_25_NEW_edited.phy.raxml.bestTree
>
> Optimized model saved to:
> /home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/TESTS/cornutum_25_NEW_edited.phy.raxml.bestModel
>
>
> Execution log saved to:
> /home/cblair/cornutum/raxml-ng_v0.5.1b_linux_x86_64_MPI/bin/TESTS/cornutum_25_NEW_edited.phy.raxml.log
>
>
> Analysis started: 05-Feb-2018 23:37:21 / finished: 06-Feb-2018 02:53:34
>
>
> Elapsed time: 11773.234 seconds
>
>
>
> On Mon, Feb 5, 2018 at 11:43 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>

> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> wrote:
>
> ok great
>
> On 05.02.2018 17:42, Christopher Blair wrote:
>
> That's what I thought. Unfortunately I don't have permission to do this. I'll send an email to our
> admin to see
> if I can obtain permission. Will keep you posted.
>
> On Mon, Feb 5, 2018 at 11:40 AM, Alexey Kozlov <alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>

> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>>>> wrote:
>
>    you have to ssh to this node (if it's allowed on your cluster), and then run top from there.
> please ask
> your admins
>    for help if unsure.
>
>    On 05.02.2018 17:38, Christopher Blair wrote:
>
>    Hi Alexey,
>
>    Looks like we do not have htop installed on our system. You are right, I am probably running
> top in the
> wrong
>    place. How can I specify a specific node to top? Looks like the current job is running on node14.
>
>    Chris
>
>    On Mon, Feb 5, 2018 at 11:31 AM, Alexey Kozlov <alexei...@gmail.com
> <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>

> CBl...@citytech.cuny.edu <mailto:CBl...@citytech.cuny.edu>; cbl...@gc.cuny.edu <mailto:cbl...@gc.cuny.edu>
> <http://individual.utoronto.ca/chrisblair/index.html>

> Website: https://sites.google.com/site/christopherblairphd/home
> Office: Pearl 410; Ph: 718-260-5342
>

Reply all

Reply to author

Forward