Compile for Knights Landing (AVX512)

145 views
Skip to first unread message

Gabe

unread,
Apr 24, 2017, 12:41:28 AM4/24/17
to raxml
Are there plans in the works to enable easy building for Xeon Phi processors (not talking about the old co-processors; I know that's a separate project)? I ask this from a user compilation standpoint, as I'm sure you're probably well aware of the tremendous performance gain possible with the new AVX512 instructions on Phi/Skylake-Xeon. :)

Are there build instructions yet? I see we will probably be needing "Cmake" now, for instance, so things look a little different from before. I'm assuming it still works to swap in the intel compiler (icc / icpc)?

Cheers,
Gabe

Gabe

unread,
Apr 24, 2017, 12:42:59 AM4/24/17
to raxml
Ah, I should clarify I'm talking about raxml-ng (I wasn't aware the wiki link took me to the general group).

That said, of course it would also be interesting to hear about whether "traditional" raxml supports AVX512 yet as well.

Cheers!

Alexey Kozlov

unread,
Apr 24, 2017, 3:36:14 AM4/24/17
to ra...@googlegroups.com
Hi Gabe,

yes, we do plan to to work on KNL/AVX512 support for RAxML-NG. However, please note that simple re-compilation usually
yields quite poor performance. Therefore, we don't recommend it and don't provide any instructions. Rather, a new set of
AVX512-vectorized likelihood kernels has to be implemented in libpll.


> That said, of course it would also be interesting to hear about whether "traditional" raxml supports AVX512 yet as well.

There will be no AVX512 support for "old" RAxML, but we might release AVX512-enabled version of ExaML at some point
(depending on how easy it will be be to port the existing Xeon Phi KNC vectorization to KNL).

Best,
Alexey

On 24.04.2017 06:42, Gabe wrote:
> Ah, I should clarify I'm talking about raxml-ng (I wasn't aware the wiki link took me to the general group).
>
>
> Cheers!
>
> On Sunday, April 23, 2017 at 11:41:28 PM UTC-5, Gabe wrote:
>
> Are there plans in the works to enable easy building for Xeon Phi processors (not talking about the old
> co-processors; I know that's a separate project)? I ask this from a user compilation standpoint, as I'm sure you're
> probably well aware of the tremendous performance gain possible with the new AVX512 instructions on Phi/Skylake-Xeon. :)
>
> Are there build instructions yet? I see we will probably be needing "Cmake" now, for instance, so things look a
> little different from before. I'm assuming it still works to swap in the intel compiler (icc / icpc)?
>
> Cheers,
> Gabe
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Gabe Al-Ghalith

unread,
Apr 24, 2017, 7:55:36 PM4/24/17
to ra...@googlegroups.com
Awesome, thanks for the response. 

Having done some AVX512 work myself, I can promise it's far easier (at least for my case) to port from AVX2 up to AVX512 than to rework old offloadable KNC code. Fundamentally, KNL is a normal processor compatible with all existing instructions (and I'm using a KNL system to write this post). My KNL system runs the current AVX2 version of RAxML just fine (quite fast, actually), but the more instructions that can be replaced with "real" AVX512, the less the "legacy" performance penalty. 

This is of course more true of the recently debuted Skylake Xeon and the mainstream Cannonlake processors coming later this year, which will also have AVX512 and will be able to use the exact same binary used for KNL (as long as you don't use the reciprocal/transcendental instructions unique to the latter, or the BW/VL extensions unique to the former). 

My experience tells me you're totally right about the bang for the buck -- just using the normal AVX2 version with an AVX512-supporting libpll will probably come pretty close, assuming most of the time is spent in the ML calculation code. 


To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com
<mailto:raxml+unsubscribe@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.

Alexandros Stamatakis

unread,
Apr 25, 2017, 3:22:06 PM4/25/17
to ra...@googlegroups.com
Dear Gabe,

> Having done some AVX512 work myself, I can promise it's far easier (at
> least for my case) to port from AVX2 up to AVX512 than to rework old
> offloadable KNC code. Fundamentally, KNL is a normal processor
> compatible with all existing instructions (and I'm using a KNL system to
> write this post). My KNL system runs the current AVX2 version of RAxML
> just fine (quite fast, actually), but the more instructions that can be
> replaced with "real" AVX512, the less the "legacy" performance penalty.

Evidently.

> This is of course more true of the recently debuted Skylake Xeon and the
> mainstream Cannonlake processors coming later this year, which will also
> have AVX512

We know :-)

> and will be able to use the exact same binary used for KNL
> (as long as you don't use the reciprocal/transcendental instructions
> unique to the latter, or the BW/VL extensions unique to the former).
>
> My experience tells me you're totally right about the bang for the buck
> -- just using the normal AVX2 version with an AVX512-supporting libpll
> will probably come pretty close, assuming most of the time is spent in
> the ML calculation code.

90-95% of the time is spent in the ML calculations, it's definitely on
our list of things we want to look at, of course, you might also want to
consider contributing AVX512 coe yourself.

Alexis

>
>
> On Mon, Apr 24, 2017 at 2:36 AM, Alexey Kozlov <alexei...@gmail.com
> <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> yes, we do plan to to work on KNL/AVX512 support for RAxML-NG.
> However, please note that simple re-compilation usually yields quite
> poor performance. Therefore, we don't recommend it and don't provide
> any instructions. Rather, a new set of AVX512-vectorized likelihood
> kernels has to be implemented in libpll.
>
>
> > That said, of course it would also be interesting to hear about whether "traditional" raxml supports AVX512 yet as well.
>
> There will be no AVX512 support for "old" RAxML, but we might
> release AVX512-enabled version of ExaML at some point (depending on
> how easy it will be be to port the existing Xeon Phi KNC
> vectorization to KNL).
>
> Best,
> Alexey
>
> On 24.04.2017 06 <tel:24.04.2017%2006>:42, Gabe wrote:
>
> Ah, I should clarify I'm talking about raxml-ng (I wasn't aware
> the wiki link took me to the general group).
>
>
> Cheers!
>
> On Sunday, April 23, 2017 at 11:41:28 PM UTC-5, Gabe wrote:
>
> Are there plans in the works to enable easy building for
> Xeon Phi processors (not talking about the old
> co-processors; I know that's a separate project)? I ask this
> from a user compilation standpoint, as I'm sure you're
> probably well aware of the tremendous performance gain
> possible with the new AVX512 instructions on Phi/Skylake-Xeon. :)
>
> Are there build instructions yet? I see we will probably be
> needing "Cmake" now, for instance, so things look a
> little different from before. I'm assuming it still works to
> swap in the intel compiler (icc / icpc)?
>
> Cheers,
> Gabe
>
> --
> You received this message because you are subscribed to the
> Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to a topic in
> the Google Groups "raxml" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to
> raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Alexey Kozlov

unread,
Apr 27, 2017, 12:48:55 PM4/27/17
to ra...@googlegroups.com
Hi Gabe,

> Having done some AVX512 work myself, I can promise it's far easier (at least for my case) to port from AVX2 up to AVX512
> than to rework old offloadable KNC code. Fundamentally, KNL is a normal processor compatible with all existing
> instructions (and I'm using a KNL system to write this post). My KNL system runs the current AVX2 version of RAxML just
> fine (quite fast, actually), but the more instructions that can be replaced with "real" AVX512, the less the "legacy"
> performance penalty.

In fact, I also used native mode on KNC, since offloading was just way too inefficient in our use case. Anyway, we
recently got access to a KNL test system, and today I re-compiled my old KNC code (ExaML) there. It went surprisingly
smoothly: I just had to change the Makefile and substitute one single intrinsic which isn't there anymore! Surely, it
could be further optimized for KNL, but even first quick benchmark shows why re-compiling AVX/AVX2 code on KNL is not a
good idea:

OMP_NUM_THREADS=1

examl-AVX: Overall accumulated Time (in case of restarts): 144.378619
examl-AVX512:Overall accumulated Time (in case of restarts): 53.270765

OMP_NUM_THREADS=4

examl-AVX:Overall accumulated Time (in case of restarts): 44.922629
examl-AVX512:Overall accumulated Time (in case of restarts): 25.614163

If you (or anybody else on the group) want to try it out yourself, here is the code:

https://github.com/amkozlov/ExaML

(just use Makefile.KNL.icc to compile the KNL binary).

Best,
Alexey


>
> This is of course more true of the recently debuted Skylake Xeon and the mainstream Cannonlake processors coming later
> this year, which will also have AVX512 and will be able to use the exact same binary used for KNL (as long as you don't
> use the reciprocal/transcendental instructions unique to the latter, or the BW/VL extensions unique to the former).
>
> My experience tells me you're totally right about the bang for the buck -- just using the normal AVX2 version with an
> AVX512-supporting libpll will probably come pretty close, assuming most of the time is spent in the ML calculation code.
>
>
> On Mon, Apr 24, 2017 at 2:36 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> yes, we do plan to to work on KNL/AVX512 support for RAxML-NG. However, please note that simple re-compilation
> usually yields quite poor performance. Therefore, we don't recommend it and don't provide any instructions. Rather,
> a new set of AVX512-vectorized likelihood kernels has to be implemented in libpll.
>
>
> > That said, of course it would also be interesting to hear about whether "traditional" raxml supports AVX512 yet as well.
>
> There will be no AVX512 support for "old" RAxML, but we might release AVX512-enabled version of ExaML at some point
> (depending on how easy it will be be to port the existing Xeon Phi KNC vectorization to KNL).
>
> Best,
> Alexey
>
> On 24.04.2017 06 <tel:24.04.2017%2006>:42, Gabe wrote:
>
> Ah, I should clarify I'm talking about raxml-ng (I wasn't aware the wiki link took me to the general group).
>
>
> Cheers!
>
> On Sunday, April 23, 2017 at 11:41:28 PM UTC-5, Gabe wrote:
>
> Are there plans in the works to enable easy building for Xeon Phi processors (not talking about the old
> co-processors; I know that's a separate project)? I ask this from a user compilation standpoint, as I'm sure
> you're
> probably well aware of the tremendous performance gain possible with the new AVX512 instructions on
> Phi/Skylake-Xeon. :)
>
> Are there build instructions yet? I see we will probably be needing "Cmake" now, for instance, so things look a
> little different from before. I'm assuming it still works to swap in the intel compiler (icc / icpc)?
>
> Cheers,
> Gabe
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.

Gabe Al-Ghalith

unread,
Apr 27, 2017, 1:00:07 PM4/27/17
to ra...@googlegroups.com
Beautiful. This is fantastic news. Let me know if you need access to another (shared) KNL node (Phi 7250). 

I'm currently running your provided binary of raxml-ng on the knl node (no recompiling). It's fast but clearly crippled by the legacy vex instructions. AVX512 and KNC-NI share a bunch of instructions, so it's not entirely surprising some native KNC code compiles decently for KNL -- how different is the ExaML ML code structured compared to the libpll stuff? Copy-paste-able? ;-)

Thanks a kiloton,
Gabe


        For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.


    --
    You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
    To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
    <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.

    To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com
    <mailto:raxml%2Bunsubscribe@googlegroups.com>.



--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com
<mailto:raxml+unsubscribe@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.

Alexandros Stamatakis

unread,
Apr 28, 2017, 6:19:50 AM4/28/17
to ra...@googlegroups.com
Alexey,

Cool results :-)

> Beautiful. This is fantastic news. Let me know if you need access to
> another (shared) KNL node (Phi 7250).
>
> I'm currently running your provided binary of raxml-ng on the knl node
> (no recompiling). It's fast but clearly crippled by the legacy vex
> instructions. AVX512 and KNC-NI share a bunch of instructions, so it's
> not entirely surprising some native KNC code compiles decently for KNL
> -- how different is the ExaML ML code structured compared to the libpll
> stuff? Copy-paste-able? ;-)

It's pretty different, you can have a look at the respective github repos:

https://github.com/amkozlov/ExaML

versus:

https://github.com/xflouris/libpll

Alexis

>
> Thanks a kiloton,
> Gabe
>
> On Thu, Apr 27, 2017 at 11:48 AM, Alexey Kozlov <alexei...@gmail.com
> <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> https://github.com/amkozlov/ExaML <https://github.com/amkozlov/ExaML>
>
> (just use Makefile.KNL.icc to compile the KNL binary).
>
> Best,
> Alexey
>
>
>
> This is of course more true of the recently debuted Skylake Xeon
> and the mainstream Cannonlake processors coming later
> this year, which will also have AVX512 and will be able to use
> the exact same binary used for KNL (as long as you don't
> use the reciprocal/transcendental instructions unique to the
> latter, or the BW/VL extensions unique to the former).
>
> My experience tells me you're totally right about the bang for
> the buck -- just using the normal AVX2 version with an
> AVX512-supporting libpll will probably come pretty close,
> assuming most of the time is spent in the ML calculation code.
>
>
> On Mon, Apr 24, 2017 at 2:36 AM, Alexey Kozlov
> <alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com
> raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml%2Bunsu...@googlegroups.com
> <mailto:raxml%252Buns...@googlegroups.com>>
> <mailto:raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml%2Bunsu...@googlegroups.com
> <mailto:raxml%252Buns...@googlegroups.com>>>.
> For more options, visit
> https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>>.
>
>
> --
> You received this message because you are subscribed to a
> topic in the Google Groups "raxml" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>
>
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>>.
> To unsubscribe from this group and all its topics, send an
> email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml%2Bunsu...@googlegroups.com
> <mailto:raxml%252Buns...@googlegroups.com>>.
> <https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>>.
>
>
> --
> You received this message because you are subscribed to the
> Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from
> <mailto:raxml%2Bunsu...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to a topic in
> the Google Groups "raxml" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to
> raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--

Alexey Kozlov

unread,
Apr 28, 2017, 8:22:13 AM4/28/17
to ra...@googlegroups.com
> Beautiful. This is fantastic news. Let me know if you need access to another (shared) KNL node (Phi 7250).

Thank you, we have 7210 and it's fine for development, but I might ask you to run some speed tests later on...

> I'm currently running your provided binary of raxml-ng on the knl node (no recompiling). It's fast but clearly crippled
> by the legacy vex instructions. AVX512 and KNC-NI share a bunch of instructions, so it's not entirely surprising some
> native KNC code compiles decently for KNL --

One more thing I just discovered is that KNL uses (low-bandwidth) DDR memory by default, i.e. in order to fully emulate
the native mode of KNC and place the whole program in MCDRAM, one has to use numactl:

numactl --membind=1 ../examl/examl-KNL

(I think you're aware of this, but just in case - on a large dataset, I observed 5x runtime improvement after switching
to MCDRAM!)

>how different is the ExaML ML code structured compared to the libpll stuff?
> Copy-paste-able? ;-)

It's not that easy, mainly because vectorization in ExaML kernels was tailored for GAMMA with 4 categories, whereas
libpll (and raxml-ng) are much more flexible (GAMMA/FreeRate with arbitrary number of categories). But as I said, KNL
kernels are on our todo list.

Best,
Alexey

> On Thu, Apr 27, 2017 at 11:48 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> Having done some AVX512 work myself, I can promise it's far easier (at least for my case) to port from AVX2 up
> to AVX512
> than to rework old offloadable KNC code. Fundamentally, KNL is a normal processor compatible with all existing
> instructions (and I'm using a KNL system to write this post). My KNL system runs the current AVX2 version of
> RAxML just
> fine (quite fast, actually), but the more instructions that can be replaced with "real" AVX512, the less the
> "legacy"
> performance penalty.
>
>
> In fact, I also used native mode on KNC, since offloading was just way too inefficient in our use case. Anyway, we
> recently got access to a KNL test system, and today I re-compiled my old KNC code (ExaML) there. It went
> surprisingly smoothly: I just had to change the Makefile and substitute one single intrinsic which isn't there
> anymore! Surely, it could be further optimized for KNL, but even first quick benchmark shows why re-compiling
> AVX/AVX2 code on KNL is not a good idea:
>
> OMP_NUM_THREADS=1
>
> examl-AVX: Overall accumulated Time (in case of restarts): 144.378619
> examl-AVX512:Overall accumulated Time (in case of restarts): 53.270765
>
> OMP_NUM_THREADS=4
>
> examl-AVX:Overall accumulated Time (in case of restarts): 44.922629
> examl-AVX512:Overall accumulated Time (in case of restarts): 25.614163
>
> If you (or anybody else on the group) want to try it out yourself, here is the code:
>
> https://github.com/amkozlov/ExaML <https://github.com/amkozlov/ExaML>
> On 24.04.2017 06 <tel:24.04.2017%2006> <tel:24.04.2017%2006>:42, Gabe wrote:
>
> Ah, I should clarify I'm talking about raxml-ng (I wasn't aware the wiki link took me to the general group).
>
>
> Cheers!
>
> On Sunday, April 23, 2017 at 11:41:28 PM UTC-5, Gabe wrote:
>
> Are there plans in the works to enable easy building for Xeon Phi processors (not talking about the old
> co-processors; I know that's a separate project)? I ask this from a user compilation standpoint, as
> I'm sure
> you're
> probably well aware of the tremendous performance gain possible with the new AVX512 instructions on
> Phi/Skylake-Xeon. :)
>
> Are there build instructions yet? I see we will probably be needing "Cmake" now, for instance, so
> things look a
> little different from before. I'm assuming it still works to swap in the intel compiler (icc / icpc)?
>
> Cheers,
> Gabe
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>>>.
> To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>>.
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.

Gabe Al-Ghalith

unread,
Apr 28, 2017, 2:24:12 PM4/28/17
to ra...@googlegroups.com
Thanks, sure, I'd be happy to test. 

I have set my rig's MCDRAM in cache configuration; yours appears to be configured in flat mode (where the MCDRAM is exposed as a separate NUMA module). I find that cache mode is almost as good as binding directly to the MCDRAM and requires no intervention or extra commands. 

Another alternative is to use memkind/hbwmalloc, where you can programatically choose which data goes to which memory. It's transparent and works on other systems (just doesn't do anything extra if those systems don't have HBM of some sort). 

Looks like "ng" truly is next generation in more ways than speed and user-friendliness! What would be the most equivalent ExaML command to NG's
./raxml-ng --msa testDNA.fa --model GTR+G --prefix MyTree --threads 32
(Because that would be awesome; what drove me to try NG was how incredibly easy it was to download and run!) I ask because there appear to be some subtleties that differ in the default options for both programs and I'd ideally like to compare the runtimes/outputs. 

(something like the 3-step: ./parse­-examl ­-s ../testData/49 -­m DNA ­-n binaryAlignment && raxmlHPC­AVX -­y -­d -­m GTRCAT -­p 71264 ­-s ../testData/49 ­-n RandomStartingTree && OMP_NUM_THREADS=32 ./examl­AVX -­t RandomStartingTree -­m GAMMA  -s binaryAlignment -­n T1 )

Thanks!
Gabe


                raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
        <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                <mailto:raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
        <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>>.

                For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
        <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.


            --
            You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
            To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>
            <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>>.
            To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com
        <mailto:raxml%2Bunsubscribe@googlegroups.com>
            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>.

            For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
        <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.


        --
        You received this message because you are subscribed to the Google Groups "raxml" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to

        For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.


    --
    You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
    To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
    <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
    To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com
    <mailto:raxml%2Bunsubscribe@googlegroups.com>.

    For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.


--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com
<mailto:raxml+unsubscribe@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.

Alexey Kozlov

unread,
Apr 28, 2017, 4:21:39 PM4/28/17
to ra...@googlegroups.com
Hi Gabe,

> I have set my rig's MCDRAM in cache configuration; yours appears to be configured in flat mode (where the MCDRAM is
> exposed as a separate NUMA module). I find that cache mode is almost as good as binding directly to the MCDRAM and
> requires no intervention or extra commands.

That's good news!

> Another alternative is to use memkind/hbwmalloc, where you can programatically choose which data goes to which memory.
> It's transparent and works on other systems (just doesn't do anything extra if those systems don't have HBM of some sort).

yes, but this will require code modifications and also some thinking about which parts to place in HBM. And since in
raxml we don't have any obvious memory access patterns, I doubt this will work better than cache mode (at least without
investing much time in profiling and re-design)

> Looks like "ng" truly is next generation in more ways than speed and user-friendliness! What would be the most
> equivalent ExaML command to NG's
>
> ./raxml-ng --msa testDNA.fa --model GTR+G --prefix MyTree --threads 32
>
> (Because that would be awesome; what drove me to try NG was how incredibly easy it was to download and run!)

Thanks, this is exactly what I wanted raxml-ng to be :)

> I ask
> because there appear to be some subtleties that differ in the default options for both programs and I'd ideally like to
> compare the runtimes/outputs.

> (something like the 3-step: ./parse­-examl ­-s ../testData/49 -­m DNA ­-n binaryAlignment && raxmlHPC­AVX -­y -­d
> -­m GTRCAT -­p 71264 ­-s ../testData/49 ­-n RandomStartingTree && OMP_NUM_THREADS=32 ./examl­AVX -­t RandomStartingTree
> -­m GAMMA -s binaryAlignment -­n T1 )

You're right, defaults are slightly different, so for the sake of comparison, I'd do the following:

1. In RAxML-NG, "GTR+G" means GTR+G+FO (ML estimate of base frequencies), which is equivalent to GTRGAMMAX / DNAX in
RAxML/ExaML. So you could either change it to "GTR+G+F" (empirical frequencies), or create a partition file containing
"DNAX, p1 = <start>-<end>" and then run:

./parse­-examl ­-s ../testData/49 -­m DNA ­-n binaryAlignment -q myPartitionFile.txt

(see ExaML manual for details)

2. To make sure that both programs are given the same starting tree, please use raxml-ng starting tree
(MyTree.raxml.startTree) for ExaML

3. "examl-AVX" has MPI parallelization only, so you must run it as

mpirun -n 32 ./examl­-AVX -­t MyTree.raxml.startTree -­m GAMMA -s binaryAlignment.binary -­n T1

or alternatively compile the hybrid MPI/OpenMP version using Makefile.OMP.AVX.gcc, and then run

OMP_NUM_THREADS=32 ./examl­-OMP-AVX -­t MyTree.raxml.startTree -­m GAMMA -s binaryAlignment.binary -­n T1

Finally, examl-KNL is also hybrid and can be started without MPI.


Hope this helps,
Alexey


> On Fri, Apr 28, 2017 at 7:22 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> Beautiful. This is fantastic news. Let me know if you need access to another (shared) KNL node (Phi 7250).
>
>
> Thank you, we have 7210 and it's fine for development, but I might ask you to run some speed tests later on...
>
> I'm currently running your provided binary of raxml-ng on the knl node (no recompiling). It's fast but clearly
> crippled
> by the legacy vex instructions. AVX512 and KNC-NI share a bunch of instructions, so it's not entirely surprising
> some
> native KNC code compiles decently for KNL --
>
>
> One more thing I just discovered is that KNL uses (low-bandwidth) DDR memory by default, i.e. in order to fully
> emulate the native mode of KNC and place the whole program in MCDRAM, one has to use numactl:
>
> numactl --membind=1 ../examl/examl-KNL
>
> (I think you're aware of this, but just in case - on a large dataset, I observed 5x runtime improvement after
> switching to MCDRAM!)
>
> how different is the ExaML ML code structured compared to the libpll stuff?
> Copy-paste-able? ;-)
>
>
> It's not that easy, mainly because vectorization in ExaML kernels was tailored for GAMMA with 4 categories, whereas
> libpll (and raxml-ng) are much more flexible (GAMMA/FreeRate with arbitrary number of categories). But as I said,
> KNL kernels are on our todo list.
>
> Best,
> Alexey
>
> On Thu, Apr 27, 2017 at 11:48 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> wrote:
>
> Hi Gabe,
>
> Having done some AVX512 work myself, I can promise it's far easier (at least for my case) to port from
> AVX2 up
> to AVX512
> than to rework old offloadable KNC code. Fundamentally, KNL is a normal processor compatible with all
> existing
> instructions (and I'm using a KNL system to write this post). My KNL system runs the current AVX2 version of
> RAxML just
> fine (quite fast, actually), but the more instructions that can be replaced with "real" AVX512, the less the
> "legacy"
> performance penalty.
>
>
> In fact, I also used native mode on KNC, since offloading was just way too inefficient in our use case.
> Anyway, we
> recently got access to a KNL test system, and today I re-compiled my old KNC code (ExaML) there. It went
> surprisingly smoothly: I just had to change the Makefile and substitute one single intrinsic which isn't there
> anymore! Surely, it could be further optimized for KNL, but even first quick benchmark shows why re-compiling
> AVX/AVX2 code on KNL is not a good idea:
>
> OMP_NUM_THREADS=1
>
> examl-AVX: Overall accumulated Time (in case of restarts): 144.378619
> examl-AVX512:Overall accumulated Time (in case of restarts): 53.270765
>
> OMP_NUM_THREADS=4
>
> examl-AVX:Overall accumulated Time (in case of restarts): 44.922629
> examl-AVX512:Overall accumulated Time (in case of restarts): 25.614163
>
> If you (or anybody else on the group) want to try it out yourself, here is the code:
>
> https://github.com/amkozlov/ExaML <https://github.com/amkozlov/ExaML> <https://github.com/amkozlov/ExaML
> On 24.04.2017 06 <tel:24.04.2017%2006> <tel:24.04.2017%2006> <tel:24.04.2017%2006>:42, Gabe wrote:
>
> Ah, I should clarify I'm talking about raxml-ng (I wasn't aware the wiki link took me to the
> general group).
>
>
> Cheers!
>
> On Sunday, April 23, 2017 at 11:41:28 PM UTC-5, Gabe wrote:
>
> Are there plans in the works to enable easy building for Xeon Phi processors (not talking
> about the old
> co-processors; I know that's a separate project)? I ask this from a user compilation
> standpoint, as
> I'm sure
> you're
> probably well aware of the tremendous performance gain possible with the new AVX512
> instructions on
> Phi/Skylake-Xeon. :)
>
> Are there build instructions yet? I see we will probably be needing "Cmake" now, for
> instance, so
> things look a
> little different from before. I'm assuming it still works to swap in the intel compiler (icc
> / icpc)?
>
> Cheers,
> Gabe
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>>
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>>>.
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>>.
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>>>.
> To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.

Gabe Al-Ghalith

unread,
Apr 30, 2017, 7:18:44 PM4/30/17
to ra...@googlegroups.com
Hello again!

A few comments so far:
- Got it to compile (I had to modify the beginning of the Makefile to append "-cc=icc" to the CC line so it would use the intel compiler, 2017.2)
- Got the parser eventually accept the alignment file (apparently it needs the phyllip version, not the fna I typically use, and it is much more restrictive than NG in that it disallows many characters such as '[' and ',' while NG didn't complain). I modified the NG starting tree so that the names were valid for ExaML, as well as the phyllip file produced by NG so they matched up properly.
- I got it running with the "DNAX, p1 = <start>-<end>" trick (although it didn't recognize the wildcard "<start>" and "<end>" which I substituted with the start and end position of my alignments, minus 1 (it starts counting positions at 0, correct?)
- Without specifying any OMP_NUM_THREADS environmental variable, it fires up 272 threads (without mpirun) and starts churning, but it has been hanging (?) at "Memory Saving Option: DISABLED" for about an hour with no output files or status messages since. Apparently the alignment has "7793 distinct alignment patterns" and originally contained 5398 sequences (about 100 are duplicates by design, as these DNA fragments are fungal ITS regions, some of which are expected to be the same for extremely closely related species). Could this be the problem, or does ExaML just like to churn for awhile before reporting any status?
- It seems the parameters are correct: "GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter" (although it seems heterogeneity is misspelled? nitpick, sorry)

Cheers,
Gabe



                        raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
        <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>

                        <mailto:raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
        <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>

                        For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
        <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
                <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
        <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>.


                    --
                    You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
                    To unsubscribe from this topic, visit
        https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>
                <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>>
                    <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>
                <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>>>.
                    To unsubscribe from this group and all its topics, send an email to
                    <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
        <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>>.

                    For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
        <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
                <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
        <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>.


                --
                You received this message because you are subscribed to the Google Groups "raxml" group.
                To unsubscribe from this group and stop receiving emails from it, send an email to
                raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
        <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                <mailto:raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>

                For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
        <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.


            --
            You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
            To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>
            <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>>.
            To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com
        <mailto:raxml%2Bunsubscribe@googlegroups.com>
            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>.

            For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
        <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.


        --
        You received this message because you are subscribed to the Google Groups "raxml" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to

        For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.


    --
    You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
    To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
    <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
    To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com
    <mailto:raxml%2Bunsubscribe@googlegroups.com>.

    For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.


--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com
<mailto:raxml+unsubscribe@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.

Gabe Al-Ghalith

unread,
Apr 30, 2017, 7:43:17 PM4/30/17
to ra...@googlegroups.com
I spoke too soon; files have appeared!
1067.153153 -2099104.613066
1186.894734 -1833461.732698
1515.283911 -1609489.063102
(The first value looks like the time, and second like the current logLk at that step.)

Another thing I've noticed is that the radius seemed to stop at 25 by default on the NG build, even though LogLk's were decreasing substantially between 20 and 25 -- should this be tweaked by the user? For background, making trees out of fungal ITS regions is notoriously difficult, but arguably necessary given the substantial number of fungi that only have this small region sequenced. For further background, quality of MSA seems to matter quite a bit for the logLk scores. Making trees out of hastier MSA's seems to produce trees with a lower logLk. In your experience, how much does the MSA really matter for tree quality?

Alexey Kozlov

unread,
May 1, 2017, 10:21:49 AM5/1/17
to ra...@googlegroups.com
Hi Gabe,

thanks again for testing & reporting back! Please see my answers below:

> A few comments so far:
> - Got it to compile (I had to modify the beginning of the Makefile to append "-cc=icc" to the CC line so it would
> use the intel compiler, 2017.2)

OK.

> - Got the parser eventually accept the alignment file (apparently it needs the phyllip version, not the fna I
> typically use, and it is much more restrictive than NG in that it disallows many characters such as '[' and ','
> while NG didn't complain). I modified the NG starting tree so that the names were valid for ExaML, as well as the
> phyllip file produced by NG so they matched up properly.

Well, it's just because input validation in NG is not as sophisticated yet :) I'd avoid using '[' and ',' in taxa names,
since those symbols have special meaning in Newick tree format, and thus can lead to problems at least with some
viewers/programs.

> - I got it running with the "DNAX, p1 = <start>-<end>" trick (although it didn't recognize the wildcard "<start>"
> and "<end>" which I substituted with the start and end position of my alignments, minus 1 (it starts counting
> positions at 0, correct?)

Sorry if it wasn't clear, but you were supposed to substitute "<start>-<end>" with actual start/end values, and counting
start from 1 (please see p.5 of ExaML manual here: https://github.com/amkozlov/ExaML/blob/master/manual/ExaML.pdf)

> - Without specifying any OMP_NUM_THREADS environmental variable, it fires up 272 threads (without mpirun) and starts
> churning, but it has been hanging (?) at "Memory Saving Option: DISABLED" for about an hour with no output files or
> status messages since. Apparently the alignment has "7793 distinct alignment patterns" and originally contained 5398
> sequences (about 100 are duplicates by design, as these DNA fragments are fungal ITS regions, some of which are
> expected to be the same for extremely closely related species). Could this be the problem, or does ExaML just like
> to churn for awhile before reporting any status?

duplicates are not a problem. However, running a single-gene alignment with 272 threads IS problematic, since you need
at least ~1000 patterns/thread to parallelize efficiently, and if you assign <100 (as in your case), then it will most
probably lead to a slowdown (again, please read the section "How many cores shall I use?" in ExaML manual). Moreover, in
my experience hyper-threading was never efficient with RAxML/ExaML, and in fact using those additional "fake" cores
could result in a slowdown, since you assign less patterns/core and sync/communication overhead will grow. So on your
KNL card, I'd use <=136 threads, even with larger alignments.

> - It seems the parameters are correct: "GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter" (although
> it seems heterogeneity is misspelled? nitpick, sorry)

yes, thanks :)

> I spoke too soon; files have appeared!
> 1067.153153 -2099104.613066
> 1186.894734 -1833461.732698
> 1515.283911 -1609489.063102
> (The first value looks like the time, and second like the current logLk at that step.)

Exactly.

> Another thing I've noticed is that the radius seemed to stop at 25 by default on the NG build, even though LogLk's were
> decreasing substantially between 20 and 25 -- should this be tweaked by the user?

You can specify a user-defined radius with "-i" (ExaML) or resp. "--spr-radius" (NG). If you set an even larger value,
each fast SPR round will take longer, but it is really hard to tell, whether it will improve the final tree LH and/or
the overall runtime.

> For background, making trees out of
> fungal ITS regions is notoriously difficult, but arguably necessary given the substantial number of fungi that only have
> this small region sequenced. For further background, quality of MSA seems to matter quite a bit for the logLk scores.
> Making trees out of hastier MSA's seems to produce trees with a lower logLk. In your experience, how much does the MSA
> really matter for tree quality?

Well, intuitively it should, since we rely on homology to compute likelihoods, so the general "garbage in, garbage out"
rule works here as well.

Best,
Alexey



>
> Cheers,
> Gabe
>
>
>
>
> On Fri, Apr 28, 2017 at 3:21 PM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>>>> wrote:
>
> Hi Gabe,
>
> <https://github.com/amkozlov/ExaML <https://github.com/amkozlov/ExaML>> <https://github.com/amkozlov/ExaML
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bu...@googlegroups.com>>>>
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bu...@googlegroups.com>>>>>.
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bu...@googlegroups.com>>>>.
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>>>.
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>>>.
> To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.

Gabe Al-Ghalith

unread,
May 1, 2017, 1:20:17 PM5/1/17
to ra...@googlegroups.com
Thanks for the advice. On the single-cpu KNL desktop (even with the deleterious 4-way multithreading), it ran much more quickly than on the quad-xeon 32-core Ivy Bridge server (with 32 threads, no HT), but reached a worse log likelihood from the same starting tree: 

There's a surprising amount of insight in the time-ML logs, it seems. We can see how the two algorithms push the hilltops and expand their searches (although the radius for ExaML is not printed).

Below you can see NG reaching a comfortable divet in about 16 hours (then I killed it):
Starting ML tree search with 1 distinct starting trees

[00:02:09 -2458041.743206] Initial branch length optimization
[00:02:44 -2175452.274031] Model parameter optimization (eps = 10.000000)
[00:06:20 -2101648.301490] AUTODETECT spr round 1 (radius: 5)
[00:39:32 -1823552.755028] AUTODETECT spr round 2 (radius: 10)
[01:26:17 -1451834.252208] AUTODETECT spr round 3 (radius: 15)
[02:15:33 -1145909.795360] AUTODETECT spr round 4 (radius: 20)
[03:11:03 -1028197.793256] AUTODETECT spr round 5 (radius: 25)
[04:14:14 -984117.593924] SPR radius for FAST iterations: 25 (autodetect)
[04:14:22 -984117.593924] Model parameter optimization (eps = 3.000000)
[04:18:59 -979951.696924] FAST spr round 1 (radius: 25)
[06:27:45 -864608.615837] FAST spr round 2 (radius: 25)
[08:07:25 -856546.307481] FAST spr round 3 (radius: 25)
[09:20:12 -855263.950331] FAST spr round 4 (radius: 25)
[10:14:46 -855127.499512] FAST spr round 5 (radius: 25)
[11:05:22 -855080.070291] FAST spr round 6 (radius: 25)
[11:45:14 -855074.392028] FAST spr round 7 (radius: 25)
[12:23:54 -855068.309667] FAST spr round 8 (radius: 25)
[13:00:44 -855065.926148] FAST spr round 9 (radius: 25)
[13:37:10 -855065.925611] Model parameter optimization (eps = 1.000000)
[13:38:47 -855014.438872] SLOW spr round 1 (radius: 5)
[14:19:14 -854746.179193] SLOW spr round 2 (radius: 5)
[15:00:05 -854700.461918] SLOW spr round 3 (radius: 5)
[15:39:22 -854693.769126] SLOW spr round 4 (radius: 5)
[16:18:22 -854693.768524] SLOW spr round 5 (radius: 10)
 
And here you can see the valiant travails of ExaML, converging and finishing after 14 hours but more than 100 logLk units shy of NG:

1067.153153 -2099104.613066
1186.894734 -1833461.732698
1515.283911 -1609489.063102
2293.488008 -1428440.718219
3748.889265 -1317854.284424
6229.704409 -1239284.629117
6713.366999 -1230797.982680
9753.344549 -931286.567399
11637.858771 -865769.157542
12963.348831 -857329.547208
13734.702234 -856019.308206
14261.140395 -855877.405955
14696.852511 -855823.564272
15089.573205 -855814.177273
15443.771503 -855809.399231
15998.555624 -855462.773764
17027.979278 -854931.743235
17994.644583 -854835.746049
18927.482385 -854828.204205
19627.048645 -854828.204205
20651.235819 -854811.410789
21866.230051 -854801.653565
22905.527219 -854800.157731
23665.992272 -854800.157731
24494.689723 -854800.157731
27415.876431 -854799.068488
28491.048574 -854799.068488
29633.280122 -854799.068488
32062.795285 -854799.068488
39262.851686 -854798.328561
40380.791175 -854798.328561
41591.250137 -854798.328561
44015.472189 -854798.328561
51161.821973 -854798.328561

ExaML in its "info" file reports that it also discovered "Best rearrangement radius: 25"

From some other tests, it seems the KNL machine runs the stock NG build (up to AVX2) at less than half the speed of the Xeon server. Would it make sense from a performance-assessment standpoint for me to repeat the ExaML test with 68 OMP threads (1 thread per core), or 136 threads (2 threads per core)? Might it be wiser (given the plots in the manual) that I try 34 MPI ranks with 2-way/4-way OMP? This is all rather exciting.  

                                    raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                    <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
            <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>>
                            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
            <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>
            <mailto:raxml%25252Bunsubscribe...@googlegroups.com <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>>>

                                    <mailto:raxml+unsubscribe@googlegroups.com
            <mailto:raxml%2Bunsubscribe@googlegroups.com> <mailto:raxml%2Bunsubscribe@googlegroups.com
            <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                    <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
            <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>>
                            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
            <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>
            <mailto:raxml%25252Bunsubscribe...@googlegroups.com <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>>>>.
                    raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
            <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>>
                                <mailto:raxml%2Bunsubscribe@googlegroups.com
            <mailto:raxml%252Bunsubscribe@googlegroups.com> <mailto:raxml%252Bunsubscribe@googlegroups.com
            <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>
            <mailto:raxml%25252Bunsubscribe...@googlegroups.com <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>>>.

                                For more options, visit https://groups.google.com/d/optout
            <https://groups.google.com/d/optout> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
                    <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
            <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>
                            <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
            <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
                    <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
            <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>>.


                            --
                            You received this message because you are subscribed to the Google Groups "raxml" group.
                            To unsubscribe from this group and stop receiving emails from it, send an email to
                            raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                    <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
            <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>>
                            <mailto:raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                    <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>

                            For more options, visit https://groups.google.com/d/optout
            <https://groups.google.com/d/optout> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
                    <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
            <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>.


                        --
                        You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
                        To unsubscribe from this topic, visit
            https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
            <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>
                    <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
            <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>>
                        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
            <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>
                    <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
            <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>>>.
                        To unsubscribe from this group and all its topics, send an email to
            raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
                    <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                        <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>

                        For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
            <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
                    <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
            <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>.


                    --
                    You received this message because you are subscribed to the Google Groups "raxml" group.
                    To unsubscribe from this group and stop receiving emails from it, send an email to
                    raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                    <mailto:raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>>.

                    For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
            <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.


                --
                You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
                To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
            <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>
                <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
            <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>>.
                To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com
            <mailto:raxml%2Bunsubscribe@googlegroups.com>
                <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>.

                For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
            <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.


            --
            You received this message because you are subscribed to the Google Groups "raxml" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to

            For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.


        --
        You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
        To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
        <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
        To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com
        <mailto:raxml%2Bunsubscribe@googlegroups.com>.

        For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.



--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com
<mailto:raxml+unsubscribe@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.

Alexey Kozlov

unread,
May 2, 2017, 5:43:43 AM5/2/17
to ra...@googlegroups.com
Hi Gabe,

> Thanks for the advice. On the single-cpu KNL desktop (even with the deleterious 4-way multithreading), it ran much more
> quickly than on the quad-xeon 32-core Ivy Bridge server (with 32 threads, no HT), but reached a worse log likelihood
> from the same starting tree:

i.e. ExaML-KNL and ExaML-AVX converged to different trees? unfortunately, this is possible due to round-off errors and
differences in operation order between AVX/AVX512 kernels and/or runs with different number of threads. datasets like
yours (many taxa, few patterns, identical seqs) are especially susceptible, since they usually have "rough likelihood
surface", i.e. many distinct topologies with only slight differences in likelihood scores.
Please note that trees search hasn't necessarily converged yet, there might be further likelihood improvement with s
larger SPR radius.

> From some other tests, it seems the KNL machine runs the stock NG build (up to AVX2) at less than half the speed of the
> Xeon server. Would it make sense from a performance-assessment standpoint for me to repeat the ExaML test with 68 OMP
> threads (1 thread per core), or 136 threads (2 threads per core)? Might it be wiser (given the plots in the manual) that
> I try 34 MPI ranks with 2-way/4-way OMP? This is all rather exciting.

Yes, it'd be interesting to see the results for 68 threads, my expectation is that it will run faster. Regarding OpenMP
vs. MPI: in KNC, MPI communication was rather inefficient (both intra- and inter-node), that's why I implemented the
hybrid parallelization. In KNL, however, this issue seems to be fixed: in my preliminary tests, I didn't see any
significant difference between runs with all-OMP, all-MPI, and hybrid.

Best,
Alexey

>
> On Mon, May 1, 2017 at 9:21 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> thanks again for testing & reporting back! Please see my answers below:
>
> > A few comments so far:
> > - Got it to compile (I had to modify the beginning of the Makefile to append "-cc=icc" to the CC line so it would
> > use the intel compiler, 2017.2)
>
> OK.
>
> > - Got the parser eventually accept the alignment file (apparently it needs the phyllip version, not the fna I
> > typically use, and it is much more restrictive than NG in that it disallows many characters such as '[' and ','
> > while NG didn't complain). I modified the NG starting tree so that the names were valid for ExaML, as well as the
> > phyllip file produced by NG so they matched up properly.
>
> Well, it's just because input validation in NG is not as sophisticated yet :) I'd avoid using '[' and ',' in taxa
> names, since those symbols have special meaning in Newick tree format, and thus can lead to problems at least with
> some viewers/programs.
>
> > - I got it running with the "DNAX, p1 = <start>-<end>" trick (although it didn't recognize the wildcard "<start>"
> > and "<end>" which I substituted with the start and end position of my alignments, minus 1 (it starts counting
> > positions at 0, correct?)
>
> Sorry if it wasn't clear, but you were supposed to substitute "<start>-<end>" with actual start/end values, and
> counting start from 1 (please see p.5 of ExaML manual here:
> https://github.com/amkozlov/ExaML/blob/master/manual/ExaML.pdf
> <https://github.com/amkozlov/ExaML/blob/master/manual/ExaML.pdf>)
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> wrote:
>
> Hi Gabe,
>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>>> wrote:
>
> Hi Gabe,
>
> <https://github.com/amkozlov/ExaML <https://github.com/amkozlov/ExaML>>> <https://github.com/amkozlov/ExaML
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bu...@googlegroups.com>
> <mailto:raxml%2525252Bu...@googlegroups.com <mailto:raxml%252525252B...@googlegroups.com>>>>>
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bu...@googlegroups.com>
> <mailto:raxml%2525252Bu...@googlegroups.com <mailto:raxml%252525252B...@googlegroups.com>>>>>>.
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bu...@googlegroups.com>
> <mailto:raxml%2525252Bu...@googlegroups.com <mailto:raxml%252525252B...@googlegroups.com>>>>>.
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bu...@googlegroups.com>>>>>.
> <mailto:raxml%25252Bun...@googlegroups.com <mailto:raxml%2525252Bu...@googlegroups.com>>>>.
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>>>.
> <mailto:raxml%252Buns...@googlegroups.com <mailto:raxml%25252Bun...@googlegroups.com>>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>>>.
> To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml%2Bunsu...@googlegroups.com <mailto:raxml%252Buns...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe
> <https://groups.google.com/d/topic/raxml/jDP-1VB_ILk/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.

Gabe Al-Ghalith

unread,
May 2, 2017, 3:02:25 PM5/2/17
to ra...@googlegroups.com
Apologies for being unclear. I'm still comparing apples to oranges here to figure out what the best option is for my trees at the moment. My objective was to answer this: Is ExaML-AVX512 on KNL comparable in speed and tree quality to RaXML-ng(AVX autodetected) on a quad-Xeon, with comparable settings and same starting tree? So there are two different softwares running on two different systems here, so not directly comparable in runtime (but should be comparable in tree quality, I'd imagine). 

The answer appeared to be that ExaML (with the crazy high thread count) is much faster on the KNL than RaXML-ng on the quad-Xeon (for whatever that's worth), but converged to a (substantially?) worse tree. 

Unless the two algorithms are truly identical between the two programs (rapid hill climbing, ML code, convergence criteria), it might be more than rounding error at play. If so, is it something I can address with different program parameters?

Cheers,
Gabe

                                            raxml+unsubscribe@googlegroups.com
        <mailto:raxml%2Bunsubscribe@googlegroups.com> <mailto:raxml%2Bunsubscribe@googlegroups.com
        <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                    <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
        <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>>
                            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
        <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>>>
                                    <mailto:raxml%2Bunsubscribe@googlegroups.com
        <mailto:raxml%252Bunsubscribe@googlegroups.com> <mailto:raxml%252Bunsubscribe@googlegroups.com
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>>
                            <mailto:raxml%252Bunsubscribe@googlegroups.com
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com> <mailto:raxml%25252Bunsubscribe...@googlegroups.com
        <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>

                                            <mailto:raxml+unsubscribe@googlegroups.com
        <mailto:raxml%2Bunsubscribe@googlegroups.com>
                    <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
        <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>>
                            <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
        <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>>>
                                    <mailto:raxml%2Bunsubscribe@googlegroups.com
        <mailto:raxml%252Bunsubscribe@googlegroups.com> <mailto:raxml%252Bunsubscribe@googlegroups.com
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>>
                            <mailto:raxml%252Bunsubscribe@googlegroups.com
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com> <mailto:raxml%25252Bunsubscribe...@googlegroups.com
        <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>
                            raxml+unsubscribe@googlegroups.com <mailto:raxml%2Bunsubscribe@googlegroups.com>
        <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>>
                    <mailto:raxml%2Bunsubscribe@googlegroups.com <mailto:raxml%252Bunsubscribe@googlegroups.com>
        <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>>
                                    <mailto:raxml%2Bunsubscribe@googlegroups.com
        <mailto:raxml%252Bunsubscribe@googlegroups.com> <mailto:raxml%252Bunsubscribe@googlegroups.com
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>>>
                                        <mailto:raxml%2Bunsubscribe@googlegroups.com
        <mailto:raxml%252Bunsubscribe@googlegroups.com>
                    <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>>
        <mailto:raxml%252Bunsubscribe@googlegroups.com <mailto:raxml%25252Bunsubscribe...@googlegroups.com>
                    <mailto:raxml%25252Bunsubscribe...@googlegroups.com <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>>
                            <mailto:raxml%252Bunsubscribe@googlegroups.com
        <mailto:raxml%25252Bunsubscribe...@googlegroups.com> <mailto:raxml%25252Bunsubscribe...@googlegroups.com
        <mailto:raxml%2525252Bunsubscri...@googlegroups.com>>

Alexey Kozlov

unread,
May 2, 2017, 4:13:30 PM5/2/17
to ra...@googlegroups.com
OK, now it makes sense. Search algorithms in ExaML and NG are very similar, but not identical, and in my tests NG
oftentimes attained better likelihoods than RAxML/ExaML.

More importantly, if you go beyond performance testing and want to actually use the results for something real, you
should run multiple tree searches with distinct starting trees (with either software), and then pick the best-scoring
one. Especially with this dataset, there will be a lot of local optima... The good news is that you can run multiple
tree searches in parallel, by starting multiple ExaML instances - as long as you have enough memory. And since on your
dataset this coarse-grained parallelization will be much more efficient that distributing alignment sites across cores,
I'd use it as much as possible.

On 02.05.2017 21:01, Gabe Al-Ghalith wrote:
> Apologies for being unclear. I'm still comparing apples to oranges here to figure out what the best option is for my
> trees at the moment. My objective was to answer this: Is ExaML-AVX512 on KNL comparable in speed and tree quality to
> RaXML-ng(AVX autodetected) on a quad-Xeon, with comparable settings and same starting tree? So there are two different
> softwares running on two different systems here, so not directly comparable in runtime (but should be comparable in tree
> quality, I'd imagine).
>
> The answer appeared to be that ExaML (with the crazy high thread count) is much faster on the KNL than RaXML-ng on the
> quad-Xeon (for whatever that's worth), but converged to a (substantially?) worse tree.
>
> Unless the two algorithms are truly identical between the two programs (rapid hill climbing, ML code, convergence
> criteria), it might be more than rounding error at play. If so, is it something I can address with different program
> parameters?
>
> Cheers,
> Gabe
>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> wrote:
>
> Hi Gabe,
>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>>>> wrote:
>
> Hi Gabe,
>
> <https://github.com/amkozlov/ExaML <https://github.com/amkozlov/ExaML>>>> <https://github.com/amkozlov/ExaML
> <mailto:raxml%2525252Bu...@googlegroups.com <mailto:raxml%252525252B...@googlegroups.com>
> <mailto:raxml%252525252B...@googlegroups.com <mailto:raxml%25252525252...@googlegroups.com>>>>>>
> <mailto:raxml%2525252Bu...@googlegroups.com <mailto:raxml%252525252B...@googlegroups.com>
> <mailto:raxml%252525252B...@googlegroups.com <mailto:raxml%25252525252...@googlegroups.com>>>>>>>.
> <mailto:raxml%2525252Bu...@googlegroups.com <mailto:raxml%252525252B...@googlegroups.com>
> <mailto:raxml%252525252B...@googlegroups.com <mailto:raxml%25252525252...@googlegroups.com>>>>>>.
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Gabe Al-Ghalith

unread,
May 2, 2017, 5:33:38 PM5/2/17
to ra...@googlegroups.com
Thanks for the clear advice. The manual for RaXML indeed mentions this, and there seem to be options for producing different "types" of starting trees (parsimony trees, which apparently may not be distinct sometimes and must be double-checked; and "random" trees). In my case, would you lean toward a specific recommendation (i.e. 10 starting trees, half random and half parsimony), or would my particularly "rough likelihood surface" be better suited by other combinations? 

For running with multiple starting trees, should I treat each tree-finding process as totally independent and run separate instances that don't communicate with each other? I don't know if there's a mode (like with mpi or something) where the program is aware of the other trees being run in parallel (and attempting to hash common subtrees among the various parallel trees and reduce computation time by not re-exploring identical regions among trees). 



Alexey Kozlov

unread,
May 2, 2017, 6:34:51 PM5/2/17
to ra...@googlegroups.com

> Thanks for the clear advice. The manual for RaXML indeed mentions this, and there seem to be options for producing
> different "types" of starting trees (parsimony trees, which apparently may not be distinct sometimes and must be
> double-checked; and "random" trees). In my case, would you lean toward a specific recommendation (i.e. 10 starting
> trees, half random and half parsimony), or would my particularly "rough likelihood surface" be better suited by other
> combinations?

yes, i'd recommend using both random and parsimony, maybe 5/5 for starters and then check how different likelihoods and
topologies are, and then run more if needed. on "good" alignments (# taxa << # sites) this process would quickly
convergence (i.e. no further likelihood improvement after adding more starting trees), but those huge single-gene
datasets are always problematic... so in the worst case you might have to stop the process despite lack of convergence,
and pick the best tree, or build a consensus from multiple trees (e.g. using CONSEL to find the subset of equally good
trees). I don't see a clean solution here, but maybe Alexis (or somebody on the group) can give a better advice based on
his experience.

> For running with multiple starting trees, should I treat each tree-finding process as totally independent and run
> separate instances that don't communicate with each other?

exactly, they are absolutely independent

> I don't know if there's a mode (like with mpi or something)
> where the program is aware of the other trees being run in parallel (and attempting to hash common subtrees among the
> various parallel trees and reduce computation time by not re-exploring identical regions among trees).

there is nothing like this implemented right now, but we discussed similar ideas, so it could be one of the future
directions...
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>>>> wrote:
>
> Hi Gabe,
>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>>> wrote:
>
> Hi Gabe,
>
> <https://github.com/amkozlov/ExaML <https://github.com/amkozlov/ExaML>>>>> <https://github.com/amkozlov/ExaML
> <mailto:raxml%252525252B...@googlegroups.com
> <mailto:raxml%25252525252...@googlegroups.com> <mailto:raxml%25252525252...@googlegroups.com
> <mailto:raxml%252525252525...@googlegroups.com>>>>>>>
> <mailto:raxml%252525252B...@googlegroups.com
> <mailto:raxml%25252525252...@googlegroups.com> <mailto:raxml%25252525252...@googlegroups.com
> <mailto:raxml%252525252525...@googlegroups.com>>>>>>>>.
> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Gabe Al-Ghalith

unread,
May 4, 2017, 12:51:38 PM5/4/17
to ra...@googlegroups.com
Thanks. I decided to start with the parsimony trees.  I've updated NG to 0.3 and run NG with the options: 
./raxml-ng --msa databases/FUNG_1000msa.msa --tree pars{5} --lh-epsilon 0.5 --spr-radius 30 --prefix NEW_E --threads 32 --model GTR+G

I get one output tree file (which is very big; perhaps the starting trees are concatenated together?), and the software on the Ivy Bridge server has been running for over 24 hours without a single line of output. No new files have been generated, but a ".ckp" file appears to have a recent timestamp. Have status updates been removed in 0.3? How can I extract the trees for use with other programs if they're concatenated? (Split on the ';'?)

Cheers,
Gabe

Alexey Kozlov

unread,
May 4, 2017, 1:13:53 PM5/4/17
to ra...@googlegroups.com
Hi Gabe,

> Thanks. I decided to start with the parsimony trees. I've updated NG to 0.3 and run NG with the options:
> ./raxml-ng --msa databases/FUNG_1000msa.msa --tree pars{5} --lh-epsilon 0.5 --spr-radius 30 --prefix NEW_E --threads 32
> --model GTR+G
>
> I get one output tree file (which is very big; perhaps the starting trees are concatenated together?),

do you mean $PREFIX.raxml.startTree file? then yes, you have all starting trees in one file, one per line

>and the software
> on the Ivy Bridge server has been running for over 24 hours without a single line of output. No new files have been
> generated, but a ".ckp" file appears to have a recent timestamp. Have status updates been removed in 0.3?

No, it's just that if you use multiple starting trees or bootstrapping, then progress updates are off by default to
reduce console output. You can have them back by adding "--log progress" to the command line. Otherwise, you will see
updates after each ML search / bootstrap only.

>How can I
> extract the trees for use with other programs if they're concatenated? (Split on the ';'?)

either that, or just use head / tail to extract a specific tree.

Best,
Alexey


> On Tue, May 2, 2017 at 5:34 PM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
>
> Thanks for the clear advice. The manual for RaXML indeed mentions this, and there seem to be options for producing
> different "types" of starting trees (parsimony trees, which apparently may not be distinct sometimes and must be
> double-checked; and "random" trees). In my case, would you lean toward a specific recommendation (i.e. 10 starting
> trees, half random and half parsimony), or would my particularly "rough likelihood surface" be better suited by
> other
> combinations?
>
>
> yes, i'd recommend using both random and parsimony, maybe 5/5 for starters and then check how different likelihoods
> and topologies are, and then run more if needed. on "good" alignments (# taxa << # sites) this process would quickly
> convergence (i.e. no further likelihood improvement after adding more starting trees), but those huge single-gene
> datasets are always problematic... so in the worst case you might have to stop the process despite lack of
> convergence, and pick the best tree, or build a consensus from multiple trees (e.g. using CONSEL to find the subset
> of equally good trees). I don't see a clean solution here, but maybe Alexis (or somebody on the group) can give a
> better advice based on his experience.
>
> For running with multiple starting trees, should I treat each tree-finding process as totally independent and run
> separate instances that don't communicate with each other?
>
>
> exactly, they are absolutely independent
>
> I don't know if there's a mode (like with mpi or something)
> where the program is aware of the other trees being run in parallel (and attempting to hash common subtrees
> among the
> various parallel trees and reduce computation time by not re-exploring identical regions among trees).
>
>
> there is nothing like this implemented right now, but we discussed similar ideas, so it could be one of the future
> directions...
>
> On Tue, May 2, 2017 at 3:13 PM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>>> wrote:
>
> Hi Gabe,
>
> <https://github.com/amkozlov/ExaML <https://github.com/amkozlov/ExaML>>>>>> <https://github.com/amkozlov/ExaML
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>

Gabe Al-Ghalith

unread,
May 5, 2017, 11:02:36 PM5/5/17
to ra...@googlegroups.com
Sounds good. After about 2 days, NG has output a line saying it has finished with search #1 (and the resulting logLk). But I see no output tree -- any way to extract it from the ckp? I was hoping to compare it to the same tree produced by ExaML on the KNL node, or determine whether or not to run the remaining trees based on the first few logLk. 

Thanks!

Gabe Al-Ghalith

unread,
May 6, 2017, 12:48:10 PM5/6/17
to ra...@googlegroups.com
One more update - with 136 threads (controlled with OMP_NUM_THREADS), the runtime for the first 3 updates on a parsimony starting tree are: 
519.273238 -862987.745704
594.881304 -862977.528101
5459.730584 -855212.687798
And with 68:
432.740910 -862987.745704
496.194180 -862977.528101
4592.904922 -855210.375810
 So we are indeed able to see both phenomena -- different rounding errors (?) and speedup by limiting to the physical number of cores. 

However, I was unable to perform the multi-process parallelism, as there is no "threads" option in ExaML and setting OMP_NUM_THREADS does not work per process or per environment. That is, even if I set the variable locally in 2 separate screen sessions to 34, the total CPU used by both ExaML processes is capped at 34 (roughly 17 each), despite each of them divvying up patterns into bins of 1/34. I doubt this is efficient at all. How can I force each ExaML process to each separately use 34 threads? 

Alexey Kozlov

unread,
May 6, 2017, 6:55:14 PM5/6/17
to ra...@googlegroups.com


> Sounds good. After about 2 days, NG has output a line saying it has finished with search #1 (and the resulting
> logLk). But I see no output tree -- any way to extract it from the ckp? I was hoping to compare it to the same tree
> produced by ExaML on the KNL node, or determine whether or not to run the remaining trees based on the first few
logLk.

all trees are written down in the very end - i agree it might be impractical for large dataset and may change it later
on, but for now I'd suggest to run multiple searches one-by-one (and in parallel)

> So we are indeed able to see both phenomena -- different rounding errors (?) and speedup by limiting to the physical
> number of cores.

OK, but I think that in your case #sites/core is the limiting factor, i.e. with larger alignments/more searches you can
efficiently use up to 136 cores. I should say that Intel made things really confusing with KNC/KNL, since here you have
e.g. 68 physical cores, each of those has 2 logical cores (but those are - unlike in Xeons - still implemented in HW!),
and on top of that you have classical "imaginary" HT cores - that's how you get a total of 68 x 2 x 2 = 272 "cores"
visible to the OS.

> However, I was unable to perform the multi-process parallelism, as there is no "threads" option in ExaML and setting
> OMP_NUM_THREADS does not work per process or per environment. That is, even if I set the variable locally in 2 separate
> screen sessions to 34, the total CPU used by both ExaML processes is capped at 34 (roughly 17 each), despite each of
> them divvying up patterns into bins of 1/34. I doubt this is efficient at all. How can I force each ExaML process to
> each separately use 34 threads?

I guess you are mixing two things here - number of OpenMP threads and pinning of those threads to CPU cores:

- there is no problem at all to set OMP_NUM_THREADS individually per-session or even per-process (it's just a regular
environment variable). But actually it's not even necessary, since you want to run both ExaML instances with 34 threads
each, and that's exactly what happening since

> each of them divvying up patterns into bins of 1/34.

- however, if I got you right, you observe that 68 threads are executed on 34 cores, which is of course not optimal.
you can control thread placement with KMP_AFFINITY or KMP_PLACE_THREADS variables, please see:

https://software.intel.com/en-us/articles/openmp-thread-affinity-control

Hope this helps,
Alexey

>
> On Fri, May 5, 2017 at 10:02 PM, Gabe Al-Ghalith <algh...@umn.edu <mailto:algh...@umn.edu>> wrote:
>
>
> Thanks!
>
> On Thu, May 4, 2017 at 12:13 PM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> Thanks. I decided to start with the parsimony trees. I've updated NG to 0.3 and run NG with the options:
> ./raxml-ng --msa databases/FUNG_1000msa.msa --tree pars{5} --lh-epsilon 0.5 --spr-radius 30 --prefix NEW_E
> --threads 32
> --model GTR+G
>
> I get one output tree file (which is very big; perhaps the starting trees are concatenated together?),
>
> 6x
> do you mean $PREFIX.raxml.startTree file? then yes, you have all starting trees in one file, one per line
>
> and the software
> on the Ivy Bridge server has been running for over 24 hours without a single line of output. No new files
> have been
> generated, but a ".ckp" file appears to have a recent timestamp. Have status updates been removed in 0.3?
>
>
> No, it's just that if you use multiple starting trees or bootstrapping, then progress updates are off by default
> to reduce console output. You can have them back by adding "--log progress" to the command line. Otherwise, you
> will see updates after each ML search / bootstrap only.
>
> How can I
> extract the trees for use with other programs if they're concatenated? (Split on the ';'?)
>
>
> either that, or just use head / tail to extract a specific tree.
>
> Best,
> Alexey
>
>
> On Tue, May 2, 2017 at 5:34 PM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>

Gabe Al-Ghalith

unread,
May 6, 2017, 7:53:09 PM5/6/17
to ra...@googlegroups.com
Thanks. I think the problem is more simply stated as: I give each instance of ExaML 34 cores like so:
screen
OMP_NUM_THREADS=34 ./exaML [params for tree 1]
[exit screen]
screen
OMP_NUM_THREADS=34 ./exaML [params for tree 2]

Both programs are running concurrently in their own environments, but the total CPU usage of the computer actually never exceeds 3400 in top (each ExaML process is hovering around 1700%), and the result is very slow. It does not seem to matter what affinity I set [scatter]. The OMP runtime seems to be globally throttling the number of cores to 34 in total, but each ExaML process thinks its getting 34 individually (which isn't happening). This seems to be the worst of both worlds -- half the cores are unused, and patterns are spread out over twice the area they are actually using. I am not used to dealing with controlling OMP settings and environment variables locally per process (most of the time, I see that programs exert control over their own core distribution like raxml-ng and raxmlHPC which have no issues with this, and as a user I just have to be careful not to oversubscribe available cores). 

The ultimate question here then is, can I run two instances of ExaML, each with 34 cores on the same machine? If so, how? Is there something obvious I'm missing?

Alexey Kozlov

unread,
May 6, 2017, 10:21:43 PM5/6/17
to ra...@googlegroups.com
OK, after some googling and trial&error I finally got it to work, here are the commands you need:

OMP_NUM_THREADS=34 OMP_PROC_BIND=true OMP_PLACES={0:34} ./examl-KNL [params for tree 1]

OMP_NUM_THREADS=34 OMP_PROC_BIND=true OMP_PLACES={34:34} ./examl-KNL [params for tree 2]

you can check thread pinning with:

ps -mo pid,tid,%cpu,psr -p `pgrep examl-KNL`

or just install htop and then you can see it nicely visualized

(it seems like Intel silently dropped "offset" parameter in KMP_PLACE_THREADS, that's why the older solution from the
link I posted doesn't work anymore)

On 07.05.2017 01:52, Gabe Al-Ghalith wrote:
> Thanks. I think the problem is more simply stated as: I give each instance of ExaML 34 cores like so:
> screen
> OMP_NUM_THREADS=34 ./exaML [params for tree 1]
> [exit screen]
> screen
> OMP_NUM_THREADS=34 ./exaML [params for tree 2]
>
> Both programs are running concurrently in their own environments, but the total CPU usage of the computer actually never
> exceeds 3400 in top (each ExaML process is hovering around 1700%), and the result is very slow. It does not seem to
> matter what affinity I set [scatter]. The OMP runtime seems to be globally throttling the number of cores to /34 in
> total/, but each ExaML process thinks its getting /34 individually/ (which isn't happening). This seems to be the worst
> of both worlds -- half the cores are unused, and patterns are spread out over twice the area they are actually using. I
> am not used to dealing with controlling OMP settings and environment variables locally per process (most of the time, I
> see that programs exert control over their own core distribution like raxml-ng and raxmlHPC which have no issues with
> this, and as a user I just have to be careful not to oversubscribe available cores).
>
> The ultimate question here then is, can I run two instances of ExaML, each with 34 cores on the same machine? If so,
> how? Is there something obvious I'm missing?
>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> wrote:
>
> Hi Gabe,
>

Alexey Kozlov

unread,
May 7, 2017, 11:32:49 AM5/7/17
to ra...@googlegroups.com
one more important thing: you have to make sure KMP_AFFINITY is not set (it was on my system), i.e. either run

unset KMP_AFFINITY

beforehand, or prepend each command with "KMP_AFFINITY= ", e.g.

KMP_AFFINITY= OMP_NUM_THREADS=34 OMP_PROC_BIND=true OMP_PLACES={0:34} ./examl-KNL [params for tree 1]

Gabe Al-Ghalith

unread,
May 8, 2017, 1:42:35 PM5/8/17
to ra...@googlegroups.com
Excellent, this works quite well. It also seems to work with other OpenMP programs, which is a fantastic piece of information that should be made more obvious by the OpenMP group, as I imagine this is a fairly typical situation, wanting to run different OpenMP programs on a system with dedicated threads and thread counts. 

Indeed the strategy of running 2 instances of 34 threads was far more efficient than running one at 68. One process at 68 would take 470 seconds, but two at 34 each would take 590 seconds each (but running simultaneously, that's a big win). 

Now that I think I have threading performance figured out, I'm trying to figure out the quality angle. ExaML, for whatever reason, no matter the starting tree seems to always (on 7 test trees thus far) produce final trees that are more than 100 LogLk units worse than NG despite running with "similar" parameters, same starting tree, spr radius of 30 for both, DNAX specified in the partitions file for ExaML, and so forth. But tonight I tried adding two additional options to ExaML: -f o -a (use "old" hill-climbing and "median" logLk). The final tree quality appears to be significantly better: 

Run with -f o -a (68 threads, not finished yet):
471.277203 -862320.069830
...
38452.073583 -852738.974174
   ...[more to come] 

Run without -f o -a (68 threads, completed):
428.704744 -863030.367867
...
59930.121990 -853361.230730

I realized that I did not control the random seed (but I thought that randomization only mattered in making the starting tree or with constraint files?). In all 10 runs I've seen, the lowest LogLk I've seen before this run was with NG, -853218. I think I should probably test whether it is the hill climbing or the median mode that makes the most difference, but thought this was an interesting result nonetheless. 

I should also mention the alignment I'm working from is fairly difficult and gappy: Alignment has 9027 distinct alignment patterns; Proportion of gaps and completely undetermined characters in this alignment: 95.60%. (The actual DNA used was, on average, about 500 bases long only. There were a couple of outliers at ~1500. Visually inspecting the alignment shows many columns are dominated by gaps.) Although I'm worried about the quality of phylogenetic signal and the "garbage in, garbage out" mantra you fittingly stated earlier, but so far the tree performs incredibly well in separating fungal communities by phylogenetic distance (UniFrac), recovering true biological variability in known samples that was impossible before the tree. I wonder if my excitement is evident from my posts?

Anyway, for clarification to others, it appears the benchmarks and performance assessments I'm running here are more relevant to noisy, small, taxa-rich, pattern-poor alignments with dubious phylogenetic signal. So basically, the marker sequence folks (bacterial 16S is also a good example). 

Alexey Kozlov

unread,
May 9, 2017, 8:53:43 AM5/9/17
to ra...@googlegroups.com
Hi Gabe,

> Excellent, this works quite well. It also seems to work with other OpenMP programs, which is a fantastic piece of
> information that should be made more obvious by the OpenMP group, as I imagine this is a fairly typical situation,
> wanting to run different OpenMP programs on a system with dedicated threads and thread counts.

Sometimes cluster job submission system can take care of that, but in general yes, it's a tricky topic, and AFAIK it was
only recently added to the OpenMP standard, KMP_AFFINITY & co. used to be an Intel extension.

> Indeed the strategy of running 2 instances of 34 threads was far more efficient than running one at 68. One process at
> 68 would take 470 seconds, but two at 34 each would take 590 seconds each (but running simultaneously, that's a big win).

sounds awesome!

> Now that I think I have threading performance figured out, I'm trying to figure out the quality angle. ExaML, for
> whatever reason, no matter the starting tree seems to always (on 7 test trees thus far) produce final trees that are
> more than 100 LogLk units worse than NG despite running with "similar" parameters, same starting tree, spr radius of 30
> for both, DNAX specified in the partitions file for ExaML, and so forth.

Well, I'm happy it's not the other way around :) NG algorithm has a couple of changes which could make it somewhat more
thorough. But again, on this dataset, even tiny differences in branch lengths/model parameters could make for a huge
changes in logLk score.

>But tonight I tried adding two additional
> options to ExaML: -f o -a (use "old" hill-climbing and "median" logLk). The final tree quality appears to be
> significantly better:

OK, "-f o" is a fair thing to try, but if you use median/"-a", then you get likelihood scores which are incomparable
with the ones without "-a". Of course, you can still get better trees with "-a", but in order to compare the logLks,
you'd need to evaluate all trees under the same model (either with or without "-a").

> I realized that I did not control the random seed (but I thought that randomization only mattered in making the starting
> tree or with constraint files?).

that's right

>In all 10 runs I've seen, the lowest LogLk I've seen before this run was with NG,
> -853218. I think I should probably test whether it is the hill climbing or the median mode that makes the most
> difference, but thought this was an interesting result nonetheless.

please see above

> I should also mention the alignment I'm working from is fairly difficult and gappy: Alignment has 9027 distinct
> alignment patterns; Proportion of gaps and completely undetermined characters in this alignment: 95.60%. (The actual DNA
> used was, on average, about 500 bases long only. There were a couple of outliers at ~1500. Visually inspecting the
> alignment shows many columns are dominated by gaps.) Although I'm worried about the quality of phylogenetic signal and
> the "garbage in, garbage out" mantra you fittingly stated earlier, but so far the tree performs incredibly well in
> separating fungal communities by phylogenetic distance (UniFrac), recovering true biological variability in known
> samples that was impossible before the tree. I wonder if my excitement is evident from my posts?

For sure it is :) As well as your perfect understanding of technical matters, so I really enjoy our discussion!

As for your analysis: do I see it correctly that you are building a tree of all environmental ITS sequences found in
your samples? I didn't know that it's possible to align ITS at higher taxonomic levels, but apparently you have a
solution for this. I worked with 16S alignments which similar properties (>90% gaps, few sites/lots of taxa), and they
had all the same issues. Still, we (and others) use those alignments, because for many taxa/samples 16S is all we
have... Using phylogenomic trees as a constraint could sometimes help, which reminds me I have to put this feature on my
TODO list for NG.

> Anyway, for clarification to others, it appears the benchmarks and performance assessments I'm running here are more
> relevant to noisy, small, taxa-rich, pattern-poor alignments with dubious phylogenetic signal. So basically, the marker
> sequence folks (bacterial 16S is also a good example).

I guess even 16S is not as bad, ITS is a really extreme case...

Best,
Alexey
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>>>> wrote:
>
> Hi Gabe,
>

Gabe Al-Ghalith

unread,
May 9, 2017, 5:47:25 PM5/9/17
to ra...@googlegroups.com
I turned off -f o and got the exact same LogLk in roughly 1/3 of the time. So it's definitely '-a' causing the difference in this case.
 
if you use median/"-a", then you get likelihood scores which are incomparable with the ones without "-a". Of course, you can still get better trees with "-a", but in order to compare the logLks, you'd need to evaluate all trees under the same model (either with or without "-a").
Got it. Since the final score under '-a' can't be directly compared to average mode, is there a way to invoke '-a' in NG (yet)? 
I was excited to see a LogLk of "-852682" with -a compared to "-853361" without, but I should have been a little more skeptical of a ~700 LogLk unit increase. I did notice the manual recommends turning on this option, which is interesting since it also ran in 25,000 seconds versus 60,000 without -a (same starting tree, threading, etc). 

What other factors make the LogLk values incomparable? If I used a different starting alignment (with the same species in it, same parameters, same starting tree), would the LogLk also not be comparable as a result of a different number of patterns in the alignment? I was hoping to compare the trees generated on different alignments for the same taxa to pick the "best tree+alignment mix", but it seems currently that alignments with more columns will result in lower LogLk scores for the same taxa. Does the LogLk mean anything comparatively in this case?

Finally, has there been any work assessing the strictness of the loss of comparability in interpreting the final LogLk scores? I would assume a tree with a single extra taxon (10,000 to 10,001), added to an identical alignment with one extra line, for instance, wouldn't skew the final LogLk very much, but a totally different set of organisms and number of taxa would produce a wildly different and "thoroughly" (?) incomparable LogLk. Similarly, for the same taxa, I would conclude (by gut, not by strict mathematics) that a tree of score L*10 is much worse than a tree of score L, even if run under different model parameters on those taxa. But in my case, a difference of ~700 units is probably not sufficiently different to exercise the "probably better" sentiment, whereas if it differed by >10,000 units, might we be more comfortable anecdotally saying the "way lower-scoring" tree is worse?

Still grappling with how to objectively (or subjectively, even) compare the quality of trees; it seems the most "straighforward" method to compare trees made under different model parameters etc would be to directly test the effect of both trees on biological stats/usages (how well can one tree differentiate between fungal communities compared to the other). 

Alexey Kozlov

unread,
May 13, 2017, 8:22:02 PM5/13/17
to ra...@googlegroups.com
> I turned off -f o and got the exact same LogLk in roughly 1/3 of the time. So it's definitely '-a' causing the
> difference in this case.

OK that's expected.

> if you use median/"-a", then you get likelihood scores which are incomparable with the ones without "-a". Of course,
> you can still get better trees with "-a", but in order to compare the logLks, you'd need to evaluate all trees under
> the same model (either with or without "-a").
>
> Got it. Since the final score under '-a' can't be directly compared to average mode, is there a way to invoke '-a' in NG
> (yet)?

not yet, but I'll add it into the next release

> I was excited to see a LogLk of "-852682" with -a compared to "-853361" without, but I should have been a little more
> skeptical of a ~700 LogLk unit increase. I did notice the manual recommends turning on this option, which is interesting
> since it also ran in 25,000 seconds versus 60,000 without -a (same starting tree, threading, etc).

that's really interesting

> What other factors make the LogLk values incomparable? If I used a different starting alignment (with the same species
> in it, same parameters, same starting tree), would the LogLk also not be comparable as a result of a different number of
> patterns in the alignment? I was hoping to compare the trees generated on different alignments for the same taxa to pick
> the "best tree+alignment mix", but it seems currently that alignments with more columns will result in lower LogLk
> scores for the same taxa. Does the LogLk mean anything comparatively in this case?

Not really, logLks obtained on different alignments are not comparable. The proper way to do it would be alignment-tree
co-estimation (e.g. BaliPhy), but to my knowledge this is not feasible for datasets of your size. There were also some
attempts to build faster iterative methods (e.g.
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3101-8) which you might want to look into.

> Finally, has there been any work assessing the strictness of the loss of comparability in interpreting the final LogLk
> scores? I would assume a tree with a single extra taxon (10,000 to 10,001), added to an identical alignment with one
> extra line, for instance, wouldn't skew the final LogLk very much, but a totally different set of organisms and number
> of taxa would produce a wildly different and "thoroughly" (?) incomparable LogLk. Similarly, for the same taxa, I would
> conclude (by gut, not by strict mathematics) that a tree of score L*10 is much worse than a tree of score L, even if run
> under different model parameters on those taxa. But in my case, a difference of ~700 units is probably not sufficiently
> different to exercise the "probably better" sentiment, whereas if it differed by >10,000 units, might we be more
> comfortable anecdotally saying the "way lower-scoring" tree is worse?

Nothing I'm aware of, but if your goal is to compare trees, why not just evaluate them under the same model?

> Still grappling with how to objectively (or subjectively, even) compare the quality of trees; it seems the most
> "straighforward" method to compare trees made under different model parameters etc would be to directly test the effect
> of both trees on biological stats/usages (how well can one tree differentiate between fungal communities compared to the
> other).

Maybe, but then you also introduce your subjective bias, assuming that communities ought to be well-differentiable.

Best,
Alexey

>
> On Tue, May 9, 2017 at 7:53 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> <mailto:algh...@umn.edu <mailto:algh...@umn.edu>>>>> wrote:
>
>
> Thanks!
>
> On Thu, May 4, 2017 at 12:13 PM, Alexey Kozlov <alexei...@gmail.com
> <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>>> wrote:
>
> Hi Gabe,
>

Gabe Al-Ghalith

unread,
May 30, 2017, 12:05:38 AM5/30/17
to ra...@googlegroups.com
Thanks. Some profound insights here. Yes, like you say, I'm searching for the best tree, but as the tree is sensitive to the alignment quality, I was attempting to compare the distribution of tree scores across different alignments for the same DNA sequence data, which you mentioned is not valid by comparing logLk alone. 

The approach you linked to (PASTA + BAliPhy) at first glance seems to violate the notion that logLks obtained on different alignments are not comparable. The program used in the article (PASTA) seems to iterate and base its hunt for the best tree on the raw logLk scores produced by substantially different alignments ("re-alignments"), which is similar to what I was trying to do. An example output (emphasis mine):
PASTA INFO: current score: -881981.93, best score: -881981.93
PASTA INFO: Step 1. Realigning with decomposition strategy set to centroid
PASTA INFO: Step 1. Alignment obtained. Tree inference beginning...
PASTA INFO: realignment accepted and score improved.
PASTA INFO: current score: -878325.642, best score: -878325.642
PASTA INFO: Step 2. Realigning with decomposition strategy set to centroid
PASTA INFO: Step 2. Alignment obtained. Tree inference beginning...
PASTA INFO: realignment accepted and despite the score not improving.
PASTA INFO: current score: -880197.298, best score: -878325.642

 Assuming this approach is correct, are there certain alignment conditions that must be satisfied in order to co-infer alignment+tree while ensuring "better" trees are actually being produced from the alignments in subsequent iterations?

Apologies if this is getting too deep into the woods...

Thanks,
Gabe

Alexey Kozlov

unread,
Jun 2, 2017, 10:00:40 AM6/2/17
to ra...@googlegroups.com
Hi Gabe.

it seems like ML scores are mostly irrelevant for this method, please see discussion in the following paper:

https://academic.oup.com/sysbio/article/61/1/90/1680002/SATe-II-Very-Fast-and-Accurate-Simultaneous


"Because the sequence evolution model for ML phylogeny estimation used in SATé is GTR+Gamma with gaps treated as missing
data, SATé is not attempting to solve ML under a sequence evolution model that includes indels. This suggests that SATé
is not likely to be statistically consistent under a model in which sequences evolve with indels.

[...]

We show definitively that the reason SATé is highly accurate is not due to the use of the ML criterion to select among
the alignment/tree pairs it generates. *We show empirically that whether SATé uses ML to choose among tree/alignment
pairs or simply takes the last tree alignment pair generated after at least three to five iterations has little or no
effect on the topological accuracy of the tree produced.* "

Best,
Alexey

On 30.05.2017 06:05, Gabe Al-Ghalith wrote:
> Thanks. Some profound insights here. Yes, like you say, I'm searching for the best tree, but as the tree is sensitive to
> the alignment quality, I was attempting to compare the distribution of tree scores across different alignments for the
> same DNA sequence data, which you mentioned is not valid by comparing logLk alone.
>
> The approach you linked to (PASTA + BAliPhy) at first glance seems to violate the notion that logLks obtained on
> different alignments are not comparable. The program used in the article (PASTA) seems to iterate and base its hunt for
> the best tree on the raw logLk scores produced by substantially different alignments ("re-alignments"), which is similar
> to what I was trying to do. An example output (emphasis mine):
>
> PASTA INFO: current score: -881981.93, best score: -881981.93
> PASTA INFO: Step 1. Realigning with decomposition strategy set to centroid
> PASTA INFO: Step 1. Alignment obtained. Tree inference beginning...
> PASTA INFO: realignment accepted and score improved.
> PASTA INFO: current score: -878325.642, best score: -878325.642
> PASTA INFO: Step 2. Realigning with decomposition strategy set to centroid
> PASTA INFO: Step 2. Alignment obtained. Tree inference beginning...
> PASTA INFO: realignment accepted and *despite the score not improving.*
> PASTA INFO: current score: -880197.298, best score: -878325.642
>
>
> Assuming this approach is correct, are there certain alignment conditions that must be satisfied in order to co-infer
> alignment+tree while ensuring "better" trees are actually being produced from the alignments in subsequent iterations?
>
> Apologies if this is getting too deep into the woods...
>
> Thanks,
> Gabe
>
> <https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3101-8>) which you might want to look into.
>
> Finally, has there been any work assessing the strictness of the loss of comparability in interpreting the final
> LogLk
> scores? I would assume a tree with a single extra taxon (10,000 to 10,001), added to an identical alignment with one
> extra line, for instance, wouldn't skew the final LogLk very much, but a totally different set of organisms and
> number
> of taxa would produce a wildly different and "thoroughly" (?) incomparable LogLk. Similarly, for the same taxa,
> I would
> conclude (by gut, not by strict mathematics) that a tree of score L*10 is much worse than a tree of score L,
> even if run
> under different model parameters on those taxa. But in my case, a difference of ~700 units is probably not
> sufficiently
> different to exercise the "probably better" sentiment, whereas if it differed by >10,000 units, might we be more
> comfortable anecdotally saying the "way lower-scoring" tree is worse?
>
>
> Nothing I'm aware of, but if your goal is to compare trees, why not just evaluate them under the same model?
>
> Still grappling with how to objectively (or subjectively, even) compare the quality of trees; it seems the most
> "straighforward" method to compare trees made under different model parameters etc would be to directly test the
> effect
> of both trees on biological stats/usages (how well can one tree differentiate between fungal communities
> compared to the
> other).
>
>
> Maybe, but then you also introduce your subjective bias, assuming that communities ought to be well-differentiable.
>
> Best,
> Alexey
>
>
> On Tue, May 9, 2017 at 7:53 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> wrote:
>
> Hi Gabe,
>

Gabe Al-Ghalith

unread,
Jul 8, 2017, 8:59:14 PM7/8/17
to ra...@googlegroups.com
I'm having an issue with the KNL version of ExaML on a ~20,000 sequence 16S (8235 pattern) dataset. Let me know what info or files to provide or what I can do to help (either myself or you!). The initial tree was made with FastTree, which is what I've always refined upon -- I can provide any/all data.

$ KMP_AFFINITY= OMP_NUM_THREADS=68 OMP_PROC_BIND=true OMP_PLACES={0:68} examl-KNL -t SparseMe.ft.tre -m GAMMA -s SparseMe.binary -n SparseMe.fast -D
... 
Likelihood problem in model optimization l1: -inf l2: -3013784.9188003716990351676940917968750000000000 tolerance: 0.0000030137849188003715740111990856187063
examl-KNL: optimizeModel.c:2958: checkTolerance: Assertion `0' failed.
Aborted

Gabe Al-Ghalith

unread,
Jul 9, 2017, 2:04:48 PM7/9/17
to ra...@googlegroups.com
Update: Looks like it's not ExaML/AVX512/tree to blame. Maybe it's the alignment itself. 

Here's what I get when I run NG (AVX2) on the raw alignment, generating a starting parsimony tree (I've also attached that starting tree and alignment):

raxml-ng --msa SparseMe.phy --model GTR+G --threads 68 --tre pars --prefix RAX
Analysis options:
  run mode: ML tree search
  start tree(s): parsimony
  random seed: 1499574239
  tip-inner: ON
  pattern compression: ON
  fast spr radius: AUTO
  spr subtree cutoff: 1.000000
  branch lengths: ML estimate (linked)
  SIMD kernels: AVX2
  parallelization: PTHREADS (68 threads)
[00:00:00] Reading alignment from file: SparseMe.phy
[00:00:03] Loaded alignment with 18916 taxa and 14966 sites
WARNING: Fully undetermined columns found: 82
NOTE: Reduced alignment (with gap-only columns removed) was printed to:
/data/16S_tree/RAX.raxml.reduced.phy
Alignment comprises 1 partitions and 8235 patterns
Partition 0: noname
Model: GTR+FO+G4m
Alignment sites / patterns: 14966 / 8235
Gaps: 90.24 %
Invariant sites: 51.79 %

[00:01:05] Generating parsimony starting tree(s) with 18916 taxa
[11:08:28] Data distribution: partitions/thread: 1-1, patterns/thread: 121-122
Starting ML tree search with 1 distinct starting trees
[11:10:33 -inf] Initial branch length optimization
terminate called recursively
terminate called recursively
terminate called recursively
ERROR: ERROR in branch lenght optimization: wrong likelihood derivatives
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
Aborted
startTree_and_reducedPhy.zip

Alexey Kozlov

unread,
Jul 9, 2017, 2:25:28 PM7/9/17
to ra...@googlegroups.com
Hi Gabe,

it seems very much like the infamous GAMMA underflow issue, which should be fixed in NG but not in old RAxML/ExaML.

Could you please:

1) run NG with "--rate-scalers on" switch

2) run NG with "--model GTR" (i.e. without GAMMA rate heterogeneity)

and let me know the results?

I'll have a look at your files a bit later.

Thanks!

Best,
Alexey
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>>>> wrote:
>
> Hi Gabe,
>
> <mailto:algh...@umn.edu <mailto:algh...@umn.edu>>>>>>> wrote:
>
>
> Thanks!
>
> On Thu, May 4, 2017 at 12:13 PM, Alexey Kozlov
> <alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>>>> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>>>

Gabe Al-Ghalith

unread,
Jul 9, 2017, 3:44:47 PM7/9/17
to ra...@googlegroups.com
Thanks!

I've tried with --rate-scalers on using the failed run's starting tree and it seems to have gotten farther:
[00:03:29 -3252280.786716] Initial branch length optimization
[00:05:05 -2583101.021822] Model parameter optimization (eps = 10.000000)

Should I let it run to the end over the next few days, or would we know right away if there is a problem? 

Are there some settings I can use with ExaML to avoid the underflow? If it happens only on unoptomized initial trees, maybe one could use CAT (site specific rate) to do a search, then use the output CAT tree in a GAMMA run to avoid the underflow or something?

I only ask because NG is about an order of magnitude slower on my system than ExaML, and the difference widens further if I use '-D' with ExaML (which does some 1% R-F early termination) to something like a 30-fold difference in runtime with ~20,000 taxa and a single marker gene. Until AVX512 and early termination come to NG, I want to try to stick to ExaML for practical reasons. :)

Thanks,
Gabe

                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
            <mailto:alexei...@gmail.com>>>
                             <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>

            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
            <mailto:alexei...@gmail.com>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>>
                             <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>

            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> <mailto:alexei...@gmail.com
            <mailto:alexei...@gmail.com>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com

            <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>>
                                         <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>
            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com

            <mailto:alexei...@gmail.com>>> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>>
                             <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>>
                             <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>
            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>
                                         <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>
            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com

            <mailto:alexei...@gmail.com>>>> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>>
                             <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>

            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>> <mailto:alexei...@gmail.com
            <mailto:alexei...@gmail.com>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com

            <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>>>
                                                 <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>
            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
            <mailto:alexei...@gmail.com>>>
                             <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>

            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
            <mailto:alexei...@gmail.com>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>>>> <mailto:alexei...@gmail.com

            <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
                             <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>
            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>
                                         <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com>
            <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
                     <mailto:alexei.kozlow@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com

Alexandros Stamatakis

unread,
Jul 10, 2017, 9:12:24 AM7/10/17
to ra...@googlegroups.com
you could do ExaML inferences under CAT which usually is less prone to
this numerical issue and then compute the GAMMA scores of final trees
with RAxML-NG,

alexis

On 09.07.2017 21:44, Gabe Al-Ghalith wrote:
> Thanks!
>
> I've tried with --rate-scalers on using the failed run's starting tree
> and it seems to have gotten farther:
>
> [00:03:29 -3252280.786716] Initial branch length optimization
> [00:05:05 -2583101.021822] Model parameter optimization (eps =
> 10.000000)
>
>
> Should I let it run to the end over the next few days, or would we know
> right away if there is a problem?
>
> Are there some settings I can use with ExaML to avoid the underflow? If
> it happens only on unoptomized initial trees, maybe one could use CAT
> (site specific rate) to do a search, then use the output CAT tree in a
> GAMMA run to avoid the underflow or something?
>
> I only ask because NG is about an order of magnitude slower on my system
> than ExaML, and the difference widens further if I use '-D' with ExaML
> (which does some 1% R-F early termination) to something like a 30-fold
> difference in runtime with ~20,000 taxa and a single marker gene. Until
> AVX512 and early termination come to NG, I want to try to stick to ExaML
> for practical reasons. :)
>
> Thanks,
> Gabe
>
> On Sun, Jul 9, 2017 at 1:25 PM, Alexey Kozlov <alexei...@gmail.com
> <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> <mailto:algh...@umn.edu <mailto:algh...@umn.edu>>> wrote:
>
> I'm having an issue with the KNL version of ExaML on a
> ~20,000 sequence 16S (8235 pattern) dataset. Let me know what
> info or files to provide or what I can do to help (either
> myself or you!). The initial tree was made with FastTree,
> which is what I've always refined upon -- I can provide
> any/all data.
>
> $ KMP_AFFINITY= OMP_NUM_THREADS=68 OMP_PROC_BIND=true
> OMP_PLACES={0:68} examl-KNL -t SparseMe.ft.tre -m GAMMA -s
> SparseMe.binary -n SparseMe.fast -D
>
> ...
>
> Likelihood problem in model optimization l1: -inf l2:
> -3013784.9188003716990351676940917968750000000000
> tolerance: 0.0000030137849188003715740111990856187063
> examl-KNL: optimizeModel.c:2958: checkTolerance:
> Assertion `0' failed.
> Aborted
>
>
>
> On Fri, Jun 2, 2017 at 9:00 AM, Alexey Kozlov
> <alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

Alexey Kozlov

unread,
Jul 10, 2017, 9:17:57 AM7/10/17
to ra...@googlegroups.com

> I've tried with --rate-scalers on using the failed run's starting tree and it seems to have gotten farther:
>
> [00:03:29 -3252280.786716] Initial branch length optimization
> [00:05:05 -2583101.021822] Model parameter optimization (eps = 10.000000)
>
>
> Should I let it run to the end over the next few days, or would we know right away if there is a problem?

Great, it seems like per-rate scalers solved the issue. It'd be interesting to see if it'll run till the end, but it's
up to you whether to spend time on this...

> Are there some settings I can use with ExaML to avoid the underflow? If it happens only on unoptomized initial trees,
> maybe one could use CAT (site specific rate) to do a search, then use the output CAT tree in a GAMMA run to avoid the
> underflow or something?

In general, you can do that (use CAT for tree search and then GAMMA for final optimization). However, unfortunately

1) CAT model is not supported on KNC/KNL
2) the underflow issue can still reappear during final optimization

The latter problem can be easily solved by using NG for final GAMMA optimization.

>I only ask because NG is about an order of magnitude slower on my system than ExaML, and the difference widens further
>if I use '-D' with ExaML (which does some 1% R-F early termination) to something like a 30-fold difference in runtime
>with ~20,000 taxa and a single marker gene. Until AVX512 and early termination come to NG, I want to try to stick to
>ExaML for practical reasons.

I totally understand. We already started to work on AVX512 vectorization, but that's gonna take some time. '-D' could be
easily implemented, but again, unfortunately I can't spend much time on NG development in the next couple of months. I
can't promise anything, but I expect both features to become available in September/October.


Best,
Alexey

> On Sun, Jul 9, 2017 at 1:25 PM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>> wrote:
>
> Hi Gabe,
>
> <mailto:algh...@umn.edu <mailto:algh...@umn.edu>>> wrote:
>
> I'm having an issue with the KNL version of ExaML on a ~20,000 sequence 16S (8235 pattern) dataset. Let me
> know what
> info or files to provide or what I can do to help (either myself or you!). The initial tree was made with
> FastTree,
> which is what I've always refined upon -- I can provide any/all data.
>
> $ KMP_AFFINITY= OMP_NUM_THREADS=68 OMP_PROC_BIND=true OMP_PLACES={0:68} examl-KNL -t SparseMe.ft.tre -m
> GAMMA -s
> SparseMe.binary -n SparseMe.fast -D
>
> ...
>
> Likelihood problem in model optimization l1: -inf l2: -3013784.9188003716990351676940917968750000000000
> tolerance: 0.0000030137849188003715740111990856187063
> examl-KNL: optimizeModel.c:2958: checkTolerance: Assertion `0' failed.
> Aborted
>
>
>
> On Fri, Jun 2, 2017 at 9:00 AM, Alexey Kozlov <alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
> <mailto:alexei...@gmail.com>>> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>
> <mailto:alexei...@gmail.com>>>> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>>>>
> <mailto:alexei...@gmail.com
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
> <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com>>
> <mailto:alexei...@gmail.com <mailto:alexei...@gmail.com> <mailto:alexei...@gmail.com
Reply all
Reply to author
Forward
0 new messages