Restarting RAxML from checkpoint

743 views
Skip to first unread message

taua...@gmail.com

unread,
Mar 15, 2018, 4:04:09 PM3/15/18
to raxml
Hi all,

I am running RAxML on a large dataset and my jobs can't run all the way to the end, so I have to restart. I saw in the manual that I can use the -j option to save checkpoint intermediate files, but how do I restart an analysis with that? I simply relaunched my script and it failed saying the output files already existed, but only a few initial files actually existed, not the final output.

I am breaking the best tree and each bootstrap in different jobs, to try and get the results faster.
I am using RAxML 8.2.10 - HYBRID-SSE3 for the best tree, PTHREADS-SSE3 for each bootstrap.

Command for the best tree:

raxmlHPC-HYBRID-SSE3 -T $SLURM_CPUS_PER_TASK -N 10 -j -d -p 1313 \

-m MULTIGAMMA -K GTR -n MLBEST.besttree -q partition -s RAXMLdayhoff


I learned that ExaML allows for checkpoints, but I would rather use RAxML for this one since it also seems to have a checkpoint option. Any help is much appreciated.

Tauana

Alexandros Stamatakis

unread,
Mar 16, 2018, 12:40:44 AM3/16/18
to ra...@googlegroups.com
Hi Tauna,

Restarting from checkpoint with RAxML does not work for the RAxML
command you are using and it also doesn't work for the hybrid version,
unfortunately.

You can either split up those 10 ML searches into 10 individual searches
(10 jobs), but you need to make sure that each search is started with a
different random number seed (-p option).

Alternatively you can use ExaML (as you said) or RAxML-NG, the new
better version of RAxML:
https://github.com/amkozlov/raxml-ng

which also offers better checkpointing support.

Finally, the current version of old RAxML is v 8.2.11.

Alexis
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org
Message has been deleted

Alexandros Stamatakis

unread,
Mar 20, 2018, 12:28:20 PM3/20/18
to ra...@googlegroups.com
Hi Tauna,

> Thank you very much, Alexis, I was not aware of RAxML-NG. Since it seems
> to be the most recent version you have been working on, I think I will
> try that.

Yes please.

> If I understood correctly from the wiki, it automatically creates a
> checkpoint file and therefore there is nothing else I need to add to the
> command, just relaunch the program. Please let me know if I am mistaken.

That's correct.

> I am also assuming I can still use the -m MULTIGAMMA -K GTR options for
> a dayhoff dataset, which is what I was trying with the previous RAxML.

I think that should be implemented already, but you should wait for
Alexey's answer (he is the main RAxML-NG developer) about this.

Alexis

>
> Thanks again!
> Tauana
>
>
> On Friday, March 16, 2018 at 12:40:44 AM UTC-4, Alexis wrote:
>
> Hi Tauna,
>
> Restarting from checkpoint with RAxML does not work for the RAxML
> command you are using and it also doesn't work for the hybrid version,
> unfortunately.
>
> You can either split up those 10 ML searches into 10 individual
> searches
> (10 jobs), but you need to make sure that each search is started with a
> different random number seed (-p option).
>
> Alternatively you can use ExaML (as you said) or RAxML-NG, the new
> better version of RAxML:
> https://github.com/amkozlov/raxml-ng
> <https://github.com/amkozlov/raxml-ng>
>
> which also offers better checkpointing support.
>
> Finally, the current version of old RAxML is v 8.2.11.
>
> Alexis
>
> > an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>

Alexey Kozlov

unread,
Mar 20, 2018, 2:18:01 PM3/20/18
to ra...@googlegroups.com
Hi Tauana,

> If I understood correctly from the wiki, it automatically creates a checkpoint file and therefore there is nothing else
> I need to add to the command, just relaunch the program. Please let me know if I am mistaken.

Yes.

> I am also assuming I can still use the -m MULTIGAMMA -K GTR options for a dayhoff dataset, which is what I was trying
> with the previous RAxML.

As of now, multistate models are only available in the development branch of RAxML-NG. So you can either compile it as
described here:

https://github.com/amkozlov/raxml-ng/wiki/Installation#building-development-branch

or wait until the next RAxML-NG release (2-4 weeks).

Also, the model specification differs from old RAxML, please see here:

https://github.com/amkozlov/raxml-ng/wiki/Input-data#single-model

Hope this helps,
Alexey

> On Friday, March 16, 2018 at 12:40:44 AM UTC-4, Alexis wrote:
>
> Hi Tauna,
>
> Restarting from checkpoint with RAxML does not work for the RAxML
> command you are using and it also doesn't work for the hybrid version,
> unfortunately.
>
> You can either split up those 10 ML searches into 10 individual searches
> (10 jobs), but you need to make sure that each search is started with a
> different random number seed (-p option).
>
> Alternatively you can use ExaML (as you said) or RAxML-NG, the new
> better version of RAxML:
> https://github.com/amkozlov/raxml-ng <https://github.com/amkozlov/raxml-ng>
>
> which also offers better checkpointing support.
>
> Finally, the current version of old RAxML is v 8.2.11.
>
> Alexis
>
> On 15.03.2018 22:04, taua...@gmail.com <javascript:> wrote:
> > Hi all,
> >
> > I am running RAxML on a large dataset and my jobs can't run all the way
> > to the end, so I have to restart. I saw in the manual that I can use the
> > -j option to save checkpoint intermediate files, but how do I restart an
> > analysis with that? I simply relaunched my script and it failed saying
> > the output files already existed, but only a few initial files actually
> > existed, not the final output.
> >
> > I am breaking the best tree and each bootstrap in different jobs, to try
> > and get the results faster.
> > I am using RAxML 8.2.10 - HYBRID-SSE3 for the best tree, PTHREADS-SSE3
> > for each bootstrap.
> >
> > Command for the best tree:
> >
> > raxmlHPC-HYBRID-SSE3 -T $SLURM_CPUS_PER_TASK -N 10 -j -d -p 1313 \
> >
> > -m MULTIGAMMA -K GTR -n MLBEST.besttree -q partition -s RAXMLdayhoff
> >
> >
> > I learned that ExaML allows for checkpoints, but I would rather use
> > RAxML for this one since it also seems to have a checkpoint option. Any
> > help is much appreciated.
> >
> > Tauana
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>

Andrew ODonnell

unread,
Mar 1, 2024, 3:25:29 AMMar 1
to raxml
Dear Alexis,

I've run 20 tree searches on a large data set of 1,600 sequences using 10 threads using raxml-ng, as determined by the estimation, by following along with the raxml tutorial. However, we have up to 32 available cores on our computing cluster, and I'm wondering if the checkpointing described here allows me to stop the search, increase the number of threads, and then resume from a checkpoint (i.e. can I change a parameter in how raxml was called and still benefit from the checkpointing features)? My search is still running, but taking a very long time, and I would like to first ask, and then try stopping the run an re-starting with more threads selected.

I used the raxml-ng-mpi verion as so, while following along with the raxml-ng tutorial:

RAxML-NG v. 1.1.0 released on 29.11.2021 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

System: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, 32 cores, 503 GB RAM

RAxML-NG was called at 28-Feb-2024 11:35:02 as follows:

raxml-ng-mpi --msa hits.phy --model JTT+G --prefix T3 --threads 10 --seed 2

Analysis options:
  run mode: ML tree search
  start tree(s): random (10) + parsimony (10)
  random seed: 2
  tip-inner: OFF
  pattern compression: ON
  per-rate scalers: OFF
  site repeats: ON
  fast spr radius: AUTO
  spr subtree cutoff: 1.000000
  branch lengths: proportional (ML estimate, algorithm: NR-FAST)
  SIMD kernels: AVX2
  parallelization: coarse-grained (auto), PTHREADS (10 threads), thread pinning: ON

[00:00:00] Reading alignment from file: hits.phy
[00:00:00] Loaded alignment with 1637 taxa and 7894 sites

Oleksiy Kozlov

unread,
Mar 1, 2024, 8:22:43 AMMar 1
to ra...@googlegroups.com
Dear Andrew,

in general, it should be possible, but there a couple of exceptions.

I'd recommend to just copy all PREFIX.raxml.* files to a new directory, try to restart with a higher
number of threads there, and see what happens :)

Best,
Oleksiy

On 01.03.24 09:22, Andrew ODonnell wrote:
> Dear Alexis,
>
> I've run 20 tree searches on a large data set of 1,600 sequences using 10 threads using raxml-ng, as
> determined by the estimation, by following along with the raxml tutorial. However, we have up to 32
> available cores on our computing cluster, and I'm wondering if the checkpointing described here
> <https://github.com/amkozlov/raxml-ng/wiki/Advanced-Tutorial> allows me to stop the search, increase
> the number of threads, and then resume from a checkpoint (i.e. can I change a parameter in how raxml
> was called and still benefit from the checkpointing features)? My search is still running, but
> taking a very long time, and I would like to first ask, and then try stopping the run an re-starting
> with more threads selected.
>
> I used the raxml-ng-mpi verion as so, while following along with the raxml-ng tutorial:
>
> *RAxML-NG v. 1.1.0 released on 29.11.2021 by The Exelixis Lab.
> Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
> Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
> Latest version: https://github.com/amkozlov/raxml-ng
> Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml
>
> System: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, 32 cores, 503 GB RAM
>
> RAxML-NG was called at 28-Feb-2024 11:35:02 as follows:
>
> raxml-ng-mpi --msa hits.phy --model JTT+G --prefix T3 --threads 10 --seed 2
>
> Analysis options:
>   run mode: ML tree search
>   start tree(s): random (10) + parsimony (10)
>   random seed: 2
>   tip-inner: OFF
>   pattern compression: ON
>   per-rate scalers: OFF
>   site repeats: ON
>   fast spr radius: AUTO
>   spr subtree cutoff: 1.000000
>   branch lengths: proportional (ML estimate, algorithm: NR-FAST)
>   SIMD kernels: AVX2
>   parallelization: coarse-grained (auto), PTHREADS (10 threads), thread pinning: ON
>
> [00:00:00] Reading alignment from file: hits.phy
> [00:00:00] Loaded alignment with 1637 taxa and 7894 sites*
>
> On Friday, March 16, 2018 at 5:40:44 AM UTC+1 Alexandros Stamatakis wrote:
>
> Hi Tauna,
>
> Restarting from checkpoint with RAxML does not work for the RAxML
> command you are using and it also doesn't work for the hybrid version,
> unfortunately.
>
> You can either split up those 10 ML searches into 10 individual searches
> (10 jobs), but you need to make sure that each search is started with a
> different random number seed (-p option).
>
> Alternatively you can use ExaML (as you said) or RAxML-NG, the new
> better version of RAxML:
> https://github.com/amkozlov/raxml-ng <https://github.com/amkozlov/raxml-ng>
> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/c73889db-40ba-4355-a3bb-300e71d1849en%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/c73889db-40ba-4355-a3bb-300e71d1849en%40googlegroups.com?utm_medium=email&utm_source=footer>.

Andrew ODonnell

unread,
Mar 5, 2024, 2:37:49 AMMar 5
to ra...@googlegroups.com
Dear Oleksiy,

I would just like to kindly report back (for anyone else who might be interested) that I was able to stop the analysis (ctrl + c), increase thread number, and restart the analysis without problems (using raxml-ng) . Increasing the thread count from 10 to 24 resulted in moderately faster times for each tree search. I then stopped the analysis again, and increased to the maximum number of 32 threads, and in my case, this resulted in the fastest times for each tree search. For a tree of over 1,600 sequences, it almost feels like lightning-speed! (although it will still take a couple days to finish 20 tree searches)

Although I had no problems, I always copied all PREFIX.raxml files to a backup directory prior to stopping each run of raxml.

Thanks for your help, and good luck to all!

Best,

A

You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/ruOjgtmfokg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raxml/6f83c38a-8f13-4920-b5b0-6b1d880fe084%40gmail.com.

Oleksiy Kozlov

unread,
Mar 5, 2024, 6:50:27 AMMar 5
to ra...@googlegroups.com
Dear Andrew,

thanks for your feedback!

Best,
Oleksiy
> > Latest version: https://github.com/amkozlov/raxml-ng <https://github.com/amkozlov/raxml-ng>
> >      > an email to raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>
> >      > <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bun...@googlegroups.com>>.
> >      > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout> <https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>>.
> >
> >     --
> >     Alexandros (Alexis) Stamatakis
> >
> >     Research Group Leader, Heidelberg Institute for Theoretical Studies
> >     Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
> >
> > www.exelixis-lab.org <http://www.exelixis-lab.org> <http://www.exelixis-lab.org
> <http://www.exelixis-lab.org>>
> >
> > --
> > You received this message because you are subscribed to the Google Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to
> > raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>
> <mailto:raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>>.
> <https://groups.google.com/d/msgid/raxml/c73889db-40ba-4355-a3bb-300e71d1849en%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/c73889db-40ba-4355-a3bb-300e71d1849en%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/raxml/ruOjgtmfokg/unsubscribe
> <https://groups.google.com/d/topic/raxml/ruOjgtmfokg/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to
> raxml+un...@googlegroups.com <mailto:raxml%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/6f83c38a-8f13-4920-b5b0-6b1d880fe084%40gmail.com
> <https://groups.google.com/d/msgid/raxml/6f83c38a-8f13-4920-b5b0-6b1d880fe084%40gmail.com>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/CAJiRDeOVFOzqQVDWchaTU8FCnhPTkC4CRUT-9XBWXrJboG27mg%40mail.gmail.com <https://groups.google.com/d/msgid/raxml/CAJiRDeOVFOzqQVDWchaTU8FCnhPTkC4CRUT-9XBWXrJboG27mg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages