raxml-ng extremely slow on HPC

167 views
Skip to first unread message

Justin Bernstein

unread,
Aug 26, 2021, 1:49:19 PM8/26/21
to raxml
Hello all,

I made a post on this once before for the previous version of RAxML, but was hoping to get some insight on memory allocation for running jobs on the cluster.
I am running a concatenated phylogenomic dataset (UCEs, AHEs, nuclear genes) from the Phyluce pipeline. My dataset contains 169 taxa and 2,706,714 sites, and the nodes in the computer cluster I am using have up to 50 cores (28 cores for nodes that cannot be interrupted by higher priority job submission). A preliminary memory assessment using the RAxML-NG check estimated I use 139 threads and 50 GB of memory.

I have tried running a few different scripts using RAxML-NG, but whether I am running 20 or 50 starting trees, each tree search is taking 6-7 hours, and I only get a 72 hour limit in the cluster. Am I perhaps allocating memory wrong? Do I need to be using thread pinning? 

For reference, here is an example of the script I am running (on a node that 25 cores allocated to my job submission, and has up to 182 GB of memory):
raxml-ng --msa mafft-nexus-internal-trimmed-gblocks-clean-75p-raxml.phylip --model GTR+G --prefix T25 --threads 25 --seed 2 --tree pars{25},rand{25}

So here I have made the number of threads equal to the number of cores, still using one node. Perhaps I am doing something wrong here? Any recommendations on the script or how I am allocating memory would be extremely helpful. I am happy to provide any other files or info that would be useful. 

Best,
Justin

Alexey Kozlov

unread,
Aug 26, 2021, 4:20:43 PM8/26/21
to ra...@googlegroups.com
Hello Justin,

could you please show your job submission script and raxml-ng log file for this run?

Best,
Alexey
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/47263fc9-ea05-421d-8e07-f528c51fb224n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/47263fc9-ea05-421d-8e07-f528c51fb224n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Justin Bernstein

unread,
Aug 26, 2021, 6:33:44 PM8/26/21
to raxml
Hi Alexey,

Sure thing! Attached is the submission and log file.

Best,
Justin

T25.raxml.log
job-submission.sh

Alexey Kozlov

unread,
Aug 26, 2021, 7:46:13 PM8/26/21
to ra...@googlegroups.com
thanks, it generally looks fine, and given that your alignment is pretty large, so 6 hours per tree
is quite plausible

still, you can try a couple of things:

- use 32 threads (and modify script accordingly), since your node has 32 cpu cores
- try thread pinning: for that I would do a quick test run with "--tree pars{1}" and compare the
runtimes after first few SPR rounds (you don't have to wait until search has finished)

please note that raxml-ng supports checkpointing, so job time limit is not a problem - you can just
resubmit the job and raxml-ng will continue where it has been terminated
> <https://groups.google.com/d/msgid/raxml/47263fc9-ea05-421d-8e07-f528c51fb224n%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/47263fc9-ea05-421d-8e07-f528c51fb224n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/50c99306-f7eb-4ae6-ba55-931cd39e5dden%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/50c99306-f7eb-4ae6-ba55-931cd39e5dden%40googlegroups.com?utm_medium=email&utm_source=footer>.

Justin Bernstein

unread,
Aug 26, 2021, 10:30:56 PM8/26/21
to raxml
Thank you for the advice. I will run it and also check the thread pinning. For the checkpoints, as long as there is a .ckp file in the directory, will it automatically start from there if I just run the same command?

Alexey Kozlov

unread,
Aug 27, 2021, 4:41:45 AM8/27/21
to ra...@googlegroups.com


> Thank you for the advice. I will run it and also check the thread pinning. For the checkpoints, as
> long as there is a .ckp file in the directory, will it automatically start from there if I just run
> the same command?

exactly.
> <https://groups.google.com/d/msgid/raxml/50c99306-f7eb-4ae6-ba55-931cd39e5dden%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/50c99306-f7eb-4ae6-ba55-931cd39e5dden%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/9de90f2b-ca8c-4573-ba3d-1083969785bdn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/9de90f2b-ca8c-4573-ba3d-1083969785bdn%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages