Dear,
In our lab, we conduct research that requires us to construct large trees for various virus pathogens.
In order to do this, we constructed a workflow that was based on the Bootstrapping section (Step 4) of the step-by-step documentation manual (https://sco.h-its.org/exelixis/web/software/raxml/hands_on.html).
We aim to construct trees with 1000 bootstraps, so to optimally use our HPC infrastructure, we run each bootstrap separately, using:
raxmlHPC-SSE3 -x $seed -p $trees -# 1
Where $seed is an element of a set of random integers and $trees is an element of another set of random integers.
To determine the best tree on the other hand, we run the best tree search as an MPI coordinated process:
mpirun raxmlHPC-MPI -f a -x 12356 -p 123 -# 20
However, when reading the explanation in the step-by-step manual, this process runs 20 distinct tree searches, and export the tree with the largest likelihood.
Therefore, we were wondering whether it would be possible to run these 20 searches as separate processes, export 20 distinct trees with their likelihood, such that we can select the best tree manually. This way of computing would enable us to have more control of our HPCs job scheduling system.
Would the following approach be sound to achieve this purpose:
- create 2 sets of random integers of size 20 s1 and s2
- run 20 instances of raxmlHPC-SSE3 -x $seed1 -p $seed2 -# 1, where $seed1 is an element popped from the above set s1 and $seed2 is an element popped from the above set s2
- from the 20 computed trees, select the one with the highest likelihood
Thanks in advance and kind regards,
Pieter
Hi Pieter,
In our lab, we conduct research that requires us to construct large trees for various virus pathogens.
In order to do this, we constructed a workflow that was based on the Bootstrapping section (Step 4) of the step-by-step documentation manual (https://sco.h-its.org/exelixis/web/software/raxml/hands_on.html).
We aim to construct trees with 1000 bootstraps, so to optimally use our HPC infrastructure, we run each bootstrap separately, using:
raxmlHPC-SSE3 -x $seed -p $trees -# 1
Where $seed is an element of a set of random integers and $trees is an element of another set of random integers.
Why don't you use the so-called bootstopping option (see manual) of the code? This might help to avoid running unecessary replicates.
Also, why don't you use the AVX version of the code?
In this case it would be preferrable to use the standard bootstrap, -b for which there also exists a coarse-grained MPI parallalization.
To determine the best tree on the other hand, we run the best tree search as an MPI coordinated process:
mpirun raxmlHPC-MPI -f a -x 12356 -p 123 -# 20
However, when reading the explanation in the step-by-step manual, this process runs 20 distinct tree searches, and export the tree with the largest likelihood.
Therefore, we were wondering whether it would be possible to run these 20 searches as separate processes, export 20 distinct trees with their likelihood, such that we can select the best tree manually. This way of computing would enable us to have more control of our HPCs job scheduling system.
Yes, that's feasible, only that a final optimization step on the best scoring of these 20 trees (which does not improve the likelihood a lot though) would not be executed.
Also the above command will also execute bootstraps. It should rather read:
mpirun raxmlHPC-MPI -p 123 -# 20
which will generate 20 trees in parallel.
Would the following approach be sound to achieve this purpose:
- create 2 sets of random integers of size 20 s1 and s2
- run 20 instances of raxmlHPC-SSE3 -x $seed1 -p $seed2 -# 1, where $seed1 is an element popped from the above set s1 and $seed2 is an element popped from the above set s2
You should omit -x as this will generate bootstrap trees.
- from the 20 computed trees, select the one with the highest likelihood
Yes, but be careful with -x which generates bootstrap trees.
Also, the simplest way is to do as many BS replicates as required with the -b option (using the bootstopping criterion to check if you have already done enough bootstraps) and then do 20 independent ML searches.
Finally, for the sake of efficiency I would strongly recommend switching to the RAxML re-design RAxML-NG:
https://github.com/amkozlov/raxml-ng
Alexis
Thanks in advance and kind regards,
Pieter
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com <mailto:raxml+unsubscribe@googlegroups.com>.
For more options, visit https://groups.google.com/d/optout.
--
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
www.exelixis-lab.org
--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/UXXkkV_Am0o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.
> <mailto:raxml%2Bu...@googlegroups.com>
> <mailto:raxml+un...@googlegroups.com
> <mailto:raxml%2Bu...@googlegroups.com>>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>
>
>
> --
> You received this message because you are subscribed to a topic in
> the Google Groups "raxml" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/raxml/UXXkkV_Am0o/unsubscribe
> <https://groups.google.com/d/topic/raxml/UXXkkV_Am0o/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to
> raxml+un...@googlegroups.com
> <mailto:raxml%2Bu...@googlegroups.com>.