Obtaining Model Parameters from a Checkpoint Tree

Kaushik Komandur

unread,

May 3, 2024, 12:55:22 AMMay 3

to raxml

Hi RAxML-NG team,

I am running a phylogenetic construction on binary data with a fair bit of missing values, using the BINGAMMA model (4 categories). It's still running with checkpoints.

I was interested in simulating input data from the current gamma model being optimized by the program. Is there any way to access the model parameters (and the rate category labels of my input data) at a checkpoint?

Thanks!

- Kaushik Komandur

Anastasis Togkousidis

unread,

May 3, 2024, 12:54:33 PMMay 3

to raxml

Dear Kaushik,

There is an unofficial version of RAxML-NG, which we currently use to systematically study the dynamics of convergence of RAxML-NG. The link is the following: https://github.com/togkousa/raxml-ng/tree/overfitting

You can download it by simply typing:

git clone --recursive -b overfitting https://github.com/togkousa/raxml-ng.git

And then compile it (manually) as usual:
cd raxml-ng
mkdir build && cd build
cmake ..
make

Some useful comments for the execution:

- The output files with suffix ".lastTree.TMP_{i}" and ".bestModel_{i}" correspond to the checkpoint tree topology and the model parameters configuration at the i-th checkpoint step. Hence, for example, if you use the prefix "myrun" for your output files and you want to recalculate the log-likelihood score at the second checkpoint step, you can simply do so by executing the following command:

./raxml-ng --evaluate --msa {msa} --model myrun.raxml.bestModel_2 --tree myrun.raxml.lastTree.TMP_2 --opt-branches off --opt-model off

and you expect your output log-likelihood score to be equal to the printed value in RAxML-NG logfile.

- Since this is only a benchmark version, it does not support (or at least it is not guaranteed to support) basic functionalities. For example, parallel execution is not supported, or at least in our experiments we do not use it at all. So, we suggest to use the "--threads 1" command.

- Further, this is essentially a hacked version to print intermediate results, which uses a universal counter to enumerate the checkpoint step and attach it to each checkpoint file, so that the tool will not rewrite the same file. However, we assume that only one execution is conducted, and hence the algorithm has no functionality of distinguish checkpoint files from independent searches. So, for example, if the first ML tree inference comprises 18 steps, the first files of the second ML inference will start from counter 19 and the corresponding files will have suffixes ".raxml.bestModel_19" and ".raxml.lastTree.TMP_19". To overcome this limitation, we suggest to use separate commands and prefixes for independent tree inferences. For example:

./raxml-ng --threads 1 --msa {msa} --model {model} --tree rand{1} --seed 1 --prefix myrun1
./raxml-ng --threads 2 --msa {msa} --model {model} --tree rand{1} --seed 2 --prefix myrun2
./raxml-ng --threads 3 --msa {msa} --model {model} --tree rand{1} --seed 3 --prefix myrun3

- The results from this version are not guaranteed to match with any specific released version of RAxML-NG. I believe this github branch was created somewhere between the versions 1.1 and 1.2 of RAxML-NG. However, we are not sure about the default configuration of the parameters which changed between the two versions (e.g. epsilons, fast CLV update on SPR rounds). In case you want to reproduce some specific results derived from some version of RAxML-NG, please let us know.

We hope this (hacked) version will help you with your experiments. For further information please do not hesitate to contact us.
Best regards,
The RAxML-NG team

Kaushik Komandur

unread,

May 6, 2024, 9:24:08 PMMay 6

to raxml

This is super useful. Thank you for the detailed reply!

Best,

Kaushik

Reply all

Reply to author

Forward