Estimated Bootstrap Guesser

Skip to first unread message

Alexandros Stamatakis

Oct 20, 2024, 1:52:52 AMOct 20
Dear Users,

Do you want to rapidly predict bootstrap values via machine learning?
You can now use our Educated Bootstrap Guesser:

This will also be integrated into RAxML-NG next year.


Alexandros (Alexis) Stamatakis

ERA Chair, Institute of Computer Science, Foundation for Research and
Technology - Hellas
Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology (Crete lab) (Heidelberg lab)

Pfeiffer, Wayne

Nov 1, 2024, 6:44:22 PMNov 1
to, Pfeiffer, Wayne
Hi Alexis,

After receiving this announcement about EBG I was eager to try it out.

I installed the package via conda and typed

ebg -h

which returns the expected output.

So far, however, I have been unable to get it run for actual data. I found that I needed absolute paths to the input files to avoid some errors, but even after adding those paths, I still get errors.

Here is my command line:

ebg -msa /expanse/projects/ngbt/opt/benchmarks/EBG-0.12.0_expanse/218/218.fasta -tree /expanse/projects/ngbt/opt/benchmarks/EBG-0.12.0_expanse/218/218.bestTree -model /expanse/projects/ngbt/opt/benchmarks/EBG-0.12.0_expanse/218/218.bestModel -redo

and here are the error messages:

Traceback (most recent call last):
File "/home/cipres/miniconda3/envs/ebgenv/bin/ebg", line 10, in <module>
File "/home/cipres/miniconda3/envs/ebgenv/lib/python3.12/site-packages/EBG/", line 29, in main
predictor = Predictor(args.msa, args.tree, args.model, args.o, args.t, args.raxmlng, args.redo)
File "/home/cipres/miniconda3/envs/ebgenv/lib/python3.12/site-packages/EBG/Prediction/", line 64, in __init__
self.feature_extractor = FeatureExtractor(msa_filepath, tree_filepath, model_filepath, o, raxml_ng_path, redo)
File "/home/cipres/miniconda3/envs/ebgenv/lib/python3.12/site-packages/EBG/Features/", line 36, in __init__
self.feature_computer = FeatureComputer(msa_file_path, tree_file_path, model_file_path, output_prefix, raxml_ng_path, redo)
File "/home/cipres/miniconda3/envs/ebgenv/lib/python3.12/site-packages/EBG/Features/", line 80, in __init__
tmp_folder_path = os.path.abspath(os.path.join(os.curdir, output_prefix))
File "<frozen posixpath>", line 90, in join
File "<frozen genericpath>", line 164, in _check_arg_types
TypeError: join() argument must be str, bytes, or os.PathLike object, not ‘NoneType'

* Is there someone on your team who might suggest what the problem is and how to solve it?

Thanks for whatever help you can provide.


> On Oct 19, 2024, at 10:52 PM, Alexandros Stamatakis <> wrote:
> Dear Users,
> Do you want to rapidly predict bootstrap values via machine learning? You can now use our Educated Bootstrap Guesser:
> This will also be integrated into RAxML-NG next year.
> Alexis
> --
> Alexandros (Alexis) Stamatakis
> ERA Chair, Institute of Computer Science, Foundation for Research and Technology - Hellas
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>;!!Mih3wA!DwLRVEWVr0dum5oC3vVKM35Jl5luV8F-7Cg_YM5Lx5Td4V5rTYAXrYP2yA-obT9jvI8Mrg5INaNEGf7y2k3jRE0KGrA$ (Crete lab)
>;!!Mih3wA!DwLRVEWVr0dum5oC3vVKM35Jl5luV8F-7Cg_YM5Lx5Td4V5rTYAXrYP2yA-obT9jvI8Mrg5INaNEGf7y2k3jKXYJWQU$ (Heidelberg lab)
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> To view this discussion on the web visit*40gmail.com__;JQ!!Mih3wA!DwLRVEWVr0dum5oC3vVKM35Jl5luV8F-7Cg_YM5Lx5Td4V5rTYAXrYP2yA-obT9jvI8Mrg5INaNEGf7y2k3j_AawPLI$ .

Oleksiy Kozlov

Nov 1, 2024, 6:54:51 PMNov 1
Hi Wayne,

please try to add "-o <output_folder>", there seems to be a missing default value for this argument,


Pfeiffer, Wayne

Nov 2, 2024, 5:27:10 AMNov 2
to, Pfeiffer, Wayne
Hi Oleksiy,

Thanks for the prompt reply.

Adding the -o option allowed me to successfully analyze two small DNA data sets in 49 and 209 s :)

However, analysis of a larger DNA data set with 45 taxa and 168,565 patterns ran out of time after reaching my specified time limit of 6 hours. Here is the final output in stderr

FeatureComputer - INFO - Finished computing 180 from 200 parsimony bootstraps ... 
FeatureComputer - INFO - Finished computing 200 Parsimony Bootstraps
FeatureComputer - INFO - Finished computing Parsimony Bootstrap features!
FeatureExtractor - INFO - Elpased time: 2306.95 seconds
FeatureComputer - INFO - Finished computing tree split features!
FeatureExtractor - INFO - Elpased time: 0.12 seconds

So no output was generated for over 5 hours.

* Do you think this analysis would finish if run longer, or is this data set just too big for EBG?

I also tried to analyze two amino acid data sets, but both attempts failed. EBG thought that the input was for partitioned DNA data sets, even though the original RAxML-NG analyses were unpartitioned and the model files specified WAG+G4m or LG+G4m. Here is the start of the stdout file from my run with the WAG+G4m model:

ERROR: Failed to read partition file:
ERROR model initialization |(Seq140:0.314768| (LIBPLL-5001): DNA model not found: (Seq140:0.314768

I presumed that AA data sets were allowed, since the paper by Wiegert et al says:

“We used 1496 MSAs (93% DNA and 7% Amino Acid (AA)) for training and evaluating EBG.”

* Please let me know if you would like me to send you or one of your colleagues any of my input files by a separate email thread for further investigation.

Thanks again,

Pfeiffer, Wayne

Nov 2, 2024, 7:22:57 PMNov 2
to, Pfeiffer, Wayne
Hi Oleksiy,

I resubmitted a job for the large DNA data set with a longer time limit, and it nearly finished after 13.2 hours when it ran out of memory.

I have resubmitted two more jobs requesting more memory to see whether one of them finishes successfully.

Also, EBG does not seem to accept a partition file as input.

* Does that mean that EBG cannot handle partitioned data sets?

Best regards,

Pfeiffer, Wayne

Nov 4, 2024, 3:41:02 AMNov 4
to, Pfeiffer, Wayne
Hi Oleksiy,

My EBG analysis of the DNA data set with 45 taxa and 168,565 patterns finished successfully in 12.6 hours after I increased the memory to 32 GB. The analysis was fine using only 2 GB of memory until the very end, when a final processing step became very memory intensive.

* This explosion in memory usage would be good to investigate along with why the code does not work for amino acid data sets.

Best regards,

Oleksiy Kozlov

Nov 4, 2024, 8:46:32 AMNov 4
to, Pfeiffer, Wayne
Hi Wayne,

thanks for extensive testing!

I was not involved in this project, so I will have to discuss your questions with colleagues, and
this could take a while.

We will also discuss whether/how EBG could be integrated into raxml-ng, and hopefully we can address
some limitations of the current implementation in the process.

My best guess so far:

- partitioned alignments are not supported

- for AA data, could it be that you provided a newick file instead of .raxml.bestModel in the
"-model" option?

> <>.
> To view this discussion visit
> <>.

Pfeiffer, Wayne

Nov 4, 2024, 10:13:16 AMNov 4
to Oleksiy Kozlov,, Pfeiffer, Wayne
Reply all
Reply to author
0 new messages