Then I ran the Parse command to determine the memory requirements and thread recommendations:
raxml-ng --parse --msa gtdbtk.bac120.msa.forRAxML.fasta --model LG+G8+F --prefix T2
I then ran RAxML-ng on the RBA file created during Parse. Since I'm running on a machine with 300 GB RAM and 16 cores (Parse recommended 14 cores), I decided to increase the tree numbers:
raxml-ng --msa T2.raxml.rba --model LG+G8+F --prefix T3 --threads 14 --seed 2 --tree pars{25},rand{25}
The program has been running for over a week at this point. Here is the Log information:
RAxML-NG v. 0.9.0 released on 20.05.2019 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml
RAxML-NG was called at 12-Jul-2021 10:34:23 as follows:
raxml-ng --msa T2.raxml.rba --model LG+G8+F --prefix T3 --threads 14 --seed 2 --tree pars{25},rand{25}
Analysis options:
run mode: ML tree search
start tree(s): random (25) + parsimony (25)
random seed: 2
tip-inner: OFF
pattern compression: ON
per-rate scalers: OFF
site repeats: ON
fast spr radius: AUTO
spr subtree cutoff: 1.000000
branch lengths: proportional (ML estimate, algorithm: NR-FAST)
SIMD kernels: AVX2
parallelization: PTHREADS (14 threads), thread pinning: OFF
WARNING: The model you specified on the command line (LG+G8+F) will be ignored
since the binary MSA file already contains a model definition.
If you want to change the model, please re-run RAxML-NG
with the original PHYLIP/FASTA alignment and --redo option.
[00:00:00] Loading binary alignment from file: T2.raxml.rba
[00:00:02] Alignment comprises 23478 taxa, 1 partitions and 5040 patterns
Partition 0: noname
Model: LG+FC+G8m
Alignment sites / patterns: 5040 / 5040
Gaps: 11.56 %
Invariant sites: 0.00 %
NOTE: Per-rate scalers were automatically enabled to prevent numerical issues on taxa-rich alignments.
NOTE: You can use --force switch to skip this check and fall back to per-site scalers.
[00:00:03] Generating 25 random starting tree(s) with 23478 taxa
[00:00:04] Generating 25 parsimony starting tree(s) with 23478 taxa
It looks like there are 23,748 taxa - including the 20 text taxa and all the GTDBTk reference taxa. The MSA is amino acid sequences for the 37 protein sequences GTDBTk uses for classification, concatenated for each sample in FASTA format.
I've never used RAxML before, so I'm not sure if this time frame is expected? Any advice would be appreciated.
Thanks!
Kevin