2025 GPU for BEAST X and BEAST 2

88 views
Skip to first unread message

Gautier Richard

unread,
Jul 14, 2025, 2:47:15 PMJul 14
to beast-users
Hello everyone,

We are setting up a small datacenter to perform some machine learning, deep learning and BEAST X computations using GPUs.

We were wandering what are the most important specs for GPUs when it comes to BEAST X? We are mainly interested in analyzing influenza phylodynamics and phylogeographics as well as reassortments dynamics with BEAST X and BEAST 2.

After a few hours of research it seems that FP64 performance is the most important aspect for double precision? My conclusions were thus that A100 GPUs are probably the best (not sure that we have the budget for an H100 or H200). And this would thus rule out the L40S and L4 GPUs? What about the RTX5090? Currently we have an RTX4070 on a desktop computer for the demanding models, and it's indeed faster than CPUs. We are however limited in the number of GPUs at the moment.

All the best and many thanks,

Gautier Richard, PhD
Project Manager in Molecular Epidemiology,
Swine Immunology Virology Unit, Ploufragan, France

Guy Baele

unread,
Jul 21, 2025, 3:46:35 PMJul 21
to beast-users
The main requirement is indeed the FP64 performance; in my experience, only the (expensive) nVidia and AMD cards are a real option.
The problem that I had last year was that I couldn't actually get my hands on A100 GPUs, as I was told that I would have to wait for at least 1 year without any guarantee that I would actually get any.
So I ended purchasing half the number of GPUs by going for the H100 over the A100, which is a very painful choice given that the performance benefit (for phylogenetics) doesn't warrant the price increase of an H100 over an A100 GPU.

Best I can tell, the RTX5090 has an FP64 performance of 1.637 TFLOPS, whereas an A100 GPU has 9.7 TFLOPS and an H100 has 34 TFLOPS.
But note that these numbers don't easily translate to BEAST performance increases, so it's best to try to get as many modern GPUs as possible for your budget.
I would suggest to go for the A100 (not sure you can still get older GPUs) indeed and focus on having a decent memory size, e.g. the 80Gb version.
But nowadays, perhaps AMD Instinct cards may be the better option, go for an MI210 or higher (MI300 and higher have very nice benchmark numbers).

Best regards,
Guy

Op maandag 14 juli 2025 om 20:47:15 UTC+2 schreef Gautier Richard:

Pfeiffer, Wayne

unread,
Jul 23, 2025, 4:21:37 PMJul 23
to gautier....@gmail.com, beast...@googlegroups.com
Hi Richard,

I am the project leader for the CIPRES science gateway. We run tens of BEAST, BEASTX, and BEAST2 jobs every day on our Expanse cluster that has AMD cores and NVIDIA V100 and A100 GPUs. Currently 26 jobs are running: 10 are using BEAST or BEASTX, and 16 are using BEAST2. Two jobs using BEAST 1.10.4 are running on GPUs, while all of the other jobs are running on cores.

To decide how to run these jobs cost-effectively, I have done extensive benchmarking of BEAST, BEASTX, and BEAST2 on both cores and GPUs. From these benchmarks I developed rules for choosing the number of cores or GPUs for cost-effective execution. The most important parameters for the rules are

- the number of patterns,
- the number of partitions, and
- whether or not the data set has amino acid sequences.

Before each job we run a script that parses the input xml file to get the needed parameters. GPUs are generally much faster than cores on Expanse, but the usage charge for GPUs is correspondingly higher: 1 GPU hour = 20 AMD core hours. Thus we only use GPUs when their expected speedup over cores is at least 5. For really large data sets, we see speedups on GPUs over cores of ≥30 for BEASTX and ≥100 for BEAST2. We just have a few A100s, so we use them only when they are 1.3x faster than V100s.

With that as background, the rules that we use for BEASTX and BEAST2 are appended here. Feel free to contact me if you want more information.

Best regards,
Wayne

---

* Rules for running BEASTX 10.5.0 on Expanse via the CIPRES gateway

The runs use varying numbers of cores and GPUs within a single node of Expanse
depending upon the data set.

- Ask the user for the following:

  whether the data set has amino acids (AAs),
  the number of partitions in the data set,
  the total number of patterns in the data set, and
  whether the analysis needs extra memory.

  If the data set does not contain AAs, assume that it is DNA.

- Specify the Slurm partition, threads, beagle_instances, cores, GPU type, and
  Slurm memory according to the following table. Also, use the additional BEAGLE
  parameters listed in the examples, including -beagle_scaling dynamic unless
  the user specifies otherwise.

   Data         Data     Memory      Slurm             beagle_          GPU   Slurm
partitions    patterns   needed   partition  threads instances  cores  type  memory  

DNA data

    <8          <3,000  regular    shared       3        3        3             6G
    <8    3,000-79,999  regular  gpu-shared     1                10    V100    90G
    <8        >=80,000  regular  gpu-shared     1                10    A100    90G
   >=8         <10,000  regular    shared       3        3        3             6G
   >=8        >=10,000  regular    shared       4        4        4             8G
   any           any     extra     shared       6        6        6            12G
                                                                    
AA data

   any           any    regular  gpu-shared     1                10    V100    90G

---

* Rules for running BEAST2 2.x.x on Expanse via the CIPRES gateway

The runs use varying numbers of cores and GPUs within a single node of Expanse
depending upon the type of analysis and the data set.

- Ask the user for the following:

  whether the analysis uses Path Sampling, SNAPP, or both,
  whether the data set has amino acids (AAs),
  the number of partitions in the data set,
  the total number of patterns in the data set,
  whether the analysis needs extra memory. 

- Specify the Slurm partition, threads, instances, cores, GPUs, and memory according
  to the following table. Also, use the additional BEAGLE parameters listed in the
  examples, including-beagle_scaling dynamic unless the user specifies otherwise.

   Data          Data     Extra    Slurm                                         Slurm 
partitions     patterns  memory  partition  -threads -instances  cores   GPUs   memory

Any data with Path Sampling but without SNAPP

    any           any       no    shared        6         1         6             11G

Any data with SNAPP but without Path Sampling

    any           any       no    shared       24         1        24             46G     

Any data with SNAPP and Path Sampling

    any           any       no    shared       25         1        25             50G 
    any           any      yes    compute      25         1        25            243G

DNA data without Path Sampling or SNAPP

  1 to 3         <5,000     no    shared        3         3         3              6G
  1 to 3   5,000-37,999     no  gpu-shared      1         1        10    1 V100   90G
  1 to 3  38,000-99,999     no  gpu-shared      1         1        10    1 A100   90G
  1 to 3      >=100,000     no  gpu-shared      2         2        20    2 A100  180G

  4 to 17        <1,200     no    shared        1         1         1              2G
  4 to 17   1,200-4,999     no    shared        3         1         3              6G
  4 to 17  5,000-19,999     no    shared        6         2         6             12G
  4 to 17      >=20,000     no  gpu-shared      1         1        10    1 V100   90G

   >=18          <8,000     no    shared        2         1         2              4G
   >=18    8,000-13,999     no    shared        3         1         3              6G
   >=18   14,000-39,999     no    shared        6         2         6             12G
   >=18        >=40,000     no  gpu-shared      1         1        10    1 V100   90G

    any           any      yes    shared       12         1        12             24G

AA data without Path Sampling or SNAPP

     1          <12,000     no  gpu-shared      1         1        10    1 V100   90G
     1         >=12,000     no      gpu         4         4        40    4 V100  360G

  2 to 39         any       no  gpu-shared      1         1        10    1 V100   90G

   >=40           any       no    shared       24         1        24             46G

-- 
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/beast-users/51b0ca3a-d396-4ab9-ae8e-eb969a81f328n%40googlegroups.com.

Martin Gunnill

unread,
Sep 3, 2025, 3:32:03 PM (4 days ago) Sep 3
to beast-users
Dear Wayne

Am I correct in assuming that 'pattern' here refers to codons?

I also notice that BEAST 2 outputs the number of patterns to screen before MCMC run e.g:
```
Alignment(2023-03-01 56 WGS D8)
  56 taxa
  15678 sites
  266 patterns
```
Do you have any suggestions for estimating number patterns before running BEAST 2.
I am working on a pipelining tool for BEAST 2 and was wondering about the feasibility of inputting some of the decisions in your look up table.

Yours Martin

Pfeiffer, Wayne

unread,
Sep 3, 2025, 6:02:49 PM (4 days ago) Sep 3
to beast...@googlegroups.com, Pfeiffer, Wayne
On Sep 3, 2025, at 7:04 AM, Martin Gunnill <Martin....@phac-aspc.gc.ca> wrote:

Dear Wayne

Am I correct in assuming that 'pattern' here refers to codons?

No. The number of patterns is the number of unique columns in the multiple sequence alignment. It can be much smaller than the number of sites, as shown by your example alignment.

I also notice that BEAST 2 outputs the number of patterns to screen before MCMC run e.g:
```
Alignment(2023-03-01 56 WGS D8)
  56 taxa
  15678 sites
  266 patterns
```
Do you have any suggestions for estimating number patterns before running BEAST 2.
I am working on a pipelining tool for BEAST 2 and was wondering about the feasibility of inputting some of the decisions in your look up table.

BEASTX includes npatterns in the xml file :) However, BEAST2 does not :(

To schedule a BEASTX job, we first run a bash parser script that extracts npatterns from the xml file.

To get the number of patterns for a BEAST2 analysis, we run a BEAST2 pilot job just far enough to output the needed number. The pilot job uses a temporary xml file that includes

  chainlength=“0"
  preBurnin=“0”

The temporary xml file can be generated with the following commands:

  cp *xml temp.xml
  sed -i -e 's/chainLength="[0-9]*"/chainLength="0"/' temp.xml
  sed -i -e 's/preBurnin="[0-9]*"/preBurnin="0"/‘ temp.xml

The number of patterns is then extracted from the pilot job output file using a bash parser script.

Usually the pilot job finishes in less than 60 seconds, but if it doesn’t, we set the number of patterns to a default value of 1000.

Reply all
Reply to author
Forward
0 new messages