Hi Richard,
I am the project leader for the CIPRES science gateway. We run tens of BEAST, BEASTX, and BEAST2 jobs every day on our Expanse cluster that has AMD cores and NVIDIA V100 and A100 GPUs. Currently 26 jobs are running: 10 are using BEAST or BEASTX, and 16
are using BEAST2. Two jobs using BEAST 1.10.4 are running on GPUs, while all of the other jobs are running on cores.
To decide how to run these jobs cost-effectively, I have done extensive benchmarking of BEAST, BEASTX, and BEAST2 on both cores and GPUs. From these benchmarks I developed rules for choosing the number of cores or GPUs for cost-effective execution. The
most important parameters for the rules are
- the number of patterns,
- the number of partitions, and
- whether or not the data set has amino acid sequences.
Before each job we run a script that parses the input xml file to get the needed parameters. GPUs are generally much faster than cores on Expanse, but the usage charge for GPUs is correspondingly higher: 1 GPU hour = 20 AMD core hours. Thus we only use
GPUs when their expected speedup over cores is at least 5. For really large data sets, we see speedups on GPUs over cores of ≥30 for BEASTX and ≥100 for BEAST2. We just have a few A100s, so we use them only when they are 1.3x faster than V100s.
With that as background, the rules that we use for BEASTX and BEAST2 are appended here. Feel free to contact me if you want more information.
Best regards,
Wayne
---
* Rules for running BEASTX 10.5.0 on Expanse via the CIPRES gateway
The runs use varying numbers of cores and GPUs within a single node of Expanse
depending upon the data set.
- Ask the user for the following:
whether the data set has amino acids (AAs),
the number of partitions in the data set,
the total number of patterns in the data set, and
whether the analysis needs extra memory.
If the data set does not contain AAs, assume that it is DNA.
- Specify the Slurm partition, threads, beagle_instances, cores, GPU type, and
Slurm memory according to the following table. Also, use the additional BEAGLE
parameters listed in the examples, including -beagle_scaling dynamic unless
the user specifies otherwise.
Data Data Memory Slurm beagle_ GPU Slurm
partitions patterns needed partition threads instances cores type memory
DNA data
<8 <3,000 regular shared 3 3 3 6G
<8 3,000-79,999 regular gpu-shared 1 10 V100 90G
<8 >=80,000 regular gpu-shared 1 10 A100 90G
>=8 <10,000 regular shared 3 3 3 6G
>=8 >=10,000 regular shared 4 4 4 8G
any any extra shared 6 6 6 12G
AA data
any any regular gpu-shared 1 10 V100 90G
---
* Rules for running BEAST2 2.x.x on Expanse via the CIPRES gateway
The runs use varying numbers of cores and GPUs within a single node of Expanse
depending upon the type of analysis and the data set.
- Ask the user for the following:
whether the analysis uses Path Sampling, SNAPP, or both,
whether the data set has amino acids (AAs),
the number of partitions in the data set,
the total number of patterns in the data set,
whether the analysis needs extra memory.
- Specify the Slurm partition, threads, instances, cores, GPUs, and memory according
to the following table. Also, use the additional BEAGLE parameters listed in the
examples, including-beagle_scaling dynamic unless the user specifies otherwise.
Data Data Extra Slurm Slurm
partitions patterns memory partition -threads -instances cores GPUs memory
Any data with Path Sampling but without SNAPP
any any no shared 6 1 6 11G
Any data with SNAPP but without Path Sampling
any any no shared 24 1 24 46G
Any data with SNAPP and Path Sampling
any any no shared 25 1 25 50G
any any yes compute 25 1 25 243G
DNA data without Path Sampling or SNAPP
1 to 3 <5,000 no shared 3 3 3 6G
1 to 3 5,000-37,999 no gpu-shared 1 1 10 1 V100 90G
1 to 3 38,000-99,999 no gpu-shared 1 1 10 1 A100 90G
1 to 3 >=100,000 no gpu-shared 2 2 20 2 A100 180G
4 to 17 <1,200 no shared 1 1 1 2G
4 to 17 1,200-4,999 no shared 3 1 3 6G
4 to 17 5,000-19,999 no shared 6 2 6 12G
4 to 17 >=20,000 no gpu-shared 1 1 10 1 V100 90G
>=18 <8,000 no shared 2 1 2 4G
>=18 8,000-13,999 no shared 3 1 3 6G
>=18 14,000-39,999 no shared 6 2 6 12G
>=18 >=40,000 no gpu-shared 1 1 10 1 V100 90G
any any yes shared 12 1 12 24G
AA data without Path Sampling or SNAPP
1 <12,000 no gpu-shared 1 1 10 1 V100 90G
1 >=12,000 no gpu 4 4 40 4 V100 360G
2 to 39 any no gpu-shared 1 1 10 1 V100 90G
>=40 any no shared 24 1 24 46G