Benchmark Performance Database

134 views
Skip to first unread message

Jakin Delony

unread,
Feb 4, 2021, 3:57:12 PM2/4/21
to hoomd-users
I used the HOOMD benchmarks provided at https://glotzerlab.engin.umich.edu/hoomd-blue/benchmarks.html to benchmark MD performance on an RTX 3090 using v2.9.3.  The database of hardware performance hasn't been updated in ~3 years (the latest GPU listed is a Tesla P100), so does your group have access to a database that is more current, or do you have any suggestions how I could get a fair comparison with current hardware?  For reference, the specs for single-rank, single precision MD benchmarks for v2.9.3 on the 3090 are below:

lj liquid - Hours to complete 10e6 steps: 0.32800617353744466
triblock copolymer - Hours to complete 10e6 steps: 0.4097621371409968
microsphere - Hours to complete 10e6 steps: 5.771346368893445
quasicrystal - Hours to complete 10e6 steps: 0.7863728134300128

Our group is trying to determine whether to invest in several more of these GPUs for our HPC cluster, so any advice is appreciated.

Thanks,
Jakin

Joshua Anderson

unread,
Feb 4, 2021, 5:13:49 PM2/4/21
to hoomd...@googlegroups.com
Jakin,

I haven't updated those results recently as I don't think they are representative of what most users actually run with HOOMD. If there is a particular benchmark case relevant for your research, post a self-contained script and I would be happy to run it on a V100. That is the fastest GPU I have access to. These are the GPUs typically used in large HPC centers like OLCF Summit and PSC Bridges-2.
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan


--
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/6b3a7d09-8800-492d-aae2-d4a1248fcf60n%40googlegroups.com.

Jakin Delony

unread,
Feb 4, 2021, 6:02:21 PM2/4/21
to hoomd-users
Dr. Anderson,

Thanks - if you have the single rank, single precision data for the MD benchmarks on the V100 (LJ liquid, triblock copolymer, microsphere, and quasicrystal) that would be a great start.  Our group's production-level simulations are still in v1.0.1 because of custom plugins that we haven't migrated to v2.x yet, so I don't have a compatible benchmark on hand but could have one ready next week.

Thanks,
Jakin

Joshua Anderson

unread,
Feb 5, 2021, 8:09:19 AM2/5/21
to hoomd...@googlegroups.com
Jakin,

I recommend that you update your plugins for v3. v2 is in maintenance mode now (only bugfix releases). Once 3.0.0 is complete, we will stop maintaining v2.

I don't have a single precision build to use to run the benchmarks, nor do I support such a build configuration. For 3.x, we plan to implement a mixed precision mode that can give most of the performance benefits of single without loss of accuracy. Here are the double precision performance numbers on V100:

lj-liquid: Hours to complete 10e6 steps: 0.6721325771882073
triblock-copolymer: Hours to complete 10e6 steps: 0.9222339257454073
microsphere: Hours to complete 10e6 steps: 14.320493993917205
quasicrystal: Hours to complete 10e6 steps: 1.691228061921023

These are approximately 2x your single precision numbers on RTX3090 which is consistent with the fact that V100 and 3090 have approximately the same memory bandwidth and doubles are 2x the size of floats. HOOMD performance is entirely limited by the available memory bandwidth and the effectiveness of the on-chip cache.

V100 and RTX 3090 are a generation apart. The datacenter version of the 3090 is the A100 - but I don't have access to any A100s for testing.

------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

ifh...@gmail.com

unread,
Sep 17, 2021, 2:45:58 AM9/17/21
to hoomd-users
Hello,

I have made a similar try with a new workstation running an RTX3090 on an i5-10500. I had good experience in the past running a TitanX (pascal) and could reproduce the benchmark for lj-liquid on the TitanX. But with the RTX3090 I am only at
3.25 hours instead of 0.6hrs as I would expect. I do not know what could cause this. As the card is utilized ~99%. I tried the same version of hoomd ( 2.9.3) and run it via singularity from the docker image. With that setup the TitanX runs it as expected but as mentioned the RTX 3090 does not.  I also did a test with the beta 3.0 and it was about 15% faster but nowhere near the 0.6hours

Maybe you could give me a hint what the bottleneck is here.

Greetings Martin
Reply all
Reply to author
Forward
0 new messages