LAMMPS MBX MPI performance with OMP threads

58 views
Skip to first unread message

Tim Tan

unread,
Mar 18, 2025, 1:17:10 AMMar 18
to MBX-users
Dear MBX developer team:

I have a question regarding the current MBX compile set up steps. I saw an earlier chat post said MBX v1.2 takes advantage of OMP threads than MPI ranks. I ran one just now and the performance I got showed that there is barely any difference between mpi -np 32 vs. mpi -np 1 -omp 32. One of my guesses was that this may occur due to MBX unable to use the threads. 
My question is how do we write the command line so MBX can be assigned with the threads? Do we need to edit the input file in order to do so? During ./configure step, is there any extra steps that needs to be done in order to use MPI hybrid OpenMP acceleration?

Best regards,
Tim

Ruihan Zhou

unread,
Mar 21, 2025, 5:02:10 PMMar 21
to MBX-users
Hi Tim,

Thank you for your interest in using MBX! As long as you have specified the   --enable-mpi  flag, you should be able to use the command line specifications for MPI/OMP hybrid acceleration. 
However, MPI and OMP are optimizing different components of simulation, so depending on the systems the best combination can vary. Would you let us know which system you've run the test with?


Best,
Ruihan 

Tim Tan

unread,
Mar 22, 2025, 8:11:45 PMMar 22
to MBX-users
Thanks for the response Ruihan!
I am currently running the simulation using the initial data from MBX/examples/lammps/256h2o. The timestep I run with is 1fs for 50,000 time steps. 
I run a couple more tests after I post the questions with different combinations of mpi and omp settings. Here is the result I got. All the simulation performance here are running on the same HPC cluster to avoid hardware difference:
32 mpi 1 omp threads: 0.756 ns/day, 31.763 hours/ns, 8.745 timesteps/s, 6.716 katom-step/s; 57.5 % CPU use
16 mpi 2 omp threads: 0.719 ns/day, 33.397 hours/ns, 8.318 timesteps/s, 6.388 katom-step/s; 147.35% CPU use
8 mpi 4 omp threads: 0.870 ns/day, 27.583 hours/ns, 10.071 timesteps/s, 7.734 katom-step/s; 348.8% CPU use
4 mpi 8 omp threads: 1.041 ns/day, 23.052 hours/ns, 12.050 timesteps/s, 9.254 katom-step/s; 759.0% CPU use
2 mpi 16 omp threads: 0.167 ns/day, 143.912 hours/ns, 1.930 timesteps/s, 1.482 katom-step/s; 89.8% CPU use
1 mpi 32 omp threads: 0.098 ns/day, 244.552 hours/ns, 1.136 timesteps/s, 872.345 atom-step/s; 92.0% CPU use
 I found it interesting that it does seem to use omp acceleration. But the optimal setting seemed to be different. I am curious what happened in here as well. Would I see the same trend if I apply this to a bigger system? Can you expand a little on what do you mean by mpi and omp are optimizing different components of the simulation?

Best,
Tim

Ruihan Zhou

unread,
Mar 24, 2025, 6:40:22 PMMar 24
to MBX-users
Hi Tim,

Thanks for sharing the benchmark results. Different processors or HPC architectures can also result in different communication time or introduce memory-access penalties, which is why we always suggest running these quick tests before longer simulations. Going beyond the optimal balance of your hardware architecture (in this case to 16 or 32 threads per process) can cause memory access bottlenecks and thread competition issues, which could explain the low CPU utilization you observed for the last two benchmarks. While the sub 100% cpu usage in the last two test cases could just mean that OMP is active and just very inefficient for 16 or 32 OMP threads on your hardware, it‘s more likely that OMP multithreading is not active at all for these two test cases. Without more information it’s hard to know why. Could you double-check if there're 16 cores available on processors you are running these simulations on?

For MBX with LAMMPS, MPI is mainly used for dividing the simulation box into subdomains, where each processor calculates the interaction within a subdomain and with ghost particles in adjacent sub-domains. The calculations associated with each subdomain are accelerated with OMP threads. For more details you can refer to the MBX1.2 paper https://pubs.acs.org/doi/abs/10.1021/acs.jctc.4c01333


- The MBX Team
Reply all
Reply to author
Forward
0 new messages