Thanks for the response Ruihan!
I am currently running the simulation using the initial data from MBX/examples/lammps/256h2o. The timestep I run with is 1fs for 50,000 time steps.
I run a couple more tests after I post the questions with different combinations of mpi and omp settings. Here is the result I got. All the simulation performance here are running on the same HPC cluster to avoid hardware difference:
32 mpi 1 omp threads: 0.756 ns/day, 31.763 hours/ns, 8.745 timesteps/s, 6.716 katom-step/s; 57.5 % CPU use
16 mpi 2 omp threads: 0.719 ns/day, 33.397 hours/ns, 8.318 timesteps/s, 6.388 katom-step/s; 147.35% CPU use
8 mpi 4 omp threads: 0.870 ns/day, 27.583 hours/ns, 10.071 timesteps/s, 7.734 katom-step/s; 348.8% CPU use
4 mpi 8 omp threads: 1.041 ns/day, 23.052 hours/ns, 12.050 timesteps/s, 9.254 katom-step/s; 759.0% CPU use
2 mpi 16 omp threads: 0.167 ns/day, 143.912 hours/ns, 1.930 timesteps/s, 1.482 katom-step/s; 89.8% CPU use
1 mpi 32 omp threads: 0.098 ns/day, 244.552 hours/ns, 1.136 timesteps/s, 872.345 atom-step/s; 92.0% CPU use
I found it interesting that it does seem to use omp acceleration. But the optimal setting seemed to be different. I am curious what happened in here as well. Would I see the same trend if I apply this to a bigger system? Can you expand a little on what do you mean by mpi and omp are optimizing different components of the simulation?
Best,
Tim