I assume that these DGX-like systems are installed in many places and hence the scaling on these systems is of some general interest.
For comparison, here are the timings for the same LJ system with LAMMPS/KOKKOS (built with CUDA-aware MPI) that show much better scaling despite the fact that on 1 GPU HOOMD is faster:
1 GPU: 1144 sec
2 GPU: 626 sec
4 GPU: 362 sec
6 GPU: 264 sec
That is why I am curious if my built or running option of HOOMD is suboptimal.
Joshua Anderson
unread,
Jan 17, 2022, 7:42:27 AM1/17/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hoomd...@googlegroups.com
Vladimir,
As I stated on the issue, the only way to test which parallel configuration is optimal is to benchmark it.
HOOMD-blue's CMake scripts expect to use the native host compiler (gcc or clang), and nvcc as the CUDA compiler. You can control which compiler CMake users with an environment variable: https://cmake.org/cmake/help/v3.22/envvar/CUDACXX.html
Likewise, HOOMD-blue's CMake scripts search for MPI with the FindMPI module: https://cmake.org/cmake/help/v3.22/module/FindMPI.html - see the documentation for details on how to specify which MPI library to use.
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan