Comparison of DPD system run with HOOMD and LAMMPS on GPU

700 views
Skip to first unread message

Kirill Lykov

unread,
Feb 28, 2015, 3:32:14 PM2/28/15
to hoomd...@googlegroups.com
Dear HOOMD users,

I'm performing simulations of HOOMD with GPU to find out the best possible performance and, then, compare with LAMMPS GPU implementation.
I found that HOOMD ran 2 times faster and I would like to ask - is this result expectable or the comparison is unfair? By unfair, I mean that the two simulations might be not equivalent: may be there are some differences in parameters which are not transparent to me (like neighbour list parameters or communication parameters in HOOMD)? Will be thankful for any comments.

I use DPD simple fluid with ~400K number of particles, simulation domain 48x48x48. I ran on one node, for HOOMD I used 1 MPI task, for LAMMPS - 4 MPI tasks.
The HOOMD script is the following:
from hoomd_script import *

init.create_random(N=442368, name='A', min_dist=0.1, box=data.boxdim(L=48))
dpd = pair.dpd(r_cut=1.0, T=0.0945)
dpd.pair_coeff.set('A', 'A', A=100.0, gamma = 45.0)

integrate.mode_standard(dt=0.001)
integrate.nve(group=group.all())

run(5e4)

LAMMPS script (a bit longer):

package gpu 1 device kepler

boundary p p p


units     lj

atom_style    atomic


lattice custom 3.0 a1 1.0 0.0 0.0 a2 0.0 1.0 0.0 a3 0.0 0.0 1.0 &

     basis 0.5 0.0 0.0 basis 0.0 0.5 0.0 basis 0.0 0.0 0.5


region box block -24.0 24.0  -24.0 24.0  -24.0 24.0


create_box  1 box

create_atoms    1 random 442368 1234 box

mass        1 1.0


neighbor    0.3 bin

neigh_modify    delay 0 every 4 check yes


comm_style brick

comm_modify vel yes

pair_style  dpd/gpu 0.0945 1.0 34387

pair_coeff  1 1 100.0 45.0 1.0


thermo          10000

timestep 0.001


fix 1 all nve

run 50000


Carolyn Phillips

unread,
Feb 28, 2015, 3:56:17 PM2/28/15
to hoomd...@googlegroups.com
There are, of course, some algorithmic differences in the way LAMMPS and HOOMD are implemented.   Those algorithmic differences, however, should not impact the correctness of the total calculation. 

Therefore, as long as you are using the optimal tuning of each algorithm, any performance difference you see is fair.  

HOOMD has autotuners.   See, 
and    

Meanwhile, the obvious place to improve your lammps code is these two lines

neighbor    0.3 bin

neigh_modify    delay 0 every 4 check yes


While the choices on these two lines can make dramatic differences, nobody can really tell you what the optimal parameters here should be. I am not aware of LAMMPS having an autotuner for these parameters, so a little systematic trial and error is your best option.







--
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To post to this group, send email to hoomd...@googlegroups.com.
Visit this group at http://groups.google.com/group/hoomd-users.
For more options, visit https://groups.google.com/d/optout.

Kirill Lykov

unread,
Feb 28, 2015, 4:02:25 PM2/28/15
to hoomd...@googlegroups.com
Thank you for the reply.
Regarding these neighbor options from LAMMPS, do you ,by chance, know
how to set up the same in HOOMDS?

To be completely correct with DPD, I should have used neighbor modify
every timestep. Since in DPD velocities are much higher than in MD.
But since there is no external driving force in this particular
simulation, I set up 4.
> You received this message because you are subscribed to a topic in the
> Google Groups "hoomd-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/hoomd-users/TMrU_scpX08/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> hoomd-users...@googlegroups.com.
> To post to this group, send email to hoomd...@googlegroups.com.
> Visit this group at http://groups.google.com/group/hoomd-users.
> For more options, visit https://groups.google.com/d/optout.



--
Best regards,
Kirill Lykov,
personal page: http://kirilllykov.github.com/blog/about/
tel.: +41 765 27 6229

Carolyn Phillips

unread,
Feb 28, 2015, 4:37:05 PM2/28/15
to hoomd...@googlegroups.com
Here in the documentation the default r_buff and check_periods are discussed.

I believe the default HOOMD settings are then equivalent to

neighbor    0.8 bin

neigh_modify    delay 0 every 1 check yes


As I read their documentation, the default of LAMMPS is
neighbor 0.3 bin
neigh_modify delay 10 every 1 check yes

(I must admit, this seems very odd to me, as their default (delay 10) is not conservatively correct behavior as I read their documentation.  You could need, and miss, a neighbor list build.  As I understand their documentation, using a non-zero delay at all could mean missing an interaction and should be used at your own risk.  Better couple that with a generous r_buff!! 
Compare this to HOOMD
"For safety, the default check_period is 1 to ensure that the neighbor list is always updated when it needs to be. Increasing this to an appropriate value for your simulation can lead to performance gains of approximately 2 percent."
)

However, the point I want to emphasize is that these are likely not the optimal settings for either HOOMD or LAMMPS.  In fact, HOOMD and LAMMPS likely do not have the same optimal settings.    So while it makes some sense to try to initially compare the performance of the two codes using similar settings, there is nothing "fair" about this.  Fair is optimal tuning while maintaining correctness.  

Experience shows you likely want to play with the r_buff (e.g. the 0.3 and 0.8) and just keep "check yes" to get your best performance tuning.




Trung Nguyen

unread,
Mar 1, 2015, 12:10:09 AM3/1/15
to hoomd...@googlegroups.com
Hi Kirill,

I would like to add a few comments besides the discussion on autotunner and neighbor list setting. Because you're using only one MPI task for HOOMD, the entire calculation is performed on the GPU (unless you explicitly set mode=cpu in the command line), there's no communication overhead between MPI processes, nor the overhead due to GPU oversubscription from the MPI processes. Meanwhile, the GPU package in LAMMPS only accelerates the force compute on the device and thus its expected performance gain (vs. CPU-only) is bound by the contribution of the force compute to the simulation time (Amdahl's law) and the host-device bandwidth. This is one of the main reasons that lead to the difference in their performances, especially when you are comparing single-GPU runs.

For the system size you are simulating (440K), that HOOMD with one MPI process is 2x faster than LAMMPS GPU with 4 MPI processes is not surprising, to my opinion. You can try HOOMD with 4 MPI processes to see if the difference between the two codes' performance would change. For DPD, we observe that LAMMPS GPU would get closer to HOOMD when the number of particles per GPU decreases for multiple GPU runs, e.g. 2M particles on more than 64 nodes (i.e. 64 GPUs). 

-Trung

Kirill Lykov

unread,
Mar 2, 2015, 4:22:21 AM3/2/15
to hoomd...@googlegroups.com
Thank you, Trung and Carolyn, for shading light on these issues. I first tried with 1 gpu and then move to bigger systems.

Carolyn: I have a very basic problem with a syntax of HOOMD - I cannot set nlist parameters. It seems that something like
nlist.set_params(r_buff = 0.3, check_period = 4, dist_check = True)
should set it but it returns error "NonType object has not attribute set_params", so it cannot find nlist. Could you say how to fix it?

Joshua A. Anderson

unread,
Mar 2, 2015, 7:45:04 AM3/2/15
to hoomd...@googlegroups.com, Kirill Lykov
1) 2x is in line with what we normally see for HOOMD vs LAMMPS-GPU. The new, still under development LAMMPS Kokkos module matches HOOMD performance on a single GPU.

2) As Carolyn pointed out, tuning your nlist params is crucial for DPD simulations. The defaults in HOOMD are for LJ type systems, and the defaults in LAMMPS are nonsensical.

3) I'm not sure what is non-intuitive about hoomd's nlist params. They are clearly documented. They correspond to the same values in LAMMPS, except that I have no idea what order they go in for the LAMMPS command. Critical for DPD is that you need the check period to be 1. In HOOMD, tune.r_buff can automatically find the best settings for you.

4) Odd that the nlist.set_params doesn't work. What version of HOOMD are you running? If it's not v1.0.x, then you should upgrade. If it is,
post the output of this script:
---
from hoomd_script import *
print(nlist)
---

--
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan
Phone: 734-647-8244
http://www-personal.umich.edu/~joaander/
> --
> You received this message because you are subscribed to the Google Groups "hoomd-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com <mailto:hoomd-users...@googlegroups.com>.
> To post to this group, send email to hoomd...@googlegroups.com <mailto:hoomd...@googlegroups.com>.

Carolyn Phillips

unread,
Mar 2, 2015, 8:14:24 AM3/2/15
to hoomd...@googlegroups.com, Kirill Lykov
nlist does not exist until after a pair force has been specified.  Perhaps your problem is that you are trying to modify its parameters before it has been created.

-Carolyn

To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To post to this group, send email to hoomd...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages