md.nlist.cell not running on GPU in GPU mode

140 views
Skip to first unread message

Remya Ann

unread,
Aug 19, 2022, 2:10:37 PM8/19/22
to hoomd-users
Dear all,

I run a simulation, where I have a bottleneck during the md.nlist.cell command.
I expected that the process would be faster on the GPU mode.

However, nvidia-smi command shows that the simulation starts using the GPU only after the hoomd.run() command.

Is there a way to force the system to do the cell list building on the GPU as well?

Kind regards,
Remya

Joshua Anderson

unread,
Aug 19, 2022, 2:39:30 PM8/19/22
to hoomd...@googlegroups.com
Remya,

Please share a minimal script that runs with HOOMD-blue v3.4.0 and has no dependencies other than hoomd and numpy. I can run this script and investigate where the time is spent.
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

--
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/fc7cc5b7-b92e-47fd-b25b-170087ac5444n%40googlegroups.com.

Remya Ann

unread,
Aug 21, 2022, 11:23:49 AM8/21/22
to hoomd-users

Dear Dr. Joshua,
I have attached 2 scripts here. The v2 version gets stuck at the cell list build stage. However, it ran on the GPU for a smaller number of particles till it gave the error as follows:
#---------------
RuntimeError: scan failed on 2nd step: cudaErrorInvalidValue: invalid argument
**ERROR**: invalid argument before ../hoomd/GPUArray.h:160
terminate called after throwing an instance of 'std::runtime_error'
  what():  CUDA Error
Abort
#--------------
The v3 script takes time at the lj and expanded_lj sections. However, at the run stage, I get the same error as above.
 I read in the previous documentation that it could be due to a large number of particle types. Is there a workaround for this?

Could you also guide me on how to print the simulation trajectories at constant intervals in version3 as in the dump command in version2? I tried versions of hoomd.write.GSD, but I kept getting errors.

Kind regards,
Remya
script_db_sigma_hoomd3.py
script_db_sigma_hoomd2.py

Remya Ann

unread,
Aug 21, 2022, 11:43:28 AM8/21/22
to hoomd-users
Dear Dr. Joshua,
I also just saw that the simulation in hoomdversion3 (script attached in previous email) gets killed for 5000 particles or more in the CPU mode as well at the lj params stage. Could you also please check this issue as well? It ran fine for 1000 particles, though.

Kind regards,
Remya

Joshua Anderson

unread,
Aug 22, 2022, 2:35:02 PM8/22/22
to hoomd...@googlegroups.com
Remya,

I did not have time to run your script to completion. I killed the process after several minutes. I am certain that if this script were to continue to the point of calling `Simulation.run`, the cell list would be computed on the GPU. However, there are setup steps on the CPU that are taking a long time. I ran it with the Python profiler to determine where the time is spent: https://docs.python.org/3/library/profile.html

$ python -m cProfile -s cumtime script_db_sigma_hoomd3.py

I've included the truncated output below. Most of the time spent to this point in the script is in typeparam.py:117(__setitem__) which is called when you set a potential parameter: e.g. lj.params[('A, 'A')] = dict(sigma=1, epsilon=1). By the time I killed the process, __setitem__ had been called more than 4 million times. Please understand that HOOMD-blue is optimized for typical coarse-grained and atomistic force fields that have at most tens of particle types. There isn't room in the fast shared memory on the GPU to store a full N by N parameter matrix, and our rich type parameter system is slow to validate millions parameter __setitem__ as shown by this profile.

If you would like to use HOOMD-blue, you will need to implement a custom ForceCompute class in a C++ plugin that efficiently stores the parameters for your potential. If possible, write one that stores a O(N) length dense array and computes the relevant sigma and epsilon as needed. You will also need to implement your own neighbor list structure, as the existing NeighborList class has numerous optimizations built with the idea that there are only a small number of particle types.

Current Time = 2022-08-22 14:12:52.761693
Python 3.9.5
0
box length =  219
nunit cell  18
length of unit cell =  12.166666666666666
15000 10001 15000 15000
Current Time before initialize = 2022-08-22 14:12:52.813809
before cell list
after cell list
^C         556554179 function calls (550308880 primitive calls) in 161.202 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    233/1    0.000    0.000  161.202  161.202 {built-in method builtins.exec}
        1    7.298    7.298  161.202  161.202 script_db_sigma_hoomd3.py:1(<module>)
  4149771    1.587    0.000  149.038    0.000 typeparam.py:117(__setitem__)
  4149771    4.108    0.000  147.451    0.000 parameterdicts.py:242(__setitem__)
  4149770    4.333    0.000   72.894    0.000 parameterdicts.py:445(_single_setitem)
10374442/4149784    7.957    0.000   64.604    0.000 collections.py:562(_to_hoomd_data)
  2074885    8.040    0.000   53.136    0.000 collections.py:283(__init__)
132804565   17.844    0.000   39.349    0.000 {built-in method builtins.isinstance}
  4149771    2.514    0.000   37.604    0.000 parameterdicts.py:483(_validate_values)
  4149771    6.910    0.000   34.547    0.000 parameterdicts.py:314(_validate_values)
  8299541    3.293    0.000   32.846    0.000 parameterdicts.py:80(__call__)
  8299541    2.933    0.000   29.136    0.000 parameterdicts.py:91(raw_yield)
  4149771    1.364    0.000   25.290    0.000 parameterdicts.py:104(validate_and_split_index)
  4149771    6.204    0.000   23.926    0.000 parameterdicts.py:127(validate_and_split_len)
 74696031    9.396    0.000   21.506    0.000 abc.py:96(__instancecheck__)
  6224658    2.773    0.000   20.426    0.000 collections.py:250(_to_hoomd_data)
  2074887    0.653    0.000   14.958    0.000 typeconverter.py:312(__call__)
  4149771    2.677    0.000   14.356    0.000 parameterdicts.py:136(<listcomp>)
  2074886    4.001    0.000   14.305    0.000 typeconverter.py:557(_validate)
 74696031   10.704    0.000   12.109    0.000 {built-in method _abc._abc_instancecheck}
  8299542    1.873    0.000   11.221    0.000 parameterdicts.py:29(_is_key_iterable)
  8299572    2.596    0.000    9.348    0.000 util.py:21(_is_iterable)
  2074887    7.456    0.000    7.954    0.000 collections.py:203(_suspend_read_and_write)
............
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

Remya Ann

unread,
Aug 23, 2022, 11:06:23 AM8/23/22
to hoomd-users
 Dear Dr. Joshua,
Thank you for your reply!
It would be great if you could point me towards how to add and implement a C++ plugin in hoomd. I am comfortable in setting up a force compuate class in C++ according to my system. However, I am not sure how it can be added to HOOMD.

Any suggestions would be really helpful.

Kind regards,
Remya
Reply all
Reply to author
Forward
0 new messages