mpirun on multiple gpus

712 views
Skip to first unread message

jude vishnu

unread,
Oct 3, 2022, 9:57:45 AM10/3/22
to hoomd-users
Dear all,
I am using hoomd 3.3. I wanted to know how I can run simulations on multiple gpu's using mpirun. I read the tutorials where running simulation on multiple cpu's is mentioned. Can we do the same by changing the device to gpu ? Can someone give me an example scipt ?

Regards,
Jude

Russell Kajouri

unread,
Oct 3, 2022, 10:02:48 AM10/3/22
to hoomd...@googlegroups.com
Hi there

To run simulation by mpirun, you can follow up the below and read the tutorial of mpi. MPI is a independent package of HOOMD.


mpirun -np <number of process> <python script> --mode=gpu

Best

The message has been sent by phone

--
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/2ba3633a-04db-4289-b959-668a9ddd5fa1n%40googlegroups.com.

jude vishnu

unread,
Oct 3, 2022, 10:33:33 AM10/3/22
to hoomd-users
You are referring to version 2. I don't think hoomd 3.3 works this way.

Regards,
Jude

Joshua Anderson

unread,
Oct 3, 2022, 10:39:18 AM10/3/22
to hoomd...@googlegroups.com
Jude,

Follow the steps in the tutorial and/or those appropriate to launching a MPI application on your system, but use a GPU device instead of a CPU.

And yes, the documentation states that multiple GPUs are supported:
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

jude vishnu

unread,
Oct 3, 2022, 12:33:49 PM10/3/22
to hoomd-users
Dear Joshua,
I did what you asked me to. I get the following error when I switch from cpu to gpu device. 
The code is attached below with initial config. If you can let me know why this is happening this would be great.

Regards,
Jude
random.gsd
lj_perf.py
error.txt

Joshua Anderson

unread,
Oct 3, 2022, 1:34:18 PM10/3/22
to hoomd...@googlegroups.com
Jude,

Your script runs fine for me:

mpirun -n 2 python lj_perf.py
notice(2): Using domain decomposition: n_x = 1 n_y = 1 n_z = 2.
1861.9754815068595
1861.9754815068595

Did you build with ENABLE_MPI_CUDA=on by chance? That code path is buggy and should not be used, leave it on the default setting of off.
If off, share your `CMakeCache.txt` file so we can see how you compiled HOOMD.

Other than that, I can't think of anything specific. We test numerous multi-GPU executions with MPI on every commit to the repository. For example: https://github.com/glotzerlab/hoomd-blue/actions/runs/3151866395/jobs/5126702861

------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan
--
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/c4ffc09a-3a90-4c6a-854f-8ad340784b58n%40googlegroups.com.
<random.gsd><lj_perf.py><error.txt>

jude vishnu

unread,
Oct 3, 2022, 5:15:55 PM10/3/22
to hoomd...@googlegroups.com
Hi,
I installed with hoomd3.5 with ENABLE_MPI_CUDA=OFF and it worked for the previous code. However now I am trying to run a different code using mpi on gpu's but it results in errors which I do not fully understand. I execute it by mpirun -n 3 python3 diff_sim.py and this results in errors. 

It seems to run fine when executing with mpirun -n 2 python3 diff_sim.py. Is this a bug or is the particle density not supporting this kind of decomposition?

Regards,
Jude

You received this message because you are subscribed to a topic in the Google Groups "hoomd-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hoomd-users/CewY3g3gH5w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hoomd-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/E97E8053-41BF-40B9-B0B7-C3824CE349F2%40umich.edu.


--
Yours truly,
Jude Ann Vishnu
error1.txt
diff_sim.py

jude vishnu

unread,
Oct 11, 2022, 7:33:42 AM10/11/22
to hoomd-users
Hai,
I just wanted to know whether someone was able to figure this out?

Regards,
Jude

Joshua Anderson

unread,
Oct 11, 2022, 8:44:21 AM10/11/22
to hoomd...@googlegroups.com
Jude,

It is not possible to answer the question given the information you provided.

------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

jude vishnu

unread,
Oct 11, 2022, 9:36:11 AM10/11/22
to hoomd...@googlegroups.com
Hi Joshua,
Could you please tell me what other additional information I can provide so that this problem can be solved?

Regards,
Jude

Joshua Anderson

unread,
Oct 11, 2022, 9:58:58 AM10/11/22
to hoomd...@googlegroups.com
Jude,

Jude, you ask "Is this a bug or is the particle density not supporting this kind of decomposition?"

To evaluate whether or not this is a bug, I would need to run your script. You did not provide the file prep_config_diamond_network_diffusive_Ntot_1155108_vol_frac0.84991_densratio_1.52410_molfrac_0.60382_kT_4.3.gsd needed to run the script.

To evaluate whether the particle density is supported, I would need to know the box dimensions and number of particles.

------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan
On Oct 11, 2022, at 9:35 AM, jude vishnu <mohms...@gmail.com> wrote:

Hi Joshua,
Could you please tell me what other additional information I can provide so that this problem can be solved?

Regards,
Jude

On Tue, Oct 11, 2022 at 2:44 PM Joshua Anderson <joaa...@umich.edu> wrote:
Jude,

It is not possible to answer the question given the information you provided.
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan
On Oct 11, 2022, at 7:33 AM, jude vishnu <mohms...@gmail.com> wrote:

Hai,
I just wanted to know whether someone was able to figure this out?

Regards,
Jude

On Monday, October 3, 2022 at 11:15:55 PM UTC+2 jude vishnu wrote:

jude vishnu

unread,
Oct 11, 2022, 5:23:27 PM10/11/22
to hoomd-users
Dear Joshua,
I did share the initial configuration, just like I mentioned in my mail earlier. It was shared through google drive to this group. If you are finding trouble in accessing this please let me know.
I am also putting the initial configuration again here so that you can see it. Basically all hoomd-group members should be able to access it. 

I hope this is enough.

Regards,
Jude

Joshua Anderson

unread,
Oct 11, 2022, 6:06:00 PM10/11/22
to hoomd...@googlegroups.com
Jude,

I did not see that link in your message.

Running your script, I get the same error. Running with the device notice_level=10 gives this additional information at the end of the output:
notice(6): nlist: (Re-)allocating neighbor list, new size 67296884036 uints 
notice(7): GPUArray: Resizing to 261624 MB

This appears to be a bug in the Tree neighbor list, as it is allocating space for far more neighbors than your system should have. I suggest you use the Cell neighbor list. With it, your script runs to completion on my system. You have a single r_cut and your system volume is reasonable so Tree offers no advantages over Cell.

------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

jude vishnu

unread,
Oct 12, 2022, 3:11:08 AM10/12/22
to hoomd-users
Thanks Joshua. I will try this out.

Regards,
Jude

Reply all
Reply to author
Forward
0 new messages