How to run westpa on multi gpu

362 views
Skip to first unread message

MOUTOSHI SAHA

unread,
May 27, 2020, 2:21:05 AM5/27/20
to westpa-users
Hi,
I am trying to run the basic NaCl amber tutorial on a machine with two gpus. I used the following commands on run.sh to run the job on two gpus. 
export CUDA_VISIBLE_DEVICES="0,1"
$WEST_ROOT/bin/w_run --work-manager=processes --n-workers 2 "$@" &> west.log

However, I found that two pmemd.cuda jobs were running on single gpu (gpu 0). How can I run these on two gpus?
nvidia-smi shows the following


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0 Off |                  N/A |
| 44%   61C    P2   132W / 200W |   5364MiB /  8119MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:02:00.0 Off |                  N/A |
|  7%   49C    P2    44W / 200W |    201MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

Thanks in advance for your help.
Regards-
Moutoshi Saha

Joshua Adelman

unread,
May 27, 2020, 10:02:15 AM5/27/20
to westpa...@googlegroups.com, MOUTOSHI SAHA
Hi Moutoshi,

westpa sets a per worker `WM_PROCESS_INDEX` environment variable that you can use to specify `CUDA_VISIBLE_DEVICES `, which is what Amber uses to choose the GPU to run on. 

Josh
--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/cbb90616-d9f8-4f32-8fb1-23d96078be67%40googlegroups.com.

MOUTOSHI SAHA

unread,
May 27, 2020, 12:51:00 PM5/27/20
to westpa-users
Hi Joshua,

Thank you so much for your reply. I tried the following and it's not running. 

export CUDA_VISIBLE_DEVICES="WM_PROCESS_INDEX"
source env.sh
rm -f west.log
$WEST_ROOT/bin/w_run --work-manager=processes --n-workers 2 "$@" &> west.log


I am attaching the west.log file herein that has the warnings and errors. 
I appreciate your help.
Regards-
Moutoshi
To unsubscribe from this group and stop receiving emails from it, send an email to westpa...@googlegroups.com.
west.log

Joshua Adelman

unread,
May 27, 2020, 12:57:11 PM5/27/20
to westpa...@googlegroups.com, MOUTOSHI SAHA
I think you’re going to want something like 

export CUDA_VISIBLE_DEVICES=$WM_PROCESS_INDEX

or if you wanted a comma separated list to specify multiple devices (which I don’t think you want, but for completeness), you would need to escape the quotation marks:

export CUDA_VISIBLE_DEVICES="\"$WM_PROCESS_INDEX,$WM_PROCESS_INDEX\””  

which would give in this naive case for WM_PROCESS_INDEX=0 => “0,0”

Josh
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/51668d89-2f8d-4484-970c-04c17fe6b26f%40googlegroups.com.

MOUTOSHI SAHA

unread,
May 27, 2020, 1:09:01 PM5/27/20
to westpa-users
Hi Joshua,

I tried both but still not working. I am using amber18. Is it causing any issues?
Thank you. 

Regards-
Moutoshi

Kim F. Wong

unread,
Jun 5, 2020, 11:45:17 AM6/5/20
to westpa...@googlegroups.com

Hi Moutoshi,


We've been working on a multi-GPU example for the westpa_tutorials.  Can you test it?  Here are the steps:


1.  git clone https://github.com/burntyellow/westpa_tutorials.git -b multi-GPU

2.  The example is basic_nacl_amber_multi-GPU

3.  The relevant files that you need to update/inspect are

    env.sh
    node.sh
    runwe_briges.sh
    westpa_scripts/runseg.sh

(a) For env.sh, you need to expose the paths to Amber. 

(b) You don't need to make any changes to node.sh but focus on the CUDA_VISIBLE_DEVICES_ALLOCATED line. 

(c) runwe_bridges.sh is the SLURM job submission script, so depending on how the compute is setup at your institution, you will need to update it accordingly.  Focus on the SLURM directives on the top and also on the line

ssh -o StrictHostKeyChecking=no $node $PWD/node.sh $SLURM_SUBMIT_DIR $SLURM_JOBID $node $CUDA_VISIBLE_DEVICES --work-manager=zmq --n-workers=4 --zmq-mode=client --zmq-read-host-info=$SERVER_INFO --zmq-comm-mode=tcp &

towards the bottom of the file.  The assumption here is that 1 GPU will run 1 instance of Amber.  Since I requested 4 GPUs on each node in the SLURM directive

    #SBATCH --gres=gpu:4

I want to set the number of workers equal to the same number of Amber instances on each node (which in this case also equals to the number of GPUs).

(d) westpa_scripts/runseg.sh

The key parts here are in the lines

export CUDA_DEVICES=(`echo $CUDA_VISIBLE_DEVICES_ALLOCATED | tr , ' '`)
export CUDA_VISIBLE_DEVICES=${CUDA_DEVICES[$WM_PROCESS_INDEX]}


The first line takes all the SLURM allocated CUDA devices and puts them into an temporary array CUDA_DEVICES.  The second line does what Josh suggested, which is to expose the GPU device for the Amber execution line the follows.  The exposed GPU device is tagged to the $WM_PROCESS_INDEX.  The  $WM_PROCESS_INDEX is correlated to the --n-workers that was requested in runwe_bridges.sh.

This setup will work on multi-node/multi-GPU by changing the #SBATCH --nodes=<desired_number_nodes> in
runwe_bridges.sh.

-Kim

To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/3c9b01eb-f73a-4a6f-adbc-7edf0a191cac%40googlegroups.com.

MOUTOSHI SAHA

unread,
Jun 5, 2020, 12:16:12 PM6/5/20
to westpa-users
Hi Joshua and Kimberly,

After looking more into the westpa scripts, I was able to understand where should I put the CUDA_VISIBLE_DEVICES command.  I added export CUDA_VISIBLE_DEVICES=$WM_PROCESS_INDEX command on runseg.sh. It is running on two gpus now.

Thank you for your help.

Regards-
Moutoshi

Reply all
Reply to author
Forward
0 new messages