Multiple GPUs on multiple nodes with zmq and SLURM

33 views
Skip to first unread message

Schuyler Byrn

unread,
Jun 20, 2025, 5:16:14 PMJun 20
to westpa-users
Hi all,

I am learning how to run WESTPA (with amber) on multiple GPUs across multiple nodes using zmq work manager. I have read several threads in this google groups about this problem as well as some of the additional westpa tutorials. It seems in many of the examples, people have been ssh'ing into each allocated node, then executing node.sh. Then, in runseg.sh they use the WM_PROCESS_INDEX to assign that segment a single CUDA_VISIBLE_DEVICE. However, I saw in some earlier threads that people have used srun to spawn node.sh tasks on each of the nodes and then again use WM_PROCESS_INDEX to assign CUDA_VISIBLE_DEVICES to a segment.

I was wondering if one of these methods is significantly better than another? Also, if I run srun with multiple tasks-per-node and one gpu-per-task, I don't seem to even need to deal with CUDA_VISIBLE_DEVICES in my scripts, since srun does it 'automaticaly.' This seems less complicated than ssh'ing, so I wanted to know if it has any drawbacks. I apologize if this seems like a dumb question or has already been asked; I don't know much about using SLURM or running anything in parallel.

Thanks!

Schuyler


Anthony Bogetti

unread,
Jun 23, 2025, 11:13:11 AMJun 23
to westpa...@googlegroups.com
Hi Schuyler,

I believe, when considering ssh vs srun for this kind of process management with zmq, srun is the preferred method. There should be more fine-grained resource management, and as you pointed out, the implementation should be cleaner. I am unsure if there are any performance gains from using srun over ssh. Maybe. But I would guess they may not be significant. Do you have enough examples of using srun with zmq in WESTPA? If not, we can dig some out for you and also make them available on the user_submitted_scripts repo. Let us know if you have any additional questions.

Best,
Anthony

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/westpa-users/549b6d57-6eac-493c-b75a-a500213583aan%40googlegroups.com.

Victor Montal Blancafort

unread,
Jun 27, 2025, 4:05:28 PMJun 27
to westpa-users
In case it might help, this is the runner that works at my institution HPC, using srun, allocating one node (4GPUs). Here, I used NUM_workers to be 2. Letme know if you have further questions, Schuyler, and I can try to help!

#!/bin/bash
#SBATCH --job-name=westpa_mut
#SBATCH --time=10:00:00
#SBATCH --cpus-per-task=20       # each GPU requires 20.
#SBATCH --gres=gpu:4            # Request 4 GPUs
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4     # 4 tasks (1 per GPU)
#SBATCH --output=logs/westpa_srun_%j.out
#SBATCH --error=logs/westpa_srun_%j.err
#SBATCH --account=XX
#SBATCH -D .
#SBATCH --qos=XX


#--
# - Modules
#--
module purge
module load cuda
module load nvidia-hpc-sdk/23.11
module load gromacs/2023.3

module load miniforge
source activate /gpfs/projects/bsc72/conda_envs/westpa2

export SLURM_CPU_BIND=none
export GMX_ENABLE_DIRECT_GPU_COMM=1
export GMX_GPU_PME_DECOMPOSITION=1

export GMX="mpirun -np 1 --bind-to none -report-bindings gmx_mpi"

#--
# - DEFAULTS
#--
set -x # Allow debug
cd $SLURM_SUBMIT_DIR ; pwd
# Ensure environment script is sourced *before* launching WESTPA components
source ./env.sh || { echo "Failed to source env.sh"; exit 1; }
cd $WEST_SIM_ROOT || { echo "Failed to cd to WEST_SIM_ROOT: $WEST_SIM_ROOT"; exit 1; }



#--
# - DEDICATED SERVER
#--
SERVER_INFO=$WEST_SIM_ROOT/west_zmq_info-$SLURM_JOBID.json
# The dedicated server is used to controll all the sub-sampling runnings
# see NOTES for further info
$WEST_ROOT/bin/w_run --debug --work-manager=zmq --n-workers=0 --zmq-mode=master --zmq-write-host-info=$SERVER_INFO --zmq-comm-mode=tcp &> west-$SLURM_JOBID.log &

# wait on host info file up to ten minutes
for ((n=0; n<60; n++)); do
    date
    if [ -e $SERVER_INFO ] ; then
        echo "== server info file $SERVER_INFO =="
        cat $SERVER_INFO
        break
    fi
    sleep 10
done

# exit if host info file doesn't appear in time
if ! [ -e $SERVER_INFO ] ; then
    echo 'server failed to start'
    kill %1
    exit 1
fi


#--
# - RUN WESTPA
#--
scontrol show hostname $SLURM_NODELIST >& SLURM_NODELIST.log

srun --gpus-per-task=1 --gpu-bind=single:1 --cpus-per-task=$SLURM_CPUS_PER_TASK \
     node.sh ${SLURM_SUBMIT_DIR} \
     --work-manager=zmq \
     --n-workers=${NUM_WORKERS} \
     --zmq-mode=client \
     --zmq-read-host-info=$SERVER_INFO \
     --zmq-comm-mode=tcp
wait

Schuyler Byrn

unread,
Aug 25, 2025, 3:18:09 PMAug 25
to westpa...@googlegroups.com
Hi Anthony,

Thanks for the response. Everything seemed to work fine, but I'll follow up if I run into any issues in the future!

Best,
Schuyler

Schuyler Byrn

unread,
Aug 25, 2025, 3:20:36 PMAug 25
to westpa...@googlegroups.com
Also, thank you Victor for the example script!

Best,
Schuyler

Reply all
Reply to author
Forward
0 new messages