Code execution stuck during computation across nodes

46 views
Skip to first unread message

Vincent

unread,
Sep 30, 2025, 1:15:42 AM (5 days ago) Sep 30
to Dedalus Users
Hi all,

I am trying to run Dedalus code in a cluster. The code is mounted on all compute node from a control node via NFS and my idea was that all the results would be writing into the same dir. 

At first, it didn’t work even running on one compute node and it was stuck after showing logs for the first time step. After reading this post, I add 

export HDF5_USE_FILE_LOCKING='FALSE'

in the .slurm script, it works fine when running on just one compute node. But it is still stuck  after showing logs for the first time step, when running across multiple nodes.

This is my test.slurm:

#!/bin/bash
#SBATCH -J dedalus_test_128_128
#SBATCH -N 2
#SBATCH -n 256
#SBATCH --ntasks-per-socket=64
#SBATCH --cpus-per-task=1
#SBATCH -o dedalus_test_128_128.o
#SBATCH -e dedalus_test_128_128.e

# activate env conda
eval "$(/home/user/miniforge3/bin/conda shell.bash hook)"

# activate env dedalus3
conda activate dedalus3

export HDF5_USE_FILE_LOCKING='FALSE'

srun python3 rbc3d.py

————————————————————————
Any help would be appreciated. 

Thanks in advance
Vincent
Reply all
Reply to author
Forward
0 new messages