Hi all,
I am trying to run Dedalus code in a cluster. The code is mounted on all compute node from a control node via NFS and my idea was that all the results would be writing into the same dir.
At first, it didn’t work even running on one compute node and it was stuck after showing logs for the first time step. After reading
this post, I add
export HDF5_USE_FILE_LOCKING='FALSE'
in the .slurm script, it works fine when running on just one compute node. But it is still stuck after showing logs for the first time step, when running across multiple nodes.
This is my test.slurm:
#!/bin/bash
#SBATCH -J dedalus_test_128_128
#SBATCH -N 2
#SBATCH -n 256
#SBATCH --ntasks-per-socket=64
#SBATCH --cpus-per-task=1
#SBATCH -o dedalus_test_128_128.o
#SBATCH -e dedalus_test_128_128.e
# activate env conda
eval "$(/home/user/miniforge3/bin/conda shell.bash hook)"
# activate env dedalus3
conda activate dedalus3
export HDF5_USE_FILE_LOCKING='FALSE'
srun python3 rbc3d.py
————————————————————————
Any help would be appreciated.
Thanks in advance
Vincent