Dear BerkeleyGW developers and users,
I'm recently trying to run parabands in BGW 4 on Perlmutter. The size of the input WFN file is 39G. In my previous runs using parabands in BGW 3, the following configuration was used:
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --nodes=40
#SBATCH -C cpu
#SBATCH -t 24:00:00
#SBATCH -J para
#SBATCH -A m2651
#SBATCH -e parabands.err
#export OMP_NUM_THREADS=16
#export OMP_PLACES=threads
#export OMP_PROC_BIND=spread
BGWPATH='/global/homes/j/jywu/software/BerkeleyGW-master-cpu-elpa/bin'
srun -n 2560 --cpu_bind=cores $BGWPATH/parabands.cplx.x > parabands.out
without any problem. What puzzles me is, now with BGW 4, the same configuration seems to always lead to Segmentation fault- invalid memory reference.
This appears to be an out of memory error, as reducing the number of MPI tasks solves the problem. But then I find it's now only possible to have two MPI tasks on a single node on Perlmutter. This means there's now a 10x time cost rise.
I'm wondering if there's anything I missed. What's the best practice of parallelization now for parabands in BGW 4?
I attached the input files I used and also an instance of the segmentation fault error output to this email.
Thank you for any help you can offer!
Best,
Jinyuan