I am working on a medium scale MRCC calculations on a HPC system
When OpenMPI was used, it created few thousand files under /dev/shm
-rw------- 1 dominic dominic 8 Oct 5 14:22 cmx000000103200000546250000as
-rw------- 1 dominic dominic 16 Oct 5 14:22 cmx000000103200000546250000ay
-rw------- 1 dominic dominic 1048352 Oct 5 14:23 cmx000000103200000546250000c5
-rw------- 1 dominic dominic 8 Oct 5 14:28 cmx000000103200000546250000df
....
-rw------- 1 dominic dominic 32 Oct 5 14:21 sem.cmx00000010320000054622000045
-rw------- 1 dominic dominic 32 Oct 5 14:21 sem.cmx0000001032000005462500003X
[root@hpcnode042 shm]# ls -l |wc -l
4195
The problem is that if the job is crashed, all these files will be left in the /dev/shm and subsequent jobs may not have sufficient shared memory space; it will also be very difficult for system admin to clear these files afterward as the system may be sharing with multiple users (or different job from the same user);
I can ask slurm to create a temp director under /dev/shm/<username_jobid> and remove this directory when the job is completed/crashed. Is there a way for user to ask GA/ARMCI to store these file under a specific directory?
Thank you!
Regards,
Dominic