Junk files left under /dev/shm that jam the node

14 views
Skip to first unread message

Dominic Chien

unread,
Oct 5, 2025, 2:52:32 AM (11 days ago) Oct 5
to NWChem Forum
Hi All,

I am working on a medium scale MRCC calculations on a HPC system
When OpenMPI was used, it created few thousand files under /dev/shm

-rw-------  1 dominic  dominic        8 Oct  5 14:22 cmx000000103200000546250000as
-rw-------  1 dominic  dominic       16 Oct  5 14:22 cmx000000103200000546250000ay
-rw-------  1 dominic  dominic   1048352 Oct  5 14:23 cmx000000103200000546250000c5
-rw-------  1 dominic  dominic         8 Oct  5 14:28 cmx000000103200000546250000df
....
-rw-------  1 dominic  dominic       32 Oct  5 14:21 sem.cmx00000010320000054622000045
-rw-------  1 dominic  dominic        32 Oct  5 14:21 sem.cmx0000001032000005462500003X
[root@hpcnode042 shm]# ls -l |wc -l
4195

The problem is that if the job is crashed, all these files will be left in the /dev/shm  and subsequent jobs may not have sufficient shared memory space; it will also be very difficult for system admin to clear these files afterward as the system may be sharing with multiple users (or different job from the same user); 

I can ask slurm to create a temp director under /dev/shm/<username_jobid> and remove this directory when the job is completed/crashed. Is there a way for user to ask GA/ARMCI to store these file under a specific directory?

Thank you!

Regards,
Dominic 

Edoardo Aprà

unread,
Oct 7, 2025, 6:47:01 PM (8 days ago) Oct 7
to NWChem Forum
You cannot store the files in  a different directory  other than /dev/shm since the syntax of shm_open() does not allow it.
The system administrator could add a script to the Slurm epilog that remove the files under /dev/shm that belong to the user that has submitted the Slurm job.
Reply all
Reply to author
Forward
0 new messages