CCSDT: Too many opened file

191 views
Skip to first unread message

Dominic Chien

unread,
Mar 24, 2022, 10:17:41 PM3/24/22
to NWChem Forum
Hi, 

I have been testing a copy of  7.0.2 that was compiled with intel compilers v2000 and OpenMPI 4.1.2 with UCX on a 4-node (128 core and 512GB ram each) HPC cluster,
but it run into a  "too many open files" problem with CCSDTQ/6-31g**  with a H2O molecule right after the second CCSDTQ iteration.

I have followed a suggestion in an earlier discussion for a similar problem to set /proc/sys/fs/file-max to and ulimit -n to 5480000, but it does not solve the problem. 

How can I solve this problem?

Regards,
Dominic


error message
=============================================================
 CCSDTQ iterations
 --------------------------------------------------------
 Iter          Residuum       Correlation     Cpu    Wall
 --------------------------------------------------------
    1   0.3242124170820  -0.1955866226348     5.9     6.4
    2   0.1395453651391  -0.2004354553176     6.0     6.5
_shm_attach: shm_open: Too many open files in system
[4] Received an Error in Communication: (-1) _shm_attach: shm_open
_shm_attach: shm_open: Too many open files in system
[5] Received an Error in Communication: (-1) _shm_attach: shm_open
_shm_attach: shm_open: Too many open files in system
=============================================================

=============================================================
mpirun   --mca orte_tmpdir_base /local   --mca pml ucx --mca osc ucx  --mca btl ^vader,tcp,openib,uct   -x UCX_NET_DEVICES=mlx5_0:1  -x MALLOC_MMAP_MAX_=0 -x MALLOC_TRIM_THRESHOLD_=-1 -x ARMCI_DEFAULT_SHMMAX=8192   ./nwchem  test.in >& test.out
=============================================================

test.in (input) 
=============================================================
echo
start h2o
memory stack 1000 mb heap 250 mb global 2000 mb

permanent_dir /home/chiensh/scratch/test1
SCRATCH_DIR   /local

charge 0
geometry units angstrom
             H       -1.958940   -0.032063    0.725554
             H       -0.607485    0.010955    0.056172
             O       -1.538963    0.004548   -0.117331
end

basis "ao basis" cartesian
  * library 6-31G**
end

scf
  vectors output h2o
  rohf
#  doublet
  thresh 1e-8
  maxiter 200
end

tce
  SCF
  CCSDTQ
  thresh 1e-6
  io  ga
  freeze atomic
  tilesize   8
end

task scf energy
task tce energy
=============================================================

build environment
=============================================================
#module load parallel_studio/2018.4.057
module load  openmpi/4.1.2_intel
module load cuda/11.2.1 cudablas/11.2.1 cudafft/11.2.1

export NWCHEM_TOP=/home/chiensh/scratch/nwchem/nwchem-7.0.2_omp
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=MPI-PR
export USE_NOFSCHECK=y
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export LARGE_FILES=TRUE
export ENABLE_COMPONENT=yes
export USE_OPENMP=y
export DISABLE_GAMIRROR=y
export USE_GAGITHUB=y
#unset GA_STABLE
export BLAS_SIZE=8
export MKLIB=${MKLROOT}/lib/intel64


export BLASOPT="-I${MKLROOT}/include -L${MKLIB} -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread -lm -ldl -qopenmp -qopenmp-simd"
export LAPACK_LIB=$BLASOPT
export USE_SCALAPACK=yes
export SCALAPACK_SIZE=8
#export SCALAPACK=" -I${MKLROOT}/include -L${MKLIB} -lmkl_scalapack_ilp64  -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_ilp64   -liomp5 -lpthread -lm -ldl -qopenmp -qopenmp-simd "
export SCALAPACK=" -I${MKLROOT}/include -L${MKLIB} -lmkl_scalapack_ilp64  -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_openmpi_ilp64   -liomp5 -lpthread -lm -ldl -qopenmp -qopenmp-simd "
export SCALAPACK_LIB="$SCALAPACK  $BLASOPT"
#unset SCALAPACK_LIB
#unset SCALAPACK

unset  MPI_INCLUD
unset LIBMPI

#export USE_PYTHONCONFIG=n
#export USE_PYTHON64=n
#export PYTHONVERSION=3.7
#export PYTHONHOME=
#export PYTHONLIBTYPE=so

#export NWCHEM_MODULES="all python "
export NWCHEM_MODULES="all"
#export NWCHEM_MODULES=smallqm

export CCSDTQ=y
export CCSDTLR=y
export MRCC_METHODS=TRUE
#unset CCSDTQ
#unset CCSDTLR
#unset MRCC_METHODS

export CC=icc
export FC=ifort
export F77=ifort
export F90=ifort
export CXX=icpc
#export MPICC=mpiicc
#export MPIFC=mpiifort
export MPICC=mpicc
export MPIFC=mpif90

export TCE_CUDA=Y
export CUDA_LIBS="-L /cm/shared/common_software_stack/packages/compilers/cuda/11.2.1/lib64   -lcublas -lcudart "
export CUDA_FLAGS="-arch sm_70"
export CUDA_INCLUDE="-I. -I/cm/shared/common_software_stack/packages/compilers/cuda/11.2.1/include "
export CUDA=nvcc
=============================================================

Edoardo Aprà

unread,
Mar 25, 2022, 12:46:54 PM3/25/22
to NWChem Forum
How many processes on each node is your run using?
What is the size of the /dev/shm file-system on the computer nodes you have been using?
Did you examine the content of /dev/shm once you run crashed?

Dominic Chien

unread,
Mar 26, 2022, 10:27:37 AM3/26/22
to NWChem Forum
Hi Edo,

We have 128 cores (AMD 7742 x2 ) and 512GB ram per node , so it has 256 GB in /dev/shm by default. I have removed all content in /dev/shm and /local (local scratch) on each node before the nwchem job is started. 

Thanks!

~Dominic 
Edoardo Aprà 在 2022年3月26日 星期六上午12:46:54 [UTC+8] 的信中寫道:

Edoardo Aprà

unread,
Mar 26, 2022, 11:26:36 AM3/26/22
to NWChem Forum
What happens if you use a single node and just 64 processes?

Dominic Chien

unread,
Mar 27, 2022, 9:41:30 PM3/27/22
to NWChem Forum
it works!

but how do I determine what core counts is most appropriate for a calculation? and how to prevent a similar problem happens for a larger calculation?

Thanks!

Edoardo Aprà 在 2022年3月26日 星期六下午11:26:36 [UTC+8] 的信中寫道:

Edoardo Aprà

unread,
Mar 28, 2022, 7:40:45 PM3/28/22
to NWChem Forum
The only way I can suggest to handle this is by trial and error.
I would be interested to see if the case 1 node and 128 processes works.

Dominic Chien

unread,
Mar 29, 2022, 10:22:19 AM3/29/22
to NWChem Forum
it also work with 1 node 128 cores


Edoardo Aprà 在 2022年3月29日 星期二上午7:40:45 [UTC+8] 的信中寫道:

Edoardo Aprà

unread,
Mar 29, 2022, 12:13:18 PM3/29/22
to NWChem Forum
Does the 4 x 64 cores' run fail every time? I would try to monitor the number of /dev/shem/cmx* file created.

Dominic Chien

unread,
Mar 29, 2022, 10:40:26 PM3/29/22
to NWChem Forum
I cannot reproduce the problem of "Too many opened files on the system" even using the same number of nodes and core count. 

It seems that this problem happens randomly, but it is somehow related to the high cores and nodes count, and it also depends on the number of files opened by other processes on the system. 

So I did another study, using the same input, using 128 cores on each of the 32 nodes, ie 4096 cores. The CISDTQ was completed much slower due to overhead.

I killed the job at the last iteration (#16) to preserve the temp files at /dev/shm
There are only 6994 files at /dev/shm, i.e. 
[root@hpcnode117 ~]# ls /dev/shm/sem.cmx0000001032000004* |wc -l
128
[root@hpcnode117 ~]# ls /dev/shm/cmx0000001032000004* |wc -l
6858
[root@hpcnode117 ~]# ls /dev/shm/* |grep -v cmx |wc -l
8

I also dump the list of opened files for one of the nodes during the 2nd to last iterations (#15), and there are 3899442 files opened on the system, which is quite closed to the system limit that I set at  /proc/sys/fs/file-max (5480000)

I further study on these open files, and there are 3.6M files related to nwchem
[root@hpcnode117 ~]# cat temp |grep nwchem |wc -l
3699965
[root@hpcnode117 ~]# cat temp |grep sem.cmx  |wc -l
65024
[root@hpcnode117 ~]# cat temp |grep -v sem|grep cmx   |wc -l
3483476
[root@hpcnode117 ~]# cat temp |grep -v nwchem   |wc -l
199477

There are about 190000 files opened on an idle system. 
[root@hpcnode117 ~]# lsof |wc -l
192954

Hope this information will be helpful to analyse the cause of the problem 

Thanks!
Edoardo Aprà 在 2022年3月30日 星期三上午12:13:18 [UTC+8] 的信中寫道:

Edoardo Aprà

unread,
Mar 30, 2022, 1:24:51 PM3/30/22
to NWChem Forum
Let me try to understand your detailed analysis.
Are you saying that at some point you had 3.5M files on /dev/shm, but only 0.2M belonging to the nwchem process?

[root@hpcnode117 ~]# cat temp |grep -v sem|grep cmx   |wc -l
3483476
[root@hpcnode117 ~]# cat temp |grep -v nwchem   |wc -l
199477

Dominic Chien

unread,
Mar 30, 2022, 9:35:24 PM3/30/22
to NWChem Forum
Hi Edo,

No really. Actually what I meant was that there were only 6k files sitting in /dev/shm in one of the nodes when I terminated the job at the 15th iteration of CCSDTQ; 

I also dump the record of opened files on the system (using lsof) on the same node in a file called "temp", and  ~3.7M files were opened, during the 15th iteration of CCSDTQ;  amongst these opened files, 3,699,965 were opened by nwchem, 3483476 of them are named with "cmx", and 65,024 are "sem.cmx" 

It may be easier for you to understand if I could attach the temp file here. but  I did not keep the temp file before rebooting the system. 

I tried to redo the calculation with only 1 node (128 cores), and I also recorded 3M files were opened during the CCSDTQ iterations, and the list of opened files can be found in the following link (400MB)

https://drive.google.com/file/d/1RmTa_9l8K0IuJaZE-CaiIgI_EBc0xCqA/view?usp=sharing

Thanks!


Edoardo Aprà 在 2022年3月31日 星期四上午1:24:51 [UTC+8] 的信中寫道:
Message has been deleted

Edoardo Aprà

unread,
Mar 31, 2022, 1:47:55 PM3/31/22
to NWChem Forum
Could you compile the source of the current master branch of NWChem instead of 7.0.2 to see if the behavior is any different? That would help my task, too, since I am trying to reproduce your problem with the current source code. It does not make much sense to use 7.0.2 for this exercise.

On Thursday, March 31, 2022 at 9:35:09 AM UTC-7 Edoardo Aprà wrote:
Thanks for posting this material. I can notice a striking difference between my runs and yours: while my /dev/shm/cmx* seems to get of size ~ 1000MB, your largest cmx* files are of size ~ 5MB.
I need to understand why your segments are 3 orders of magnitude smaller.

Edoardo Aprà

unread,
Mar 31, 2022, 5:49:46 PM3/31/22
to NWChem Forum
I eventually managed to reproduce your lsof output of 3.5M cmx* open files. "/usr/bin/ls" shows a much smaller number of files around a few thousands, instead.
Message has been deleted

Dominic Chien

unread,
Apr 3, 2022, 9:05:44 PM4/3/22
to NWChem Forum
I set  /proc/sys/fs/file-max to a very large value (10,000,000) and the problem has been temporarily fixed, but it will be nice to have better solution for this ....

Edoardo Aprà 在 2022年4月1日 星期五上午5:49:46 [UTC+8] 的信中寫道:

Edoardo Aprà

unread,
Apr 4, 2022, 3:58:26 PM4/4/22
to NWChem Forum
The MPI-PR port has a quadratic scaling for the number of file descriptors with respect to the number of processes/node. See https://github.com/GlobalArrays/ga/issues/257

It is likely that the situation is exacerbated by cases when a  large number of Global Arrays allocations and de-allocations occur (this might be the case in CCSDTQ).

jeff.science

unread,
Apr 12, 2022, 3:59:43 AM4/12/22
to NWChem Forum
ARMCI-MPI does not have this issue with many shm files, so you can try that as another workaround.  There is a shared memory bottleneck in ARMCI-MPI (or rather, in how MPI libraries implement RMA atomicity) that may limit performance with 128 processes per node, but it should not crash.

Jeff

Dominic Chien

unread,
Apr 12, 2022, 5:05:13 AM4/12/22
to NWChem Forum
Thanks Jeff!

I will try ARMCI-MPI later, as we have a temp fix by setting a large fs-limit value, we first need to solve the convergence problem discussed on other threads or there is no point for us to move large CCSDTQ jobs to NWChem...

Thanks!
jeff.science 在 2022年4月12日 星期二下午3:59:43 [UTC+8] 的信中寫道:
Reply all
Reply to author
Forward
0 new messages