ERROR on CCSD calculation - INTEL compilers

142 views
Skip to first unread message

Juan Jose Aucar

unread,
Mar 19, 2020, 9:57:37 PM3/19/20
to dirac-users
Dear Dirac developers,

I've been testing some DIRAC compilations looking for the best efficiency. I get and error on running a CCSD calculation for large systems when I run it on the version compiled with OPENMP and INTEL compilers (ifort, icc, icpc). Calculations made with OPENMP (gfortran, gcc, g++) and openblas libraries went okey (I attach outputs).

On smalls systems (FCl, Cl2) it all seems okay, but it crashes when I try a bigger one (ClBr, Br2), even with small basis (3-22G). I also tried with and without symmetry detection.

The error reported by Dirac is


 ====  below this line is the stderr stream  ====
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source            
dirac.x            0000000001D201EA  Unknown               Unknown  Unknown
libc-2.26.so       00007FE538389160  Unknown               Unknown  Unknown
dirac.x            000000000072317E  cceqn_driver_lamb         245  cceqn_driver_lambda.F
dirac.x            000000000070FC6C  ccfopr_                  3197  ccdriv.F
dirac.x            00000000007078FC  ccmain_                   909  ccdriv.F
dirac.x            00000000007068A0  pamccm_                   118  ccpam.F
dirac.x            0000000000560254  dirac_                    430  dirac.F
dirac.x            000000000055F724  MAIN__                    174  main.F90
dirac.x            000000000055F3D2  Unknown               Unknown  Unknown
libc-2.26.so       00007FE538373F4A  __libc_start_main     Unknown  Unknown
dirac.x            000000000055F2EA  Unknown               Unknown  Unknown



The table below summarizes what I've tried. All the corresponding outputs are attached (also files related to error reports and scratch reports)

System Basis Compilation Library N.º of orbitals STATUS
ClBr 3-21G omp – intel MKL 291 ERROR
FCl dyall.cv2z omp – intel MKL 304 RUNS OK
Cl2 3-21G omp – intel MKL 370 RUNS OK
Br2 3-21G omp – intel MKL 390 ERROR
ClBr dyall.cv2z omp – intel MKL 550 ERROR
ClBr dyall.cv2z omp – gfortran, gcc, g++ openblas 550 RUNS OK

Thanks in advance,
Juan José Aucar

p.d.: Ive also tried giving more RAM, but the problem persists
p.d.2: I thought It may be related to the size of the problem, but it seems is not (based on column 5 of the table)

Juan Jose Aucar

unread,
Mar 19, 2020, 9:59:52 PM3/19/20
to dirac-users
Sry, I forgot to attach the files... Here I do
CCSD_JJA.zip

Ilias Miroslav, doc. RNDr., PhD.

unread,
Mar 20, 2020, 2:28:44 AM3/20/20
to dirac...@googlegroups.com
Dear Juan,

for failing cases - would you try to increase the memory , similar to http://www.diracprogram.org/doc/master/tutorials/cc_memory_count/count_cc_memory.html ?

You have:
DIRAC serial starts by allocating 1572000000 words (  11993.41 MB -     11.712 GB) of memory
    out of the allowed maximum of 2147483648 words (  16384.00 MB -     16.000 GB)
 
so try first  --gb=15.5 --ag=16 and increasing both gb,ag up to the server limit

Miro


Od: dirac...@googlegroups.com <dirac...@googlegroups.com> v mene používateľa Juan Jose Aucar <juan...@gmail.com>
Odoslané: piatok 20. marca 2020 2:57
Komu: dirac-users <dirac...@googlegroups.com>
Predmet: [dirac-users] ERROR on CCSD calculation - INTEL compilers
 
--
You received this message because you are subscribed to the Google Groups "dirac-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dirac-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dirac-users/c6e7f507-e232-4dbf-a021-721fc366fef3%40googlegroups.com.

Juan Jose Aucar

unread,
Mar 20, 2020, 8:54:27 AM3/20/20
to dirac-users
Dear Miroslav Ilias,

Thanks for the reply.
I dont get why should it be a memory size allocation problem. For all the calculations that went ok, I measured the peak memory usage using an external tool -tracejob- and they all used less than 3gb.

I did what it is suggested on http://www.diracprogram.org/doc/master/tutorials/cc_memory_count/count_cc_memory.html and for ClBr with the small basis (3-21G) it says "Predicted RelCC memory demand:           0.220 GB" (file attached). Then I try running the calculation with --gb=13 --ag=16 (much more than it is predicted to demand) and the error persists .

If I do as u suggested (trying with --gb=15.5 and --ag=16) I get the error:



and thats why on the previous outputs the difference between --ag and --gb is around 3gb (to avoid this problem)

Right now I cant set the --ag parameter above 16gb because there are other calculations on going; I'll do it and report it back as soon as I can.


Thanks,
Juan José Aucar
To unsubscribe from this group and stop receiving emails from it, send an email to dirac...@googlegroups.com.
ClBr-CCSD-MemCount_3-21G.out

Juan Jose Aucar

unread,
Mar 20, 2020, 9:48:46 AM3/20/20
to dirac-users
Dear Miroslav Ilias,


I tried now setting --mb=20000 and --ag=23 and the problem persists (output attached)


Juan J. Aucar
ClBr-CCSD-3-21G-19.5gb.out

juan...@gmail.com

unread,
Dec 1, 2020, 9:09:00 PM12/1/20
to dirac-users
Dear Dirac developers,

For a while I've been using DIRAC with OpenMPI and OpenBLAS or with GNU and OpenBLAS.
I'm now very interested on compiling it with intel compilers and MKL libraries, but I'm getting the same error as the one I reported here some time ago.
Does anyone know what this could be about? I don't know how to track the error. I re-compiled with different CMake versions, on a cluster and on my personal computer (so there were a few changes also on the GNU version) but the problem persist.

Thanks in advance,
Juan J. Aucar

p.d. I already checked if it was a (lack of) memory problem increasing the --mb and --ag parameters, but seems it's not

juan...@gmail.com

unread,
Dec 1, 2020, 9:11:10 PM12/1/20
to dirac-users
p.d. 2: I attach some files, in order to share info about the compilation

setup_command.txt
CMakeCache.txt
cmake_output.txt
build_info.h
COMPILERS_VERSIONS.txt
Reply all
Reply to author
Forward
0 new messages