Running CDFT Tutorial Calculation on Cluster

408 views
Skip to first unread message

Brian Day

unread,
Jul 20, 2018, 11:24:22 AM7/20/18
to cp2k
Hi all,

I am trying to run the following CDFT tutorial (https://www.cp2k.org/howto:cdft) for water dimers on my university's cluster and am getting an error I do not understand. For clarity, the issue is not with submitting the file, but with CP2K itself. I have also successfully completed the tutorial locally, which is another reason I am confused by the error.

My process for running this on the cluster was as follows:
     -Duplicate the energy.bash script and modify it to generate a series of input files (one for each cp2k run)
     -Modify the slurm file to submit each input file individually (i.e. Rather than running them with a single bash script as when running locally, I wanted to run each input with its own submission to the cluster)

I have attached the relevant files.

I am able to run the first calculation (energy_run_standard.inp) successfully, but when I try to run the second one (energy_run_cte_noadj.inp) I get the following error:

 Possible matches for unknown keyword


 MAX_LS


   keyword MAX_LS in section %__ROOT__%FORCE_EVAL%DFT%SCF%OUTER_SCF%CDFT_OPT score:  44

   keyword MAX_LS in section %__ROOT__%FORCE_EVAL%DFT%QS%CDFT%OUTER_SCF%CDFT_OPT score:  44

   keyword MAX_LS in section %__ROOT__%FORCE_EVAL%DFT%XAS%SCF%OUTER_SCF%CDFT_OPT score:  44

   keyword MAX_SCF in section %__ROOT__%FORCE_EVAL%DFT%SCF%OUTER_SCF score:  13

   keyword MAX_SCF in section %__ROOT__%FORCE_EVAL%DFT%QS%CDFT%OUTER_SCF score:  13


 *******************************************************************************

 *   ___                                                                       *

 *  /   \                                                                      *

 * [ABORT]                                                                     *

 *  \___/          found an unknown keyword MAX_LS in section OUTER_SCF        *

 *    |                                                                        *

 *  O/|                                                                        *

 * /| |                                                                        *

 * / \                                               input/input_parsing.F:246 *

 *******************************************************************************



 ===== Routine Calling Stack =====


            7 section_vals_parse

            6 section_vals_parse

            5 section_vals_parse

            4 section_vals_parse

            3 section_vals_parse

            2 section_vals_parse

            1 read_input

application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0


I checked my input files, and MAX_LS is never called in either of those sections.

Any help in understanding this would be greatly appreciated!

Regards,
     Brian
cdft-buildfiles.bash
cp2k-v2.slurm
dft-common-params.inc
energy_run_cte_nosizeadj.inp
energy_run_cte_nosizeadj.out
energy_run_standard.inp
energy_run_standard.out
energy_run.inp
subsys.inc

Nico Holmberg

unread,
Jul 21, 2018, 4:39:50 AM7/21/18
to cp2k
Hi,

My bad. I made some changes to the CDFT input structure between CP2K version 5.1 and 6.1 and I forgot to update the tutorial. You probably have different versions locally and on your cluster. All CDFT related stuff from the OUTER_SCF section was moved to a CDFT_OPT subsection. If move the relevant keywords in the file becke_qs.inc, the calculation should run without issue.

I'll update the tutorial on Monday and upload new files for CP2K 6.1.


BR,

Nico

Brian Day

unread,
Jul 23, 2018, 10:25:18 AM7/23/18
to cp...@googlegroups.com
Thank you! 

-Brian

Nico Holmberg

unread,
Jul 24, 2018, 1:54:52 AM7/24/18
to cp2k
Hi,

Just as a heads up. I've updated the tutorial and uploaded new versions of the example input files. Changes between CP2K versions 5.1 and 6.1 are indicated.

Let me know if you encounter any further issues with CDFT calculations.


BR,

Nico

maanantai 23. heinäkuuta 2018 17.25.18 UTC+3 Brian Day kirjoitti:
Thank you! 

-Brian

Brian Day

unread,
Jul 24, 2018, 2:33:30 PM7/24/18
to cp...@googlegroups.com
Hi Nico, 

Two follow up questions:

1. When I went to run the fragment based CDFT calculations, I got the following error

 Reading the cube file:


 water-dimer-frag-a-pbe-energy-ELECTRON_DENSITY-1_0.cube



 Reading the cube file:


 water-dimer-frag-b-pbe-energy-ELECTRON_DENSITY-1_0.cube



 *******************************************************************************

 *   ___                                                                       *

 *  /   \                                                                      *

 * [ABORT]                                                                     *

 *  \___/        The number of electrons in the reference and interacting      *

 *    |       configurations does not match. Check your fragment cube files.   *

 *  O/|                                                                        *

 * /| |                                                                        *

 * / \                                                   qs_cdft_methods.F:958 *

 *******************************************************************************


Visually inspecting the files, there didn't seem to be any issue with either (I've attached each of them). But it may be related to the second issue I ran into.

2. When I tried to run each of the fragment for the charge transfer energy, the calculation portion ran fine, but I got a SIGSEGV (forrtl: severe (174): SIGSEGV, segmentation fault occurred) error when it tried to write the electron density cube files. It would simply output a blank file, and then break. I was able to fix this by passing in the blank file of the appropriate name (giving me the files above), but it seemed like an odd error. I'm not sure if this has something to do with the way CP2K  6.1 is complied on our cluster, or if it is an artifact of the way the code was written. I have a ticket regarding this issue open with the computing center here, and can update if they let me know anything useful. 

I also can recreate the error and send the full output file if that would be of any benefit to you.

Thanks and Regards,
     Brian
water-dimer-frag-b-pbe-energy-ELECTRON_DENSITY-1_0.cube
water-dimer-frag-a-pbe-energy-ELECTRON_DENSITY-1_0.cube

Brian Day

unread,
Jul 25, 2018, 3:59:27 PM7/25/18
to cp2k
Hi Nico,

I was able to finish the tutorial by generating those electron density cube files in a separate step with an older version of CP2K, and then using them to complete the tutorial. For whatever reason, v6.1 seems to have issues with generating these cube files. If this is an error you are familiar with and have any compiling advice I can pass on to our computing center, that would be great, as they seem unsure of how to handle it. Regardless, Thanks again for your help, and appreciate you updating the tutorial for the new version. 

Best,
     Brian

Nico Holmberg

unread,
Jul 27, 2018, 6:30:10 AM7/27/18
to cp2k
Hi Brian,

The cube files you attached seem to be quite a bit smaller than I get when I run the tutorial (12 vs 25 Mb), which is likely the cause of the error message you received in question 1. For some reason, likely related to the error you described in question 2, the cube file is only partly written to disk. I don't understand what you mean by "It would simply output a blank file, and then break. I was able to fix this by passing in the blank file of the appropriate name (giving me the files above), but it seemed like an odd error."

In any case, an MPI parallelized cube file writer was introduced in CP2K 6.1. It has been tested on different platforms and compilers (including Intel ifort), though not with this particular input file, but it is possible that some bug still remains. In order to debug this behavior, it would be helpful if you could post more details about how the CP2K 6.1 binary was compiled and how you ran the calculation. In particular, the following would be helpful
  1. Compiling environment including version numbers
  2. Number of MPI processes used in the calculation. Does the error related to printing out the cube file disappear if you use just 1 MPI process?'
  3. Detailed output of the crashing simulation with debugging symbols turned on in the binary (your computing center might not provide one by default)
I have different Intel compiler versions available to me on a cluster so I can try to reproduce the error you're seeing if you can figure out the compiler version.


BR,

Nico

Brian Day

unread,
Jul 30, 2018, 4:00:41 PM7/30/18
to cp2k
Hi Nico,

I use the following intel compilers: module load intel/2017.3.196 intel-mpi/2017.3.196 cp2k/6.1

Summarized below are the conditions I used and the results I got:
nodes = 1, tasks = 14, executable = cp2k.popt -i *.inp -o *.out ---> Cube files generated without issue
nodes = 1, tasks = 14, executable = mpirun -np $SLURM_NTASKS cp2k.popt -i *.inp -o *.out ---> 

forrtl: severe (174): SIGSEGV, segmentation fault occurred

Image              PC                Routine            Line        Source

cp2k.popt          000000000D730E14  Unknown               Unknown  Unknown

libpthread-2.17.s  00002ABEB44735E0  Unknown               Unknown  Unknown

libc-2.17.so   00002ABEB61E74DC  cfree                 Unknown  Unknown

cp2k.popt          000000000D767FA8  Unknown               Unknown  Unknown

cp2k.popt          000000000109F105  qs_dispersion_typ         135  qs_dispersion_types.F

cp2k.popt          0000000000B88BFF  qs_environment_ty        1476  qs_environment_types.F

cp2k.popt          0000000000B13020  force_env_types_m         232  force_env_types.F

cp2k.popt          0000000000EE66E0  f77_interface_mp_         335  f77_interface.F

cp2k.popt          000000000043BEF8  cp2k_runs_mp_run_         405  cp2k_runs.F

cp2k.popt          0000000000432814  MAIN__                    281  cp2k.F

cp2k.popt          000000000043151E  Unknown               Unknown  Unknown

libc-2.17.so   00002ABEB6188C05  __libc_start_main     Unknown  Unknown

cp2k.popt          0000000000431429  Unknown               Unknown  Unknown

Note that I had to run these simulations on a different cluster here, as the one I was using previously requires the submission file to declare a minimum of 2 nodes. I can talk to our computing center and see if they can test this themselves. 

To (hopefully) clarify my earlier message, when I ran with 2 nodes, the last line in the cp2k output file would be:

 The sum of alpha and beta density is written in cube file format to the file:


 /scratch/slurm-1244788/water-dimer-frag-b-pbe-energy-ELECTRON_DENSITY-1_0.cube

and the electron density file would appear in the submission directory, but it would be empty. If I re-ran the simulation and passed this blank file to the cluster which I was running on, it would run successfully, but when trying to open the file in another program such as Avogadro, or using it in a subsequent simulation, it would not work. Maybe this is because it is only partially writing as you pointed out. 

I will try and update this with the compiling information and a detailed debugging output. (Sorry if any of the above does not make sense, I am still fairly new to computational work). 

Thanks again.

-Brian

Nico Holmberg

unread,
Jul 31, 2018, 2:35:45 AM7/31/18
to cp2k
Hi Brian,

Thanks for the information. Just to confirm, if you run
ifort --version

does the command return ifort (IFORT) 17.0.4 ? I need to use a different machine than normally to compile with that version of the Intel compiler, so I need a while to familiarize myself with the proper build process on that machine. Hopefully I'll find the time later this week.

By the way, the error message you posted is quite cryptic and does not point to anything related to writing the cube file. Any chance you could post the full output log file for the crashing simulation? Are you able to reproduce the crash if you decrease the number of MPI tasks to, say 2 or 4, from 14?


BR,

Nico

Brian Day

unread,
Aug 9, 2018, 12:49:45 PM8/9/18
to cp2k
Hi Nico,

Sorry for the long delayed reply, I had forgotten to check this thread for some time! 

ifort --version returns: ifort (IFORT) 17.0.04 20170411.
Additionally, I get the same error message when I reduce the number of mpi tasks to 4 (2 per node, 2 nodes).

Best,
     Brian

Brian Day

unread,
Aug 9, 2018, 12:51:08 PM8/9/18
to cp2k
Actually, the error message is slightly different, see below:

forrtl: severe (174): SIGSEGV, segmentation fault occurred

Image              PC                Routine            Line        Source

cp2k.popt          000000000D730E14  Unknown               Unknown  Unknown

libpthread-2.17.s  00002ABEBC47F5E0  Unknown               Unknown  Unknown

libmpi.so.12   00002ABEBD7DA1BA  PMPI_File_write_a     Unknown  Unknown

libmpifort.so.12.  00002ABEBCF2F1AE  pmpi_file_write_a     Unknown  Unknown

cp2k.popt          0000000002F17BCC  message_passing_m        3315  message_passing.F

cp2k.popt          0000000002A33AFC  realspace_grid_cu         698  realspace_grid_cube.F

cp2k.popt          0000000002A31F4D  realspace_grid_cu         211  realspace_grid_cube.F

cp2k.popt          0000000000A06F9D  cp_realspace_grid          64  cp_realspace_grid_cube.F

cp2k.popt          0000000000A9F32B  qs_scf_post_gpw_m        2651  qs_scf_post_gpw.F

cp2k.popt          0000000000A883B1  qs_scf_post_gpw_m        2001  qs_scf_post_gpw.F

cp2k.popt          0000000000EB6610  qs_scf_post_scf_m          70  qs_scf_post_scf.F

cp2k.popt          00000000017A267F  qs_scf_mp_scf_            285  qs_scf.F

cp2k.popt          0000000000BA7709  qs_energy_mp_qs_e          86  qs_energy.F

cp2k.popt          0000000000C52681  qs_force_mp_qs_ca         115  qs_force.F

cp2k.popt          000000000096F4AA  force_env_methods         242  force_env_methods.F

cp2k.popt          000000000043BCAC  cp2k_runs_mp_run_         323  cp2k_runs.F

cp2k.popt          0000000000432814  MAIN__                    281  cp2k.F

cp2k.popt          000000000043151E  Unknown               Unknown  Unknown

libc-2.17.so   00002ABEBE194C05  __libc_start_main     Unknown  Unknown

cp2k.popt          0000000000431429  Unknown               Unknown  Unknown


Thanks again for all your help so far!

-Brian

Nico Holmberg

unread,
Sep 11, 2018, 3:43:57 PM9/11/18
to cp2k
Hi Brian,

Sorry for the long delay in replying, I had a couple of tight deadlines that required my full attention. 

I compiled CP2K using version of 17.0.4 20170411 of the Intel Fortran compiler, Intel MPI and MKL. You can find my arch file below. I ran the tutorial files with 1, 2, and 24 MPI processes and did not encounter any issues. 

Looking at the stack trace you included in your last post, it seems that the calculation is crashing somewhere inside the MPI I/O routine that CP2K is calling. This looks like a library issue to me. Are you able to provide any more information about how your binary has been compiled?

By the way, if you have access to the latest development version of CP2K (dated yesterday), you can disable MPI I/O to force CP2K to use the serial versions of the cube writer/reader. This will bypass your issue without fixing the underlying issue. See discussion in this post for more information.

# Bare bones arch file for building CP2K with the Intel compilation suite
# Tested with ifort (IFORT) + Intel MPI + MKL version 17.0.4 20170411 

# Build tools
CC       = icc
CPP      =
FC       = mpiifort
LD       = mpiifort
AR       = ar -r

# Flags and libraries
CPPFLAGS =

DFLAGS   = -D__BLACS -D__INTEL -D__MKL -D__FFTW3 -D__parallel -D__SCALAPACK  \
           -D__HAS_NO_SHARED_GLIBC

CFLAGS   = $(DFLAGS)

FCFLAGS  = $(DFLAGS) -O2 -g -traceback -fp-model precise -fp-model source -free  \
           -I$(MKLROOT)/include -I$(MKLROOT)/include/fftw

LDFLAGS  = $(FCFLAGS)

LDFLAGS_C = $(FCFLAGS) -nofor_main

LIBS     = -Wl,--start-group \
           $(MKLROOT)/lib/intel64/libmkl_scalapack_lp64.a \
           $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \
           $(MKLROOT)/lib/intel64/libmkl_sequential.a \
           $(MKLROOT)/lib/intel64/libmkl_core.a \
           $(MKLROOT)/lib/intel64/libmkl_blacs_intelmpi_lp64.a \
           -Wl,--end-group \
           -lpthread -lm -ldl

# Required due to memory leak that occurs if high optimisations are used
mp2_optimize_ri_basis.o: mp2_optimize_ri_basis.F
$(FC) -c $(subst O2,O0,$(FCFLAGS)) $<

Reply all
Reply to author
Forward
0 new messages