CP2K compiling on CRAYXC40 - cuda version using gnu compiler

387 views
Skip to first unread message

Aman Jindal

unread,
Apr 12, 2016, 6:13:54 AM4/12/16
to cp2k
Dear CP2K users,
I am beginner on cp2k. I compiled cp2k-3.0 on CRAYXC40 cuda version using gnu compilers. While I am using the executable, cp2k.psmp to run cp2k jobs I got the following error:

Program received signal SIGILL: Illegal instruction.

Backtrace for this error:

Program received signal SIGILL: Illegal instruction.

Backtrace for this error:
#0  0x2AAAB4044467
#1  0x2AAAB4043660
#2  0x2AAAB4E8C91F
#3  0x1EE7C38 in __cp_units_MOD_cp_unit_create
#4  0x1A2E00E in __input_keyword_types_MOD_keyword_create
#5  0xB98758 in __eri_mme_test_MOD_create_eri_mme_test_section
#6  0x69D351 in __input_cp2k_MOD_create_cp2k_root_section
#7  0x4327C3 in MAIN__ at cp2k.F:?
#0  0x2AAAB4044467
#1  0x2AAAB4043660
#2  0x2AAAB4E8C91F
#3  0x1EE7C38 in __cp_units_MOD_cp_unit_create
#4  0x1A2E00E in __input_keyword_types_MOD_keyword_create
#5  0xB98758 in __eri_mme_test_MOD_create_eri_mme_test_section
#6  0x69D351 in __input_cp2k_MOD_create_cp2k_root_section
#7  0x4327C3 in MAIN__ at cp2k.F:?
_pmiu_daemon(SIGCHLD): [NID 00109] [c0-0c1s11n1] [Tue Apr 12 04:41:11 2016] PE RANK 0 exit signal Illegal instruction

I am sure that this error has something to do with the executable.
Can anyone please help me in sorting out this error. Any help would be appreciable.

Thanks in advance
Aman Jindal



















































Samuel Andermatt

unread,
Apr 12, 2016, 8:51:25 AM4/12/16
to cp...@googlegroups.com
Which arch file did you use and which compiler version do you have?

Aman Jindal

unread,
Apr 12, 2016, 9:06:41 AM4/12/16
to cp2k
Hi, I used CRAY-XC30-gfortran-cuda.psmp arch file and I have gcc-5.2.0 version. Attached is the modified arch file.
Thank you
CRAY-XC30-gfortran-cuda.psmp

Aman Jindal

unread,
Apr 13, 2016, 5:55:02 AM4/13/16
to cp...@googlegroups.com
Hello Samuel Andermatt,
I found that this error is coming only if I am compiling CP2K(version 3.0)  with plumed (version 2.2.1). Because I compiled cp2k with and without plumed today using CRAY-XC30-gfortran.psmp (this time without cuda) arch file. Now without plumed, it is working fine, but with plumed it is giving the same error (mentioned earlier).

Do you have any idea how can I resolve this issue..

Thank you,
Aman

--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
To post to this group, send email to cp...@googlegroups.com.
Visit this group at https://groups.google.com/group/cp2k.
For more options, visit https://groups.google.com/d/optout.

Alfio Lazzaro

unread,
Apr 13, 2016, 7:22:27 AM4/13/16
to cp2k
Dear Aman,
in your arch file I see that you are using "-mavx" flag. Could you remove it and try to recompile? Your error message says that there is something wrong with the flags used for the compilation (I assume the vector instructions), and from your last email it is seems that there is something wrong with PLUMED. Do you know how it was compiled?

Alfio

Aman Jindal

unread,
Apr 14, 2016, 3:10:28 AM4/14/16
to cp...@googlegroups.com
Hello Alfio,
Thanks your valuable response. I tried compiling cp2k after removing the flag "-mavx" , but I am getting the following error:
collect2: error: ld returned 1 exit status

Plumed was compiled using the following commands:
1) ./configure CXX=CC CC=cc F77=ftn F90=ftn  --prefix=/path/for/exe --enable-mpi
2) make
3) make install

Aman

Alfio Lazzaro

unread,
Apr 14, 2016, 5:23:14 AM4/14/16
to cp2k
Interesting, there must something before the error you are mentioning. Have you make "clean" of CP2K before recompiling it with the flag?

Alfio

Aman Jindal

unread,
Apr 14, 2016, 5:38:10 AM4/14/16
to cp...@googlegroups.com
Yeah I did that. Now, once more I am trying to compile plumed again and then cp2k (without this flag).
Hope it will work. 
Thank you

DEC014

unread,
Apr 14, 2016, 1:25:06 PM4/14/16
to cp...@googlegroups.com
Aman,

I have received very similar errors before on Cray machines (most recently a few Cray XC40  and XC30 machines).  Are you getting a Cholesky Error along the lines of "Cholesky decompose failed: the matrix is not positive definite or ill-conditioned" in your output files (not the error files)?  My problems occurred when I tried statically linking Intel MKL libraries for the BLAS, LAPACK, SCALAPACK, and BLACS libraries.  Even when I did not use the Cray cc wrapper to exclude any conflicts from Libsci_gnu_49 I was still getting the error.  

I see you are not using MKL libraries, but you might want to try a different version of LibSci Module.  Also, are you compiling in the Batch Nodes or Compute nodes?  I've had programs that misbehave when they aren't compiled inside the batch nodes.  To compile in a batch node, submit an interactive PBS job (qsub -I -A <Allocation> -q debug -l select=1:ncpus=24:mpiprocs=24 -l walltime=0:30:00 -N cp2k_build -j oe -o cp2k.oe) of course you'll have to change ncpus and mpiprocs to your Cray's processors.  When building inside Batch Nodes, please do not use more than 4 cpus for make (make -j 4).  These Batch nodes are really the PBS MOM nodes that dole out the PBS tasks to the compute nodes and Admins can get angry when you overload them.  I have attached the Architecture file that worked for me for my Cray XC40 build, of course this is for POPT and you'd have to make the necessary changes to build a Hybrid Executable.

Regards,
David 
Cray-XC40-gfortran-mkl-smm.popt

Aman Jindal

unread,
Apr 15, 2016, 8:02:14 AM4/15/16
to cp...@googlegroups.com
Thank you David, I will see this arch file and will try to compile cp2k (after modifying the file). I think I was getting this Cholesky error when I was trying to compile cp2k first with intel compilers.
Will be back soon with more specific details.
Thank you once again,
Aman

On Thu, Apr 14, 2016 at 10:55 PM, DEC014 <dcoss...@gmail.com> wrote:
Aman,

I have received very similar errors before on Cray machines (most recently a few Cray XC40  and XC30 machines).  Are you getting a Cholesky Error along the lines of "Cholesky decompose failed: the matrix is not positive definite or ill-conditioned" in your output files (not the error files)?  My problems occurred when I tried statically linking Intel MKL libraries for the BLAS, LAPACK, SCALAPACK, and BLACS libraries.  Even when I did not use the Cray cc wrapper to exclude any conflicts from Libsci_gnu_49 I was still getting the error.  

I see you are not using MKL libraries, but you might want to try a different version of LibSci Module.  Also, are you compiling in the Batch Nodes or Compute nodes?  I've had programs that misbehave when they aren't compiled inside the batch nodes.  To compile in a batch node, submit an interactive PBS job (qsub -I -A <Allocation> -q debug -l select=1:ncpus=24:mpiprocs=24 -l walltime=0:30:00 -N cp2k_build -j oe -j cp2k.oe) of course you'll have to change ncpus and mpiprocs to your Cray's processors.  When building inside Batch Nodes, please do not use more than 4 cpus for make (make -j 4).  These Batch nodes are really the PBS MOM nodes that dole out the PBS tasks to the compute nodes and Admins can get angry when you overload them.  I have attached the Architecture file that worked for me for my Cray XC40 build, of course this is for POPT and you'd have to make the necessary changes to build a Hybrid Executable.

Aman Jindal

unread,
Apr 16, 2016, 6:55:12 AM4/16/16
to cp2k
Hello CP2K Users,

I am able to compile CP2K on CRAY-XC40 machine using CRAY-XC30-gfortran-cuda.psmp arch file, but I could not compile it with cuda (using CRAY-XC30-gfortran-cuda.psmp). In both the cases I linked plumed (installed prior) in the arch file. I am getting the following error in case of cuda version:

nvcc -c -D__FFTW3 -D__parallel -D__SCALAPACK  -D__ACC -D__DBCSR_ACC -D__PLUMED2 -D__HAS_smm_dnn -O3 -arch sm_35 /mnt/lustre/ipc2/ipcvaish/cp2k-3.0/src/dbcsr/libsmm_acc/libcusmm/libcusmm.cu
Updating archive /mnt/lustre/ipc2/ipcvaish/cp2k-3.0/lib/CRAY-XC30-gfortran-cuda/psmp/libcusmm.a
ftn -c -D__FFTW3 -D__parallel -D__SCALAPACK  -D__ACC -D__DBCSR_ACC -D__PLUMED2 -D__HAS_smm_dnn -O3 -fopenmp  -funroll-loops -ffast-math -ftree-vectorize -ffree-form -ffree-line-length-512 -D__COMPILE_ARCH="\"CRAY-XC30-gfortran-cuda\"" -D__COMPILE_DATE="\"Sat Apr 16 05:34:10 CDT 2016\"" -D__COMPILE_HOST="\"clogin72\"" -D__COMPILE_REVISION="\"svn:16458\"" -D__DATA_DIR="\"/mnt/lustre/ipc2/ipcvaish/cp2k-3.0/data\"" -D__SHORT_FILE__="\"acc/acc_device.F\"" /mnt/lustre/ipc2/ipcvaish/cp2k-3.0/src/acc/acc_device.F
No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used.
Load a valid targeting module or set CRAY_CPU_TARGET
/mnt/lustre/ipc2/ipcvaish/cp2k-3.0/src/acc/../base/base_uses.f90:4.6:
    Included at /mnt/lustre/ipc2/ipcvaish/cp2k-3.0/src/acc/acc_device.F:11:

  USE base_hooks,                      ONLY: cp__a,&
      1
Fatal Error: Cannot read module file 'base_hooks.mod' opened at (1), because it was created by a different version of GNU Fortran
make[3]: *** [acc_device.o] Error 1
make[2]: *** [all] Error 2
make[1]: *** [psmp] Error 2
make: *** [all] Error 2
 
Attached are the arch. files used for both the cases. Can you please figure what's going wrong, because I am scratching my head from more than a  week over this.

Thank you,
Aman
CRAY-XC30-gfortran.psmp
CRAY-XC30-gfortran-cuda.psmp
Reply all
Reply to author
Forward
0 new messages