building hybrid MPI+OpenMP

758 views
Skip to first unread message

Steve Schmerler

unread,
Jul 8, 2013, 11:09:46 AM7/8/13
to cp2k-users
Hello

I'm trying to compile a hybrid MPI+OpenMP version with

* ifort 12.1
* mkl 10.3
* intel MPI 4.0.3
* fftw 3.3.3 threaded
* scalapack from mkl

The arch file:

-----------------------------------------------------------------------
MKL_LIB=$(MKLROOT)/lib/intel64
MKL_INC=$(MKLROOT)/include
FFTW_LIB=/home/schmerler/soft/lib/fftw/intel/3.3.3/lib
FFTW_INC=/home/schmerler/soft/lib/fftw/intel/3.3.3/include
CC = cc
CPP =
FC = mpiifort
LD = mpiifort
AR = ar -r
DFLAGS = -D__INTEL -D__parallel -D__BLACS -D__SCALAPACK -D__FFTW3
CPPFLAGS =
FCFLAGS = $(DFLAGS) -O2 -free -heap-arrays 64 -funroll-loops -fpp -axAVX \
-openmp -mt_mpi -I$(FFTW_INC)
FCFLAGS2 = $(DFLAGS) -O1 -free
LIBS = -lfftw3_threads -lfftw3 -liomp5 \
-lmkl_blacs_intelmpi_lp64 -lmkl_scalapack_lp64 \
-lmkl_intel_lp64 -lmkl_core -lmkl_sequential \
LDFLAGS = $(FCFLAGS) -L$(FFTW_LIB) -I$(FFTW_INC) -L$(MKL_LIB) -I$(MKL_INC) $(LIBS)

OBJECTS_ARCHITECTURE = machine_intel.o
graphcon.o: graphcon.F
$(FC) -c $(FCFLAGS2) $<
-----------------------------------------------------------------------

I see different errors, depending on which combo of MPI tasks and threads is
used:

* OMP_NUM_THREADS=1, mpirun -np 1

*****************************************************
*** ERROR in cp_fm_syevd_base (MODULE cp_fm_diag) ***
*****************************************************

*** Matrix diagonalization failed ***

*** Program stopped at line number 384 of MODULE cp_fm_diag ***

===== Routine Calling Stack =====

10 cp_fm_syevd_base
9 cp_fm_syevd
8 cp_dbcsr_syevd
7 subspace_eigenvalues_ks_dbcsr
6 prepare_preconditioner
5 init_scf_loop
4 scf_env_do_scf
3 qs_energies_scf
2 qs_forces
1 CP2K

* OMP_NUM_THREADS=1, mpirun -np 4

MKL ERROR: Parameter 4 was incorrect on entry to DLASCL
{ 1, 1}: On entry to
DSTEQR parameter number -3 had an illegal value
MKL ERROR: Parameter 5 was incorrect on entry to DLASCL
{ 0, 0}: On entry to
DSTEQR parameter number -3 had an illegal value

I had this one before and the reason was that the input geometry + used basis
caused NaNs which were apparently passed to a scalapack call. However, the
input is OK and works with a pure-MPI build. Therefore, I guess that the OMP
code calculates something wrong. Then in the case of 1 core, a serial lapack
call fails, while in the parallel case, a scalapack call does.

* OMP_NUM_THREADS=4, mpirun -np 1

Output hangs at "Extrapolation method: initial_guess", only one MPI taks is
running, but no threads.

I wanted to blame the MPI library, but Intel MPI says it supports
MPI_THREAD_FUNNELED. The same happens if I only link fftw3, not
fftw3_threads, so it's probably not fftw, either. So am I linking some
libraries wrong (in which case the problem is probably completety
trivial and I just don't see it)?

Thank you for your help!

best,
Steve

--
Steve Schmerler
Institut f�r Theoretische Physik
TU Freiberg, Germany

Iain Bethune

unread,
Jul 15, 2013, 4:45:06 AM7/15/13
to cp...@googlegroups.com
Hi Steve,

Try rebuilding without using the -heap-arrays 64 flag, as I have found problems with this in combination with OpenMP before (but not yet got to the root cause). Let me know if it helps or not!

Thanks

- Iain

--

Iain Bethune
Project Manager, EPCC

Email: ibet...@epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Mayfield Road, Edinburgh, EH9 3JZ
> Institut für Theoretische Physik
> TU Freiberg, Germany
>
> --
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
> To post to this group, send email to cp...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Steve Schmerler

unread,
Jul 18, 2013, 9:45:17 AM7/18/13
to cp...@googlegroups.com
On Jul 15 09:45 +0100, Iain Bethune wrote:
> Hi Steve,

[Sorry for the delayed response.]

> Try rebuilding without using the -heap-arrays 64 flag, as I have found problems with this in combination with OpenMP before (but not yet got to the root cause). Let me know if it helps or not!

No, unfortunately not. It only prevents the output to hang in the one
case mentioned and instead shows the same MKL error, just like all other
cases. Even with a very reduced arch file (no FFTW, no fancy compiler
flags), I get the same errors:

------------------------------------------------------------------
CC = cc
CPP =
FC = mpiifort
LD = mpiifort
AR = ar -r
DFLAGS = -D__INTEL -D__parallel -D__BLACS -D__SCALAPACK -D__FFTSG
CPPFLAGS =
FCFLAGS = $(DFLAGS) -O2 -free -fpp -openmp
FCFLAGS2 = $(DFLAGS) -O1 -free
LIBS = -lmkl_blacs_intelmpi_lp64 -lmkl_scalapack_lp64 \
-lmkl_intel_lp64 -lmkl_core -lmkl_sequential -liomp5
------------------------------------------------------------------

I'm out of ideas right now. Luckily, the MPI-only version is fast enough
for our systems at the moment on the number of cores where it scales.
Thanks.

best,
Steve

--
Steve Schmerler
Institut f�r Theoretische Physik
TU Freiberg, Germany

Iain Bethune

unread,
Jul 18, 2013, 9:55:42 AM7/18/13
to cp...@googlegroups.com
You also need to ensure the -openmp flag gets passed to the linker, that doesn't seem to be the case in your arch file (although it was in the earlier one). I'm afraid I can't think of anything else obviously wrong. We used an arch file (below) recently on another intel system and it worked OK. The major differences are we static linked everything, and used the libfftw3_omp rather than _threads. You could try and see if that helps any?

Cheers

- Iain

CC = cc
CPP =
FC = mpiifort -openmp
LD = mpiifort -openmp
AR = ar -r

FFTW_DIR = /users/fiona/fftw3_intel_threaded
LIBXC_DIR = /users/fiona/lib/libxc-intel
LIBINT_DIR = /users/fiona/lib/libint-intel
LIBSMM_DIR = /users/fiona/lib/cp2klibs-intel
LIBGRID_DIR = /users/fiona/lib/cp2klibs-intel

CPPFLAGS =
DFLAGS = -D__INTEL -D__FFTSG -D__FFTW3 -D__LIBINT -D__LIBXC2 -D__parallel \
-D__BLACS -D__SCALAPACK -D__HAS_smm_dnn -D__HAS_LIBGRID
CFLAGS = $(DFLAGS)
# Version 13.1.0 of compiler
MKLROOT = /apps/dommic/intel/composer_xe_2013.2.146/mkl/
FCFLAGS = $(DFLAGS) -O2 -g -traceback -fpp -free \
-I$(FFTW_DIR)/include
LDFLAGS = $(FCFLAGS) -static-intel
LIBS = $(FFTW_DIR)/lib/libfftw3.a $(FFTW_DIR)/lib/libfftw3_omp.a \
$(MKLROOT)/lib/intel64/libmkl_scalapack_lp64.a \
-Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \
$(MKLROOT)/lib/intel64/libmkl_sequential.a \
$(MKLROOT)/lib/intel64/libmkl_core.a \
$(MKLROOT)/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group \
-lpthread -lm \
$(LIBINT_DIR)/lib/libderiv.a $(LIBINT_DIR)/lib/libint.a -lstdc++ \
-L$(LIBXC_DIR)/lib -lxc \
$(LIBSMM_DIR)/libsmm_dnn.a $(LIBGRID_DIR)/libgrid.a

OBJECTS_ARCHITECTURE = machine_intel.o

--

Iain Bethune
Project Manager, EPCC

Email: ibet...@epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Mayfield Road, Edinburgh, EH9 3JZ








> Institut für Theoretische Physik
> TU Freiberg, Germany
>

Steve Schmerler

unread,
Jul 29, 2013, 5:27:57 PM7/29/13
to cp...@googlegroups.com
On Jul 18 14:55 +0100, Iain Bethune wrote:
> You also need to ensure the -openmp flag gets passed to the linker,
> that doesn't seem to be the case in your arch file (although it was in
> the earlier one). I'm afraid I can't think of anything else obviously
> wrong. We used an arch file (below) recently on another intel system
> and it worked OK. The major differences are we static linked
> everything, and used the libfftw3_omp rather than _threads. You could
> try and see if that helps any?

Thank you for the detailed arch file, which was very helpful. However no
luck so far. I tested static vs dynamic linking, libfftw3_omp
vs libfftw3_threads, ..., however without the number of extra libraries
lib{smm_dnn,grid,int,deriv,xc} which are all optional, as I understand.

I suspect my build environment to be the reason.

What version of Intel compilers and MPI was used on the machine you
mentioned? I used ifort 12.1, MPI 4.0.3, MKL 10.3

Thanks again.

best,
Steve

--
Steve Schmerler
Institut f�r Theoretische Physik
TU Freiberg, Germany

Steve Schmerler

unread,
Jul 29, 2013, 5:33:23 PM7/29/13
to cp...@googlegroups.com
On Jul 29 23:27 +0200, Steve Schmerler wrote:
> What version of Intel compilers and MPI was used on the machine you
> mentioned? I used ifort 12.1, MPI 4.0.3, MKL 10.3

Sorry, it's in your arch file (ifort 13.1), so I asume MKL 11 and Intel
MPI 4.1).

Iain Bethune

unread,
Aug 5, 2013, 4:42:41 AM8/5/13
to cp...@googlegroups.com
Hi Steve,

We used:
ifort 2013.2.146 (13.1.0)
impi 4.1.0.030
MKL 11.0.2

However, it's worth being aware that there are some regressions in the more recent versions of the compiler relating to OpenMP in CP2K that are currently fixed only in the 14.1 beta release, but if you use exactly these versions you should be OK!

Cheers

- Iain



--

Iain Bethune
Project Manager, EPCC

Email: ibet...@epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Mayfield Road, Edinburgh, EH9 3JZ








> Institut für Theoretische Physik
> TU Freiberg, Germany
>

Steve Schmerler

unread,
Aug 5, 2013, 8:05:41 AM8/5/13
to cp...@googlegroups.com
On Aug 05 09:42 +0100, Iain Bethune wrote:
> Hi Steve,
>
> We used:
> ifort 2013.2.146 (13.1.0)
> impi 4.1.0.030
> MKL 11.0.2
>
> However, it's worth being aware that there are some regressions in the more recent versions of the compiler relating to OpenMP in CP2K that are currently fixed only in the 14.1 beta release, but if you use exactly these versions you should be OK!

Thank you. The Intel system which I used cannot be upgraded in terms of
software, unfortunately. I'll test on some other systems (OpenMPI
based). Most likely I'll need to build a custom OpenMPI with threads
support on almost all of them. I'll report back when I get a version
running.

best,
Steve

--
Steve Schmerler
Institut f�r Theoretische Physik
TU Freiberg, Germany
Reply all
Reply to author
Forward
0 new messages