CC = mpiicc
CPP =
FC = mpiifort
LD = mpiifort
AR = xiar -r
DFLAGS = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -D__MKL -D__FFTW3 -D__LIBINT -D__LIBXC2
CPPFLAGS =
FCFLAGS = $(DFLAGS) $(INC) -O3 -axAVX -xSSE4.2 -heap-arrays 64 -funroll-loops -fpp -free
FCFLAGS2 = $(DFLAGS) $(INC) -O1 -axAVX -xSSE4.2 -heap-arrays 64 -fpp -free
LDFLAGS = $(FCFLAGS)
LIBS = -L$(MKL_LIB) -Wl,-rpath,$(MKL_LIB) \
-lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 \
-lmkl_sequential -lmkl_core \
$(FFTW_LIB)/libfftw3xf_intel.a \
$(LIBINT_LIB)/libderiv.a $(LIBINT_LIB)/libint.a -lstdc++ \
$(LIBXC_LIB)/libxc.a \
-lpthread -lm
OBJECTS_ARCHITECTURE = machine_intel.o
graphcon.o: graphcon.F
$(FC) -c $(FCFLAGS2) $<
# In order to avoid segv when HF exchange for example
qs_vxc_atom.o: qs_vxc_atom.F
$(FC) -c $(FCFLAGS2) $<
Number of FAILED tests 56
Number of WRONG tests 18
Number of CORRECT tests 2559
Number of NEW tests 16
Total number of tests 2649
GREPME 56 18 2559 16 2649 X
If I do the same with -O2 instead of -O3 (test-O2)
Number of FAILED tests 0
Number of WRONG tests 17
Number of CORRECT tests 2616
Number of NEW tests 16
Total number of tests 2649
GREPME 0 17 2616 16 2649 X
So I assume some files has to be compiled with -O1 (on top of the two ones with -O1) -> Fail segfault
And 10 errors are "unacceptable" (greater than one order : rel error 1e-13 tolerence is 1-14 is considered it ok, but not 1e-12)
and -O1 instead of -O3 (test-O1)
Number of FAILED tests 0
Number of WRONG tests 81
Number of CORRECT tests 2568
Number of NEW tests 0
Total number of tests 2649
GREPME 0 81 2568 0 2649 X
More wrong (9 are "unacceptable" but different from -O2)
Also I've tried the -O2 on all, and -O1 on two files : (as hinted by Iain Bethune in https://groups.google.com/forum/#!searchin/cp2k/intel$20$20after$3A2014$2F01$2F01/cp2k/YZ3gVI-6Au0/uJZC8QKSzxUJ) (test-IB)
Number of FAILED tests 166
Number of WRONG tests 16
Number of CORRECT tests 2467
Number of NEW tests 0
Total number of tests 2649
GREPME 166 16 2467 0 2649 X
This setup is wrose thant the previous -O2/-O1 files. I assume this was only valid for 2.5.1 as in the post.
And also using the Arch files from (http://support.euforia-project.eu/phi/popt/regtest-arch, but without -D__HAS_smm_dnn -D__HAS_LIBGRID) (test-EPCC)
Number of FAILED tests 159
Number of WRONG tests 38
Number of CORRECT tests 2436
Number of NEW tests 16
Total number of tests 2649
GREPME 159 38 2436 16 2649 X
Lots more of failed: influence of LIBGRID/smm_dnn ? Or maybe the files compiled in -O1 aren't showed. Or since it's ins't the same compiler (XE 2015 vs XE 2013)
So I have some questions (first goal is no FAILED test while maintaining the best speed (-O1 is clearly slower, but maybe the diff -O3 vs -O2 is next to nothing, our cluster is small so we need to push it to the limit so we went for -O3 first))
-Is something wrong in our arch file ?
-Someone managed to compile in -O3 (or -O2) with some files in -O1 (I deduced graphcon.F and qs_vxc_atom.F must be compiled -O1, but maybe other, or some in -O2) with intel compiler 2013 (14.0.x versions) and no big errors ?
-O2 vs -O3 ?
-What can I do to see what's wrong in FAILED/segfault, -traceback -g, but I what do I look for ? (I'm no expert !) or also what 'file.F' are included in each regtest if it's possible to know easily for now ?
-Also I noticed big errors being different from -O3/-O2/-O1 (the 3 first arch I used), and since that can I assume there is nothing wrong with libint/libxc/mkl, just -Oflags ? :
-O3 + -O1 on graphcon.F and qs_vxc_atom.F (test -O3)
NEB/regtest-1/2gly_EB-NEB.inp.out
NEB/regtest-2/2gly_DIIS-SM.inp.out
NEB/regtest-2/2gly_DIIS-DNEB.inp.out
NEB/regtest-2/2gly_DIIS-NEB.inp.out
relative error : 2e-02 > numerical tolerance = 8e-12/-11/-13
Fist/regtest-3/water_2_TS_CG.inp.out
relative error : 2.21900214e-06 > numerical tolerance = 1.0E-14
QS/regtest-ri-mp2/opt_basis_O_auto_gen.inp.out
relative error : 6.54370492e-02 > numerical tolerance = 1e-04
QS/regtest-almo-2/FH-chain.inp.out
relative error : 2.00884032e-10 > numerical tolerance = 1e-13
QS/regtest-almo-1/almo-x.inp.out
QS/regtest-almo-1/almo-guess.inp.out
QS/regtest-almo-1/almo-scf.inp.out
relative error : 6e-12 > numerical tolerance = 4/7/8e-14
SE/regtest-3-4/Al2O3.inp.out
relative error : 2.51373362e-05 > numerical tolerance = 6e-14
-O2 + -O1 on graphcon.F and qs_vxc_atom.F (---> Same errors as test-O3) (test -O2)
NEB/regtest-1/2gly_EB-NEB.inp.out
NEB/regtest-2/2gly_DIIS-SM.inp.out
NEB/regtest-2/2gly_DIIS-DNEB.inp.out
NEB/regtest-2/2gly_DIIS-NEB.inp.out
relative error : 2e-02 > numerical tolerance = 8e-12/-11/-13
Fist/regtest-3/water_2_TS_CG.inp.out :
relative error : 2.21900214e-06 > numerical tolerance = 1.0E-14
QS/regtest-ri-mp2/opt_basis_O_auto_gen.inp.out
relative error : 6.54370492e-02 > numerical tolerance = 1e-04
QS/regtest-almo-2/FH-chain.inp.out
relative error : 2.00884032e-10 > numerical tolerance = 1e-13
QS/regtest-almo-1/almo-x.inp.out
QS/regtest-almo-1/almo-guess.inp.out
QS/regtest-almo-1/almo-scf.inp.out
relative error : 6e-12 > numerical tolerance = 4/7/8e-14
SE/regtest-3-4/Al2O3.inp.out
relative error : 2.51373362e-05 > numerical tolerance = 6e-14
-O1 on all (---> Differents errors as test-O3/-O2) (test -O1)
QS/regtest-ps-implicit-1-3/Ar_mixed_planar.inp.out
relative error : 1.02615640e-09 > numerical tolerance = 1e-12
QS/regtest-ps-implicit-2-2/H2O_mixed_periodic_planar.inp.out :
relative error : 3.64315287e-07 > numerical tolerance = 1e-12
QS/regtest-ps-implicit-2-3/H2O_mixed_periodic_cylindrical.inp.out :
relative error : 3.99727816e-07 > numerical tolerance = 1e-12
QS/regtest-ps-implicit-1-2/Ar_mixed_periodic_planar.inp.out
relative error : 1.63406231e-06 > numerical tolerance = 1e-12
QS/regtest-admm-4/MD-1.inp.out
relative error : 6.79583221e-11 > numerical tolerance = 7e-13
QS/regtest-admm-4/MD-2_no_OT.inp.out
relative error : 1.05116397e-11 > numerical tolerance = 1.0E-14
Fist/regtest-3/2d_pot.inp.out
relative error : 2.56003763e-01 > numerical tolerance = 5e-06
Fist/regtest-1-2/deca_ala_reftraj.inp.out
relative error : 5.45234681e-12 > numerical tolerance = 1.0E-14
Fist/regtest-4/H2O-meta-combine.inp.out
relative error : 2.41671120e-02 > numerical tolerance = 1.0E-14
Any help/hint/info/experience will be well recieved.
Also we have gcc/gfortran on the cluster. Is intel faster for CP2K or roughly the same as GCC ?
Thank you for your time if you've read all this !
Kind regards,
Rolf David
./generate -c config/linux.intel -j 0 -t 16 tiny1
Generate master file output_linux.intel/tiny_find_1_1_1__24_24_24.f90
make: execvp: /bin/sh: Argument list too long
make: *** [output_linux.intel/tiny_find_1_1_1__24_24_24.f90] Error 127
The problem comes from the command by verbosing:
make -j 16 -f ../Makefile.tiny_dnn_linux.intel all SI=1 EI=13824
and ifort died
if anobody has any idea of what I'm doing wrong, I'll be glad (and relieved !)
Kinds regards
Rolf
--
--
> cat no.wlm
batch_cmd() {
$@
}
Therefore you can use "-j 100 -w no".
Note that I have never tried such a case, so I'm not sure it will work out-of-the-box. Let me how it goes.
FC_comp="ifort -free -fpp"
FCFLAGS="-O3 -axAVX -xSSE4.2 -funroll-loops"
& make -j 1 alone in a node (16 cores but only one used 1)
and then H2O-64.inp, 16 cores.
Hi Rolf,
libgrid should affect the performance in calculate_rho_elec and integrate_v_rspace. For H2O-64 on low processor counts (e.g. 24) these should take up >50% of the runtime. Larger H2O-XXX tend to mask the effect as the cubically scaling operations in OT tend to dominate.
- Iain
--
Iain Bethune
Project Manager, EPCC
Email: ibet...@epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, EH9 3FD