Compilations with Intel (XE 2013) for CP2K-trunk (2.7dev) & regtests errors

334 views
Skip to first unread message

Rolf David

unread,
Jun 10, 2015, 6:38:25 AM6/10/15
to cp...@googlegroups.com
Hi all

I've encountered several problem with CP2K compilation (trunk-rev-15402, popt) with Intel Compiler/MPI/MKL (icc/ifort 14.0.2 : mpi 4.1 Update 2 : mkl 11.1.2)

First my "out of the box" arch file (libint is 1.1.4, libxc 2.0.1):
 
CC       = mpiicc
CPP      
=
FC      
= mpiifort
LD      
= mpiifort
AR      
= xiar -r
DFLAGS  
= -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -D__MKL -D__FFTW3 -D__LIBINT -D__LIBXC2
CPPFLAGS
=
FCFLAGS  
= $(DFLAGS) $(INC) -O3 -axAVX -xSSE4.2 -heap-arrays 64 -funroll-loops -fpp -free
FCFLAGS2
= $(DFLAGS) $(INC) -O1 -axAVX -xSSE4.2 -heap-arrays 64 -fpp -free
LDFLAGS  
= $(FCFLAGS)
LIBS
= -L$(MKL_LIB) -Wl,-rpath,$(MKL_LIB) \
       
-lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 \
       
-lmkl_sequential -lmkl_core \
        $
(FFTW_LIB)/libfftw3xf_intel.a \
        $
(LIBINT_LIB)/libderiv.a $(LIBINT_LIB)/libint.a -lstdc++ \
        $
(LIBXC_LIB)/libxc.a \
       
-lpthread -lm
OBJECTS_ARCHITECTURE
= machine_intel.o
graphcon
.o: graphcon.F
        $
(FC) -c $(FCFLAGS2) $<
# In order to avoid segv when HF exchange for example
qs_vxc_atom
.o: qs_vxc_atom.F
        $
(FC) -c $(FCFLAGS2) $<

We are calling it test-O3 
 

Number of FAILED  tests 56
Number of WRONG   tests 18
Number of CORRECT tests 2559
Number of NEW     tests 16
Total number of   tests 2649
GREPME 56 18 2559 16 2649 X

Most failed are regtesting Fist (regtest (-5)(-12)(-pol)(-6)(-15)(-1-3)(-4)(-1-2)(-2)(-8)(-9)(-11) (and /QS/regtest-ot/H2-BECKE-MD.inp, QMMM/SE/regtest/ mol_CSVR_gen*.inp,QMMM/SE/regtest_2/water_g3x3_excl_*m.inp)


If I do the same with -O2 instead of -O3 (test-O2)


Number of FAILED  tests 0
Number of WRONG   tests 17
Number of CORRECT tests 2616
Number of NEW     tests 16
Total number of   tests 2649
GREPME 0 17 2616 16 2649 X


So I assume some files has to be compiled with -O1 (on top of the two ones with -O1) -> Fail segfault

And 10 errors are "unacceptable" (greater than one order : rel error 1e-13 tolerence is 1-14 is considered it ok, but not 1e-12)


and -O1 instead of -O3 (test-O1)


Number of FAILED  tests 0
Number of WRONG   tests 81
Number of CORRECT tests 2568
Number of NEW     tests 0
Total number of   tests 2649
GREPME 0 81 2568 0 2649 X


More wrong (9 are "unacceptable" but different from -O2)



Also I've tried the -O2 on all, and -O1 on two files : (as hinted by Iain Bethune in https://groups.google.com/forum/#!searchin/cp2k/intel$20$20after$3A2014$2F01$2F01/cp2k/YZ3gVI-6Au0/uJZC8QKSzxUJ) (test-IB)


Number of FAILED  tests 166
Number of WRONG   tests 16
Number of CORRECT tests 2467
Number of NEW     tests 0
Total number of   tests 2649
GREPME 166 16 2467 0 2649 X


This setup is wrose thant the previous -O2/-O1 files. I assume this was only valid for 2.5.1 as in the post.


And also using the Arch files from (http://support.euforia-project.eu/phi/popt/regtest-arch, but without -D__HAS_smm_dnn -D__HAS_LIBGRID) (test-EPCC)


Number of FAILED  tests 159
Number of WRONG   tests 38
Number of CORRECT tests 2436
Number of NEW     tests 16
Total number of   tests 2649
GREPME 159 38 2436 16 2649 X


Lots more of failed: influence of LIBGRID/smm_dnn ? Or maybe the files compiled in -O1 aren't showed. Or since it's ins't the same compiler (XE 2015 vs XE 2013)



So I have some questions (first goal is no FAILED test while maintaining the best speed (-O1 is clearly slower, but maybe the diff -O3 vs -O2 is next to nothing, our cluster is small so we need to push it to the limit so we went for -O3 first))


-Is something wrong in our arch file ?

-Someone managed to compile in -O3 (or -O2) with some files in -O1 (I deduced graphcon.F and qs_vxc_atom.F must be compiled -O1, but maybe other, or some in -O2) with intel compiler 2013 (14.0.x versions)  and no big errors ?

-O2 vs -O3 ?

-What can I do to see what's wrong in FAILED/segfault, -traceback -g, but I what do I look for ? (I'm no expert !) or also what 'file.F' are included in each regtest if it's possible to know easily for now ?


-Also I noticed big errors being different from -O3/-O2/-O1 (the 3 first arch I used), and since that can I assume there is nothing wrong with libint/libxc/mkl, just -Oflags ? :


-O3 + -O1 on  graphcon.F and qs_vxc_atom.F (test -O3)

NEB/regtest-1/2gly_EB-NEB.inp.out 

NEB/regtest-2/2gly_DIIS-SM.inp.out 

NEB/regtest-2/2gly_DIIS-DNEB.inp.out

NEB/regtest-2/2gly_DIIS-NEB.inp.out 

relative error :   2e-02 >  numerical tolerance = 8e-12/-11/-13

Fist/regtest-3/water_2_TS_CG.inp.out 

relative error :   2.21900214e-06 >  numerical tolerance = 1.0E-14

QS/regtest-ri-mp2/opt_basis_O_auto_gen.inp.out

relative error :   6.54370492e-02 >  numerical tolerance = 1e-04

QS/regtest-almo-2/FH-chain.inp.out

relative error :   2.00884032e-10 >  numerical tolerance = 1e-13

QS/regtest-almo-1/almo-x.inp.out

QS/regtest-almo-1/almo-guess.inp.out

QS/regtest-almo-1/almo-scf.inp.out 

 relative error :   6e-12 >  numerical tolerance = 4/7/8e-14

SE/regtest-3-4/Al2O3.inp.out

 relative error :   2.51373362e-05 >  numerical tolerance = 6e-14


-O2 + -O1 on  graphcon.F and qs_vxc_atom.F (---> Same errors as test-O3) (test -O2)

NEB/regtest-1/2gly_EB-NEB.inp.out 

NEB/regtest-2/2gly_DIIS-SM.inp.out 

NEB/regtest-2/2gly_DIIS-DNEB.inp.out

NEB/regtest-2/2gly_DIIS-NEB.inp.out 

relative error :   2e-02 >  numerical tolerance = 8e-12/-11/-13

Fist/regtest-3/water_2_TS_CG.inp.out :

relative error :   2.21900214e-06 >  numerical tolerance = 1.0E-14

QS/regtest-ri-mp2/opt_basis_O_auto_gen.inp.out

relative error :   6.54370492e-02 >  numerical tolerance = 1e-04

QS/regtest-almo-2/FH-chain.inp.out 

relative error :   2.00884032e-10 >  numerical tolerance = 1e-13

QS/regtest-almo-1/almo-x.inp.out

QS/regtest-almo-1/almo-guess.inp.out

QS/regtest-almo-1/almo-scf.inp.out 

 relative error :   6e-12 >  numerical tolerance = 4/7/8e-14

SE/regtest-3-4/Al2O3.inp.out

 relative error :   2.51373362e-05 >  numerical tolerance = 6e-14


-O1 on all (---> Differents errors as test-O3/-O2) (test -O1)

QS/regtest-ps-implicit-1-3/Ar_mixed_planar.inp.out

 relative error :   1.02615640e-09 >  numerical tolerance = 1e-12

QS/regtest-ps-implicit-2-2/H2O_mixed_periodic_planar.inp.out :

 relative error :   3.64315287e-07 >  numerical tolerance = 1e-12

QS/regtest-ps-implicit-2-3/H2O_mixed_periodic_cylindrical.inp.out :

 relative error :   3.99727816e-07 >  numerical tolerance = 1e-12

QS/regtest-ps-implicit-1-2/Ar_mixed_periodic_planar.inp.out

 relative error :   1.63406231e-06 >  numerical tolerance = 1e-12

QS/regtest-admm-4/MD-1.inp.out

 relative error :   6.79583221e-11 >  numerical tolerance = 7e-13

QS/regtest-admm-4/MD-2_no_OT.inp.out

relative error :   1.05116397e-11 >  numerical tolerance = 1.0E-14

Fist/regtest-3/2d_pot.inp.out

relative error :   2.56003763e-01 >  numerical tolerance = 5e-06

Fist/regtest-1-2/deca_ala_reftraj.inp.out

 relative error :   5.45234681e-12 >  numerical tolerance = 1.0E-14

Fist/regtest-4/H2O-meta-combine.inp.out

 relative error :   2.41671120e-02 >  numerical tolerance = 1.0E-14



Any help/hint/info/experience will be well recieved.


Also we have gcc/gfortran on the cluster. Is intel faster for CP2K or roughly the same as GCC ?


Thank you for your time if you've read all this !


Kind regards,


Rolf David


Iain Bethune

unread,
Jun 10, 2015, 11:26:18 AM6/10/15
to cp...@googlegroups.com
Hi Rolf,

Let me have a go at answering some of your (many) questions!

* In some of my (not-so-recent) testing, I found that the choice of gfortran or Ifort (or the precise optimisation levels of each), makes very little difference to the performance of the code. In practice, as long as you compile for the vector instruction set of your CPU, then most of the runtime gains can be found from 5 places: a well-tuned BLAS/LAPACK/BLACS/ScaLAPACK stack (MKL is of similar quality to Cray Libsci here), FFTW3 (or MKL’s FFTW3 interface), libgrid (see cp2k/tools/autotune_grid) and libsmm (see cp2k/tools/build_libsmm), a good MPI library/interconnect.

* CP2K is very well-tested with gfortran (in fact I think we test nightly devel builds of gcc with CP2K trunk), so it’s possible to compile CP2K at -O3 with everything since GCC 4.6. Intel we have good coverage of (currently 15.x compilers), and we do report bugs, which do get fixed, but only after the release of beta compilers, so the turnaround time is much longer, and some outstanding bugs remain (relating to OpenMP only).

* It’s not clear from your email re: testing with -O2 - your FAILED count is 0, but you said that some tests fail with segfaults - from my notes I believe the minimal set of files which must be compiled at low optimisation with ifort 14.0.2 are :

external_potential_types.F -O1
qs_linres_current.F -O1
qs_vxc_atom.F -O1
mp2_optimize_ri_basis.F -O0 (I believe a code change was made that should allow this to be compiled with -O1, we are still waiting for Intel to fix the relevant compiler bug though).

* Re: numerical errors, these do not mean very much - many of the tests are not converged, do not use production quality settings, or are otherwise numerically unstable (so they run quickly). Thus they will vary, sometimes by more than several orders of magnitude between different optimisation settings, compiler versions etc. Thus the ‘WRONG’ count should only really be used as a regression test when code changes are made that should not affect the numerical behaviour of the code.

* If you do want to compare against a fixed reference, then the gfortran-pdbg (-O1) is a reasonable baseline - see http://dashboard.cp2k.org/archive/mkrack-pdbg/index.html . The do_regtest script now prints the values that are compared in the test output. However, to be sure you really need to run some larger simulations and check that your results are physically sensible (which of course we should always do!).

I hope that’s helpful!

Cheers

- Iain

--

Iain Bethune
Project Manager, EPCC

Email: ibet...@epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, EH9 3FD
> --
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
> To post to this group, send email to cp...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Rolf David

unread,
Jun 10, 2015, 12:32:38 PM6/10/15
to cp...@googlegroups.com
Hi Iain,

It is really helpful !
I'll respond/comment point by point if you allow me.

Points 1 & 4/5 :

I intended to use the regtest as a base test for optimising CP2K on our cluster. 
First goal was to have a working "default" build. And then add more enhancement: the two you mentioned (libgrid and libsmm), also testing ELPA. (and maybe later move on onto MPI/OpenMP mostly for HF and memory needed for integrals)
So if -O2 it's not the determining factor in speed vs -O3 with ifort, I will stick to -O2 and some -O1/O0

Point 2:

Since we have intel procs, intel mkl/mpi I though intel ifort was the choice against gfortran. Maybe I was also dupped by intel advertising: "on intel machine intel is the best". And you know. if it costs money it should be better (Guess I was very wrong in this case and also relieved they didn't bought it for that.) and maybe because I'm just a theoretical chemist and this a new world (compilation/optimisation vs utilisation) but I'm learning ! 

Maybe after, the OpenMP part, i'll go with gfortran (and still Intel MKL/MPI because they are better than other BLAS...ScaLAPACK stack/OpenMPI ?)

Point 3: 

Yes, I was confused to when I reread (what arch result in what regtest)  : "So I assume some files...-> Fail segfault", I ment some file has to be compiled with -O2 (for the -O3 to work) but since there is little difference in speed (what I see from benchmarking quickly and from your tests also),  I'll start for all in -O2 and test with the 4 files you provided, and one by one. I know that qs_vxc_atom.F is vital. And I'll start from here.

From the tests I already ran, I've found the H2-BECKE-MD.inp test need et_coupling.F set to -O2 (instead of -O3) to avoid FAILED but if all is in -O2 I guess this problem is no more ! (And also find the others, and if they are dependant/independant)

Point 4:

Ok. But I was surprised about that. 

For exemple the 4 NEB tests :
Correct in   -O1
Wrong in     -O2 (or -O3)
compared to EPCC Hydra cluster popt
But
Wrong in    -O1
Correct in   -O2 (or -O3)
compared to Sheffield Iceberg cluster popt

So... I was confused. 

And back to point 5:

I'll test against that also. Thanks for the advice.

I have some long simulation (AIMD, good correlation with experience) to check after, but the reg-test was in a way of testing segfault/large error quickly to put a "Warning".
If no WRONG result be careful. If one WRONG be careful as hell for this type of calculation. But mostly to check if I don't break old thing by changing libs/optimisation and maybe even code.


So I'll stick with ifort -O2, maybe later test gfortran -O3.

Anyway thanks for throwing some (a lot of !) light on my problems/questions.

Kind regards

Rolf

Rolf David

unread,
Jun 12, 2015, 2:26:47 PM6/12/15
to cp...@googlegroups.com
Hi,

The advices worked well. I'm now error free.

I've moved on compiling libsmm and libgrid. 

Now i've run in another problem, the generation of libsmm.

The tiny part : 

./generate -c config/linux.intel -j 0 -t 16 tiny1


Generate master file output_linux.intel/tiny_find_1_1_1__24_24_24.f90

make
: execvp: /bin/sh: Argument list too long

make
: *** [output_linux.intel/tiny_find_1_1_1__24_24_24.f90] Error 127



The problem comes from the command by verbosing:


make -j 16 -f ../Makefile.tiny_dnn_linux.intel all SI=1 EI=13824


and ifort died 


if anobody has any idea of what I'm doing wrong, I'll be glad (and relieved !)


Kinds regards


Rolf

Frank Uhlig

unread,
Jun 13, 2015, 4:18:37 AM6/13/15
to cp...@googlegroups.com
Hi Rolf,

that is a system limitation and I am not completely sure about the proper workaround. The maximum length of your command line is either 128 KB or 1/4 of your stack size (if linux kernel >= 2.6.23). So you could try increasing your stack limit (e.g., ulimit -s unlimited) and then run the generate command again. (also see: http://man7.org/linux/man-pages/man2/execve.2.html)

Or, you can circumvent this issue by reducing the exhaustive search in the tiny1 procedure by setting dims_tiny=`seq 1 12` in the config.in in the build_libsmm directory. Then the verbose make command that you quote will not be as long (and the suggestion in the config.in is to use something between 8 and 12).

Could you be so kind and post your final arch file, because I've been experiencing some trouble recently as well, and would be thankful for a working reference.

Cheers,

Frank

--

Alfio Lazzaro

unread,
Jun 13, 2015, 9:06:52 AM6/13/15
to cp...@googlegroups.com
Hi Rolf,
the problem is that the list of arguments is too long (there are 13824 entries, each one with at least 10 chars) 
Could you run the test in parallel? I mean you can use "-j #". In this case you have also to specify the wlm by using "-w " flag (pbs/slurm or a new one for your system). Please see the README for the parallel execution steps.
If you don't want to run on the cluster, you can still use the parallel execution on the login node, but then you have to declare an "empty" wlm file under config directory:

> cat no.wlm
batch_cmd() {
$@
}
  
Therefore you can use "-j 100 -w no".
Note that I have never tried such a case, so I'm not sure it will work out-of-the-box. Let me how it goes.

Cheers,

Alfio

Rolf David

unread,
Jun 15, 2015, 4:10:20 AM6/15/15
to cp...@googlegroups.com
Hi Frank & Alfio

First I tried the ulimit -s unlimited and I was already set to unlimited.

Next I try the dummy batch run. and for now, it's not finished but I can already said it is running correclty (doing the 100 jobs in "sequential"). I'll keep you posted.

Thanks both of you for the advice.

My arch file (for now, no libgrid/libsmm or ELPA yet) is in the file attached.

I'll repost the final when all is complete and working.

Best Regards,

Rolf


Linux-x86-64-intel-avx-libint-1.1.4-libxc-2.2.2-2.7.t_15415.popt-default-ARCH

Frank Uhlig

unread,
Jun 15, 2015, 4:19:47 AM6/15/15
to cp...@googlegroups.com
Hi Rolf and Alfio,

pity that the ulimit version did not work. Then it might be some other limit, but I am not sure right now.

I like Alfio's idea of the local submission. It runs in sequential and I think the reason is the following. The for loop around line 127 in the generate.bash file will always wait until the execution of the following command in the run_make function (line 136)

${run_cmd} make -j ${ntasks} -f ../${make_file} ${target} SI=${element_start} EI=${element_end}

has finished and won't actually run in parallel. This is no problem with queuing systems, because the submit command will always exit, but the job will run (in the queuing system). This is not the case with the current local version.

So you could change the command to run in the background that seems to work for me right now. Like the following:

batch_cmd() {
$@ &
}

Best,

Frank




--

Rolf David

unread,
Jun 15, 2015, 7:08:36 AM6/15/15
to cp...@googlegroups.com
Hi again,

The 'workaround' proposed by Alfio

> cat no.wlm
batch_cmd() {
$@
}
  
Therefore you can use "-j 100 -w no".
Note that I have never tried such a case, so I'm not sure it will work out-of-the-box. Let me how it goes.

worked well in my case. I was able to compile. And run tiny1 all of it. And tiny2 in standard after.

I'm moving on onto small1.

Also I've compiled libgrid (on a compute node) and successfully integrated into CP2K. Is there a "benchmark" to see the effect, I mean a file in the tests-folder where people get difference without and with libgrid ?
I run H2O-512.inp, but not noticeable difference. I run a test I had (QM/MM with hybrid functional) and I didn't see noticeable effect (on short tests), and in the readme it said about "Gaussian to Plane wave transformations", so I assume a speed up in some routine in GPW (or even GAPW no ?)
Also, Iain said (https://groups.google.com/forum/#!searchin/cp2k/libgrid/cp2k/DU3KNkwM4as/8_bO8zjWZ0sJ) and here again, it's performance-critical.

So if I have a "working" benchmark, I can see if I miscompiled it (no error in the out of the libgrid compilation), or maybe wrong compiler option and subroutines affected:  integrate_v_rspace for example  ?

Regards, 

Rolf

Iain Bethune

unread,
Jun 15, 2015, 7:56:27 AM6/15/15
to cp...@googlegroups.com
Hi Rolf,

libgrid should affect the performance in calculate_rho_elec and integrate_v_rspace. For H2O-64 on low processor counts (e.g. 24) these should take up >50% of the runtime. Larger H2O-XXX tend to mask the effect as the cubically scaling operations in OT tend to dominate.

- Iain

--

Iain Bethune
Project Manager, EPCC

Email: ibet...@epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, EH9 3FD

> --
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
> To post to this group, send email to cp...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.


Alfio Lazzaro

unread,
Jun 15, 2015, 8:06:16 AM6/15/15
to cp...@googlegroups.com
Hi Frank,
running the jobs in background is not a good idea, indeed they will run on the same node and clearly you will have wrong performance results from tiny phase (multiple jobs on the same core).

I'm glad that my solution works (l can add it to the SVN repository).

Best regards,

Alfio

Alfio Lazzaro

unread,
Jun 15, 2015, 8:08:35 AM6/15/15
to cp...@googlegroups.com
Hi Rolf,
I think for the small phase (and all remaining checks) you don't need to use any trick since there are less entries (2744 if I recall correctly...).

Alfio

Frank Uhlig

unread,
Jun 15, 2015, 8:23:13 AM6/15/15
to cp...@googlegroups.com
Hi Alfio,

I agree that it is not the best choice. I am running it anyhow right now and will compare it to 'something more reasonable'.
Because what I don't see, and maybe I am not involved in it enough, how running it in the background on a sufficient number of processors will give you different results from the hypothetical

./generate -c ... -j 0 -t 40 tiny1

for let's say a local machine with 40 cores (and if it would work).

Best,

Frank

Rolf David

unread,
Jun 15, 2015, 8:42:40 AM6/15/15
to cp...@googlegroups.com
Hi, 

So my results:
With libgrid

FC_comp="ifort -free -fpp"

FCFLAGS="-O3 -axAVX -xSSE4.2 -funroll-loops"

& make -j 1 alone in a node (16 cores but only one used 1)


and then H2O-64.inp, 16 cores.


I go from 59 sec to 53 sec (10%) and from 56 to 55 sec (3%). So I guess it works.
Precious time won.

Thanks again.

Rolf


On Monday, June 15, 2015 at 1:56:27 PM UTC+2, IBethune wrote:
Hi Rolf,

libgrid should affect the performance in calculate_rho_elec and integrate_v_rspace.  For H2O-64 on low processor counts (e.g. 24) these should take up >50% of the runtime.  Larger H2O-XXX tend to mask the effect as the cubically scaling operations in OT tend to dominate.

- Iain

--

Iain Bethune
Project Manager, EPCC

Email: ibet...@epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax
: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, EH9 3FD

Iain Bethune

unread,
Jun 15, 2015, 9:04:49 AM6/15/15
to cp...@googlegroups.com
Great, that sounds reasonable. The default versions included in the main source actually work quite well for most Intel CPUs. 10% gain is pretty good.

- Iain

--

Iain Bethune
Project Manager, EPCC

Email: ibet...@epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, EH9 3FD

> > On 15 Jun 2015, at 12:08, Rolf David <rolf.d...@gmail.com> wrote:
> >
> > Hi again,
> >
> > The 'workaround' proposed by Alfio
> >
> > > cat no.wlm
> > batch_cmd() {
> > $@
> > }
> >
> > Therefore you can use "-j 100 -w no".
> > Note that I have never tried such a case, so I'm not sure it will work out-of-the-box. Let me how it goes.
> >
> > worked well in my case. I was able to compile. And run tiny1 all of it. And tiny2 in standard after.
> >
> > I'm moving on onto small1.
> >
> > Also I've compiled libgrid (on a compute node) and successfully integrated into CP2K. Is there a "benchmark" to see the effect, I mean a file in the tests-folder where people get difference without and with libgrid ?
> > I run H2O-512.inp, but not noticeable difference. I run a test I had (QM/MM with hybrid functional) and I didn't see noticeable effect (on short tests), and in the readme it said about "Gaussian to Plane wave transformations", so I assume a speed up in some routine in GPW (or even GAPW no ?)
> > Also, Iain said (https://groups.google.com/forum/#!searchin/cp2k/libgrid/cp2k/DU3KNkwM4as/8_bO8zjWZ0sJ) and here again, it's performance-critical.
> >
> > So if I have a "working" benchmark, I can see if I miscompiled it (no error in the out of the libgrid compilation), or maybe wrong compiler option and subroutines affected: integrate_v_rspace for example ?
> >
> > Regards,
> >
> > Rolf
> >
> > --
> > You received this message because you are subscribed to the Google Groups "cp2k" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
> > To post to this group, send email to cp...@googlegroups.com.
> > Visit this group at http://groups.google.com/group/cp2k.
> > For more options, visit https://groups.google.com/d/optout.
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> --
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
> To post to this group, send email to cp...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.


--

Alfio Lazzaro

unread,
Jun 15, 2015, 3:22:56 PM6/15/15
to cp...@googlegroups.com
Hi Frank,
well, you can set a given affinity mask (by using KMP_AFFINITY if you are using the intel compiler) for a single job, while it doesn't work across multiple jobs... 
In any case, tiny phase is not so important to get the best result, but this is not the case for the small phase where you really want to use a single core per job.
Is there any specific reason why you cannot use parallel compilation on the cluster?

Alfio
Reply all
Reply to author
Forward
0 new messages