plumed2.2-hrex compilation on Cray with K20

328 views
Skip to first unread message

jan...@gmail.com

unread,
Jun 3, 2016, 6:41:12 PM6/3/16
to PLUMED users
Hi guys, 

I'm running into a problem that's preventing me from compiling plumed on the TITAN supercomputer. Here is what I do:

git clone https://github.com/GiovanniBussi/plumed2.git
git checkout v2.2-hrex

# and that step is all good – no error is returned
./configure --prefix=$HOME/privateopt/plumed2 

checking whether the C++ compiler works... yes

checking for C++ compiler default output file name... a.out

checking for suffix of executables...

checking whether we are cross compiling... no

checking for suffix of object files... o

checking whether we are using the GNU C++ compiler... yes 

checking whether CC -dynamic accepts -g... yes

checking for gcc... cc -dynamic

checking whether we are using the GNU C compiler... no

checking whether cc -dynamic accepts -g... yes

checking for cc -dynamic option to accept ISO C89... none needed

checking for gfortran... gfortran

checking whether we are using the GNU Fortran compiler... yes

checking whether gfortran accepts -g... yes

configure: Initial CXX:         CC -dynamic

configure: Initial CXXFLAGS:    -O

configure: Initial CPPFLAGS:

configure: Initial CFLAGS:      -g

configure: Initial LDFLAGS:

configure: Initial LIBS:

configure: Initial STATIC_LIBS:

configure: Initial LD:          CC -dynamic

configure: Initial LDSO:        CC -dynamic

configure: Initial SOEXT:

checking whether CC -dynamic accepts -fPIC... yes

checking whether cc -dynamic accepts -fPIC... yes

checking whether CC -dynamic accepts -Wall... no

checking whether CC -dynamic accepts -pedantic... yes

checking whether CC -dynamic accepts -ansi... no

checking whether CC -dynamic supports explicit... yes

configure: Now we will check compulsory headers and libraries

checking how to run the C++ preprocessor... CC -dynamic -E

checking for grep that handles long lines and -e... /usr/bin/grep

checking for egrep... /usr/bin/grep -E

checking for ANSI C header files... yes

checking for sys/types.h... yes

checking for sys/stat.h... yes

checking for stdlib.h... yes

checking for string.h... yes

checking for memory.h... yes

checking for strings.h... yes

checking for inttypes.h... yes

checking for stdint.h... yes

checking for unistd.h... yes

checking dirent.h usability... yes

checking dirent.h presence... yes

checking for dirent.h... yes

checking for readdir... yes

checking for dgemv... no

checking for dgemv_... yes

checking for dsyevr_... yes

configure: Now we will check for optional headers and libraries

checking for molfile_dcdplugin_init in -lmolfile_plugin... no

configure: WARNING: using internal molfile_plugins, which only support dcd/xtc/trr/trj/crd files

checking for dlopen in -ldl... yes

checking mpi.h usability... yes

checking mpi.h presence... yes

checking for mpi.h... yes

checking for MPI_Init... yes

checking for CC -dynamic option to support OpenMP... -mp  

checking matheval.h usability... no

checking matheval.h presence... no

checking for matheval.h... no

configure: WARNING: cannot enable __PLUMED_HAS_MATHEVAL   

checking time.h usability... yes

checking time.h presence... yes

checking for time.h... yes

checking for clock_gettime... yes

checking sys/time.h usability... yes

checking sys/time.h presence... yes

checking for sys/time.h... yes

checking for gettimeofday... yes

checking for dirent.h... (cached) yes

checking for readdir_r... yes

checking regex.h usability... yes

checking regex.h presence... yes

checking for regex.h... yes

checking for regcomp... yes

checking dlfcn.h usability... yes

checking dlfcn.h presence... yes

checking for dlfcn.h... yes

checking for dlopen... yes

checking execinfo.h usability... yes

checking execinfo.h presence... yes

checking for execinfo.h... yes

checking for backtrace... yes

checking zlib.h usability... yes

checking zlib.h presence... yes

checking for zlib.h... yes

checking for gzopen... no

checking for gzopen in -lz... yes

checking xdrfile/xdrfile_xtc.h usability... no

checking xdrfile/xdrfile_xtc.h presence... no

checking for xdrfile/xdrfile_xtc.h... no

configure: WARNING: cannot enable __PLUMED_HAS_XDRFILE

configure: Release mode, adding -DNDEBUG

configure: *** Special settings for dynamic libraries on Linux ***

configure: Dynamic library extension is 'so'

configure: LDSO and LDFLAGS need special flags

checking whether LDFLAGS can contain -rdynamic... no

configure: Using LDSO='CC -dynamic -shared'

configure: Using LDFLAGS=''

checking whether LDSO can create dynamic libraries... yes 

checking for doxygen... found

configure: WARNING: Doxygen version is <1.8. You might have problems in generating manuals

checking for dot... no

configure: WARNING: You will not be able to see diagrams in the manual

checking for xxd... found

checking whether a program can be run on this machine... yes

checking whether a program compiled with mpi can be run on this machine... no

configure: PLUMED seems to be configured properly!

configure: **************************

checking whether C++ objects can be grouped with ld -r... yes

configure: I will now check if C++ objects can be linked by C/Fortran compilers

configure: This is relevant if you want to use plumed patch --static on a non-C++ code

checking whether C can link a C++ object... yes

checking whether FORTRAN can link a C++ object... no

checking whether FORTRAN can link a C++ object with library -lstdc++... yes

configure: **** PLUMED will be installed using the following paths:

configure: **** prefix: /ccs/home/jandom/privateopt/plumed2

configure: **** exec_prefix: ${prefix}

configure: **** bindir: ${exec_prefix}/bin

configure: **** libdir: ${exec_prefix}/lib

configure: **** includedir: ${prefix}/include

configure: **** datarootdir: ${prefix}/share

configure: **** datadir: ${datarootdir}

configure: **** docdir: ${prefix}/share/doc/plumed

configure: **** htmldir: ${docdir}

configure: **** Executable will be named plumed

configure: **** You can change paths later using options to "make install"

configure: **** e.g. with "make install prefix=/path"

configure: **** PLUMED will be compiled using MPI

configure: WARNING: plumed executable will not run on this machine

configure: WARNING: unless you invoke it as 'plumed --no-mpi'

configure: WARNING: all command line tools are thus available as 'plumed --no-mpi name-of-the-tool'

configure: WARNING: e.g. 'plumed --no-mpi driver'

configure: WARNING: to patch an MD code use 'plumed --no-mpi patch'

configure: WARNING: (notice that MPI will be available anyway in the patched code)

configure: creating ./config.status

config.status: creating Makefile.conf

config.status: creating sourceme.sh

config.status: creating stamp-h



However, make crashes and burns 

*** Compiling all directories ***

make ../analysis ../bias ../blas ../cltools ../colvar ../config ../core ../function ../generic ../lapack ../main ../mapping ../molfile ../multicolvar ../reference ../secondarystructure ../setup ../tools ../vatom ../vesselbase ../wrapper

make[5]: Entering directory `/autofs/nccs-svm1_home1/jandom/privatesrc/plumed/plumed2/src/lib'

make -C ../analysis obj

make[6]: Entering directory `/autofs/nccs-svm1_home1/jandom/privatesrc/plumed/plumed2/src/analysis'

compiling  Analysis.cpp

pgc++-Error-Unknown switch: -MFAnalysis.d

make[6]: *** [Analysis.o] Error 1

make[6]: Leaving directory `/autofs/nccs-svm1_home1/jandom/privatesrc/plumed/plumed2/src/analysis'

make[5]: *** [../analysis] Error 2

make[5]: Leaving directory `/autofs/nccs-svm1_home1/jandom/privatesrc/plumed/plumed2/src/lib'

make[4]: *** [dirs] Error 2

make[4]: Leaving directory `/autofs/nccs-svm1_home1/jandom/privatesrc/plumed/plumed2/src/lib'

make[3]: *** [all] Error 2

make[3]: Leaving directory `/autofs/nccs-svm1_home1/jandom/privatesrc/plumed/plumed2/src/lib'

make[2]: *** [lib] Error 2

make[2]: Leaving directory `/autofs/nccs-svm1_home1/jandom/privatesrc/plumed/plumed2/src'

make[1]: *** [lib] Error 2

make[1]: Leaving directory `/autofs/nccs-svm1_home1/jandom/privatesrc/plumed/plumed2'

make: *** [all] Error 2


$ pgc++ -V


pgc++ 15.7-0 64-bit target on x86-64 Linux -tp istanbul 

The Portland Group - PGI Compilers and Tools

Copyright (c) 2015, NVIDIA CORPORATION.  All rights reserved. 

Giovanni Bussi

unread,
Jun 6, 2016, 11:59:07 AM6/6/16
to plumed...@googlegroups.com
Hi,

I think the Portland C++ does not support the -MF flag.

Is gcc installed on that system? In case so, try to set by hand the value of GCCDEP=g++ in the Makefile.conf file.

Let us know if this solves your issue.

Ciao!

Giovanni


--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users...@googlegroups.com.
To post to this group, send email to plumed...@googlegroups.com.
Visit this group at https://groups.google.com/group/plumed-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/plumed-users/ce0b8903-34ff-4223-949e-5d9d67d7ae53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jan...@gmail.com

unread,
Jun 8, 2016, 3:36:41 AM6/8/16
to PLUMED users
Hi Giovanni, 

Many thanks for the speedy reply, here is the final recipe for compiling Gromacs 5.1.2 with plumed 2.2-hrex on TITAN the Cray/GPU supercomputer. Disclaimer: it is possibly not the most performant build but it works: for one thing, I found it impossible to compile with the fftw provided by TITAN via modules, so here we build the FFTW from scratch. Comparing to default gromacs compilation on TITAN (which uses their native fftw), this has caused no performance change for my test system

1. Compile plumed2.2-hrex

module swap PrgEnv-pgi PrgEnv-gnu

module list 

Currently Loaded Modulefiles:

  1) eswrap/1.1.0-1.020200.1231.0           8) module_msg/0.1                        15) cray-libsci/13.2.0                    22) dvs/2.5_0.9.0-1.0502.2188.1.113.gem

  2) craype-network-gemini                  9) modulator/1.2.0                       16) udreg/2.3.2-1.0502.10518.2.17.gem     23) alps/5.2.4-2.0502.9774.31.12.gem

  3) gcc/4.9.0                             10) hsi/5.0.2.p1                          17) ugni/6.0-1.0502.10863.8.28.gem        24) rca/1.0.0-2.0502.60530.1.63.gem

  4) craype/2.4.2                          11) DefApps                               18) pmi/5.0.9-1.0000.10911.175.4.gem      25) atp/1.8.3

  5) cray-mpich/7.2.5                      12) site-aprun/1.0                        19) dmapp/7.0.1-1.0502.11080.8.74.gem     26) PrgEnv-gnu/5.2.82

  6) craype-interlagos                     13) aprun-usage/1.0                       20) gni-headers/4.0-1.0502.10859.7.8.gem

  7) lustredu/1.4                          14) altd/1.0                              21) xpmem/0.1-2.0502.64982.5.3.gem


** update Makefile.conf to set SOEXT to nothing: SOEXT=

./configure --prefix=$HOME/privateopt/plumed2 C=cc CXX=CC FC=ftn

make -j4; make install 


Copy and use the default modulefile generated by plumed. 


2. Patch and compile gromacs 


~/privateopt/plumed2/bin/plumed-patch -p<< EOF

5

EOF


module load cudatoolkit

module load cmake

module load fftw


cmake .. \

-DGMX_BUILD_MDRUN_ONLY=ON \

-DCMAKE_C_COMPILER=cc \

-DCMAKE_CXX_COMPILER=CC \

-DGMX_MPI=ON \

-DCMAKE_INSTALL_PREFIX=$HOME/privateopt/gromacs/5.1.2 \

-DBUILD_SHARED_LIBS=OFF \

-DGMX_BUILD_OWN_FFTW=ON \

-DCMAKE_SKIP_RPATH=YES \

-DGMX_GPU=ON \

-DGMX_SIMD=AVX_128_FMA \

-DGMX_BLAS_USER=/opt/cray/libsci/13.0.1/GNU/48/interlagos/lib/libsci_gnu_48_mpi_mp.a \

-DGMX_USE_RDTSCP=OFF \

-DGMX_DOUBLE=OFF


make -j4; make install

jan...@gmail.com

unread,
Jun 8, 2016, 4:22:38 AM6/8/16
to PLUMED users, jan...@gmail.com
Alas – I should point out that with either gromacs 4.6.7 or 5.1.2, i don't get 1.0 probabilites of exchanges between 2 identical tpr files. 

For optimal performance with a GPU nstlist (now 5) should be larger.

The optimum depends on your CPU and GPU resources.

You might want to try several nstlist values.

Changing nstlist from 5 to 40, rlist from 0.9 to 0.959


Replex is 1000, so nstlist divides that no problem... plumed 2.2-hrex branch ends with commit 63e6983befce37beb705e2e81e7b2837bae8cc84

- Jan

jan...@gmail.com

unread,
Jun 8, 2016, 9:59:16 AM6/8/16
to PLUMED users, jan...@gmail.com
Hi guys, 

Some additionally information, setting "-nb cpu" makes the problem go away. I'm starting to think that this is a bug in plumed/GPU code – this is a protein-membrane simulation with NPT. 

- Jan

Giovanni Bussi

unread,
Jun 8, 2016, 10:06:37 AM6/8/16
to plumed...@googlegroups.com
Hi Jan,

this is likely related to non-reproducibility of GROMACS calculations of total energy with GPUs. See

Notice that the error arising from this incorrect acceptance calculation is related to the fact that when energy is recomputed on the foreign replica it does not give exactly the same result. There is no way to fix this, but my feeling is that the error that you should expect in the final results is smaller than the one that you expect because of single precision. So, I think you should not worry about it.

Giovanni

jan...@gmail.com

unread,
Jun 13, 2016, 11:26:59 AM6/13/16
to PLUMED users
Hi Giovanni,

Yup, I considered that as a possibility but the behaviour is drastically different with and without the -hrex flag (with both -nb cpu and -nb auto), so that suggests a problem. -replex (1000) divides nstlist (40)

module add ~/privatemodules/gromacs/5.1.2_plumed2.2-hrex

cd $PBS_O_WORKDIR
export CRAY_CUDA_MPS=1

mpirun=`which aprun`
application=`which mdrun_mpi`

options="-nsteps 10000 -multi 2 -replex 1000 -v -plumed -maxh 24 -s tpr/topol.tpr -resethway -noconfout -pin on"
gpu_id=$(printf '0%.0s' {1..8})

$mpirun -n 16 -N 8 $application -gpu_id $gpu_id $options

grep dE_Term md1.log 
dplumed =  0.000e+00  dE_Term =  0.000e+00 (kT)
dplumed =  0.000e+00  dE_Term =  0.000e+00 (kT)
dplumed =  0.000e+00  dE_Term =  0.000e+00 (kT)
dplumed =  0.000e+00  dE_Term =  0.000e+00 (kT)
dplumed =  0.000e+00  dE_Term =  0.000e+00 (kT)

Change options to include hrex

options="-nsteps 10000 -multi 2 -replex 1000 -v -plumed -maxh 24 -s tpr/topol.tpr -resethway -noconfout -pin on -hrex"

grep dE_Term md1.log 
dplumed = -2.500e-02  dE_Term = -2.500e-02 (kT)
dplumed = -7.577e-02  dE_Term = -7.577e-02 (kT)
dplumed =  6.366e-02  dE_Term =  6.366e-02 (kT)
dplumed =  6.822e-03  dE_Term =  6.822e-03 (kT)
dplumed = -2.653e-02  dE_Term = -2.653e-02 (kT)

This, at least to me, is a little unexpected but maybe I'm doing something wrong. The simulation is NPT, exchange rate is of course not 1.0 (but close 0.99).

plumed.dat and tprs for replicas 0 and 1 are identical , plumed.dat files are empty
md5sum  plumed.*.dat tpr/*
354caff7d90bf99e29cd39a9a72895ee  plumed.0.dat
354caff7d90bf99e29cd39a9a72895ee  plumed.1.dat
d106d87d8dfbbe11598f26eace40e11a  tpr/topol0.tpr
d106d87d8dfbbe11598f26eace40e11a  tpr/topol1.tpr

Giovanni Bussi

unread,
Jun 13, 2016, 11:32:14 AM6/13/16
to plumed...@googlegroups.com
Hi.

I think an error of ~1e-02 kj/mol on the total energy is acceptable and likely a consequence of gromacs not being reproducible in computing energies.

Anyway, for paranoia, you can do the following check:

- compute energy with mdrun -rerun on an existing trr, using gpus
- repeat the same rerun several times
- compare the resulting edr (potential energy). I mean: the ones obtained from the reruns

You will be surprised (I was!!) that result is not always the same when you use gpus.

Giovanni


--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users...@googlegroups.com.
To post to this group, send email to plumed...@googlegroups.com.
Visit this group at https://groups.google.com/group/plumed-users.

jan...@gmail.com

unread,
Jun 13, 2016, 11:37:20 AM6/13/16
to PLUMED users
Okay, wait, maybe I don't get something. How's the energy computation different with and without -hrex flag, ie why is the -hrex flag making a difference when it shouldn't? Is it something like

if (hrex) sumAllPotentialEnergyTerms()
else sumOnlyTermsFromPlumedDat()

?

Otherwise agreed on the 1e-2 kJ/mol error being small and not a problem :)

Giovanni Bussi

unread,
Jun 13, 2016, 11:40:42 AM6/13/16
to plumed...@googlegroups.com
If you don't use "-hrex", gromacs assumes all tpr files to be identical. Thus, if temperature and other stuff is the same, dE=0

If you use "-hrex", gromacs also accepts different tpr files. In this case, he computes energy on the neighboring replica so as to know if it is different. Since recomputations of energy give slightly inconsistent results, dE turns to be slightly different from zero

Giovanni


--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users...@googlegroups.com.
To post to this group, send email to plumed...@googlegroups.com.
Visit this group at https://groups.google.com/group/plumed-users.
Reply all
Reply to author
Forward
0 new messages