CP2K running slow

611 views
Skip to first unread message

Tanmoy Paul

unread,
Oct 28, 2014, 10:10:44 AM10/28/14
to cp...@googlegroups.com

 Hello CP2K users,
                            I am new to this field . I have installed cp2k using "sudo apt-get install cp2k" . I am trying to run a QM/MM minimization for 40000 steps using command "mpirun -np 8 cp2k.popt -i inputfile -o outputfile ". It does not show any error message but the problem is it is terribly slow ( around 100 steps per day ) .My system is quite large (~48000 atoms are there ) and the QM-selection contains 150 atoms .I can understand that something has gone wrong with cp2k installation or my understanding about how it works, but I don't know what's going wrong . This may really be an aw-coward question to ask and I am really sorry about that . Can I have your kind suggestions on this topic.
                                                                                                 Regards
                                                                                                 Tanmoy         

Samuel Andermatt

unread,
Oct 28, 2014, 10:14:43 AM10/28/14
to cp...@googlegroups.com
Could you post your input file, knowing which parameters you chose (e.g. for the PW cutoff and the basis set) would help.

Tanmoy Paul

unread,
Oct 28, 2014, 10:20:07 AM10/28/14
to cp...@googlegroups.com
 Thanks Samuel for a quick reply. Here is the input file.............
&FORCE_EVAL
 METHOD QMMM
  &DFT
   CHARGE 0
   BASIS_SET_FILE_NAME ../BASIS_SET
   BASIS_SET_FILE_NAME ../BASIS_MOLOPT
   POTENTIAL_FILE_NAME  ../POTENTIAL
  
   &MGRID
    COMMENSURATE
    CUTOFF 250
   &END MGRID
   &QS
    EXTRAPOLATION PS
    EXTRAPOLATION_ORDER 3
    METHOD GPW
    PW_GRID NS-FULLSPACE
   &END QS
   &SCF
    SCF_GUESS ATOMIC
    MAX_SCF 300
    EPS_SCF 1.0E-06
    &OUTER_SCF
     EPS_SCF 1.0E-06
     MAX_SCF 20
    &END OUTER_SCF
    &OT
     PRECONDITIONER FULL_ALL
     ENERGY_GAP 0.001
     MINIMIZER CG
    &END OT
   &END SCF
   &POISSON               
    POISSON_SOLVER PERIODIC
    PERIODIC XYZ            
   &END POISSON            
   &XC
    &XC_GRID
     XC_SMOOTH_RHO NN10
     XC_DERIV SPLINE2_SMOOTH
    &END XC_GRID
    &XC_FUNCTIONAL BLYP
    &END XC_FUNCTIONAL
   &END XC
  &END DFT
  &MM
   &FORCEFIELD
    PARMTYPE CHM
    PARM_FILE_NAME par_all27_prot_lipid.inp
      EI_SCALE14 1.0
      VDW_SCALE14 1.0
      &SPLINE
        EMAX_SPLINE 5.00000000E-01
        RCUT_NB 15.0        
      &END SPLINE
    &END FORCEFIELD
    &POISSON
     &EWALD
      EWALD_TYPE PME
      ALPHA 0.20
      GMAX 100
      RCUT 15.0
     &END EWALD
    &END POISSON
  &END MM
  &QMMM
    &CELL
      ABC 20.0 20.0 20.0
    &END CELL
    E_COUPL GAUSS
    &INTERPOLATOR
      EPS_R 1.0e-15
      EPS_X 1.0e-15
      MAXITER 100
    &END INTERPOLATOR
    &PERIODIC
     GMAX 1.0
     &MULTIPOLE ON
      ANALYTICAL_GTERM TRUE
      NGRIDS 60 60 60
      RCUT 50.0
     &END MULTIPOLE
    &END PERIODIC
    &PRINT
      &PERIODIC_INFO
      &END PERIODIC_INFO
      &POTENTIAL
      &END POTENTIAL
    &END PRINT
    &QM_KIND H
      MM_INDEX 43 42 46 48 56 58 54 52 73 74 77 79
      MM_INDEX 82 84 86 941 942 946 947 967 969 
      MM_INDEX 971 962 963 1014 1015 1010 1009 1398
      MM_INDEX 1399 1393 1394 1427 1428 1436 1430 
      MM_INDEX 1433 1473 1464 1465 1467 1470 1820 
      MM_INDEX 1818 1816 1811 1812 3040 3038 3054
      MM_INDEX 3052 25438 25437 41848 41847 11184 
      MM_INDEX 11185 45760 45759 10410 10411 4078
      MM_INDEX 4077 4093 4092 4075 4074
    &END QM_KIND
    &QM_KIND O
      MM_INDEX 81 944 1012 1396 3039 3053 11183 
      MM_INDEX 41846 25436 10409 4076 4091 4073
      MM_INDEX 45758
    &END QM_KIND
    &QM_KIND C     
      MM_INDEX 41 44 45 49 55 57 53 51 50 72 75 76
      MM_INDEX 78 80 85 83 943 940 961 965 970 966
      MM_INDEX 1008 1011 1395 1392 1426 1431 1432 
      MM_INDEX 1435 1472 1468 1469 1815 1819 1814 
      MM_INDEX 1810 3037 3051 1463
    &END QM_KIND
    &QM_KIND N
      MM_INDEX 47 945 968 964 1013 1397 1434 1429 
      MM_INDEX 1466 1817 1813 1471
    &END QM_KIND
    &QM_KIND ZN
     MM_INDEX 4072
    &END QM_KIND
    &LINK
     MM_INDEX 39
     QM_INDEX 41
     QM_KIND H
     ALPHA_IMOMM 1.322
    &END LINK
    &LINK
     MM_INDEX 70
     QM_INDEX 72
     QM_KIND H
     ALPHA_IMOMM 1.363
    &END LINK
    &LINK
     MM_INDEX 938
     QM_INDEX 940
     QM_KIND H
     ALPHA_IMOMM 1.369
    &END LINK
    &LINK
     MM_INDEX 959
     QM_INDEX 961
     QM_KIND H
     ALPHA_IMOMM 1.423
    &END LINK
    &LINK
     MM_INDEX 1006
     QM_INDEX 1008
     QM_KIND H
     ALPHA_IMOMM 1.407
    &END LINK
    &LINK
     MM_INDEX 1389
     QM_INDEX 1392
     QM_KIND H
     ALPHA_IMOMM 1.290
    &END LINK
    &LINK
     MM_INDEX 1424
     QM_INDEX 1426
     QM_KIND H
     ALPHA_IMOMM 1.411
    &END LINK
    &LINK
     MM_INDEX 1461
     QM_INDEX 1463
     QM_KIND H
     ALPHA_IMOMM 1.286
    &END LINK
    &LINK
     MM_INDEX 1808
     QM_INDEX 1810
     QM_KIND H
     ALPHA_IMOMM 1.454
    &END LINK
    &LINK
     MM_INDEX 3041
     QM_INDEX 3037
     QM_KIND H
     ALPHA_IMOMM 1.373
    &END LINK
    &LINK
     MM_INDEX 3035
     QM_INDEX 3037
     QM_KIND H
     ALPHA_IMOMM 1.391
    &END LINK
    &LINK
     MM_INDEX 3055
     QM_INDEX 3051
     QM_KIND H
     ALPHA_IMOMM 1.378
    &END LINK
    &LINK
     MM_INDEX 3049
     QM_INDEX 3051
     QM_KIND H
     ALPHA_IMOMM 1.387
    &END LINK
  &END QMMM
  &SUBSYS
    &CELL
      ABC 77.18 74.44 85.17
      PERIODIC XYZ
    &END CELL
    &KIND H
      BASIS_SET DZVP-GTH-BLYP
      POTENTIAL GTH-BLYP-q1
    &END KIND
    &KIND O
      BASIS_SET DZVP-GTH-BLYP
      POTENTIAL GTH-BLYP-q6
    &END KIND
    &KIND C
      BASIS_SET DZVP-GTH-BLYP
      POTENTIAL GTH-BLYP-q4
    &END KIND
    &KIND N
      BASIS_SET DZVP-GTH-BLYP
      POTENTIAL GTH-BLYP-q5
    &END KIND
    &KIND ZN
      BASIS_SET DZVP-MOLOPT-SR-GTH
      POTENTIAL GTH-BLYP-q12
    &END KIND
    &TOPOLOGY
     CONN_FILE_FORMAT UPSF
     CONN_FILE_NAME HCA_in_final.psf
     COORD_FILE_FORMAT PDB
     COORD_FILE_NAME HCA_no_co2_cp2k.pdb
    &END TOPOLOGY
  &END SUBSYS
&END FORCE_EVAL
&GLOBAL
  PROJECT HCA_qmmm_min_cp2k_no_co2
  PRINT_LEVEL MEDIUM
  RUN_TYPE MD
&END GLOBAL
&MOTION
  &MD
    ENSEMBLE NVE
    STEPS 50000
    TIMESTEP 0.5
    TEMPERATURE 310
   &THERMOSTAT
    REGION GLOBAL
    TYPE NOSE
    &NOSE
    LENGTH 4
    YOSHIDA 9
    TIMECON 1000
    &END NOSE
   &END THERMOSTAT
   &PRINT
      &ENERGY
        &EACH
          MD 1
        &END EACH
      &END ENERGY
   &END PRINT
  &END MD
  &PRINT
    &RESTART
      &EACH
        MD 10000
      &END EACH
    &END RESTART
    &RESTART_HISTORY OFF
    &END RESTART_HISTORY
    &TRAJECTORY SILENT
      FORMAT DCD
      &EACH
        MD 100
      &END EACH
    &END TRAJECTORY
    &VELOCITIES OFF
          LOG_PRINT_KEY T
          FORMAT XYZ
          UNIT angstrom
          &EACH
            MD 100
          &END EACH
          ADD_LAST NUMERIC
    &END VELOCITIES
    &FORCES OFF
    &END FORCES
  &END PRINT
&END MOTION

Samuel Andermatt

unread,
Oct 29, 2014, 5:45:07 AM10/29/14
to cp...@googlegroups.com
Sadly I do not know what exactly the reason for the performance issue is. An additional method to figure out the problem would be to make a short run and at the end of the output you will see the timings. It will give some information about the performance bottlenecks.

hut...@chem.uzh.ch

unread,
Oct 29, 2014, 6:02:59 AM10/29/14
to cp...@googlegroups.com
Hi

yes, check the timing information at the end of a short run.
This will give some information on possible problems.

You should also change the EWALD method from PME to SPME.

regards

Juerg
--------------------------------------------------------------
Juerg Hutter                         Phone : ++41 44 635 4491
Institut für Chemie C                FAX   : ++41 44 635 6838
Universität Zürich                   E-mail: hut...@chem.uzh.ch
Winterthurerstrasse 190
CH-8057 Zürich, Switzerland
---------------------------------------------------------------

-----cp...@googlegroups.com wrote: -----
To: cp...@googlegroups.com
From: Samuel Andermatt
Sent by: cp...@googlegroups.com
Date: 10/29/2014 10:45AM
Subject: [CP2K:5788] Re: CP2K running slow

Sadly I do not know what exactly the reason for the performance issue is. An additional method to figure out the problem would be to make a short run and at the end of the output you will see the timings. It will give some information about the performance bottlenecks.

--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
To post to this group, send email to cp...@googlegroups.com.
Visit this group at http://groups.google.com/group/cp2k.
For more options, visit https://groups.google.com/d/optout.

Tanmoy Paul

unread,
Oct 29, 2014, 6:07:46 AM10/29/14
to cp...@googlegroups.com
Thanks Samuel and Juerg for your inputs .

Tanmoy Paul

unread,
Oct 30, 2014, 1:30:20 AM10/30/14
to cp...@googlegroups.com
Dear CP2K users,
                           I checked the output files and to be frank I have understood a very little about it . Last few lines of the output file says...

-------------------------------------------------------------------------------
 ----                             MULTIGRID INFO                            ----
 -------------------------------------------------------------------------------
 count for grid        1:         317951          cutoff [a.u.]          125.00
 count for grid        2:         302978          cutoff [a.u.]           31.25
 count for grid        3:          76551          cutoff [a.u.]            7.81
 count for grid        4:            684          cutoff [a.u.]            1.95
 total gridlevel count  :         698164

 -------------------------------------------------------------------------------
 -                                                                             -
 -                         MESSAGE PASSING PERFORMANCE                         -
 -                                                                             -
 -------------------------------------------------------------------------------

 ROUTINE             CALLS  TOT TIME [s]  AVE VOLUME [Bytes]  PERFORMANCE [MB/s]
 MP_Group                6         0.000
 MP_Bcast             1834         0.213            5713507.            49195.17
 MP_Allreduce        10132       130.663             142451.               11.05
 MP_Sync            101392         0.269
 MP_Alltoall          9590       652.514           46235404.              679.52
 MP_SendRecv         15776        87.207             294912.               53.35
 MP_ISendRecv        23730         0.566            4071645.           170706.96
 MP_Wait             24093       402.879
 MP_ISend              275         0.038             728596.             5272.74
 MP_IRecv              264         0.001             746667.           197120.00
 MP_Recv             76412         0.793               3234.              311.65
 -------------------------------------------------------------------------------


-------------------------------------------------------------------------------
 -                                                                             -
 -                                T I M I N G                                  -
 -                                                                             -
 ------------------------------------------------------------------------------- 
 SUBROUTINE                       CALLS  ASD                 SELF TIME                      TOTAL TIME
                                                                     AVERAGE     MAXIMUM  AVERAGE     MAXIMUM
 CP2K                                               1  1.0          1.346             1.804     9476.129      9476.130
 qs_mol_dyn_low                              1  2.0          0.095             0.314     9465.241      9465.854
 qs_forces                                       11  3.9          0.027             0.040     8706.050       8706.055
 qs_energies_scf                             11  4.9          0.001             0.002     8569.698       8569.705
 scf_env_do_scf                              11  5.9          0.004             0.004     8453.922       8453.929
 scf_env_do_scf_inner_loop          639  6.5          0.218             0.273     8289.159       8289.172
 velocity_verlet                                 10  3.0          0.894             3.202     5147.091       5147.116
 qs_ks_build_kohn_sham_matrix   650  8.5        12.649           13.425     4123.288       4123.607
 qs_rho_update_rho                      650  7.5          0.026              0.031    4077.287       4077.571
 calculate_rho_elec                      1300  8.5      592.558          649.730    4077.261       4077.552
 qs_ks_update_qs_env                  651  7.5          0.014              0.027    3995.187       3995.463
 fft_wrap_pw1pw2                       9590 11.0         0.464              0.476    3630.990       3654.319
 fft_wrap_pw1pw2_130               3630 11.8     366.371          376.203    3397.033       3418.254
 density_rs2pw                             1300  9.5          0.135             0.159     3331.163       3383.539
 fft3d_ps                                      9590 13.0    1685.880       1712.933     2797.864       2836.568
 qs_vxc_create                              650  9.5           0.047             0.054     1589.974        1590.199
 sum_up_and_integrate                336  9.4         46.390           55.122     1355.401        1368.971
 integrate_v_rspace                       672 10.4      301.274         329.909     1309.006        1315.670
 xc_rho_set_and_dset_create        650 11.5      202.175         205.516     1171.640        1217.797
 potential_pw2rs                             672 11.4          0.450             0.575     1007.402        1032.221
 xc_vxc_pw_create                         336 10.4      118.323         123.799     1030.731        1030.957
 rs_pw_transfer                            7954 11.1           0.263             0.272       886.089          941.457
 qs_ks_ddapc                                650   9.5           0.014             0.016       682.186          683.086
 rs_pw_transfer_RS2PW_130      1311 11.5      529.773         582.666       529.773          582.666
 x_to_yz                                        3696 14.5      561.963         566.914       561.963          566.914
 yz_to_x                                        5894 13.7      549.928         566.657       549.928          566.657
 xc_exc_calc                                    314 10.5       30.516           71.193        559.196         559.200
 xc_functional_eval                        1300 12.5         0.114             0.123         501.124        543.504
 pw_nn_compose_r                       7920 12.1     443.318         502.878         443.318        502.878
 cp_dbcsr_multiply_d                    18824 11.3        0.056             0.060         440.552        455.131
 dbcsr_multiply_anytype                18824 13.3       6.863             8.968         438.921        453.471
 qmmm_forces                                    11  3.9        5.057             7.224         418.833        418.835
 qmmm_forces_with_gaussian            11  4.9        8.177             8.259         402.521        404.942
 qs_scf_loop_do_ot                           639  7.5        0.006             0.010         397.926        398.436
 cp_ddapc_apply_CD                       650 10.5      65.894           68.079         357.733        357.740
 ot_scf_mini                                      639  8.5         0.129             0.142         357.246        357.589
 qmmm_force_with_gaussian_low      11  5.9        0.000              0.000        345.037        345.171
 lyp_lsd_eval                                      650 13.5  314.342          343.915        314.342         343.915
 qmmm_forces_with_gaussian_LG      11  7.9   340.844          340.922        340.844         340.922
 qmmm_forces_gaussian_low_R          11  6.9      0.000              0.000        340.844         340.922
 dbcsr_mm_cannon_multiply           18824 14.3   10.537            23.151        264.680        278.647
 qmmm_el_coupling                              11  3.9      0.005              0.007        272.055        272.925
 rs_pw_transfer_PW2RS_130             683 13.3  263.671         264.213        263.671        264.213
 qmmm_elec_with_gaussian                 11  4.9     10.833           11.870        259.977        262.493
 pw_scatter_p                                   3696 13.5   250.884         255.081        250.884        255.081
 pw_axpy                                        11933  9.6    219.576         219.933        219.576        219.933
 fft_wrap_pw1pw2_40                      1972 12.2     15.362           15.579        211.444        216.042
 cp_dbcsr_mult_NS_NR                   2831 11.3       0.015             0.017        202.511        214.845
 pw_zero                                         12765 10.0   211.188        214.613         211.188        214.613
 qmmm_elec_with_gaussian_low         11  5.9        0.000            0.000         203.561        204.576
 pw_copy                                          5587 11.3   197.645        200.647         197.645        200.647
 qmmm_elec_with_gaussian_LG          11  7.9    199.115        200.356         199.115        200.356
 qmmm_elec_gaussian_low_R             11  6.9        0.000            0.000         199.115        200.356
 pw_poisson_solve                              661  9.4    115.353        117.174         200.327        200.345
 pw_gather_p                                    5894 12.7   197.867        200.180         197.867        200.180
 xb88_lsd_eval                                    650 13.5   186.668        199.483         186.668        199.483
 ot_mini                                               639  9.5        0.017            0.022         198.207        198.562
 -------------------------------------------------------------------------------

  **** **** ******  **  PROGRAM ENDED AT                 2014-10-30 00:27:35.638
 ***** ** ***  *** **   PROGRAM RAN ON                                      user
 **    ****   ******    PROGRAM RAN BY                                      user
 ***** **    ** ** **   PROGRAM PROCESS ID                                 16608
  **** **  *******  **  PROGRAM STOPPED IN /home/user/cp2k_run/cp2k_input/for_co
                                           2_HCA

Any idea ? And one more added question to it........... Where can I find the file formats of the cp2k output files?
                                                                                                            Regards
                                                                                                             Tanmoy

Samuel Andermatt

unread,
Oct 30, 2014, 5:04:16 AM10/30/14
to cp...@googlegroups.com
The fft's take quite a bit of time. If your computer has an Nvidia GPU you could move the FFT's to it, which might help (-D__PW_CUDA flag and link in -lcufft -lcublas).

Michael Banck

unread,
Oct 30, 2014, 5:59:16 AM10/30/14
to cp...@googlegroups.com
On Tue, Oct 28, 2014 at 07:10:43AM -0700, Tanmoy Paul wrote:
> I am new to this field . I have installed cp2k
> using "sudo apt-get install cp2k" .

So you are using a prepackaged version. Which distribution, which
version of that distribution and on which architecture?

The packaged versions have to run on as many configurations as possible,
so are not highly optimized. It might make sense to look into compiling
CP2K yourself if the performance is not acceptable.


Michael

Samuel Andermatt

unread,
Oct 30, 2014, 6:00:09 AM10/30/14
to cp...@googlegroups.com
Another question, does they Keyword COMMENSURATE in the MGRID impact your performance (positive or negative).

Teodoro Laino

unread,
Oct 30, 2014, 9:05:42 AM10/30/14
to cp...@googlegroups.com
COMMENSURATE is mandatory for the type of job he is running.

On 30 Oct 2014, at 11:00, Samuel Andermatt <samuel.a...@student.ethz.ch> wrote:

Another question, does they Keyword COMMENSURATE in the MGRID impact your performance (positive or negative).

Teodoro Laino

unread,
Oct 30, 2014, 9:28:13 AM10/30/14
to cp...@googlegroups.com
I cannot see the point..  

-) First you are not running a minimization but simply an MD (at least this is what I see from your input and timings)

-) You are running on 8 mpi task.
-) Your system contains 150 QM atoms
-) Your QM cell is 20 Ang cubic
-) The reported timing is for 10 MD steps.

For 10 MD steps you get 9476 seconds overall. A good part of it  (approximately > 3000 secs) are for the first step (difficult to be sure without a more extended output).
This means that at regime you can do 1 MD step in 550 seconds.

What is the problem? Does it look too slow? 
Why don’t you increase # MPI task? Your system definitely is not made for a workstation. You should run that on more than just 8 tasks.

Criticism:

-)  a cutoff of 250 with BLYP… very much on the very low side..

-) This setup:

  SCF_GUESS ATOMIC
    MAX_SCF 300
    EPS_SCF 1.0E-06
    &OUTER_SCF
     EPS_SCF 1.0E-06
     MAX_SCF 20
    &END OUTER_SCF

The 300 in the MAX_SCF is difficult to understand. The idea is that you do a certain number of steps (few tens) and then you reset with a new PC.
Try to decrease 300 to something more meaningful, like 30. You’ll also definitely notice an improvement in terms of time to solution.

Other things, like the setup of the PERIODIC QMMM section will affect performance, but they will pop up in the timing only when you arrive at the scaling limit. 8 is very far from being the scaling limit for this job.

Final comment:

-) if you are a beginner I would try with something more simple.. You combined, QS, QMMM and the periodic version of it.. Even for experts all this s**t requires some careful tuning, before going in production.

Regards,
Teo
Reply all
Reply to author
Forward
0 new messages