ASE CP2K interface

360 views
Skip to first unread message

Geng Sun

unread,
Dec 4, 2018, 12:08:10 PM12/4/18
to cp2k
Dear CP2K users,

 In the past several weeks, I frequently faced a problem when I use the CP2K -ASE interface. 

 The calculations frequently got stuck during the calculations. 

1) Firstly I switched on the debug=True option in the ASE-CP2K calculator and I found that the calculation always gets stuck at a line with *END after sending the positions to the subroutine cp2k_shell.popt  (I printed the information to the standard error, so they are not buffered)

2) Then, I modified the cp2k_shell.F to print a lot of "labels". Then I found that the code may get stuck at the line of "READ (*,*,iostat=iostat) pos" like below. I can always get "begin to read pos" in the standard error, but I can not reach "LABEL-1".

           WRITE(0,*) "begin to read pos"
           CALL m_flush
(0)
           READ
(*,*,iostat=iostat) pos
           WRITE
(0,*) "LABEL-1"
           CALL m_flush
(0)




I attached the modified cp2k_shell.F and the standard out/error generated by the slurm batch system with this post. You can see that the program get stuck at the second optimization step just after sending the positions.

I greatly appreciate your suggestions for fixing this confusing bugs.

Thanks in advance.

Geng


cp2k_shell.F
opt.py
slurm.out

Ole Schütt

unread,
Dec 4, 2018, 2:05:43 PM12/4/18
to cp...@googlegroups.com
Hi Geng,

this sounds indeed like a buffering issue. Could you once try to add a
flush() on the python side.

Basically add the following line after cp2k.py:498.

self._child.stdin.write(line + '\n')
+ self._child.stdin.flush()

This is probably quite inefficient. So, if it works I'll add some logic
to flush only when recv() follows a send().


-Ole
> --
> You received this message because you are subscribed to the Google
> Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cp2k+uns...@googlegroups.com.
> To post to this group, send email to cp...@googlegroups.com.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.

Geng Sun

unread,
Dec 4, 2018, 3:27:25 PM12/4/18
to cp2k
Hello Ole,

Thank you very much for your reply:
I changed the code as you suggested (below), but the problem is still present in the test.

Best

Geng
    def send(self, line):
       
"""Send a line to the cp2k_shell"""
       
assert self._child.poll() is None  # child process still alive?
       
if self._debug:
           
#print('Sending: ' + line)
            sys
.stderr.write("Sending: {}\n".format(line))
            sys
.stderr.flush()

       
if self.version < 2.1 and len(line) >= 80:
           
raise Exception('Buffer overflow, upgrade CP2K to r16779 or later')
       
assert(len(line) < 800)  # new input buffer size
       
self.isready = False
       
self._child.stdin.write(line + '\n')
       
self._child.stdin.flush()




在 2018年12月4日星期二 UTC-8上午11:05:43,Ole Schütt写道:

Ole Schütt

unread,
Dec 4, 2018, 3:59:20 PM12/4/18
to cp...@googlegroups.com
Hi Geng,

are you using MPI? Then this is probably where the buffering happens.
Depending on which MPI implementation you are using there might be a way
to tweak its stdin/out forwarding.

Out of curiosity, how many atoms does your system have? Maybe the
Fortran side simple tries to read too many values?

-Ole
>>> Visit this group at https://groups.google.com/group/cp2k [1].
>>> For more options, visit https://groups.google.com/d/optout [2].
>
> --
> You received this message because you are subscribed to the Google
> Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cp2k+uns...@googlegroups.com.
> To post to this group, send email to cp...@googlegroups.com.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.
>
>
> Links:
> ------
> [1] https://groups.google.com/group/cp2k
> [2] https://groups.google.com/d/optout

Geng Sun

unread,
Dec 4, 2018, 4:20:19 PM12/4/18
to cp2k
Hello Ole,

Yes, I am using MPI. I am running CP2K on cori, which is  a Cray system.
I am not very sure, but it seems that the MPI is mpich, because the cray-mpich module is loaded when I run CP2K.

I have 331 atoms in the calculations. I tried to reduce the position string
self._shell.send('%.18e %.18e %.18e' % tuple(pos))
to
self._shell.send('%.8e %.8e %.8e' % tuple(pos))

I assumed this may avoid possible deadlock of the PIPE, but it does not work in the end.

Best

Geng


在 2018年12月4日星期二 UTC-8下午12:59:20,Ole Schütt写道:

Geng Sun

unread,
Dec 4, 2018, 7:29:58 PM12/4/18
to cp...@googlegroups.com
Hello Ole,

I made another change to the cp2k_shell.F: If the code reads individual positions rather than read them as a whole (like the code below), the program will hang at a point where only part of the positions are read in. (I have about 331 atoms, where normally about 250~270 atoms are read)
           !READ (*,*,iostat=iostat) pos
           
!READ (*,*,iostat=iostat) ((pos(i,j),j=1,3),i=1,n_atom)
           
!READ (*,*,iostat=iostat) (pos(i),i=1,n_atom2)
           DO i
=1,n_atom
              m
=(i-1)*3+1
              n
=(i-1)*3+3
              READ
(*,*) (pos(j),j=m,n)
              WRITE
(0,*) "atom=", i, pos(m),pos(m+1),pos(n)
              CALL m_flush
(0)
           
END DO
           WRITE
(0,*) "LABEL-1"
           CALL m_flush
(0)


This is the tail of the standard error output. Where, only 252 atoms are printed.(the Sending information from cp2k.py is complete),
Sending: 1.576752302496963409e+01 5.503432604574527431e+00 5.202333832524260515e+00
 atom
=         243   16.662443015797379        1.3742961283890502        3.8763624242357171
Sending: 1.517292733648857705e+01 4.201499193055327375e+00 7.381004145064225419e+00
 atom
=         244   16.662391280884478        1.3742459215953395       0.65239139651875000
Sending: 1.347251652964376234e+01 6.982201222299000420e+00 7.885207051660287902e-01
Sending: 1.270684522088338753e+01 5.387934046501004381e+00 7.380386770425433340e+00
 atom
=         245   16.705918168029598        1.3963594659850651        6.5096981880927025
Sending: 1.121215876405216250e+01 5.455761976608218156e+00 7.885247030917259536e-01
Sending: 1.264981900140291593e+01 5.588193026760487570e+00 2.971586044180265063e+00
 atom
=         246   14.298521645540390        2.7461084273618193        7.4746197668438166
Sending: 1.340077937297150790e+01 4.053634960891928429e+00 7.885301196183434058e-01
Sending: 1.020485628515282706e+01 6.913382900918708884e+00 7.383907976352638514e+00
 atom
=         247   14.282063640676757        2.7485874649238795        1.6804517810327422
Sending: 1.041632697503036731e+01 6.865228814815203862e+00 2.971592361475913435e+00
 atom
=         248   14.282106984202624        2.7485910114075263        4.2975637697642934
Sending: 9.521439090500260605e+00 8.245776638331120623e+00 6.218688347483974255e+00
Sending: 9.521360909499742675e+00 8.245771920855188952e+00 1.955237846516025613e+00
 atom
=         249   15.019369986694146        4.0380913886689607        2.9715839946517644
Sending: 1.190174301579737914e+01 9.620070407982204586e+00 3.876362424235717086e+00
Sending: 1.190169128088448147e+01 9.620020201188493658e+00 6.523913965187499997e-01
 atom
=         250   15.914380998597084        2.6575812528326686        5.2023401498199107
Sending: 1.189999109458993409e+01 9.622398083668830537e+00 6.496868766581632926e+00
 atom
=         251   15.925180013305853        8.4795751127617489E-002   5.2023421993484060
Sending: 9.516001797608472756e+00 1.099443356384864323e+01 7.482068722341200129e+00
 atom
=         252   18.147873024969634        1.3805454647779496        5.2023338325242605
Sending: 9.521363640676758777e+00 1.099436174451703430e+01 1.680451781032742176e+00
Sending: 9.521406984202624102e+00 1.099436529100068149e+01 4.297563769764293440e+00
Sending: 1.025866998669414798e+01 1.228386566826211634e+01 2.971583994651764371e+00
---
some lines are deleted here
---
Sending: 3.399261249766690085e+00 3.516835622123011262e+00 1.133940075852044416e+01
Sending: 5.923947421334220920e+00 4.113983946240092671e+00 1.115507609006755807e+01
Sending: *END


Geng


在 2018年12月4日星期二 UTC-8下午1:20:19,Geng Sun写道:

Maxime Van den Bossche

unread,
Dec 4, 2018, 7:52:26 PM12/4/18
to cp...@googlegroups.com
Dear Geng,

I presume you also made this thread earlier, right?
As I wrote there, I had similar problems when using IntelMPI
but not with OpenMPI (no experience with MPICH though).

Best,
Maxime

Geng Sun

unread,
Dec 4, 2018, 10:40:36 PM12/4/18
to cp2k
Hello Maxime,

Yes, I made that thread and thank you for your previous suggestions.
As you suggested, I was trying to compile CP2K with different compilers and different libraries. I even tried to compile all the necessary libraries from scratch (from compiling GCC). Unfortuantely, I still can not get a working CP2K after tens of attempts.  The only way to get a working CP2K is to use the ARCH file from the cluster adminstators, where I don't have a lot of options to alter the compilers and mpi version.
The cluster I am using is a Cray system and it is a little bit unfamiliar to me. This may also prevent me to solve this problem.

At the same time, I can easily to build a working CP2K binary on other machines, even with intelmpi. So I thought there should be other reasons. Therefore I decided to dig the source of the bug and starts this new thread.



Geng

在 2018年12月4日星期二 UTC-8下午4:52:26,Maxime Van den Bossche写道:

Geng Sun

unread,
Dec 7, 2018, 3:52:04 PM12/7/18
to cp2k
Hello,

Finally, I found the possible reasons for this problem:

It seems that MPI will only read parts of the stdin, only part of the data in the PIPE is consumed.

THE MPICH2 user guide saied: "the redirection of large amounts of data to stdin is discouraged, and may cause unexpected results"

So I have to write the coordiates to a file in cp2k.py and read the positions from cp2k_shell.F instead of the stdin (codes are shown below)

Now everything loosk correct.

Geng
        IF (para_env%mepos==para_env%source) THEN
           
!READ (*,*,iostat=iostat) n_atom2
           
!IF (iostat/=0) CPABORT('setpos read n_atom')
           
!IF (n_atom2/=SIZE(pos)) THEN
           
!   CALL my_assert(.FALSE.,'setpos invalid number of atoms',failure)
           
!   DO i=1,n_atom
           
!      READ(*,'(a)',iostat=iostat) cmdStr
           
!      CALL compress(cmdStr,full=.TRUE.)
           
!      CALL uppercase(cmdStr)
           
!      IF (cmdStr=='*END') EXIT
           
!   END DO
           
!   GOTO 10
           
!END IF
           
!READ (*,*,iostat=iostat) pos
           
!IF (iostat/=0) CPABORT('setpos read coord')
           inquire
(unit=201,opened=unitalive)
           
if (unitalive) CPABORT('UNIT 201 is being used')
           inquire
(file="CP2K_POSITIONS",exist=filepresence)
           
if (.not. filepresence) CPABORT('FILE CP2K_POSITIONS NOT EXIST')                 open(201,action='READ',file="CP2K_POSITIONS",iostat=iostat,status='OLD',form='FORMATTED',ACCESS='SEQUENTIAL')
           
if (iostat/=0) CPABORT('read CP2K_POSITIONS')
           READ
(201,*,iostat=iostat) n_atom2
           IF
(iostat/=0) CPABORT('setpos read n_atom2')
           IF
(n_atom2/=SIZE(pos)) THEN
              CALL my_assert
(.FALSE.,'setpos invalid number of atoms',failure)
              DO i
=1,n_atom
                 READ
(201,'(a)',iostat=iostat) cmdStr
                 CALL compress
(cmdStr,full=.TRUE.)
                 CALL uppercase
(cmdStr)
                 IF
(cmdStr=='*END') EXIT
             
END DO
              GOTO
10
           
END IF
           READ
(201,*,iostat=iostat) pos
           CLOSE
(201)
           IF
(iostat/=0) CPABORT('setpos read coord')
           pos
(:) = pos(:)/pos_fact
           READ
(*,'(a)',iostat=iostat) cmdStr
           CALL compress
(cmdStr,full=.TRUE.)
           CALL uppercase
(cmdStr)
           CALL my_assert
(cmdStr=='*END',' missing *END',failure)
       
END IF






在 2018年12月4日星期二 UTC-8下午12:59:20,Ole Schütt写道:
Hi Geng,
Reply all
Reply to author
Forward
0 new messages