CI-NEB calculation: crashes

331 Aufrufe
Direkt zur ersten ungelesenen Nachricht

Jörg Saßmannshausen

ungelesen,
27.02.2017, 17:23:0927.02.17
an cp2k
Dear all,

I am trying to do a CI-NEB calculation but after the first step the calculation
crashed which this error message:

NEB| Building initial set of coordinates. END

*******************************************************************************
BAND TYPE = CI-
NEB
BAND TYPE OPTIMIZATION =
SD
STEP NUMBER =
0
RMSD DISTANCE DEFINITION =
T
NUMBER OF NEB REPLICA =
5
DISTANCES REP = 9.750661 9.750661 9.750661
9.750661
ENERGIES [au] = -648.476382 -647.620195 -646.701277
-647.623017
-648.424927
BAND TOTAL ENERGY [au] =
-3238.84579812863058
*******************************************************************************
Trying to move ./WFN_restart.wfn.bak-1 to ./WFN_restart.wfn.bak-2.
rename returned status: -1
Problem moving file
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus
causing
the job to be terminated. The first process to do so was:

Process name: [[44988,1],192]
Exit code: 1


The SGE error files contains this:

cp2k-4.1-avx2.popt:3555 terminated with signal 6 at PC=2ad95d2c35f7
SP=7ffe9c0dbcf8.
(I have omitted the backtrace)

I am using 256 cores and this is the relevant part of my input file:

@SET BAND_TYPE NEB
&MOTION
&PRINT
&VELOCITIES OFF
&END
&END
&BAND
NPROC_REP 32
@IF ( ${BAND_TYPE} == NEB )
BAND_TYPE CI-NEB
K_SPRING 0.2
ROTATE_FRAMES T
&CI_NEB
NSTEPS_IT 5
&END
@ENDIF
@ENDIF
NUMBER_OF_REPLICA 5
&CONVERGENCE_CONTROL
MAX_FORCE 0.001
RMS_FORCE 0.0005
&END
&OPTIMIZE_BAND
OPTIMIZE_END_POINTS F
OPT_TYPE DIIS
&DIIS
MAX_STEPS 200
N_DIIS 7
NO_LS
STEPSIZE 0.5
MAX_STEPSIZE 1.0
&END
&END
&REPLICA
COORD_FILE_NAME files/start-A.xyz
&END
&REPLICA
COORD_FILE_NAME files/final-C.xyz
&END
&PROGRAM_RUN_INFO
&END
&CONVERGENCE_INFO
&END
&END BAND
&END MOTION


Could anybody point me in the right direction here? I am trying to get these
calculations done for some time now and I am still stuck. I have checked the
cluster with a different input file which I know works and so I got some
confidence it is not a cluster problem.
Anybody any ideas?

Please let me know if you need more informations.

All the best from London

Jörg


--
*************************************************************
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
20 Gordon Street
London
WC1H 0AJ

email: j.sassma...@ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
signature.asc

Matt W

ungelesen,
28.02.2017, 06:17:5028.02.17
an cp2k
Hi Jörg,

to me this error message

Trying to move ./WFN_restart.wfn.bak-1 to ./WFN_restart.wfn.bak-2. 
 rename returned status:           -1 

looks suspicious. I would expect the wavefunction files to have some prefixes indicating which replica etc. Maybe several MPI processes are trying to get a file lock on the same file?

Have you changed the names of any restart files / output file names etc in you input file?

Matt

Jörg Saßmannshausen

ungelesen,
28.02.2017, 07:09:2428.02.17
an cp...@googlegroups.com
Hi Matt,

thanks for the feedback.

I think that error message is a bit of a red herring. I am running normal
geometry and hessian calculations for some time now and my wavefunction file is
always called WFN_restart.wfn in the input file.

Originally I suspected it is a problem with the cluster but given that I could
repeat the problem with that calculation and not with a different calculation I
think that is not the problem.

It is running now. All I done was removing the duplicated line

@ENDIF

in my input file. I don't really know why I had it twice to be honest and it
does not make much sense to me that for a SM type of band calculations that
did not cause any problems whereas it does for a CI-NEB calculation. I would
have thought if there is a problem with the input file, the program crashes
right at the beginning and not after the first step.

So for now I think we can close that, problem sorted.

Thanks for the feedback though.

All the best from a sunny London

Jörg
> > email: j.sassma...@ucl.ac.uk <javascript:>
signature.asc
Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten