Calypso error handling on X-DFT code crashing

177 views
Skip to first unread message

Victor Naden Robinson

unread,
Aug 22, 2016, 1:17:41 PM8/22/16
to CALYPSO
Good afternoon,

I am running calypso with CASTEP and I notice that my entire calypso process, as well as my SGE or PBS job crashes along with it. This seems to coincide with:

Application 22902329 exit codes: 1
or
Application 22902329 exit codes: 132

Is there a way to make calypso ignore the crash with the program it is interfaced with?

Other errors I typically see are from CASTEP :

ERROR in cell constraints: attempt to fix a=c, but a /= c
Current trace stack:
 cell_check_cell_constraints
 cell_read
 castep

ERROR in cell constraints: attempt to fix alpha=beta, but alpha /= beta
Current trace stack:
 cell_check_cell_constraints
 cell_read
 castep

Error - symmetry related atoms do not form a group: num ops = 2
 Internal error in SYMMETRY_GENERATE. Please submit a bug report.
Current trace stack:
 cell_read
 castep

Error in pot_add with input poten1
Current trace stack:
 pot_add
 locpot_calculate
 electronic_prepare_H
 electronic_minimisation
 calculate_finite_basis_corr
 check_elec_ground_state
 castep
&
 Error - real_std_pot wrong size
Error in pot_add with input poten1
Current trace stack:
 pot_add
 locpot_calculate
 electronic_prepare_H
 electronic_minimisation
 calculate_finite_basis_corr
 check_elec_ground_state
 castep

wyc

unread,
Aug 22, 2016, 9:38:47 PM8/22/16
to Calyp...@googlegroups.com
Dear Victor,

Is there a way to make calypso ignore the crash with the program it is interfaced with?

You can change the command of submit.sh to ignore the crash. I attached an example for running CALYPSO with CASTEP and please test it using this example. 

Other errors I typically see are from CASTEP 

Please try it again with the input files for running castep(castep.param_* and cinput_*). Maybe it work well. If you have any problems, freely email us.   

Sincerely

Yanchao

4_example.tar.gz

Victor Naden Robinson

unread,
Aug 22, 2016, 9:58:00 PM8/22/16
to Calyp...@googlegroups.com
Ah I see - calypso uses standard error out as an error flag?

the cinput_* and castep.param_* look very similar to mine but I will give them a try as well, thanks.

Kind Regards,
Victor Naden Robinson
__________________________
University of Edinburgh, School of Physics and Astronomy
Centre for Science at Extreme Conditions, James Clerk Maxwell Building
King's Buildings
Edinburgh, EH9 3JZ

--
You received this message because you are subscribed to a topic in the Google Groups "CALYPSO" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/CalypsoCode/RNslcfrgOG4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to Calyp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/CalypsoCode/CE7EA78B-5F83-4DE7-912E-204888410F48%40calypso.cn.

For more options, visit https://groups.google.com/d/optout.

On Aug 23, 2016, at 1:17 AM, Victor Naden Robinson <victornad...@gmail.com> wrote:
Good afternoon,

I am running calypso with CASTEP and I notice that my entire calypso process, as well as my SGE or PBS job crashes along with it. This seems to coincide with:

Application 22902329 exit codes: 1
or
Application 22902329 exit codes: 132

Is there a way to make calypso ignore the crash with the program it is interfaced with?


--
You received this message because you are subscribed to the Google Groups "CALYPSO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to Calyp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/CalypsoCode/9f1903db-e032-4520-b602-81b7e3afe1bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "CALYPSO" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/CalypsoCode/RNslcfrgOG4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to Calyp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/CalypsoCode/CE7EA78B-5F83-4DE7-912E-204888410F48%40calypso.cn.
For more options, visit https://groups.google.com/d/optout.


castep.param_1
cinput_1

Victor Naden Robinson

unread,
Aug 24, 2016, 10:55:56 AM8/24/16
to CALYPSO
Good afternoon,

I am trying to run with &> /dev/null.

Calypso seems to abort because it fails to read CONTCAR - which does not exist after a CASTEP process has failed. Though I thought it could use the previous castep-out.cell to create a CONTCAR still with readcell.py and contcar.py?

Leads to calypso exiting under code 1 or 134:

/1/caly.log:Application 22888395 exit codes: 1
/10/caly.log:Application 22888423 exit codes: 1
/11/caly.log:Application 22887895 exit codes: 1
/12/caly.log:Application 22888462 exit codes: 1
/13/caly.log:Application 22869699 exit codes: 134
/13/caly.log:Application 22878757 exit codes: 134
/13/caly.log:Application 22886258 exit codes: 1

Victor Naden Robinson

unread,
Aug 24, 2016, 1:46:31 PM8/24/16
to CALYPSO
I think I have found the problem here, and a possible solution.

Problem:

Calypso interfaced with CASTEP
  1. If the first system relaxation fails (rare) due to a CASTEP error (alpha=/=beta for example) then no CONTCAR is produced by readcell.py
  2. calypso.x looks for CONTCAR and produces an abort signal when it cannot find the file
  3. forrtl: severe (29): file not found, unit 1003, file /path/to/file/CONTCAR
  4. If there is a previous CONTCAR file from a previous successful relaxation then calypso.x reads this with readcell.py. This is the analyzing the previous structure which is incorrect.
  5. Importantly for searches with varying number of formula units: If a castep run crashes the previous CONTCAR exists but can be the incorrect length if calypso is expecting a CONTCAR for 4 formula units and the CONTCAR from the previous structure is only 2.
  6. forrtl: severe (24): end-of-file during read, unit 1003, file /path/to/file/CONTCAR

I wonder if this is because I am calling calypso with " ./calypso.x > caly.log " and no &> /dev/null?


Possible hack or solution:
  1. Modify submit.sh to create a valid CONTCAR before the relaxation in case CASTEP crashes:
cp castep.cell castep-out.cell
./readcell.py
mpiexec $flags $code &> /dev/null/

+ a method of finding the $relaxtion-code error. For example if castep.*.err exists, cat castep.*.err >> error.log; rm castep.*.err.

This is not perfect but will stop the crashes as there is not always a valid CONTCAR file ready for calypso, however it is not relaxed (but that is OK).

I think this is unique to the CASTEP implementation.

Victor Naden Robinson

unread,
Apr 12, 2017, 9:10:50 PM4/12/17
to CALYPSO
Dear Yanchao,

I have noticed that &> /dev/null performs error handling from castep and allows calypso to run smoothly.

However there is 1 error that still seems to cause a sigterm / job abort / crashes calypso:

Error calculate_finite_basis : Convergence failed when doing finite basis set correction.
Current trace stack:
 calculate_finite_basis_corr
 check_elec_ground_state
 castep


i.e. When castep fails to find the ground state. This is an issue when the SCF cycle oscillates in some systems and never converges. Perhaps castep complains different with this error? All other errors seems to be stable.

wyc

unread,
Apr 12, 2017, 9:27:18 PM4/12/17
to Calyp...@googlegroups.com
Dear Victor,

Please pack all the documents and send it to me.

Thanks a lot!

Sincerely

Yanchao

------------------------------------------------------
Dr. Yanchao Wang (王彦超)
State Key Lab of Superhard Materials
Jilin University
Changchun 130012
China
Email: w...@calypso.cn/wyc...@jlu.edu.cn
skype:wyanchao1225



To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode...@googlegroups.com.

To post to this group, send email to Calyp...@googlegroups.com.

Victor Naden Robinson

unread,
Apr 12, 2017, 9:39:47 PM4/12/17
to Calyp...@googlegroups.com
Dear Yanchao,

I have attached all my files (search directory & job error output). Note I am using a PBE/Torque job engine where aprun in submit.sh = ~ mpirun or mpiexec

The abort happens at the exact same time as the CONTCAR is produced after the SCF minimization error.


Kind Regards,
Victor Naden Robinson
__________________________
University of Edinburgh, School of Physics and Astronomy
Centre for Science at Extreme Conditions, James Clerk Maxwell Building
King's Buildings
Edinburgh, EH9 3JZ

On Thu, Apr 13, 2017 at 2:25 AM, wyc <w...@calypso.cn> wrote:
Dear Victor,

Please pack all the documents and send it to me.

Thanks a lot!

Sincerely

Yanchao
------------------------------------------------------
Dr. Yanchao Wang (王彦超)
State Key Lab of Superhard Materials
Jilin University
Changchun 130012
China
To post to this group, send email to CalypsoCode@googlegroups.com.
To post to this group, send email to CalypsoCode@googlegroups.com.

-- 
You received this message because you are subscribed to a topic in the Google Groups "CALYPSO" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/CalypsoCode/RNslcfrgOG4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to CalypsoCode@googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "CALYPSO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to CalypsoCode@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "CALYPSO" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/CalypsoCode/RNslcfrgOG4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to Calyp...@googlegroups.com.
calypso-debug.tar

wyc

unread,
Apr 12, 2017, 10:24:39 PM4/12/17
to Calyp...@googlegroups.com
Dear Victor,

I attach input.dat as attachment. 

The system contains a lot of atoms, I suggest that the parallel mode for calypso should be employed. We have uploaded the corresponding  example in CALYPSO website: download.calypso.cn

Furthermore, the parameters (max_scf_cycles and geom_max_iter) of CASTEP should be set as 200.

Please try it again using these parameters. 

Sincerely

Yanchao
input.dat

wyc

unread,
Apr 13, 2017, 12:08:58 AM4/13/17
to Calyp...@googlegroups.com
Dear Victor,

Maybe we can modified the script file of getes_castep.py to avoid it. When we complete it, we will send it to you as soon as possible.

Sincerely

Yanchao

------------------------------------------------------
Dr. Yanchao Wang (王彦超)
State Key Lab of Superhard Materials
Jilin University
Changchun 130012
China
--
You received this message because you are subscribed to the Google Groups "CALYPSO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode...@googlegroups.com.

To post to this group, send email to Calyp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
<input.dat>
------------------------------------------------------
Dr. Yanchao Wang (王彦超)
State Key Lab of Superhard Materials
Jilin University
Changchun 130012
China
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode...@googlegroups.com.

To post to this group, send email to Calyp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
<calypso-debug.tar>


--
You received this message because you are subscribed to the Google Groups "CALYPSO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode...@googlegroups.com.

To post to this group, send email to Calyp...@googlegroups.com.

Victor Naden Robinson

unread,
Apr 14, 2017, 9:20:14 AM4/14/17
to Calyp...@googlegroups.com
Actually I may have solved this by using a "wait" command in my job script so it does not respond to errors in the current / mother process. I will test further!

Kind Regards,
Victor Naden Robinson
__________________________
University of Edinburgh, School of Physics and Astronomy
Centre for Science at Extreme Conditions, James Clerk Maxwell Building
King's Buildings
Edinburgh, EH9 3JZ

On Thu, Apr 13, 2017 at 5:06 AM, wyc <w...@calypso.cn> wrote:
Dear Victor,

Maybe we can modified the script file of getes_castep.py to avoid it. When we complete it, we will send it to you as soon as possible.

Sincerely

Yanchao

------------------------------------------------------
Dr. Yanchao Wang (王彦超)
State Key Lab of Superhard Materials
Jilin University
Changchun 130012
China
On Apr 13, 2017, at 10:22 AM, wyc <w...@calypso.cn> wrote:

Dear Victor,

I attach input.dat as attachment. 

The system contains a lot of atoms, I suggest that the parallel mode for calypso should be employed. We have uploaded the corresponding  example in CALYPSO website: download.calypso.cn

Furthermore, the parameters (max_scf_cycles and geom_max_iter) of CASTEP should be set as 200.

Please try it again using these parameters. 

Sincerely

Yanchao

--
You received this message because you are subscribed to the Google Groups "CALYPSO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode+unsubscribe@googlegroups.com.

To post to this group, send email to Calyp...@googlegroups.com.
<input.dat>
------------------------------------------------------
Dr. Yanchao Wang (王彦超)
State Key Lab of Superhard Materials
Jilin University
Changchun 130012
China
To post to this group, send email to Calyp...@googlegroups.com.

To post to this group, send email to Calyp...@googlegroups.com.
-- 
You received this message because you are subscribed to a topic in the Google Groups "CALYPSO" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/CalypsoCode/RNslcfrgOG4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to Calyp...@googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "CALYPSO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode+unsubscribe@googlegroups.com.

To post to this group, send email to Calyp...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "CALYPSO" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/CalypsoCode/RNslcfrgOG4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to Calyp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/CalypsoCode/67DFF41E-6A79-4952-9FC2-3C31E6302433%40calypso.cn.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "CALYPSO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode+unsubscribe@googlegroups.com.

To post to this group, send email to Calyp...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CALYPSO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode+unsubscribe@googlegroups.com.

To post to this group, send email to Calyp...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "CALYPSO" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/CalypsoCode/RNslcfrgOG4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to Calyp...@googlegroups.com.

Victor Naden Robinson

unread,
Apr 17, 2017, 6:20:02 PM4/17/17
to CALYPSO
Dear Yanchao,

This helped somewhat but I am still crashing calypso on castep errors as mentioned above. This only occurs when searching with a variable number of formula units.
To post to this group, send email to CalypsoCode@googlegroups.com.

To post to this group, send email to CalypsoCode@googlegroups.com.
-- 
You received this message because you are subscribed to a topic in the Google Groups "CALYPSO" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/CalypsoCode/RNslcfrgOG4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to CalypsoCode+unsubscribe@googlegroups.com.
To post to this group, send email to CalypsoCode@googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "CALYPSO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to CalypsoCode+unsubscribe@googlegroups.com.

To post to this group, send email to CalypsoCode@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages