Errors with Vasp defect calculations

929 views
Skip to first unread message

Saurabh Bajaj

unread,
Jun 9, 2015, 7:14:11 PM6/9/15
to matproj...@googlegroups.com
Hi all,

I'm posting this here as I was not sure if there is a custodian-related group. I was running some high-throughput defect calculations on Mendel, and whereas some jobs completed without any errors, others stopped after a few ionic steps with the following error in FW_job.error,

mpirun: killing job...

ERROR:custodian.custodian:{u'handler': VaspErrorHandler, u'errors': [u'eddrmm'], u'actions': [{u'action': {u'_set': {u'ALGO': u'Normal'}}, u'dict': u'INCAR'}, {u'action': {u'_file_delete': {u'mode': u'actual'}}, u'file': u'CHGCAR'}, {u'action': {u'_file_delete': {u'mode': u'actual'}}, u'file': u'WAVECAR'}]}
ERROR:custodian.custodian:MaxErrors
Traceback (most recent call last):
  File "/global/u1/s/sbajaj/sb_vasp/codes/fireworks/fireworks/core/rocket.py", line 202, in run
    m_action = t.run_task(my_spec)
  File "/global/u1/s/sbajaj/sb_vasp/codes/MPWorks/mpworks/examples/firetasks_ex.py", line 59, in run_task
    c.run()
  File "/global/u1/s/sbajaj/sb_vasp/codes/custodian/custodian/custodian.py", line 221, in run
    .format(self.total_errors, ex))
RuntimeError: 1 errors reached: MaxErrors. Exited...
INFO:rocket.launcher:Rocket finished


Also, in the output file vasp.out, I noticed the following warning,

WARNING in EDDRMM: call to ZHEGV failed, returncode =   6  3 11


I am using ALGO = Normal, and tried using both vasp/5.2 and vasp/5.3.3_vtst.matgen with the same issue. Any suggestions as to what might be causing this?

Thanks
Saurabh

Anubhav Jain

unread,
Jun 9, 2015, 7:28:31 PM6/9/15
to Saurabh Bajaj, matproj...@googlegroups.com
Hi Saurabh,

You can look into custodian.json to see the list of things custodian did to try to fix the job. There is usually also a tar.gz file that shows you all the previous runs that you can inspect to see what errors were happening previously.

There is usually a maximum number of times custodian will try to fix a job; you cannot expect that it will solve every problem for every job. If the job is important, the best strategy is to check the VASP forums (or simply Google the error) and work through the different ways forward. The ALGO=Normal is one of them but there are others. In addition, make sure that the ZHEGV error is really what's causing your job to fail (i.e., that there not other warnings)

If you think you have found a new and useful rule for fixing jobs, the best thing is if you can contribute it back to custodian so that it is automated for next time.

Best,
Anubhav





--
You received this message because you are subscribed to the Google Groups "Materials Project Development Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to matproj-devel...@googlegroups.com.
To post to this group, send email to matproj...@googlegroups.com.
Visit this group at http://groups.google.com/group/matproj-develop.

Reply all
Reply to author
Forward
0 new messages