Hi all,
I'd like to ask you a question about an exception that I often get.
I'm running QE's PwBandsWorkChain, however I should say that I don't think workchain itself is the problem, simply because I've completed it successfully with this same material (now I am only changing some parameters and re-running, like adding vdw_corr or requesting a different density of k-points). Moreover, the QE calculations finish successfully.
The relaxation steps are fine but the scf step is excepted. If I log into the cluster, I see the QE scf calculation finished successfully (the 'out' dir is 3.5 gb). But `verdi node show 4109` says:
state Excepted <aiormq.exceptions.ChannelInvalidStateError: <Channel: "4"> closed>
Then, `verdi process report 4109` shows the errors in the attached file.
I wasn't able to reproduce the error in order to pinpoint what the cause was, it just happened a couple of times. Every time it happened, though, I did find the daemon to be somewhat erratic. What I mean is that, after finding out that the WorkChain had crashed:
- `verdi status` returned a 'running' daemon
- However, anything that I tried to submit afterwards stayed as 'Created' (with a stop emoji) in the output of `verdi process list`.
- Then, I stopped and started the daemon
- And only then the newly submitted jobs changed to 'Running'.
Two final points may be relevant to this. One is that this workchain took quite some time, so I did put my computer to sleep a few times while the workchain was active. (Would it be a good practice to stop the daemon before the end of the working day, and start it next morning?). The second one would be that this particular cluster has rotary IP numbers that are assigned randomly when I connect. Honestly, I barely know what that means, but I think that's why I needed to set the Key policy to WarningPolicy, otherwise `verdi computer coonfigure ssh` would fail.
That all the information I could gather about the error, I hope some of it makes sense.
Any ideas as to what may be going on?
Thanks in advance,
Ignacio.
________________________________
Ignacio Martin Alliati
PhD student, Maths and Physics
Queen's University Belfast.