Hi all,
I have recently been experiencing a weird error. It consists in the code
*occasionally* failing to create a new output subdirectory via the Fortran subroutine
EXECUTION_COMMAND_LINE. The relevant snippet is around lines 39-43 is in the old(er) version
of output_amr.f90.
This has been happening since I started running jobs on a new cluster, therefore I strongly
suspect that it is not a bug in the code but rather has to do with the something particular to
the cluster, e.g. the openmpi version (2.1) or something else.
As mentioned above, this only happens occasionally, i.e. the code runs fine, writing output
to disk, until it suddenly fails for no apparent reason. My jobs allocate enough memory,
and there is more than enough memory on disk for the output to be written. I know this
because restarting the job generally fixes the issue, albeit only temporarily, i.e. the job fails again
at a later stage.
Looking for some pointers in the relevant forums (or fora?) all I could gleaned is that
EXECUTION_COMMAND_LINE may fail to create a new directory if there is not enough
*virtual* memory available in the system; which really does not help much.
So my questions are:
1) Has anyone experienced this issue before?
2) Does anyone have an idea where to start looking for a possible solution or at least a workaround?
Needless to say that I have contacted the help team responsible for the cluster (still waiting for their reply).
Cheers,
Thor.