Hi Doyel,
it is the BROKEN_FINISHED that has to be investigated. (The rest seems to work quite well, from my point of view).
If a job is executed by a jobserver, first of all a jobexecutor process is spawned. The jobexecutor process executes the job's run program.
The communication between jobserver and jobexecutor is done via the taskfile.
The reason for this construction is that a jobserver can be shut down an restarted without affecting running jobs. The parent process is always the jobexecutor.
On termination of the user's process, the jobexecutor will do a wait() to retrieve the exit code. If the jobexecutor dies, the exit code will be lost (swallowed by the init process ;).
Hence the health of the jobexecutor process is crucial.
If the jobexecutor terminates prematurely, the jobserver will detect this and set the state of the (still) running job to BROKEN_ACTIVE. If the user's process terminates, the jobserver will set the state to BROKEN_FINISHED.
Since your job shows a BROKEN_FINISHED state, the jobexecutor must have vanished somehow. Either the process died, or the jobserver failed to recognize the jobexecutor.
The latter can happen if you configured something else than BOOTTIME=NONE in the jobserver's configuration. The former can happen if some administrator kills the jobexecutor, the Linux OOM hits, or the program run into some bug and crashed.
Another way to confuse the system is to store the taskfiles (see the TASKFILEPREFIX in the jobserver's configuration) on an NFS mounted device. Since both the jobexecutor and the jobserver evaluate the contents of the taskfile regularly, it is extremely important that the file is in a consistent state. Unfortunately this can't be guaranteed with NFS. The bottom line is: if you love nonsensical problems, store the taskfiles on an NFS mounted device.
The best thing to do now is to cancel your job and retry it. You can cancel the job via the GUI (RIP button, yellow), or by sdmsh (ALTER JOB 97563 WITH CANCEL;).
But before you retry, please check the BOOTTIME and the TASKFILEPREFIX configuration, as explained above.
HTH
Regards,
Ronald