Gracefully shutdown of server and jobserver

36 views
Skip to first unread message

Alonso Gómez

unread,
Mar 15, 2019, 8:13:17 AM3/15/19
to schedulix
Hello,

I need shutdown all schedulix gracefully. I think the order for shutdow is:
  1. Shutdown the server
  2. Shutdown the jobserver
  3. Shutdown the web interface
Is it correct?

If it is correct, when I shutdown the server; how can I check that jobservers are not running proccess? Are there any command or instruction for this?

Thank you.

Ronald Jeninga

unread,
Mar 15, 2019, 8:59:39 AM3/15/19
to schedulix
Hi Alonso,

actually the order is pretty irrelevant.

The web interface can be shut down at any point in time.
Of course, if some colleague tries to save something at that very moment, (s)he's unlucky.

Jobservers can be shut down at any point in time too.
If they are currently executing some job, the job will simply continue execution.
If a job is started by a jobserver, you'll have the following process hierarchy:

Jobserver (Java)
   |
Jobexecutor (C)
   |
Your Run Program

The jobserver can be shut down, because the jobexecutor wait()s for the job to finish and will retrieve the exit code as soon as this happens.
This exit code is then written into the taskfile.
After a restart of the jobserver it will check all taskfiles and will report the changes to the scheduling server.

If you kill the jobexecutor, you're not doing yourself a favour.
The Job will lose its parent process and the init process will take over.
The init process will discard the exit code as soon as the Job finishes and this information will be lost forever.
The Jobserver will find out though and report a BROKEN_FINISHED (or BROKEN_RUNNING if the job is still running, but the jobexecutor has died) state to the scheduling server.
The operator will have to decide what to do with the job.

It is also possible to shut down the scheduling server at any point in time.
Obviously, if the scheduling server is down, the jobservers can't report any changes any more.
But the jobservers are very patient and will retry to connect periodically. As soon as they are successful, they'll resend their message.
Also any attempt to work with the GUI is predetermined to fail if the scheduling server is down.

The bottom line is:
I personally would revert the proposed order, but if the shut down of the components is done within a very short time, the order of shutting down is irrelevant.


If you want to check fast if a jobserver is running jobs, you can do an "ls" on the taskfile directory (if you're privileged and happen to be logged on).
As an alternative you can do a "ps" and look for jobexecutor.
If you're not logged on, but have access to the GUI, you can use the "search running jobs".
You add a condition like

JOB.SCOPENAME == 'GLOBAL.EXAMPLES.LOCALHOST.SERVER'

or even a more complex condition like

JOB.SCOPENAME == 'GLOBAL.EXAMPLES.LOCALHOST.SERVER' or
JOB.SCOPENAME == 'GLOBAL.EXAMPLES.HOST_1.SERVER' or
JOB.SCOPENAME == 'GLOBAL.EXAMPLES.HOST_2.SERVER'

together with a history of 0 days, you'll get the jobs that are currently run by one of the jobservers.
(Of course you can also write a "list job" command in sdmsh).

Best regards,

Ronald

Alonso Gómez

unread,
Mar 15, 2019, 9:50:42 AM3/15/19
to schedulix
Hi Ronal,

Thank you for the info.
Reply all
Reply to author
Forward
0 new messages