submit as suspended holds up the schecule even if the job is skipped or disabled

50 views
Skip to first unread message

dpo...@shaklee.com

unread,
Mar 31, 2017, 2:02:37 PM3/31/17
to schedulix
Hello all,

I think I am may not be understanding the submit as suspended with no resume. Basic idea is that I have a schedule. At a certain point in that schedule I need to occasionally pause the schedule for a reboot. What I wanted to do was use a named synchronized resource to not only add that hold but also notify the people who would do the reboot that the system is ready for a reboot. Submit as suspended seems ideal for this as once they are done with the reboot, they can just resume that suspended job.

The problem is, when this reboot batch is skipped, the child job in the batch still holds up the schedule since it is suspended. It is looking like I have two choice, add more states to my name resource or submit the suspended job dynamically. Dynamically may be the easier route.

However, my question is why would you submit and not resume unless there is always a point in the schedule that you want to hold the schedule on? It would just seem to be to be logical to have the job be skipped with the batch that was skipped and not hold up the schedule. It is obvious it is not working that way after a number of tests. Again, I can work around it. I just want to know why one would use that feature other than to put a hold in the schedule everytime.

It looks like this group likes details:
  1. My synchronized named resource uses a simple locking set of states
  2. the unlocked(free) state is the normal state of the resource
  3. I have a simple job that is used to trigger this resource to go into the locked state
  4. The schedule is looking for this resource to be in the locked state to enter the batch. If it is not in the locked state, the batch is immediately skipped (basically two jobs and one batch accomplish. The "deciding" job will either run or skip. The batch job is dependent on the deciding job exiting saying to run. The last "status control" job is dependent on both the other two with an or condition. For the batch any final exit state is enough. For the deciding job it my have a final state of skipped) Because the job looking for the resource succeeds, it unlocks the resource.
  5. The batch does sever things at once but the two that are important here is one send out the message and the other one is submitted as suspended so that the person who does the reboot and unsuspend it when the reboot is done.
  6. The system is very old and will not run a job server. There is a sort of API that I use to interact with the system that is getting rebooted. I understand that normally this is controlled by the jobserver being done on that machine but that is not the case for me here.
You know, writing this out gave me one other idea. Is it possible to use a job to put a job server in an offline state and later put it back to an online state? Because I could set up a jobserver for this system that would be local on the schedulix server specifically for running the jobs on this server and have another jobserver that would be the "controller" for the system's jobserver and then use jobs to take the jobserver offline at the reboot point and just have the person doing the reboot launch another job manually or just put the jobserver back to online directly to get the results I am looking for as well.

Dieter Stubler

unread,
Apr 3, 2017, 3:08:37 AM4/3/17
to sche...@googlegroups.com
Hi,

The main idea of a job which is submitted suspended is to provide a halt in the execution of a batch to give the user a chance to check for intermediate results or other conditions before continuing.
For example in a business intelligence environment one might manually check the customer base plausibility before sending the weekly message with customer data to your CEO.

Your reboot usecase I would solve in a generic way like this:

Create a Synchronizing Named Ressource (let's call it MACHINE) witha resource state model containing the resource states UP and REBOOTING.and create an instance of this resource in every scope representing a machine you want to handle.
Every job which should be protected against a reboot while it is running will get a resorce requirement on the MACHINE resource requiring the resource state UP and locking it in shared mode.

No you can create reboot batches for every machine containing a WAIT_FOR_RUNNING_JOBS job which just does nothing but allocating the MACHINE resource in exclusive mode and using a resource state mapping to set its state to REBOOTING.
This jobs will wait for any running jobs with shared locks on MACHINE to complete and other jobs with lower priority trying to share lock this MACHINE resource will not start until the exclusive lock is gone and the state of the resource is UP again.

In your reboot batch a second job (let's call it REBOOT) will be dependent from the  WAIT_FOR_RUNNING_JOBS job also locking the MACHINE resource execlusively.
The REBOOT job will be submitted suspended and will do nothing else but using a resource state mapping to set its state to UP.
Now you can submit the REBOOT batch and an wait until the WAIT_FOR_RUNNING_JOBS job has finished.
Now you can reboot the machine.
After the reboot just resume the AFTER_REBOOT job and all waiting jobs can start again.
Make sure, the WAIT_FOR_RUNNING_JOBS and REBOOT jobs are configured to run on the machine you want to reboot.

If you have a jobserver agent on the machine to reboot you may automate the reboot like this:

Change the REBOOT jobs run command to actually rebooot the machine.
The REBOOT job will fail by showing a BROKEN_FINISHED state because the machine is going down while its runtime.
Now use rerun feature of the reboot job to rerun after a few minutes.
In the rerun command you may check whether the reboot actually worked or just do nothing (use 0 as rerun command) to SUCCEED the REBOOT job.
Use a resource state mapping in the REBOOT job to set the MACHINE resource state to UP on SUCCESS.

There more options to solve your usecase but this give give you a hint on it.

Hope the helped.

Regards
Dieter
  

dpo...@shaklee.com

unread,
Apr 3, 2017, 5:21:11 PM4/3/17
to schedulix
Dieter,

Thank you for the reply. While I would love to use your generic way and I would recommend it to most of your user base, I do not have such luxury. I am working against a much older system that tends to be sensitive. Right now they have a prescribed point in the schedule where the reboot is allowed. I would love to use the submit as suspended to implement it but I cannot do that as part of the larger schedule as the reboot is only twice a month and the schedule is daily. Now if the submit as suspended would honor being unreachable if it's parent was skipped I could use it. However, no matter what I have tried it holds up the schedule until it is resumed and then it will honor the skipped or even disabled. As such I cannot include it as is in my daily schedule.

It is an HP3000 machine and I have not tried nor do I expect to be able to run a jobserver on the machine. Further, I do not think the machine would even allow a reboot from anyplace other than the main console.

I think I could get the same affect by using a job to mark the jobserver as not enabled. However, I have not figured out which command to use to do that. I think it is alter resource as the alter server does not seem to provide what I want. I have been able to alter the jobserver's static resource to be offline but that did not stop the schedule.

I may still try to use a job submitted as suspended. However, it looks like it will have to be dynamically submitted to get around the problem stated above.

Dieter Stubler

unread,
Apr 4, 2017, 3:01:29 AM4/4/17
to schedulix
Hello,

just do the following:

Replace the job which you would like to submit suspended by a job which does nothing but succeed (run program: 0) and define a BEFORE FINAL Trigger on SUCCESS on this job submitting a child which is supended.
So if the bach is skipped the trigger is not executed because the state of the triggering job will be SKIPPED and no suspended job will exist.
This is shown in Example 1 below.

An even better solution to your problem is to use a PENDING state for the job.
PENDING states a meant to reflect actions outside of the scheduling system (another scheduler, manual actions, ...)
A Job in state pending will be treated like a running job which has to get a final state by a set state command manually or programmatically using the API.
The Example 2 below shows such a job which is shown in blue while the reboot should be done.
After the reboot you don not 'resume' the job but do a 'set state'  (the buttong with the red/green arrow) to SUCCESS to continue your batch.

This should solve your problem.

Regards
Dieter

Example 1:

create or alter folder SYSTEM.'TRIGGER_SUSPENDED'
with
	group = 'ADMIN',
	environment = none,
	inherit grants = (CREATE CONTENT, DROP, EDIT, MONITOR, OPERATE, SUBMIT, VIEW, RESOURCE),
	parameters = none;

create or alter job definition SYSTEM.'TRIGGER_SUSPENDED'.'BEFORE_FINAL'
with
	aging = none,
	min priority = none,
	children = none,
	dependency mode = all,
	environment = 'SERVER@LOCALHOST',
	errlog = '${JOBID}.log' NOTRUNC,
	group = 'ADMIN',
	inherit grants = (DROP, EDIT, MONITOR, OPERATE, SUBMIT, VIEW, RESOURCE),
	kill program = none,
	logfile = '${JOBID}.log' NOTRUNC,
	NOMASTER,
	priority = 50,
	parameters = none,
	profile = 'STANDARD',
	required = none,
	rerun program = none,
	resource = none,
	runtime = 0,
	runtime final = 0,
	run program = '0',
	NOSUSPEND,
	type = JOB,
	workdir = none;

create or alter job definition SYSTEM.'TRIGGER_SUSPENDED'.'TRIGGER_SUSPENDED'
with
	aging = none,
	min priority = none,
	children = none,
	dependency mode = all,
	environment = 'SERVER@LOCALHOST',
	errlog = '${JOBID}.log' NOTRUNC,
	group = 'ADMIN',
	inherit grants = (DROP, EDIT, MONITOR, OPERATE, SUBMIT, VIEW, RESOURCE),
	kill program = none,
	logfile = '${JOBID}.log' NOTRUNC,
	MASTER,
	priority = 50,
	parameters = none,
	profile = 'STANDARD',
	required = none,
	rerun program = none,
	resource = none,
	runtime = 0,
	runtime final = 0,
	run program = '0',
	NOSUSPEND,
	type = JOB,
	workdir = none;

create or alter trigger 'BEFORE_FINAL' on job definition SYSTEM.'TRIGGER_SUSPENDED'.'TRIGGER_SUSPENDED'
with
	active,
	condition = none,
	nomaster,
	nowarn,
	limit state = none,
	suspend,
	state = (
		'SUCCESS'
	),
	submitcount = 1,
	submit SYSTEM.'TRIGGER_SUSPENDED'.'BEFORE_FINAL',
	type = BEFORE_FINAL;

Example 2:

create or alter exit state definition 'REBOOTING';

create or alter exit state mapping 'REBOOT' with map = ('REBOOTING');

create or alter exit state profile 'REBOOT'
with
	default mapping = 'REBOOT',
	states = (
		'REBOOTING' pending,
		'SUCCESS' final,
		'SKIPPED' final unreachable
	);

create or alter job definition SYSTEM.'REBOOT'
with
	aging = none,
	min priority = none,
	children = none,
	dependency mode = all,
	environment = 'SERVER@LOCALHOST',
	errlog = '${JOBID}.log' NOTRUNC,
	group = 'ADMIN',
	inherit grants = (DROP, EDIT, MONITOR, OPERATE, SUBMIT, VIEW, RESOURCE),
	kill program = none,
	logfile = '${JOBID}.log' NOTRUNC,
	MASTER,
	priority = 50,
	parameters = none,
	profile = 'REBOOT',
	required = none,
	rerun program = none,
	resource = none,
	runtime = 0,
	runtime final = 0,
	run program = '0',
	NOSUSPEND,
	type = JOB,
	workdir = none;

dpo...@shaklee.com

unread,
Apr 4, 2017, 1:14:13 PM4/4/17
to schedulix
Dieter,

That second option was exactly what I was looking for. I did not even consider exit conditions being able to put something on hold like that!! Thank you.
Reply all
Reply to author
Forward
0 new messages