shell script with exit code 1 does not show in monitor batches and jobs

Vorrarit Luengwattanakij

unread,

Feb 29, 2024, 1:38:44 PMFeb 29

to schedulix

Hi,

When job has error e.g. program runtime error, I don't see that job in monitor batches and jobs screen. So I created a simple shell script to test.

#!/bin/sh

date +"%A, %d %B %Y"
exit 1

I can see only see log from "Submit Batches and Jobs" screen.

I cannot see any jobs in "Monitor Batches and Jobs" screen, unless I click the RIP icon. Then I can see the item in "Monitor Batches and Jobs" screen with state "CANCELLED".

When job is executed by scheduler, there's no way to track such an error. Could you suggest how to query the job from "Monitor Batches and Jobs" screen, so that scheduled job that exit code 1 can be view from the screen?

Best Regards,

Vorrarit L.

Dieter Stubler

unread,

Feb 29, 2024, 3:06:04 PMFeb 29

to schedulix

Have you checked your search critaria for job states in you default bookmark settings ?

I assume you use selected job states in your search criteria and are missing the RESTARTABLE and maybe other job states.

Regards
Dieter

Vorrarit Luengwattanakij

unread,

Mar 3, 2024, 9:03:18 PMMar 3

to schedulix

The search criteria I used is just select past 10 minutes jobs without filter any job states.

I've also tried fillter all job states but the job with failure exit state still didn't shown up.

Best Regards,

Mid

Ronald Jeninga

unread,

Mar 4, 2024, 8:07:22 AMMar 4

to schedulix

Hi,

could you please post a screen shot of the search options you use, like

The above filter criteria should give you the list of all active masters (imasters that are not final).

If your filter criteria are similar (but I need to see them to judge that), we'll have to dig a little deeper in order to find out what's going on.

Best regards,

Ronald

Vorrarit Luengwattanakij

unread,

Mar 4, 2024, 9:46:33 PMMar 4

to schedulix

Hi,

This is the search options I used. All of them are default options except for "history from" and "history to".

Best Regards,

Vorrarit L.

Ronald Jeninga

unread,

Mar 5, 2024, 4:37:59 AMMar 5

to schedulix

Hi Vorrarit,

by specifying the History To value, you basically tell the system you want to know which jobs completed successfully (went FINAL) within the specified period.

If you only specify History From, you'll see both all active jobs and all jobs that went FINAL in the past "History From" time.

If you specify History From = 0, you'll only see active Jobs (and failed jobs are in a way active since they didn't complete successfully yet).

Hence, the History From filter is intended to answer a question like: Which Jobs did we run the day before yesterday?

You don't need that filter for monitoring the current situation. Remove it to get happy again :-)

Best regards,

Ronald

Vorrarit Luengwattanakij

unread,

Mar 5, 2024, 10:31:33 PMMar 5

to schedulix

Thank you Ronald. I've removed "history to" but still cannot see the failed job that I ran (named CURRENT_TIME).

Only when I input name patterns in query screen, then I can see the jobs.

Is there a way to query recent jobs (both success and failed)?

Best Regards,

Vorrarit

Ronald Jeninga

unread,

Mar 6, 2024, 1:47:17 AMMar 6

to schedulix

Hello Vorrarit,

at this point I need to explain a little about the concepts of the system.

If a Job runs into a FAILURE which you have defined to be a RESTARTABLE state, this job has not yet completed successfully.

First when it reaches a FINAL state (or is CANCELLED), the system considers the Job as done.

This means that the system expects that you'd like to do something with the failed Jobs.

(You want to get work done, which is usually not achieved by failing).

Since the system can't do anything itself, it marks such Jobs that are waiting for a manual action red.

The operator actions that are valid in this situation are:

1. RESTART the Job

You've removed the cause of the FAILURE and you want to run the program again to get work done

2. Set the Exit State of the Job (to some FINAL state)

You can't remove the cause of the FAILURE and you want to flag the Job as done

3. CANCEL the Job

You can't remove the cause of the FAILURE and want to get rid of the Job

It is important to see the difference between cancelling and setting the Exit State.

If you cancel a job, all dependent jobs will become UNREACHABLE. Another operator action will be required to decide what to do with the successors of the CANCELLED Job.

If the successors have defined some Exit State to be the UNREACHABLE state, this doesn't show effect.

If you set the Exit State of a Job, successors that become UNREACHABLE will acquire the UNREACHABLE Exit State (like e.g. SKIPPED).

You'll have to invest some work to clean up the current situation.

The Jobs waiting for an operator decision litter your system up to a point where newer Jobs aren't visible any more.

And since those Jobs are waiting, they can't be removed from the system. This occupies loads of memory and slows down your queries.

I think you are no longer interested in those ancient jobs. And since they are all standalone Jobs, as it seems, it'll be safe to CANCEL them.

Just mark all Jobs you want to get rid of with the Checkbox at the start of the row and hit the Cancel Button (Tombstone with RIP).

Now you want to prevent this from happening again.

You can make a copy of the STANDARD Exit State Profile and give it another name (Like e.q. STANDARD_FF, meaning something like STANDARD with FAILURE is FINAL).

To do so: edit the name of the Exit State Profile to copy and hit the Clone Button.

Now you mark the FAILURE Exit State as a FINAL state.

Give all Job Definitions that are allowed to fail the new Exit State Profile.

Future Jobs will now have a FINAL Exit State after failing. You cannot any longer restart such Jobs, but they also don't litter your system.

Alternatively you can create an Exit State Definition called PURGED (or give it whatever name with a comparable meaning).

You now add this Exit State to the STANDARD Exit State Profile. I think the best place will be below SUCCESS.

You define it to be a FINAL state.

Now if Jobs fail in the future, you can choose to rerun them, or to set their Exit State to PURGED, which then documents that you've decided not to try to run the Job again.

I hope the above makes sense to you. If not, please ask, and I'll try to explain it better.

Best regards,

Ronald

Vorrarit Luengwattanakij

unread,

Mar 17, 2024, 11:02:44 PMMar 17

to schedulix

Hi Ronald,

Thank you for your answer. I have tried creating exit state PURGED. When I try to set the stage as PURGED, I get this error

"A mapping to exit state PURGED doesn't exist, use force if you really want this"

What does it mean? How can I fix this error.

Also I see you'd mentioned about the sort order of exit state. Is there any special meaning in the ordering. I also found the sort ordering option in exit state mapping also.

Best Regards,

Vorrarit

Ronald Jeninga

unread,

Mar 18, 2024, 5:46:23 AMMar 18

to schedulix

Hi Vorrarit,

it isn't really an error, it is more like a warning without any change been made.

Just specify force (you need to check the force checkbox in the GUI, or specify FORCE in your alter job command), and the change will be made.

The idea behind this is that the PURGED exit state can't arise in a natural way, which means that you leave the predefined path of the job-flow.

This is not a problem, but the system just requires an extra effort from you as a confirmation that you know what you are doing.

And yes, it might seem annoying. I have to repeat the set state operation every once in a while myself. But I can positively state that it also saved me from disasters more than once.

(Granted, I have to repeat the action far more often than that I'm saved, but for me it is worth it).