sporadic exit code none problem.

44 views
Skip to first unread message

Dan Power

unread,
Jun 2, 2023, 6:22:46 PM6/2/23
to schedulix
Ok, I have schedulix running a bat file. It is not changing and I even have an
echo %ERRORLEVEL%
at the end of it. According to the echo, it is 0 but the 0 does not get back to schedulix for some reason. Instead, schedulix shows and exit code of NONE. The batch has not changed. The job has not changed. But the results are sporadic. Sometimes I get the exit code of 0 other times I get none. Do I really need to add this like to my batch file to get it to be consistent?
EXIT /B %ERRORLEVEL%
I started to run another simple batch file along with the one that is giving me troubles and the main difference is one calls an exe and nothing else and the other just calls
EXIT /B 0
Any suggestions?

Dan Power

unread,
Jun 2, 2023, 6:26:03 PM6/2/23
to schedulix
Oi, I tried adding the exit with passing the current errorlevel and it still comes back with exit code none. I can see in the logs the exit command is called. I don't understand.

Dan Power

unread,
Jun 2, 2023, 6:37:23 PM6/2/23
to schedulix
I don't know if it is related, but the job running out of schedulix logs a lot more than what shows up when I manually run it on the machine itself. This suggests that I am not running it the same way schedulix is running it. I even tried some redirection methods to improve the results, but I cannot get the same results. Is it possible that the calling method is interfering with the exit code of the bat file somehow?

Ronald Jeninga

unread,
Jun 4, 2023, 5:34:34 AM6/4/23
to schedulix
Hi Dan,

that all sounds pretty weird.
Apart from obvious questions (which Windows release, which schedulix release?), I'll try do do some loud thinking.

I don't think that the different amount of logging plays a role.
The jobexecutor isn't very active here. It just redirects stdout to some file before it starts the child process.

Well, child process..., that is a Unix way of thinking. In Windows it does a CreateProcess(), which creates (and starts) a process, but AFAIK Windows does not have a concept of parent and child processes.
In a Unix environment the parent process is kind of responsible for its children. This means that the OS keeps information about the child process (after its termination) as long as the parent process doesn't retrieve it.
I don't know if this is true in a Windows environment.

You write 
> Sometimes I get the exit code of 0 other times I get none.

This sounds a lot like a race condition.
Something like:
jobexecutor creates process
      process finishes
      process information is discarded by windows
jobexecutor tries to retrieve the exit code

versus

jobexecutor creates process
      process finishes
jobexecutor tries to retrieve the exit code
      process information is discarded by windows

There are at least three experiments you could try:

1. Instead of run program = 'dosomething.bat', you could try to invoke the bat file through cmd.exe, like run program = 'cmd.exe /C dosomething.bat'
2. If my hypothesis with respect to the cause of the race condition is correct, you can try to add a sleep 1 to the bat file. That'll give the jobexecutor time to start its WaitForSingleObject() call
3. If you rewrite your bat file as a powershell file, does it behave identically? (*)

(*) Invoking the powershell is pretty expensive (i.e. requires quite some resources) and it therefore pretty slow.
If my hypothesis applies, the delay caused by the start of the powershell could be sufficient to give the jobexecutor the time to invoke the WaitForSingleObject() call.

Let's try to find the cause of the issue. And maybe we find something that we can modify in the jobexecutor to protect the system against rolling dice.

Best regards,

Ronald

Ronald Jeninga

unread,
Jun 5, 2023, 6:24:52 AM6/5/23
to schedulix
Hi Dan,

as Dieter pointed out to me, it isn't actually possible to "finish a job" without specifying an exit code.
And since Jobservers aren't terribly intelligent and certainly not creative, they won't even try to finish a job without an exit code.

Hence, before we start chasing ghosts, could you post a show job output of a job with exit code NONE?
Suppose the jobid is 123456, you start sdmsh and issue the command

show job 123456;

grab the output and post it here. (You can obliterate any sensitive data like IP addresses or host names if required).
You can exit sdmsh using exit, bye, quit or disconnect (Control-D twice works as well).

Best regards,

Ronald

Dan Power

unread,
Jun 5, 2023, 6:13:05 PM6/5/23
to schedulix
So, whether it is calling the bat directly or through cmd.exe /C they are both behaving sporadically (different jobs exhibiting the same behavior that are also using a different calling method).

I took the job I can run frequently and pulled a failure with and without cmd involved. Looking above I did not mention that my test batch that only has an exit /D 0 in it, succeeds every time. It is the other batch that also calls an EXE that is causing me problems. I put the problem batch and the test batch in a Schedlix Batch so they are both run at the same time each time and the test batch has never failed. The batch that also runs the EXE fails sporadically no matter what changes I make to the batch file. Here is a job with no changes that shows an exit code of none:

Job

                   ID : 4074880
              SE_NAME : <MASK>.DAILY INCENTIVES - MULTIPLE.INCENTIVEENTRY
             SE_OWNER : THATCHER
              SE_TYPE : JOB
       SE_RUN_PROGRAM : <MASK>IncentiveEntry.bat
     SE_RERUN_PROGRAM : <null>
      SE_KILL_PROGRAM : <null>
           SE_WORKDIR : <null>
           SE_LOGFILE : ${JOBID}.log
         SE_TRUNC_LOG : false
        SE_ERRLOGFILE : ${JOBID}.log
      SE_TRUNC_ERRLOG : false
  SE_EXPECTED_RUNTIME : 0
          SE_PRIORITY : 50
  SE_SUBMIT_SUSPENDED : false
SE_MASTER_SUBMITTABLE : true
   SE_DEPENDENCY_MODE : AND
          SE_ESP_NAME : STANDARD
          SE_ESM_NAME : <default>
          SE_ENV_NAME : THATCHER
           SE_FP_NAME : <null>
            MASTER_ID : 4074879
            CHILD_TAG : <null>
           SE_VERSION : 2447162798
                OWNER : THATCHER
            PARENT_ID : 4074879
             SCOPE_ID : <MASK>.PROWESSDEV
             HTTPHOST : <MASK>
             HTTPPORT : <MASK>
            IS_STATIC : true
           MERGE_MODE : NOMERGE
                STATE : BROKEN_FINISHED
          IS_DISABLED : false
         IS_CANCELLED : false
           JOB_ESD_ID : FAILURE
         JOB_ESD_PREF : 1
         JOB_IS_FINAL : false
   JOB_IS_RESTARTABLE : true
         FINAL_ESD_ID : FAILURE
            EXIT_CODE : <null>
          COMMANDLINE : <MASK>IncentiveEntry.bat
       RR_COMMANDLINE : <null>
              WORKDIR : <MASK>
              LOGFILE : 4074880.log
           ERRLOGFILE : 4074880.log
                  PID : 15680@N0+1685998813
              EXT_PID : 11176@N0+1685998813
            ERROR_MSG : <null>
              KILL_ID : <null>
       KILL_EXIT_CODE : <null>
         IS_SUSPENDED : NOSUSPEND
   IS_SUSPENDED_LOCAL : false
             PRIORITY : 50
         RAW_PRIORITY : 5000
            NICEVALUE : 0
         NP_NICEVALUE : 0
         MIN_PRIORITY : 10
         AGING_AMOUNT : 1
           AGING_BASE : MINUTES
     DYNAMIC_PRIORITY : 36
     PARENT_SUSPENDED : 0
            SUBMIT_TS : 05 Jun 2023 21:00:12 GMT
            RESUME_TS : <null>
              SYNC_TS : 05 Jun 2023 21:00:12 GMT
          RESOURCE_TS : 05 Jun 2023 21:00:12 GMT
          RUNNABLE_TS : 05 Jun 2023 21:00:12 GMT
             START_TS : 05 Jun 2023 21:00:13 GMT
            FINISH_TS : 05 Jun 2023 21:00:13 GMT
             FINAL_TS : <null>
        CNT_SUBMITTED : 0
  CNT_DEPENDENCY_WAIT : 0
 CNT_SYNCHRONIZE_WAIT : 0
    CNT_RESOURCE_WAIT : 0
         CNT_RUNNABLE : 0
         CNT_STARTING : 0
          CNT_STARTED : 0
          CNT_RUNNING : 0
          CNT_TO_KILL : 0
           CNT_KILLED : 0
        CNT_CANCELLED : 0
         CNT_FINISHED : 0
            CNT_FINAL : 0
    CNT_BROKEN_ACTIVE : 0
  CNT_BROKEN_FINISHED : 0
            CNT_ERROR : 0
      CNT_RESTARTABLE : 0
      CNT_UNREACHABLE : 0
             CNT_WARN : 0
           WARN_COUNT : 0
            IDLE_TIME : 893
 DEPENDENCY_WAIT_TIME : 0
         SUSPEND_TIME : 0
            SYNC_TIME : 0
        RESOURCE_TIME : 0
       JOBSERVER_TIME : 0
     RESTARTABLE_TIME : 893
      CHILD_WAIT_TIME : 0
         PROCESS_TIME : 893
          ACTIVE_TIME : 0
             IDLE_PCT : 100
             CHILDREN :

CHILDID CHILDPRIVS CHILDSENAME CHILDSETYPE CHILDSEPRIVS PARENTID PARENTPRIVS PARENTSENAME PARENTSETYPE PARENTSEPRIVS IS_STATIC PRIORITY SUSPEND MERGE_MODE EST_NAME IGNORED_DEPENDENCIES
------- ---------- ----------- ----------- ------------ -------- ----------- ------------ ------------ ------------- --------- -------- ------- ---------- -------- --------------------

              PARENTS :

CHILDID CHILDPRIVS CHILDSENAME                                                            CHILDSETYPE CHILDSEPRIVS PARENTID PARENTPRIVS PARENTSENAME                                            PARENTSETYPE PARENTSEPRIVS IS_STATIC PRIORITY SUSPEND      MERGE_MODE EST_NAME IGNORED_DEPENDENCIES
------- ---------- ---------------------------------------------------------------------- ----------- ------------ -------- ----------- ------------------------------------------------------- ------------ ------------- --------- -------- ------------ ---------- -------- --------------------
4074880 KDEMOVGR   <MASK>.DAILY INCENTIVES - MULTIPLE.INCENTIVEENTRY JOB         KPDEMOSVGR    4074879 KDEMOVGR    <MASK>.PARALLEL TEST.PARALLEL TEST BATCH        KPDEMOSVGR    true             0 CHILDSUSPEND NOMERGE    <null>

            PARAMETER :

ID      NAME     TYPE   VALUE
------- -------- ------ -----
4062204 RUN_MODE IMPORT DEV

        REQUIRED_JOBS :

ID DEPENDENT_ID DEPENDENT_PATH DEPENDENT_PRIVS DEPENDENT_ID_ORIG DEPENDENT_PATH_ORIG DEPENDENT_PRIVS_ORIG DEPENDENCY_OPERATION REQUIRED_ID REQUIRED_PATH REQUIRED_PRIVS STATE DD_ID DD_NAME DD_DEPENDENTNAME DD_DEPENDENTTYPE DD_DEPENDENTPRIVS DD_REQUIREDNAME DD_REQUIREDTYPE DD_REQUIREDPRIVS DD_UNRESOLVED_HANDLING DD_MODE DD_STATES JOB_STATE IS_SUSPENDED PARENT_SUSPENDED CNT_SUBMITTED CNT_DEPENDENCY_WAIT CNT_SYNCHRONIZE_WAIT CNT_RESOURCE_WAIT CNT_RUNNABLE CNT_STARTING CNT_STARTED CNT_RUNNING CNT_TO_KILL CNT_KILLED CNT_CANCELLED CNT_FINISHED CNT_FINAL CNT_BROKEN_ACTIVE CNT_BROKEN_FINISHED CNT_ERROR CNT_RESTARTABLE CNT_UNREACHABLE JOB_IS_FINAL CHILD_TAG FINAL_STATE CHILDREN IGNORE CHILD_SUSPENDED CNT_PENDING DD_CONDITION
-- ------------ -------------- --------------- ----------------- ------------------- -------------------- -------------------- ----------- ------------- -------------- ----- ----- ------- ---------------- ---------------- ----------------- --------------- --------------- ---------------- ---------------------- ------- --------- --------- ------------ ---------------- ------------- ------------------- -------------------- ----------------- ------------ ------------ ----------- ----------- ----------- ---------- ------------- ------------ --------- ----------------- ------------------- --------- --------------- --------------- ------------ --------- ----------- -------- ------ --------------- ----------- ------------

       DEPENDENT_JOBS :

ID DEPENDENT_ID DEPENDENT_PATH DEPENDENT_PRIVS DEPENDENT_ID_ORIG DEPENDENT_PATH_ORIG DEPENDENT_PRIVS_ORIG DEPENDENCY_OPERATION REQUIRED_ID REQUIRED_PATH REQUIRED_PRIVS STATE DD_ID DD_NAME DD_DEPENDENTNAME DD_DEPENDENTTYPE DD_DEPENDENTPRIVS DD_REQUIREDNAME DD_REQUIREDTYPE DD_REQUIREDPRIVS DD_UNRESOLVED_HANDLING DD_MODE DD_STATES JOB_STATE IS_SUSPENDED PARENT_SUSPENDED CNT_SUBMITTED CNT_DEPENDENCY_WAIT CNT_SYNCHRONIZE_WAIT CNT_RESOURCE_WAIT CNT_RUNNABLE CNT_STARTING CNT_STARTED CNT_RUNNING CNT_TO_KILL CNT_KILLED CNT_CANCELLED CNT_FINISHED CNT_FINAL CNT_BROKEN_ACTIVE CNT_BROKEN_FINISHED CNT_ERROR CNT_RESTARTABLE CNT_UNREACHABLE JOB_IS_FINAL CHILD_TAG FINAL_STATE CHILDREN IGNORE CHILD_SUSPENDED CNT_PENDING DD_CONDITION
-- ------------ -------------- --------------- ----------------- ------------------- -------------------- -------------------- ----------- ------------- -------------- ----- ----- ------- ---------------- ---------------- ----------------- --------------- --------------- ---------------- ---------------------- ------- --------- --------- ------------ ---------------- ------------- ------------------- -------------------- ----------------- ------------ ------------ ----------- ----------- ----------- ---------- ------------- ------------ --------- ----------------- ------------------- --------- --------------- --------------- ------------ --------- ----------- -------- ------ --------------- ----------- ------------

   REQUIRED_RESOURCES :

SCOPE_ID SCOPE_NAME SCOPE_TYPE SCOPE_PRIVS RESOURCE_ID RESOURCE_NAME RESOURCE_USAGE RESOURCE_OWNER RESOURCE_PRIVS RESOURCE_STATE RESOURCE_TIMESTAMP REQUESTABLE_AMOUNT TOTAL_AMOUNT FREE_AMOUNT REQUESTED_AMOUNT REQUESTED_LOCKMODE REQUESTED_STATES RESERVED_AMOUNT ALLOCATED_AMOUNT ALLOCATED_LOCKMODE IGNORE STICKY STICKY_NAME STICKY_PARENT STICKY_PARENT_TYPE ONLINE ALLOCATE_STATE EXPIRE EXPIRE_SIGN DEFINITION
-------- ---------- ---------- ----------- ----------- ------------- -------------- -------------- -------------- -------------- ------------------ ------------------ ------------ ----------- ---------------- ------------------ ---------------- --------------- ---------------- ------------------ ------ ------ ----------- ------------- ------------------ ------ -------------- ------ ----------- ----------

          SUBMIT_PATH : <MASK>.PARALLEL TEST.PARALLEL TEST:<MASK>.DAILY INCENTIVES - MULTIPLE.INCENTIVEENTRY
          IS_REPLACED : false
       TIMEOUT_AMOUNT : <null>
         TIMEOUT_BASE : <null>
        TIMEOUT_STATE : <null>
            RERUN_SEQ : 0
          AUDIT_TRAIL :

ID USERNAME TIME TXID ACTION ORIGINID JOBID JOBNAME COMMENT INFO
-- -------- ---- ---- ------ -------- ----- ------- ------- ----

      CHILD_SUSPENDED : 0
          CNT_PENDING : 0
              CREATOR : INTERNAL
          CREATE_TIME : 05 Jun 2023 21:00:12 GMT
              CHANGER : <MASK>.PROWESSDEV
          CHANGE_TIME : 05 Jun 2023 21:00:13 GMT
                PRIVS : KDEMOVGR
             SE_PRIVS : KPDEMOSVGR
            SUBMITTAG : <null>
  UNRESOLVED_HANDLING : <null>
    DEFINED_RESOURCES :

ID RESOURCE_NAME RESOURCE_USAGE RESOURCE_OWNER RESOURCE_PRIVS RESOURCE_STATE RESOURCE_TIMESTAMP REQUESTABLE_AMOUNT TOTAL_AMOUNT FREE_AMOUNT ONLINE
-- ------------- -------------- -------------- -------------- -------------- ------------------ ------------------ ------------ ----------- ------


Job shown


Here is the same job with the command line changes and still has an exit code of none:
Job

                   ID : 4075496
              SE_NAME : <MASK>.DAILY INCENTIVES - MULTIPLE.INCENTIVEENTRY
             SE_OWNER : THATCHER
              SE_TYPE : JOB
       SE_RUN_PROGRAM : cmd.exe /C <MASK>IncentiveEntry.bat
     SE_RERUN_PROGRAM : <null>
      SE_KILL_PROGRAM : <null>
           SE_WORKDIR : <null>
           SE_LOGFILE : ${JOBID}.log
         SE_TRUNC_LOG : false
        SE_ERRLOGFILE : ${JOBID}.log
      SE_TRUNC_ERRLOG : false
  SE_EXPECTED_RUNTIME : 0
          SE_PRIORITY : 50
  SE_SUBMIT_SUSPENDED : false
SE_MASTER_SUBMITTABLE : true
   SE_DEPENDENCY_MODE : AND
          SE_ESP_NAME : STANDARD
          SE_ESM_NAME : <default>
          SE_ENV_NAME : THATCHER
           SE_FP_NAME : <null>
            MASTER_ID : 4075495
            CHILD_TAG : <null>
           SE_VERSION : 2448188716
                OWNER : THATCHER
            PARENT_ID : 4075495
             SCOPE_ID : <MASK>.PROWESSDEV
             HTTPHOST : <MASK>
             HTTPPORT : <MASK>
            IS_STATIC : true
           MERGE_MODE : NOMERGE
                STATE : BROKEN_FINISHED
          IS_DISABLED : false
         IS_CANCELLED : false
           JOB_ESD_ID : FAILURE
         JOB_ESD_PREF : 1
         JOB_IS_FINAL : false
   JOB_IS_RESTARTABLE : true
         FINAL_ESD_ID : FAILURE
            EXIT_CODE : <null>
          COMMANDLINE : cmd.exe "/C" "<MASK>IncentiveEntry.bat"
       RR_COMMANDLINE : <null>
              WORKDIR : <MASK>
              LOGFILE : 4075496.log
           ERRLOGFILE : 4075496.log
                  PID : 10020@N0+1686002713
              EXT_PID : 19152@N0+1686002713
            ERROR_MSG : <null>
              KILL_ID : <null>
       KILL_EXIT_CODE : <null>
         IS_SUSPENDED : NOSUSPEND
   IS_SUSPENDED_LOCAL : false
             PRIORITY : 50
         RAW_PRIORITY : 5000
            NICEVALUE : 0
         NP_NICEVALUE : 0
         MIN_PRIORITY : 10
         AGING_AMOUNT : 1
           AGING_BASE : MINUTES
     DYNAMIC_PRIORITY : 45
     PARENT_SUSPENDED : 0
            SUBMIT_TS : 05 Jun 2023 22:05:13 GMT
            RESUME_TS : <null>
              SYNC_TS : 05 Jun 2023 22:05:13 GMT
          RESOURCE_TS : 05 Jun 2023 22:05:13 GMT
          RUNNABLE_TS : 05 Jun 2023 22:05:13 GMT
             START_TS : 05 Jun 2023 22:05:13 GMT
            FINISH_TS : 05 Jun 2023 22:05:13 GMT
             FINAL_TS : <null>
        CNT_SUBMITTED : 0
  CNT_DEPENDENCY_WAIT : 0
 CNT_SYNCHRONIZE_WAIT : 0
    CNT_RESOURCE_WAIT : 0
         CNT_RUNNABLE : 0
         CNT_STARTING : 0
          CNT_STARTED : 0
          CNT_RUNNING : 0
          CNT_TO_KILL : 0
           CNT_KILLED : 0
        CNT_CANCELLED : 0
         CNT_FINISHED : 0
            CNT_FINAL : 0
    CNT_BROKEN_ACTIVE : 0
  CNT_BROKEN_FINISHED : 0
            CNT_ERROR : 0
      CNT_RESTARTABLE : 0
      CNT_UNREACHABLE : 0
             CNT_WARN : 0
           WARN_COUNT : 0
            IDLE_TIME : 312
 DEPENDENCY_WAIT_TIME : 0
         SUSPEND_TIME : 0
            SYNC_TIME : 0
        RESOURCE_TIME : 0
       JOBSERVER_TIME : 0
     RESTARTABLE_TIME : 312
      CHILD_WAIT_TIME : 0
         PROCESS_TIME : 312
          ACTIVE_TIME : 0
             IDLE_PCT : 100
             CHILDREN :

CHILDID CHILDPRIVS CHILDSENAME CHILDSETYPE CHILDSEPRIVS PARENTID PARENTPRIVS PARENTSENAME PARENTSETYPE PARENTSEPRIVS IS_STATIC PRIORITY SUSPEND MERGE_MODE EST_NAME IGNORED_DEPENDENCIES
------- ---------- ----------- ----------- ------------ -------- ----------- ------------ ------------ ------------- --------- -------- ------- ---------- -------- --------------------

              PARENTS :

CHILDID CHILDPRIVS CHILDSENAME                                                            CHILDSETYPE CHILDSEPRIVS PARENTID PARENTPRIVS PARENTSENAME                                            PARENTSETYPE PARENTSEPRIVS IS_STATIC PRIORITY SUSPEND      MERGE_MODE EST_NAME IGNORED_DEPENDENCIES
------- ---------- ---------------------------------------------------------------------- ----------- ------------ -------- ----------- ------------------------------------------------------- ------------ ------------- --------- -------- ------------ ---------- -------- --------------------
4075496 KDEMOVGR   <MASK>.DAILY INCENTIVES - MULTIPLE.INCENTIVEENTRY JOB         KPDEMOSVGR    4075495 KDEMOVGR    <MASK>.PARALLEL TEST.PARALLEL TEST BATCH        KPDEMOSVGR    true             0 CHILDSUSPEND NOMERGE    <null>

            PARAMETER :

ID      NAME     TYPE   VALUE
------- -------- ------ -----
4062204 RUN_MODE IMPORT DEV

        REQUIRED_JOBS :

ID DEPENDENT_ID DEPENDENT_PATH DEPENDENT_PRIVS DEPENDENT_ID_ORIG DEPENDENT_PATH_ORIG DEPENDENT_PRIVS_ORIG DEPENDENCY_OPERATION REQUIRED_ID REQUIRED_PATH REQUIRED_PRIVS STATE DD_ID DD_NAME DD_DEPENDENTNAME DD_DEPENDENTTYPE DD_DEPENDENTPRIVS DD_REQUIREDNAME DD_REQUIREDTYPE DD_REQUIREDPRIVS DD_UNRESOLVED_HANDLING DD_MODE DD_STATES JOB_STATE IS_SUSPENDED PARENT_SUSPENDED CNT_SUBMITTED CNT_DEPENDENCY_WAIT CNT_SYNCHRONIZE_WAIT CNT_RESOURCE_WAIT CNT_RUNNABLE CNT_STARTING CNT_STARTED CNT_RUNNING CNT_TO_KILL CNT_KILLED CNT_CANCELLED CNT_FINISHED CNT_FINAL CNT_BROKEN_ACTIVE CNT_BROKEN_FINISHED CNT_ERROR CNT_RESTARTABLE CNT_UNREACHABLE JOB_IS_FINAL CHILD_TAG FINAL_STATE CHILDREN IGNORE CHILD_SUSPENDED CNT_PENDING DD_CONDITION
-- ------------ -------------- --------------- ----------------- ------------------- -------------------- -------------------- ----------- ------------- -------------- ----- ----- ------- ---------------- ---------------- ----------------- --------------- --------------- ---------------- ---------------------- ------- --------- --------- ------------ ---------------- ------------- ------------------- -------------------- ----------------- ------------ ------------ ----------- ----------- ----------- ---------- ------------- ------------ --------- ----------------- ------------------- --------- --------------- --------------- ------------ --------- ----------- -------- ------ --------------- ----------- ------------

       DEPENDENT_JOBS :

ID DEPENDENT_ID DEPENDENT_PATH DEPENDENT_PRIVS DEPENDENT_ID_ORIG DEPENDENT_PATH_ORIG DEPENDENT_PRIVS_ORIG DEPENDENCY_OPERATION REQUIRED_ID REQUIRED_PATH REQUIRED_PRIVS STATE DD_ID DD_NAME DD_DEPENDENTNAME DD_DEPENDENTTYPE DD_DEPENDENTPRIVS DD_REQUIREDNAME DD_REQUIREDTYPE DD_REQUIREDPRIVS DD_UNRESOLVED_HANDLING DD_MODE DD_STATES JOB_STATE IS_SUSPENDED PARENT_SUSPENDED CNT_SUBMITTED CNT_DEPENDENCY_WAIT CNT_SYNCHRONIZE_WAIT CNT_RESOURCE_WAIT CNT_RUNNABLE CNT_STARTING CNT_STARTED CNT_RUNNING CNT_TO_KILL CNT_KILLED CNT_CANCELLED CNT_FINISHED CNT_FINAL CNT_BROKEN_ACTIVE CNT_BROKEN_FINISHED CNT_ERROR CNT_RESTARTABLE CNT_UNREACHABLE JOB_IS_FINAL CHILD_TAG FINAL_STATE CHILDREN IGNORE CHILD_SUSPENDED CNT_PENDING DD_CONDITION
-- ------------ -------------- --------------- ----------------- ------------------- -------------------- -------------------- ----------- ------------- -------------- ----- ----- ------- ---------------- ---------------- ----------------- --------------- --------------- ---------------- ---------------------- ------- --------- --------- ------------ ---------------- ------------- ------------------- -------------------- ----------------- ------------ ------------ ----------- ----------- ----------- ---------- ------------- ------------ --------- ----------------- ------------------- --------- --------------- --------------- ------------ --------- ----------- -------- ------ --------------- ----------- ------------

   REQUIRED_RESOURCES :

SCOPE_ID SCOPE_NAME SCOPE_TYPE SCOPE_PRIVS RESOURCE_ID RESOURCE_NAME RESOURCE_USAGE RESOURCE_OWNER RESOURCE_PRIVS RESOURCE_STATE RESOURCE_TIMESTAMP REQUESTABLE_AMOUNT TOTAL_AMOUNT FREE_AMOUNT REQUESTED_AMOUNT REQUESTED_LOCKMODE REQUESTED_STATES RESERVED_AMOUNT ALLOCATED_AMOUNT ALLOCATED_LOCKMODE IGNORE STICKY STICKY_NAME STICKY_PARENT STICKY_PARENT_TYPE ONLINE ALLOCATE_STATE EXPIRE EXPIRE_SIGN DEFINITION
-------- ---------- ---------- ----------- ----------- ------------- -------------- -------------- -------------- -------------- ------------------ ------------------ ------------ ----------- ---------------- ------------------ ---------------- --------------- ---------------- ------------------ ------ ------ ----------- ------------- ------------------ ------ -------------- ------ ----------- ----------

          SUBMIT_PATH : <MASK>.PARALLEL TEST.PARALLEL TEST:<MASK>.DAILY INCENTIVES - MULTIPLE.INCENTIVEENTRY
          IS_REPLACED : false
       TIMEOUT_AMOUNT : <null>
         TIMEOUT_BASE : <null>
        TIMEOUT_STATE : <null>
            RERUN_SEQ : 0
          AUDIT_TRAIL :

ID USERNAME TIME TXID ACTION ORIGINID JOBID JOBNAME COMMENT INFO
-- -------- ---- ---- ------ -------- ----- ------- ------- ----

      CHILD_SUSPENDED : 0
          CNT_PENDING : 0
              CREATOR : INTERNAL
          CREATE_TIME : 05 Jun 2023 22:05:13 GMT
              CHANGER : <MASK>.PROWESSDEV
          CHANGE_TIME : 05 Jun 2023 22:05:13 GMT
                PRIVS : KDEMOVGR
             SE_PRIVS : KPDEMOSVGR
            SUBMITTAG : <null>
  UNRESOLVED_HANDLING : <null>
    DEFINED_RESOURCES :

ID RESOURCE_NAME RESOURCE_USAGE RESOURCE_OWNER RESOURCE_PRIVS RESOURCE_STATE RESOURCE_TIMESTAMP REQUESTABLE_AMOUNT TOTAL_AMOUNT FREE_AMOUNT ONLINE
-- ------------- -------------- -------------- -------------- -------------- ------------------ ------------------ ------------ ----------- ------


Job shown

Ronald Jeninga

unread,
Jun 6, 2023, 2:41:28 AM6/6/23
to schedulix
Hi Dan,

OK, I've spotted the problem.
The Job finishes in a state BROKEN_FINISHED.

Let me explain that state and why it exists ion the first place first.
In a Unix/Linux environment PIDs are reused. Even if this takes some time, a PID is not a unique value over time and can't be used on itself to identify a process uniquely.
In normal operation this won't be a problem, but after a restart of the computer (e.g. after a power failure), a Jobserver might identify some process as a job if only the PID is used for identification.
Hence we also record the start time of a proces. Theoretically the combination of the start time and the PID could be forged as well, but that is so hard to do, we figured that an administrator that succeeds in doing this, should be rewarded with a slightly confused Jobserver.
(The effect on the system would be marginal anyway, so why not? ;).

If a Jobserver can't find the jobexecutor process (no process is found with both the correct PID and start time), but it finds the user process, the job is set to a state BROKEN_ACTIVE.
If a Jobserver finds neither the jobexecutor process nor the user process (and the task file still indicates they should be running), the job is set to a state BROKEN_FINISHED.

The problem is that there is no standard method to retrieve the exact start time of a process.
But ps and a similar tool in windows is capable of telling how long a process is running so far.
If we subtract that from the current time, we obtain the start time.
Unfortunately this method is somewhat unreliable. Especially if the clock of the system doesn't run smoothly errors will occur.
This is often the case on VMs without an ntp daemon running which keeps the clock in sync.
Because it is unreliable, we used to have a jitter value set to 5 seconds. If a calculated start time of a process doesn't differ more than the jitter value from the real start time, it is regarded to be equal.

Unfortunately this isn't always sufficient and on some platforms it still leads to BROKEN_FINISHED jobs.
Depending on the release of the Jobserver it is possible to circumvent the problem though.
If you are using a (actual) 2.9 or 2.10 release, you can set a Jobserver configuration value called STARTTIME_JITTER.
If you set this to zero, the start times are ignored and only the PID is used to identify processes.
The default is 5, but you can also set it to some higher value.

On modern systems the PID isn't a 16 bit value any more and it is probably safe to ignore the start time altogether (if there are issues).
I don't know if Windows uses a 32 bit or 16 bit value for its PIDs. The same would apply here.

And even on elder systems, ignoring the start time would be acceptable.
In the rare case of confusion, the job would be set to BROKEN_FINISHED anyway after the "rogue" process has terminated.

If you use an elder release than 2.9, the best idea would be to upgrade.

Best regards,

Ronald

Dan Power

unread,
Jun 6, 2023, 8:45:15 PM6/6/23
to schedulix
I found install instructions. But, in case I need it, I am looking for upgrade instructions. Specifically from 2.7 to 2.10.

Ronald Jeninga

unread,
Jun 7, 2023, 6:43:30 AM6/7/23
to schedulix
Hi Dan,

to upgrade the system, you'll have to upgrade 4 components:
* the scheduling server
* the jobservers
* the GUI
* the database schema

It is no problem to upgrade from release 2.7 to 2.10 in one step.
The only time you'll have to do something "extra" is while upgrading the database schema.
That is then done in several steps: 2.7 -> 2.8, 2.8 -> 2.9, 2.9 -> 2.10.

An upgrade can be performed during operation. Obviously you'll have to restart the system components, but that doesn't affect running jobs.
On the other hand, you'll do yourself a favour if you perform the upgrade at a relatively quiet time. If something goes wrong, not too many jobs will have to wait until the problems have been resolved.

In a Unix/Linux environment an upgrade is almost trivial.
If you've installed from rpms, you upgrade the roms and you are done.
If you've compiled the software yourself, you update your git sandbox and compile anew.
Then you create a tgz (or any other archive format) of the sandbox and unpack it on the taget system next to the current installation.
The current system is then shut down. and the symbic link pointing to the current installation is redirectd to the new one.
In the sql directory you'll find "generated-upgrade*.sql" files. They reside in the *_gen directories. Which ones you need depends on the database system you use.
The sql files need to be executed in order to upgrade the database schema.
If all went without issues (which it will if you don't make mistakes), you can start the schedulix system again.
If you are unsure, you can use "server-start -protected" first. This will start the scheduling server without the active internal threads (scheduling thread, timer thread) and nothing will happen.
But it'll give you time to inspect the system before normal operation is resumed.
After restarting the system, it should run as if nothing has happened.

To upgrade the GUI, you open the Zope management page, rename the SDMS folder and import the new SDMS.zexp.
That should work out of the box is your system is set up correctly.

To upgrade jobservers you proceed as if you'd upgrade the scheduling server. There's no need to upgrade the database once again of course.
After the startup the new jobservers should run. If jobs were active, the new jobservers will handle them without problems.

In a Windows environment you'll have to make sure that you have the exe files (scrolllog.exe, jobexecutor.exe, winps.exe). You can compile them yourself from the sources, or you can download them from the schedulix.org web site.
Since we don't use symbolic links in Windows, you'll have to redefine the %BICSUITEHOME% environment variable instead.
But the basic procedure is identical to the Unix/Linux approach.

It might be a good idea to practice the upgrade procedure on a test system first.
The procedure is pretty simple, but still there's room for mistakes. A bit of practice will definitely make you feel more confident.

The last option is to spend a little money.
In that case you invite me to a remote session and I'll guide you through the procedure.
It'll take maybe an hour or two (a bit depending on what your current installation looks like).
Hence the costs won't kill you (or your company). Drop me a private e-mail if you think this is the best option for you.

I hope this rough description helps you. But please ask if something is unclear or if you encounter difficulties somewhere.

Best regards,

Ronald

Dan Power

unread,
Jun 9, 2023, 3:19:02 PM6/9/23
to schedulix
When you said this part:

"If you've installed from rpms, you upgrade the roms and you are done."
Does that also mean the database is done too, or does that still need upgraded?

Dan Power

unread,
Jun 9, 2023, 6:07:13 PM6/9/23
to schedulix
Like, will it run the upgrade sql you were talking about? Do I still need them to run manually?

Dan Power

unread,
Jun 9, 2023, 6:19:49 PM6/9/23
to schedulix
I think it is some sort of my sql as there is a my sql folder.

Ronald Jeninga

unread,
Jun 10, 2023, 6:02:03 AM6/10/23
to schedulix
Hi Dan,

yes, the rpms will upgrade the schema. Actually I think it would be pretty silly if the rpms would upgrade the software and leave you with a broken sytstem.
Obviously you run either MySQL or Mariadb as database system, which is why you have the mysql directory (and a mysql_gen).
(and if you weren't aware of that, the database system was automatically installed by installing the schedulix-server-mysql rpm).

Anyway, the rpms are designed to make things as easy as possible for the user.
The only thing left then is to upgrade your Windows Jobserver installation.
The jobexecutor.exe and scrolllog.exe can be reused, but you'll need an executable called winps.exe as well.
(It replaces a call to WMIC which caused performance issues).

I'll upload that to our web server, but please give me an hour.
You can then download that from


After copying that exe to the %BICSUITEHOME%\bin directory, you stop the jobserver(s).
Then you copy the BICsuite.jar file from your (upgraded) Linux to the %BICSUITEHOME%\lib directory and start the jobserver(s) again.
That should complete the upgrade.

Best regards,

Ronald

Dan Power

unread,
Jul 6, 2023, 2:34:39 PM7/6/23
to schedulix
thank you all for your help. Upgrade fixed the problem.
Reply all
Reply to author
Forward
0 new messages