Hi Suzm,
the big mistake is to let politicians agree on time zone names:
CST = Central Standard Time = UTC-6
CST = China Standard Time = UTC+8
CST = Cuba Standard Time = UTC-5
Obviously the engine mixes up CST with CST, making a difference of UTC-6 to UTC+8 = 14 hours.
(I can't really blame the engine for this, but we'll definitely have to repair the issue).
Our first step now is to investigate where the data is correct and where the error occurs.
I hope you don't mind helping us.
Let's investigate the jobserver first.
If you have a standard schedulix installation, you'll have it installed in /opt/schedulix.
Below this directory there's a directory called "taskfiles".(at least on your clients).
Now create a job with some harmless run program that gives you some time to hack. e.g. "sleep 60".
We'll do two things:
a. We look at the times reported by the jobexecutor
b. We look at the times reported by the jobserver to the server
To achieve the second (b), please first set the trace level to 2.
echo "alter server with trace level = 2;" | sdmsh
Now you submit that test job and switch to that taskfiles directory.
As soon as the job gets a state RUNNING there'll be a file in the taskfiles directory that corresponds to the job.
-bash-4.2$ ls -l
total 48
-rw-rw-r--. 1 schedulix schedulix 16384 Sep 19 13:00 localhost-GLOBAL.'EXAMPLES'.'LOCALHOST'.'SERVER'-660148
-rw-rw-r--. 1 schedulix schedulix 6318 Sep 19 13:00 starttimes.GLOBAL.'EXAMPLES'.'HOST_1'.'SERVER'
-rw-rw-r--. 1 schedulix schedulix 6318 Sep 19 13:00 starttimes.GLOBAL.'EXAMPLES'.'HOST_2'.'SERVER'
-rw-rw-r--. 1 schedulix schedulix 5148 Feb 14 2017 starttimes.GLOBAL.'EXAMPLES'.'IRGENDWAS'.'SERVER'
-rw-rw-r--. 1 schedulix schedulix 6300 Sep 19 13:00 starttimes.GLOBAL.'EXAMPLES'.'LOCALHOST'.'SERVER'
in my case it is the first file in the list.
To save the file, we make a hard link to it:
-bash-4.2$ ln localhost-GLOBAL.\'EXAMPLES\'.\'LOCALHOST\'.\'SERVER\'-660148 xxx
After the job finished, we'll have a copy of the taskfile in the file named xxx
We can look at the content:
-bash-4.2$ cat xxx
[19-09-2017 13:00:45 CEST] incomplete
[19-09-2017 13:00:45 CEST] id=660148
[19-09-2017 13:00:45 CEST] run=0
[19-09-2017 13:00:45 CEST] status=STARTED
[19-09-2017 13:00:45 CEST] command=SDMSpopup.sh
[19-09-2017 13:00:45 CEST] argument=SYSTEM.EXAMPLES.E0010_SINGLEJOB.SINGLEJOB
[19-09-2017 13:00:45 CEST] argument=-c
[19-09-2017 13:00:45 CEST] argument=?:1=FAILURE:0=SUCCESS
[19-09-2017 13:00:45 CEST] workdir=/opt/schedulix/tmp
[19-09-2017 13:00:45 CEST] usepath
[19-09-2017 13:00:45 CEST] verboselogs
[19-09-2017 13:00:45 CEST] logfile=660148.log
[19-09-2017 13:00:45 CEST] logfile_append
[19-09-2017 13:00:45 CEST] errlog=660148.log
[19-09-2017 13:00:45 CEST] errlog_append
[19-09-2017 13:00:45 CEST] samelogs
[19-09-2017 13:00:45 CEST] complete
[19-09-2017 13:00:45 CEST] status_tx=STARTED
[19-09-2017 11:00:45 GMT] execpid=15302@N0+1505818845
[19-09-2017 11:00:45 GMT] extpid=15303@N0+1505818845
[19-09-2017 11:00:45 GMT] status=RUNNING
[19-09-2017 13:00:46 CEST] status_tx=RUNNING
[19-09-2017 11:03:01 GMT] returncode=0
[19-09-2017 11:03:01 GMT] status=FINISHED
[19-09-2017 13:03:01 CEST] status_tx=FINISHED
You'll see two different messages. Those with the local timestamp (CST for you, CEST for me) which are written by the Jobserver (Java) process, and those with a GMT timestamp written by the jobexecutor process.
If those times look OK, this part of the system isn't responsible for the confusion. If not, we've found the culprit.
Assuming the above was OK, we now look at the server's log file.
You can simply open it and search for the JobId (660148 in my example).
You'll find messages like
MESSAGE [1037,1001(1001)] 19 Sep 2017 11:00:45 GMT alter job 660148 with status = started, run = 0, timestamp = '19-09-2017 13:00:45 CEST';
...
MESSAGE [1037,1001(1001)] 19 Sep 2017 11:00:46 GMT alter job 660148 with status = running, run = 0, exec_pid = '15302@N0+1505818845', ext_pid = '15303@N0+1505818845', timestamp = '19-09-2017 11:00:45 GMT';
...
MESSAGE [1037,1001(1001)] 19 Sep 2017 11:03:01 GMT alter job 660148 with status = finished, run = 0, exit_code = 0, timestamp = '19-09-2017 11:03:01 GMT';
The timestamps in these commands should match the timestamps of the taskfile.
And if everything is OK so far, we'll have to have a look at the server.
But let's do this first, else things get confusing :-)
Regards,
Ronald