job marked as runnable but never runs

126 views
Skip to first unread message

Khaled Mobarek

unread,
Jul 27, 2016, 10:03:23 PM7/27/16
to schedulix
Hello all

I have a job that is part of a batch and the previous jobs that it was dependant on have run successfully but the job in question never runs and it says its state is runnable.  I executed it on its own and still no good.  I even removed the dependencies and executed it and still the same.  What makes a job runnable but doesn't actually run


Dieter Stubler

unread,
Jul 28, 2016, 12:08:12 AM7/28/16
to sche...@googlegroups.com
Hi,

Typical reason for a job staying in runnable state is that the jobserver which should execute it is not running or not responsive for any reason.

Start the jobserver or have look in its lockfile.

Regards
Dieter

alexand...@gmail.com

unread,
Feb 8, 2019, 5:03:30 AM2/8/19
to schedulix
Hi,

I'm having the same issue . I am trying to run a bash script and it just stays on a runnable state. Also, I tried running the SINGLEJOB job from examples. It ran okay, twice, with exit state succes and status final. But when I tried running it again, it just stayed on runnable state.

The job server seems to be working. Where do I suppose to look for this lockfile?

Thanks in advance,
Xander

Ronald Jeninga

unread,
Feb 8, 2019, 5:15:41 AM2/8/19
to schedulix
Hi Xander,

you probably mean Log File; we don't use Lock Files anywhere.
The log files for the example jobservers are located in /opt/schedulix/log.
The log file with the highest number is the latest one. (The file without a number is a named pipe; leave it where it is).

On the jobserver's properties tab you might find an error message. It is usually the first place I look before taking the trouble of logging in into the remote system.

In 99.99% of all cases there's some problem with a jobserver if a job remains in a RUNNABLE state.
Of course there will be a time lag between reaching that state and the jobserver fetching it, but that time lag should be at most several seconds, a bit depending on the jobserver's configuration.

Best regards,

Ronald

alexand...@gmail.com

unread,
Feb 8, 2019, 6:12:54 AM2/8/19
to schedulix
Hi Ronald,

I just actually copied what Dieter said. ;'D 

I checked the properties tab of the job on runnable state, and this is what I have:

SYSTEM.EXAMPLES.E0010_SINGLEJOB.SINGLEJOB 
Id28008
TypeJOB 
Submit PathSYSTEM.EXAMPLES.E0010_SINGLEJOB.SINGLEJOB 
Disabledfalse 
TagNONE 
OwnerPUBLIC 
Submitted ByPUBLIC 
Unresolved HandlingNONE 
Expected Runtime [Sec]
Nice Value [-100,100] lower speeds up, higher slows down
Profile Nice
SuspendedNOSUSPEND 
Cancelledfalse 
Suspended By Parent
StateRUNNABLE 
Error MessageNONE 
Exit StateNONE 
Exit State ProfileSTANDARD 
Master Id28008 
Submitting Parent IdNONE 
Statictrue 
Merge ModeFAILURE 
Version172089 

I've attached the log file and it doesn't seem to state any error somewhere either.

I will also look into the documentations again later too, to hopefully get a better understanding of the system and find my ways around it.

Thanks,

-Xander
BICserver.out.25

Ronald Jeninga

unread,
Feb 8, 2019, 6:30:44 AM2/8/19
to schedulix
Hi Xander,

on the "Run" tab of the job there is a field that shows the environment it requires.
In your situation that uniquely maps to a specific jobserver; most likely it is SERVER@LOCALHOST that maps to jobserver GLOBAL.EXAMPLES.LOCALHOST.SERVER.

What you have to check is if this jobserver is indeed running.
Within "Jobserver and Resources" from the main menu, you'll find this server in the explorer part on the left side of the screen (you might have to do an "Expand All").
If the server is red, it is not connected.

After clicking on the server, you might see an error message on the properties tab.
The log file of interest is /opt/schedulix/localhost.out.<some number>.

Your server log looks healthy. But maybe it is an idea to increase the trace level from 1 to 2.
You can do this temporarily by invoking sdmsh and issuing the command

alter server with trace level = 2;

(Don't forget the semicolon).
Or you can increase the trace level in the server.conf file and restart the server.

Best regards,

Ronald

alexand...@gmail.com

unread,
Feb 10, 2019, 10:23:07 PM2/10/19
to schedulix
Hi Ronald,

I did as advised. All my servers from the examples were all red. I remember that I tried tweaking the hosts' config files' "RepoHost" value to the actual ip of my server. I switched it back to simply "localhost". The job servers are all connected now. But prior to doing that, I increased the trace level to 2 as well. Although I'm a bit uncertain what that one was for. Could you elaborate on that a bit? I just tried submitting a job and it's working okay now. We will finally start testing it out and evaluate later on to which operational works in production could we use it.

Thanks a lot as always.

Xander

Ronald Jeninga

unread,
Feb 11, 2019, 3:37:53 AM2/11/19
to schedulix
Hi Xander,

well, the way I read your post everything seems to be in perfect order now.
What is remaining is your question regarding the trace level.

It isn't crucial at all, but in case of problems there is a lot more information available.
Most noticeably you can see the statements executed by the server.
In many cases (in my experience) those show that in fact the "error" is in front of the computer (me myself ;). 
In other cases it is simply a valuable test case.

While writing the statements to the log file passwords are starred out.
But of course user names are still visible. In some environments this could be interpreted as a security issue.

I am convinced that schedulix is powerful enough to fulfil all your requirements.
So please ask if you run into "unsolvable" problems.

Best regards,

Ronald
 

alexand...@gmail.com

unread,
Feb 11, 2019, 4:30:30 AM2/11/19
to schedulix
Hi Ronald,

Sure, everything went fine with creating and scheduling jobs after that last one.
Will definitely post concerns here if I ever get stuck again.


Thanks as always.

Xander
Reply all
Reply to author
Forward
0 new messages