job marked as runnable but never runs

Khaled Mobarek

unread,

Jul 27, 2016, 10:03:23 PM7/27/16

to schedulix

Hello all

I have a job that is part of a batch and the previous jobs that it was dependant on have run successfully but the job in question never runs and it says its state is runnable. I executed it on its own and still no good. I even removed the dependencies and executed it and still the same. What makes a job runnable but doesn't actually run

Dieter Stubler

unread,

Jul 28, 2016, 12:08:12 AM7/28/16

to sche...@googlegroups.com

Hi,

Typical reason for a job staying in runnable state is that the jobserver which should execute it is not running or not responsive for any reason.

Start the jobserver or have look in its lockfile.

Regards
Dieter

alexand...@gmail.com

unread,

Feb 8, 2019, 5:03:30 AM2/8/19

to schedulix

Hi,

I'm having the same issue . I am trying to run a bash script and it just stays on a runnable state. Also, I tried running the SINGLEJOB job from examples. It ran okay, twice, with exit state succes and status final. But when I tried running it again, it just stayed on runnable state.

The job server seems to be working. Where do I suppose to look for this lockfile?

Thanks in advance,

Xander

Ronald Jeninga

unread,

Feb 8, 2019, 5:15:41 AM2/8/19

to schedulix

Hi Xander,

you probably mean Log File; we don't use Lock Files anywhere.

The log files for the example jobservers are located in /opt/schedulix/log.

The log file with the highest number is the latest one. (The file without a number is a named pipe; leave it where it is).

On the jobserver's properties tab you might find an error message. It is usually the first place I look before taking the trouble of logging in into the remote system.

In 99.99% of all cases there's some problem with a jobserver if a job remains in a RUNNABLE state.

Of course there will be a time lag between reaching that state and the jobserver fetching it, but that time lag should be at most several seconds, a bit depending on the jobserver's configuration.

Best regards,

Ronald

alexand...@gmail.com

unread,

Feb 8, 2019, 6:12:54 AM2/8/19

to schedulix

Hi Ronald,

I just actually copied what Dieter said. ;'D

I checked the properties tab of the job on runnable state, and this is what I have:

SYSTEM.EXAMPLES.E0010_SINGLEJOB.SINGLEJOB
Id	28008
Type	JOB
Submit Path	SYSTEM.EXAMPLES.E0010_SINGLEJOB.SINGLEJOB
Disabled	false
Tag	NONE
Owner	PUBLIC
Submitted By	PUBLIC
Unresolved Handling	NONE
Expected Runtime [Sec]	0
Nice Value	[-100,100] lower speeds up, higher slows down
Profile Nice	0
Suspended	NOSUSPEND
Cancelled	false
Suspended By Parent	0
State	RUNNABLE
Error Message	NONE
Exit State	NONE
Exit State Profile	STANDARD
Master Id	28008
Submitting Parent Id	NONE
Static	true
Merge Mode	FAILURE
Version	172089

I've attached the log file and it doesn't seem to state any error somewhere either.

I will also look into the documentations again later too, to hopefully get a better understanding of the system and find my ways around it.

Thanks,

-Xander

BICserver.out.25

Ronald Jeninga

unread,

Feb 8, 2019, 6:30:44 AM2/8/19

to schedulix

Hi Xander,

on the "Run" tab of the job there is a field that shows the environment it requires.

In your situation that uniquely maps to a specific jobserver; most likely it is SERVER@LOCALHOST that maps to jobserver GLOBAL.EXAMPLES.LOCALHOST.SERVER.

What you have to check is if this jobserver is indeed running.

Within "Jobserver and Resources" from the main menu, you'll find this server in the explorer part on the left side of the screen (you might have to do an "Expand All").

If the server is red, it is not connected.

After clicking on the server, you might see an error message on the properties tab.

The log file of interest is /opt/schedulix/localhost.out.<some number>.

Your server log looks healthy. But maybe it is an idea to increase the trace level from 1 to 2.

You can do this temporarily by invoking sdmsh and issuing the command

alter server with trace level = 2;

(Don't forget the semicolon).

Or you can increase the trace level in the server.conf file and restart the server.

Best regards,

Ronald

alexand...@gmail.com

unread,

Feb 10, 2019, 10:23:07 PM2/10/19

to schedulix

Hi Ronald,

I did as advised. All my servers from the examples were all red. I remember that I tried tweaking the hosts' config files' "RepoHost" value to the actual ip of my server. I switched it back to simply "localhost". The job servers are all connected now. But prior to doing that, I increased the trace level to 2 as well. Although I'm a bit uncertain what that one was for. Could you elaborate on that a bit? I just tried submitting a job and it's working okay now. We will finally start testing it out and evaluate later on to which operational works in production could we use it.

Thanks a lot as always.

Xander

Ronald Jeninga

unread,

Feb 11, 2019, 3:37:53 AM2/11/19

to schedulix

Hi Xander,

well, the way I read your post everything seems to be in perfect order now.

What is remaining is your question regarding the trace level.

It isn't crucial at all, but in case of problems there is a lot more information available.

Most noticeably you can see the statements executed by the server.

In many cases (in my experience) those show that in fact the "error" is in front of the computer (me myself ;).

In other cases it is simply a valuable test case.

While writing the statements to the log file passwords are starred out.

But of course user names are still visible. In some environments this could be interpreted as a security issue.

I am convinced that schedulix is powerful enough to fulfil all your requirements.

So please ask if you run into "unsolvable" problems.

Best regards,

Ronald

alexand...@gmail.com

unread,

Feb 11, 2019, 4:30:30 AM2/11/19

to schedulix

Hi Ronald,

Sure, everything went fine with creating and scheduling jobs after that last one.

Will definitely post concerns here if I ever get stuck again.

Thanks as always.

Xander

Reply all

Reply to author

Forward