Job cannot run in any scope because of resource shortage

440 views
Skip to first unread message

robert....@rcom-i.co.uk

unread,
Jan 6, 2014, 10:49:04 AM1/6/14
to sche...@googlegroups.com
Good Afternoon,
  I get this resource shortage error when I run a job on a server as the user root.

  I know there is another post here about the same error but I think in my case the source of the problem may be different.

  Let me try to explain my problem as well as I can:

  I have two Jobservers: ROOT@BACKUP and BACKUP@BACKUP.
  I created two because I wanted some jobs to run as the root user and others to run as the backup user on this particular Linux server.

  I have defined two Environments:  ROOT@BACKUP and BACKUP@BACKUP
  ROOT@BACKUP has the following Named Resources defined: RESOURCE.BACKUP.STATIC.NODE.ROOT & RESOURCE.BACKUP.STATIC.USER.SERVER
  BACKUP@BACKUP has the following Named Resources defined:  RESOURCE.BACKUP.STATIC.NODE.BACKUP & RESOURCE.BACKUP.STATIC.USER.SERVER
  Both Environments have Grants PUBLIC with View and Use selected.

  I can run a job on Jobserver BACKUP by setting the job Environment to BACKUP@BACKUP

  However, when I run a job on Jobserver ROOT by setting the job Environment to ROOT@BACKUP I get the resource shortage error message.

  When the job fails I open Monitor Batches and Jobs window and select the Resources(Req) tab.  Other than the job name and the column headings the page is empty.  The same is true when I select the Resource(Def) tab.

  What is the cause of this issue?

  One other thing that may or may not be relevant is this:

  When I open Named Resources and select BACKUP > STATIC > NODE and then open the Content tab I see BACKUP and ROOT listed.  When I click on either of those I see this type of error:
  COMMAND: show named resource RESOURCE.'BACKUP'.'STATIC'.'NODE'.'BACKUP'
  ERRORCODE: ZSI-10001
  ERRORMESSAGE: ConnectError(ConnectError)

  I get no other such errors when doing anything else in the GUI.

  Is this related?

  Regards

Rob



Ronald Jeninga

unread,
Jan 6, 2014, 12:34:05 PM1/6/14
to sche...@googlegroups.com
Hi Robert,

first of all: Happy New Year!

Apart from your problem, I don't think it's a good idea to have an environment with public use pointing to a jobserver running as root (ordinary users don't have to know about it).
But it is possible that you run into some security mechanism after all. We're not giving away execute privileges for free ;)

Let me try to rephrase your situation:
There are three static Named Resources defined:
- RESOURCE.BACKUP.STATIC.NODE.ROOT
- RESOURCE.BACKUP.STATIC.USER.SERVER
- RESOURCE.BACKUP.STATIC.NODE.BACKUP

Their purpose is obvious.

You also created some instances of them (if not, the cause of the problem is obvious):
- RESOURCE.BACKUP.STATIC.NODE.ROOT in GLOBAL.SOMETHING.ROOT (your root jobserver)
- RESOURCE.BACKUP.STATIC.USER.SERVER in GLOBAL.SOMETHING.ROOT
and another two for the BACKUP server

Now I have some questions regarding ownership:
What is the group of your ROOT jobserver?
What is the group of both resources?
What is the submitting group of your job?

If all the groups are the same, the resources should be visible by the job. If not, well, they're not visible and therefore not allocatable, which is then a possible reason for the "cannot run in any scope".
If the group of the jobserver doesn't match the submitting group of the job, the job can't see the entire jobserver.
This is all part of the security concept. We definitely don't want people to have uncontrolled access to privileged jobservers.
The privileged ADMIN group can do everything though. This means that a job should run, if the submitting group is ADMIN.

The Monitor Batches and Jobs/Resources(Req) Tab shows the requested and allocated resources. Since your job failed, the are no requests or allocations present.
The Resource(Def) Tab shouldn't be there in the first place, because it'll always be empty for a schedulix system. It doesn't harm though, apart from the fact that it might confuse people.
(In the BICsuite PROFESSIONAL edition it is possible to make instances of Named Resources within jobs. Those are only visible for the job itself and the job hierarchy below, and exist only during the lifetime of the job. This is a very nice feature which eliminates the need of defining a lot of resources on GLOBAL level).

The connect error sounds like a bug. I even think I already fixed it since it sounds familiar. But I'm not 100% sure. If you can have a look at the server logfile, you'll probably find restarts of the server after some error.
If you can post the error I'll check if I already fixed the bug or fix it.

Regards,

Ronald

PS. I'm a bit busy the next few days, but I'll definitely check for messages in this group every now and then. It might take some time until I respond though.

robert....@rcom-i.co.uk

unread,
Jan 7, 2014, 8:51:45 AM1/7/14
to sche...@googlegroups.com
Hi Ronald,
  Oh yes, Happy New Year to you too.  Please pardon my manners.

  Great support as usual.


  You also created some instances of them (if not, the cause of the problem is obvious):
- RESOURCE.BACKUP.STATIC.NODE.
ROOT in GLOBAL.SOMETHING.ROOT (your root jobserver)
- RESOURCE.BACKUP.STATIC.USER.SERVER in GLOBAL.SOMETHING.ROOT


  You got it there.  When I created the setup_root_jobserver.sh I used setup_backup_jobserver.sh as a base and missed a single substitution for creating the RESOURCE.'BACKUP'.'STATIC'.'NODE'.'ROOT' in GLOBAL.'BACKUP'.'ROOT'.  I had defined RESOURCE.'BACKUP'.'STATIC'.'NODE'.'BACKUP' in GLOBAL.'BACKUP'.'ROOT'.
  Switching it around fixed the problem.

With regard to the connect error, I'll take a look at the logs soon.

Regards

Rob
Reply all
Reply to author
Forward
0 new messages