# qstat -explain a
queuename qtype used/tot. load_avg arch states
----------------------------------------------------------------------------
al...@octopus.nci.nih.gov BIP 0/8 0.00 lx24-amd64
----------------------------------------------------------------------------
al...@pressa.nci.nih.gov BIP 0/4 -NA- lx24-amd64 au
error: no value for "np_load_avg" because execd is in unknown state
----------------------------------------------------------------------------
al...@shakespeare.nci.nih.gov BIP 0/8 4.29 lx24-amd64
I have tried stopping and restarting the execd several times to no
avail. Upon looking into the qmaster/messages file, I see it full of
messages like:
04/14/2008 14:47:51|qmaster|shakespeare|C|denied: request for user
"Administrator" does not match credentials for connection
<pressa.nci.nih.gov,execd,1>
I'm assuming that these are associated with requests like the qstat
request above. Any suggestions why this is occurring? The cluster is
not used much, so re-installing the execution host is not
out-of-the-question if it will solve the problem.
Thanks,
Sean
Are there any logs for the execd itself? Check the
$SGE_ROOT/$SGE_CELL/spool/<exechost>/messages file.
Are you sure the execd process is actually running after the restarts?
--
Jesse Becker
GPG Fingerprint -- BD00 7AA4 4483 AFCC 82D0 2720 0083 0931 9A2B 06A2
Thanks, Jesse, for the quick answer.
No, we have not gone the Windows route for anything. However, your
hypothesis of credential mismatch may be true. We have a couple of
"desktop" linux machines in the cluster and the problematic machine is
one of them. For those machines, we do have an "Administrator"
account, which is a local, failsafe login. All other login info is
coming from LDAP. The installation was done several months ago, but
we did not use the cluster; now I am getting back to it, so it is
quite possible that something has changed on the execution machine.
In any case, any suggestions on what to do next?
> Are there any logs for the execd itself? Check the
> $SGE_ROOT/$SGE_CELL/spool/<exechost>/messages file.
There are no recent messages in here.
> Are you sure the execd process is actually running after the restarts?
Yes, it is running according to "ps".