State: Idle EState: Deferred
Creds: user:vighnesh group:vighnesh class:default qos:DEFAULT
WallTime: 00:00:00 of 99:23:59:59
SubmitTime: Sat Oct 31 15:24:36
(Time Queued Total: 00:01:04 Eligible: 00:00:01)
StartDate: -00:01:02 Sat Oct 31 15:24:38
Total Tasks: 8
Req[0] TaskCount: 8 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [ALL]
job is deferred. Reason: RMFailure (cannot start job - RM failure, rc:
15041, msg: 'Execution server rejected request MSG=cannot send job to mom,
state=PRERUN')
Holds: Defer (hold reason: RMFailure)
PE: 8.00 StartPriority: 1
cannot select job 3 for partition DEFAULT (job hold active)
-------------------------------------------------------------------
some kind of RM failure.
Please can anyone help me solve this problem.
Thankyou,
Regards,
Vighnesh
# pbsnodes -a | grep "state ="
# ps aux | grep maui
# ps aux | grep pbs
Bart
This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
# ps aux | grep maui
maui 5133 0.0 0.3 51484 31524 ? Ss Nov02 0:00
/opt/maui/sbin/maui
root 27040 0.0 0.0 61144 668 pts/1 S+ 12:36 0:00 grep maui
# ps aux | grep pbs
root 22086 0.0 0.0 10416 1344 ? Ss Nov02 0:00
/opt/torque/sbin/pbs_server
root 27042 0.0 0.0 61144 672 pts/1 S+ 12:36 0:00 grep pbs
---------------------------------------------------------
Regards,
Vighnesh
I'm curious though why only _one_ node responded to the "pbsnodes -a |
grep 'state ='" command. You said you had three nodes, but only one is
listed as free? Can you post the full output of "pbsnodes -a".
Also, do you get any warnings of interest from "mdiag -S -v" or "mdiag
-j JOBID" (where JOBID is the job id of your interactive job you just
submitted).
You might also check the pbs_mom logs on the nodes, just after you
submit the interactive job and it goes into the RMFailure state. Look
in /opt/torque/mom_logs/ on the compute nodes for the latest file, and
look at the end of it.
Bart
This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.