After a power outage, I had to reinstall most of my compute nodes. Everything looks
OK except Torque, which always put jobs into "Q" state no matter what I restart maui,
pbs_server on the frontend, and/or pbs_mom on the nodes, it simply doesn't help.
Then, I run checkjob and diagnose, and got the results below. I noticed that it told me
the problem is NoResources. But, in fact, all my nodes are available and I'm the only
user. Running "pbsnodes -a | grep 'state ='" told me that the nodes are all in "free"
state.
Any idea?
zhou huiqun
@earth sciences, nanjing university, china
============================================================
# checkjob 639
checking job 639
State: Idle EState: Deferred
Creds: user:zhou_huiqun group:quantum class:default qos:DEFAULT
WallTime: 00:00:00 of 99:23:59:59
SubmitTime: Thu Dec 16 17:47:49
(Time Queued Total: 00:02:36 Eligible: 00:00:00)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
job is deferred. Reason: NoResources (exceeds available partition procs)
Holds: Defer (hold reason: NoResources)
PE: 1.00 StartPriority: 2
cannot select job 639 for partition DEFAULT (job hold active)
=============================================================
# diagnose -j 639
Name State Par Proc QOS WCLimit R Min User Group Account QueuedTime Network Opsys Arch Mem Disk Procs Class Features
639 Idle ALL 1 DEF 99:23:59:59 0 1 hqzhou quantum - 00:36:44 [NONE] [NONE] [NONE] >=0 >=0 NC0 [default:1] [NONE]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101216/1873fa51/attachment.html
qselect -s Q | cut -f 1 -d . | xargs -i releasehold {}
r.
--
The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, Team Leader, High Performance Computing
Direct call: +47 77 64 62 56. email: roy.dr...@uit.no
Thanks for you quick response.
I have tried to releasehold the jobs, but no change even though
the message said "job holds adjusted". The jobs already in the
queue and newly submitted are still in "Q" state,
Huiqun
----- 原始邮件 -----
发件人: Roy Dragseth <roy.dr...@uit.no>
收件人: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
已发送邮件: Thu, 16 Dec 2010 20:07:45 +0800 (CST)
主题: Re: [Rocks-Discuss] Jobs stalled in queue (Torque)
Thank you very much for your quick response.
I have tried to run releasehold, but there are no changes. The jobs already
in the queue and newly submitted are still in "Q" state.
I'm sorry to the list if you received my two mails with simmilar content
because
I can't determine whether or not my posting is reached the list as neither
can I receive my own mail via the list, nor can I find my posts in the
archive
of the mailing list.
huiqun
r.
--
The Computer Center, University of Troms�, N-9037 TROMS� Norway.