[Rocks-Discuss] How to remove blocked jobs in torque?

87 views
Skip to first unread message

Cláudio Forain

unread,
Sep 20, 2010, 8:04:08 AM9/20/10
to Discussion of Rocks Clusters
I unsuccessfully tried to run jobs before running 'rocks sync config'
, which rendered me some blocked jobs:


[root@lpge-cluster ~]# showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME


0 Active Jobs 0 of 40 Processors Active (0.00%)
0 of 5 Nodes Active (0.00%)

IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME


0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME

1 cforain Idle 1 00:10:00 Mon Sep 20 07:58:27
2 cforain Idle 1 00:10:00 Mon Sep 20 08:18:37
3 cforain Idle 1 00:10:00 Mon Sep 20 08:18:39
4 cforain Idle 1 00:10:00 Mon Sep 20 08:18:40
5 cforain Idle 1 00:10:00 Mon Sep 20 08:18:41
6 cforain Idle 1 00:10:00 Mon Sep 20 08:18:42
7 cforain Idle 1 00:10:00 Mon Sep 20 08:18:43
8 cforain Idle 1 00:10:00 Mon Sep 20 08:18:44
9 cforain Idle 1 00:10:00 Mon Sep 20 08:18:45
10 cforain Idle 1 00:10:00 Mon Sep 20 08:18:46

Total Jobs: 10 Active Jobs: 0 Idle Jobs: 0 Blocked Jobs: 10


How do I delete those jobs from the queue? thanks in advance.

Cláudio Forain

unread,
Sep 20, 2010, 8:16:18 AM9/20/10
to Discussion of Rocks Clusters
Actually, I just noticed that whenever I submit a job with qsub,
torque blocks me. What am I doing wrong?

2010/9/20 Cláudio Forain <claudi...@gmail.com>:

Cláudio Forain

unread,
Sep 20, 2010, 9:12:21 AM9/20/10
to Discussion of Rocks Clusters
I researched and discovered the qdel command. Thanks.

2010/9/20 Cláudio Forain <claudi...@gmail.com>:

Cláudio Forain

unread,
Sep 20, 2010, 9:25:12 AM9/20/10
to Discussion of Rocks Clusters
Althogh now I have the following problem. qsub always put any job in
the block queue:


[cforain@lpge-cluster ~]$ qsub teste.sh
27.lpge-cluster.ufrj.br
[cforain@lpge-cluster ~]$ showq


ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME


0 Active Jobs 0 of 40 Processors Active (0.00%)
0 of 5 Nodes Active (0.00%)

IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME


0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME

27 cforain Idle 1 00:10:00 Mon Sep 20 10:13:48

Total Jobs: 1 Active Jobs: 0 Idle Jobs: 0 Blocked Jobs: 1

If I list my nodes, it gives me:


compute-0-0
state = down
np = 1
ntype = cluster

compute-0-1
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux compute-0-1.local
2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009
x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=265494,totmem=7120428kb,availmem=6986568kb,physmem=6100312kb,ncpus=8,loadave=0.00,netload=504661144,state=free,jobs=,varattr=,rectime=1284988584

compute-0-2
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux compute-0-2.local
2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009
x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=263121,totmem=7118816kb,availmem=6996500kb,physmem=6098700kb,ncpus=8,loadave=0.00,netload=487294788,state=free,jobs=,varattr=,rectime=1284988592

compute-0-3
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux compute-0-3.local
2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009
x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=266371,totmem=7118848kb,availmem=7000188kb,physmem=6098732kb,ncpus=8,loadave=0.00,netload=486211808,state=free,jobs=,varattr=,rectime=1284988593

compute-0-4
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux compute-0-4.local
2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009
x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=265942,totmem=7118848kb,availmem=6998184kb,physmem=6098732kb,ncpus=8,loadave=0.00,netload=486129000,state=free,jobs=,varattr=,rectime=1284988593

compute-0-5
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux compute-0-5.local
2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009
x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=265960,totmem=7118852kb,availmem=6997224kb,physmem=6098736kb,ncpus=8,loadave=0.02,netload=485707033,state=free,jobs=,varattr=,rectime=1284988593


Which is right. I see the trace of the job I just ran:


[cforain@lpge-cluster ~]$ tracejob 27
/opt/torque/server_priv/accounting/20100920: Permission denied
/opt/torque/mom_logs/20100920: No such file or directory
/opt/torque/sched_logs/20100920: No such file or directory

Job: 27.lpge-cluster.ufrj.br

09/20/2010 10:13:48 S enqueuing into default, state 1 hop 1
09/20/2010 10:13:48 S Job Queued at request of
cfo...@lpge-cluster.ufrj.br, owner = cfo...@lpge-cluster.ufrj.br,
job
name = teste.sh, queue = default


Here is the scripts I tried to run:


[cforain@lpge-cluster ~]$ cat teste.sh
#!/bin/bash
#PBS -lwalltime=0:10:0
echo starting
sleep 10
echo ending


And a MPI one:

[cforain@lpge-cluster ~]$ cat teste-mpi.sh
#!/bin/bash
#PBS -lwalltime=0:10:0
#PBS -lnodes=4
echo starting openmpi job:
/opt/openmpi/bin/mpirun /opt/mpi-tests/bin/mpi-ring
echo ending

Cláudio Forain

unread,
Sep 20, 2010, 6:40:12 PM9/20/10
to Discussion of Rocks Clusters
UPDATE:

I have resolved the issue. I just had to set the defaulg to ACTIVE via command:

qmgr -c "set queue batch enabled=true"

Thanks for reading and I hope it helps someone that needs.

2010/9/20 Cláudio Forain <claudi...@gmail.com>:

Reply all
Reply to author
Forward
0 new messages