claimed but idle?

33 views
Skip to first unread message

Peter Gavin

unread,
May 23, 2012, 9:40:47 AM5/23/12
to archer-us...@googlegroups.com
Hi,

It seems there a lot of nodes that are claimed but idle. Any idea how
I can clean them up? Google tells me I should be able to do something
like condor_reconfig -daemon startd, but that doesn't seem to have any
effect.

-Pete

Renato Figueiredo

unread,
May 23, 2012, 10:39:41 AM5/23/12
to archer-us...@googlegroups.com
Good question. I don't know - I suspect reconfiguring startd would need to be done on all machines, individually.
We'll look into it. 
--rf


--
You received this message because you are subscribed to the Google Groups "Archer User's Group" group.
To post to this group, send email to archer-us...@googlegroups.com.
To unsubscribe from this group, send email to archer-users-gr...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/archer-users-group?hl=en.




--
Dr. Renato J. Figueiredo
Associate Professor
ACIS Lab - ECE - University of Florida
UF Site Director, Cloud and Autonomic Computing (CAC) Center
http://byron.acis.ufl.edu
ph: 352-392-6430

Kyungyong Lee

unread,
May 24, 2012, 10:16:31 AM5/24/12
to archer-us...@googlegroups.com
Hello Peter,

As I inspect it, all machines seem to be claimed by "C208052244.ipop".
Is it a machine that you are running your jobs? I think it would be
worth to spend time to inspect status of both the claimed/idle
machines and submit VM (C208052244.ipop).

Best,
Kyungyong.

Peter Gavin

unread,
May 24, 2012, 10:38:44 AM5/24/12
to archer-us...@googlegroups.com
On Thu, May 24, 2012 at 4:16 PM, Kyungyong Lee <kl...@acis.ufl.edu> wrote:
> Hello Peter,
>
> As I inspect it, all machines seem to be claimed by "C208052244.ipop".
> Is it a machine that you are running your jobs? I think it would be
> worth to spend time to inspect status of both the claimed/idle
> machines and submit VM (C208052244.ipop).
>

Yes, C208052244 is my submit machine. I just dumped a huge batch of
jobs to run, so I don't really want to reboot anything or restart
condor, and I'm not exactly a condor guru, so that's what I do when I
have problems :)

It seems the number of claimed/idle nodes has gone down from 48
yesterday to 34 now. But of the ones that remain, none of them are
machines that I admin. These are the hostnames of my machines:

C000034052.ipop
C018192220.ipop
C026205156.ipop
C033082209.ipop
C050172050.ipop
C113208164.ipop
C142212012.ipop
C152162005.ipop
C180202002.ipop
C185203074.ipop
C208052244.ipop
C212138053.ipop
C251167027.ipop

And these are the ones with claimed/idle nodes:

$ condor_status -const 'State == "Claimed" && Activity == "Idle"' -f
'%s\n' Machine | sort -u
C026214087.ipop
C052013004.ipop
C077078154.ipop
C108079196.ipop
C123040101.ipop
C135095238.ipop
C135096034.ipop
C153238212.ipop
C175010111.ipop
C252172134.ipop

I don't really know how to manage condor on machines I can't
personally ssh into :)

-Pete

Kyungyong Lee

unread,
May 24, 2012, 10:46:33 AM5/24/12
to archer-us...@googlegroups.com
That's fine, Peter. I can help with that. If you have launched a bunch
of jobs just now, let us wait until the jobs are done to see if the
claimed/idle status are shown again.

Kyungyong.

Peter Gavin

unread,
May 24, 2012, 10:48:12 AM5/24/12
to archer-us...@googlegroups.com
On Thu, May 24, 2012 at 4:46 PM, Kyungyong Lee <iamkyu...@gmail.com> wrote:
> That's fine, Peter. I can help with that. If you have launched a bunch
> of jobs just now, let us wait until the jobs are done to see if the
> claimed/idle status are shown again.

Ok, sounds good.

Thanks,
-Pete
Reply all
Reply to author
Forward
0 new messages