central_manager file empty in worker node (Amazon Ec2)

64 views
Skip to first unread message

miccloud

unread,
Apr 28, 2010, 5:49:46 AM4/28/10
to cloudscheduler
Hi,
I have configured two ami based on Ubuntu 9.10.
So I installed in one a central manager condor with cloud scheduler
and in the other ami condor worker node.

I have used CloudScheduler init script for condor and cloud scheduler.

Central manager start well.
When I submit a job CloudScheduler launch a worker node based on my
ami.
But when worker node is launched it doesn't join my pool. I have seen
that when the worker node is launched, his configuration of
condor_config.local has "CONDOR_HOST= ". It is empty. I have seen that
central_manager in the worker node is empty.

I think that central manager not update this file in the worker node.

What can I do?

Thanks a lot.

michele

unread,
Apr 28, 2010, 9:47:49 AM4/28/10
to cloudscheduler
I have also tried to start the Worker node AMI  "ami-fdee0094" (provided by CloudScheduler) but the problem is the same.
It is launched but after central_manager  file of worker node is empty and the node doesn't join my pool.

Thanks.


2010/4/28 miccloud <michel...@gmail.com>

Patrick Armstrong

unread,
Apr 28, 2010, 12:18:18 PM4/28/10
to cloudsc...@googlegroups.com
On 28-Apr-10, at 2:49 AM, miccloud wrote:
> I have configured two ami based on Ubuntu 9.10.
> So I installed in one a central manager condor with cloud scheduler
> and in the other ami condor worker node.
>
> I have used CloudScheduler init script for condor and cloud scheduler.
>
> Central manager start well.
> When I submit a job CloudScheduler launch a worker node based on my
> ami.
> But when worker node is launched it doesn't join my pool. I have seen
> that when the worker node is launched, his configuration of
> condor_config.local has "CONDOR_HOST= ". It is empty. I have seen that
> central_manager in the worker node is empty.

You need to install the ec2contexthelper script. It will take special
metadata (in the Nimbus contextualization format) sent by
cloudscheduler and write them to specific files, like /etc/condor/
central_manager

This script is in scripts/ec2contexthelper

You also need to enable the condor_host_on_vm and condor_context_file
settings in the configuration file.

michele

unread,
Apr 28, 2010, 2:16:40 PM4/28/10
to cloudsc...@googlegroups.com
Hi,
Have I to install it in the central manager only, or also in the worker node?

Another question... in the scripts\cloud_scheduler there is

export GLOBUS_LOCATION=/usr/local/nimbus/


But this directory doesn't exist in my central manager ami, and also in the AMI Cloud Scheduler Test Drive . I
Is it normal?

Thanks again.

2010/4/28 Patrick Armstrong <patr...@uvic.ca>

Patrick Armstrong

unread,
Apr 28, 2010, 2:51:39 PM4/28/10
to cloudsc...@googlegroups.com

On 28-Apr-10, at 11:16 AM, michele wrote:

> Hi,
> Have I to install it in the central manager only, or also in the
> worker node?

ec2contexthelper only needs to be on your worker node.

>
> Another question... in the scripts\cloud_scheduler there is
>
>
> export GLOBUS_LOCATION=/usr/local/nimbus/
Yep, this is normal. The GLOBUS_LOCATION environment variable is only
necessary for when you use a Nimbus Cloud as a resource, and even
then, only with a specific type of installation of the client. Feel
free to ignore it.

miccloud

unread,
Apr 28, 2010, 3:18:00 PM4/28/10
to cloudscheduler
When I type sudo chkconfig context on I have returned this warning:
insserv: script context is not an executable regular file, skipped!
insserv: warning: script 'hwclock' missing LSB tags and overrides
insserv: warning: script 'failsafe-x' missing LSB tags and overrides
insserv: warning: script 'udevmonitor' missing LSB tags and overrides
insserv: warning: script 'procps' missing LSB tags and overrides
insserv: warning: script 'ufw' missing LSB tags and overrides
insserv: warning: script 'rsyslog' missing LSB tags and overrides
insserv: warning: script 'udev-finish' missing LSB tags and overrides
insserv: warning: script 'cron' missing LSB tags and overrides
insserv: warning: script 'module-init-tools' missing LSB tags and
overrides
insserv: warning: script 'dbus' missing LSB tags and overrides
insserv: warning: script 'dmesg' missing LSB tags and overrides
insserv: warning: script 'udev' missing LSB tags and overrides
insserv: warning: script 'rsyslog-kmsg' missing LSB tags and overrides
insserv: warning: script 'apport' missing LSB tags and overrides
insserv: warning: script 'hwclock-save' missing LSB tags and overrides
insserv: warning: script 'udevtrigger' missing LSB tags and overrides
insserv: warning: script 'atd' missing LSB tags and overrides
insserv: warning: script 'hal' missing LSB tags and overrides

What I have to do?
I am using ubuntu 9.10.
With chkconfig contextualization will run automatically on reboot?

Thanks.

miccloud

unread,
Apr 28, 2010, 4:07:07 PM4/28/10
to cloudscheduler
Ignoring this warning I have bundled the ami.
So with the central manager I have submitted a job.
The worker node get up but the problem is not resolved.

So I have opened the running ami and I have typed chkconfig -all.This
is the response:
apparmor on
apport off
atd off
bootlogd off
condor on
console-setup on
context off
cron off
dbus off
dmesg off
dns-clean on
ec2-init 2345
ec2-init-user-data on
failsafe-x off
grub-common on
hal off
hwclock off
hwclock-save off
keyboard-setup on
killprocs on
landscape-client on
module-init-tools off
networking 0
ondemand on
pppd-dns on
procps off
rc.local on
rcS off
rsync on
rsyslog off
rsyslog-kmsg off
screen-cleanup on
sendsigs 0
ssh on
stop-bootlogd off
stop-bootlogd-single off
udev off
udev-finish off
udevmonitor off
udevtrigger off
ufw off
umountfs 0
umountnfs.sh 0
umountroot 0
urandom 0S
wpa-ifupdown 0
x11-common on

Context is off. So I type:
python ./context start and the shell return me this error:
ubuntu@ip-10-194-146-86:~/ec2context$ python ./context start
File "./context", line 14
. /etc/rc.d/init.d/functions
^
SyntaxError: invalid syntax

The line is :
# source function library
. /etc/rc.d/init.d/functions

What can I do?

Thanks in advance.

Patrick Armstrong

unread,
Apr 28, 2010, 4:20:11 PM4/28/10
to cloudsc...@googlegroups.com
Hi There,

On 28-Apr-10, at 1:07 PM, miccloud wrote:
> Context is off. So I type:
> python ./context start and the shell return me this error:

Ahh, the context init script is a bash script, not a python script. To
invoke it, do:

# /etc/init.d/context start

> ubuntu@ip-10-194-146-86:~/ec2context$ python ./context start
> File "./context", line 14
> . /etc/rc.d/init.d/functions
> ^
> SyntaxError: invalid syntax
>
> The line is :
> # source function library
> . /etc/rc.d/init.d/functions
>
> What can I do?

I don't have much experience with Ubuntu so I'm not entirely sure why
the init script isn't starting at start up. On redhat, chkconfig sets
things to start on boot, but you may need to do something special on
ubuntu.

--patrick

Ian Gable

unread,
Apr 28, 2010, 5:25:29 PM4/28/10
to cloudsc...@googlegroups.com
Hi All,

Ubuntu is a little different. It uses the command: update-rc.d

For more info on that have a look at:
http://www.debuntu.org/how-to-manage-services-with-update-rc.d

Cheers,

Ian
--
Ian Gable
HEPnet Canada Technical Manager
iga...@uvic.ca
+1 250 7217746

miccloud

unread,
Apr 28, 2010, 5:36:13 PM4/28/10
to cloudscheduler
I think that I have resolved...after I will write a document.
It may be useful.

The worker node is launched and central_manager file is writed
correctly from central manager.
So the problem is that the condor init script (scripts / condor /
worker ) is not executed after the central_manager file is writed.

How do I run the command:
/ Etc / init.d / condor restart
last?

Output chkconfig -list
apparmor 0:off 1:off 2:off 3:off 4:off 5:off
6:off S:on
apport 0:off 1:off 2:off 3:off 4:off 5:off
6:off
atd 0:off 1:off 2:off 3:off 4:off 5:off
6:off
bootlogd 0:off 1:off 2:off 3:off 4:off 5:off
6:off
condor 0:off 1:off 2:on 3:on 4:on 5:on
6:off
console-setup 0:off 1:off 2:off 3:off 4:off 5:off
6:off S:on
context 0:off 1:off 2:on 3:on 4:on 5:on
6:off
cron 0:off 1:off 2:off 3:off 4:off 5:off
6:off
dbus 0:off 1:off 2:off 3:off 4:off 5:off
6:off
dmesg 0:off 1:off 2:off 3:off 4:off 5:off
6:off
dns-clean 0:off 1:on 2:on 3:on 4:on 5:on
6:off
ec2-init 0:off 1:off 2:on 3:on 4:on 5:on
6:off
ec2-init-user-data 0:off 1:off 2:on 3:on 4:on 5:on
6:off
failsafe-x 0:off 1:off 2:off 3:off 4:off 5:off
6:off

Thanks a lot

Patrick Armstrong

unread,
Apr 28, 2010, 5:45:19 PM4/28/10
to cloudsc...@googlegroups.com

On 28-Apr-10, at 2:36 PM, miccloud wrote:

> I think that I have resolved...after I will write a document.
> It may be useful.

That would be really great. We haven't tested at all on Ubuntu, and
Cloud Scheduler is still pretty young. Thanks.

> The worker node is launched and central_manager file is writed
> correctly from central manager.
> So the problem is that the condor init script (scripts / condor /
> worker ) is not executed after the central_manager file is writed.

Yeah. We handle this on redhat by making the start priority earlier
than condor (condor has priority 98, and context helper has priority
20, so on redhat, the context helper starts much before the condor
init script.

I'm not sure how to do this on Ubuntu, but it looks like the insserv
man page might help.

http://manpages.ubuntu.com/manpages/karmic/en/man8/insserv.8.html

--patrick

miccloud

unread,
Apr 28, 2010, 6:07:41 PM4/28/10
to cloudscheduler
I have installed chkconfig in ubuntu by typing sudo apt-get install
chkconfig.

So what I have to type for visualize/change the priority of the
scripts?

Thanks again.

Patrick Armstrong

unread,
Apr 28, 2010, 6:40:13 PM4/28/10
to cloudsc...@googlegroups.com

On 28-Apr-10, at 3:07 PM, miccloud wrote:

> I have installed chkconfig in ubuntu by typing sudo apt-get install
> chkconfig.
>
> So what I have to type for visualize/change the priority of the
> scripts?
>
> Thanks again.

I don't really, know. With Red Hat, if you do "chkconfig service on"
it just magically puts the links in the right place to start things in
the right order. As far as I know, Ubuntu uses a new init system which
is pretty different, so I'm not even sure that chkconfig will work as
you'd expect.

miccloud

unread,
Apr 28, 2010, 7:14:40 PM4/28/10
to cloudscheduler
Ok,
resolved...I have erased the link and re-added with right priority.

Now................another problem......if you want I could open
another post.
The problem is that when the job is finished the worker node not
shutdown, and if I cat cloudscheduler.conf I say:
2010-04-28 23:01:28,951 - INFO - 63 - Reading cloud resource
configuration file /etc/cloudscheduler/cloud_resources.conf
2010-04-28 23:01:28,952 - INFO - 145 - New cluster AmazonEC2 created
2010-04-28 23:01:28,953 - INFO - 141 - Started info server on port
8111
2010-04-28 23:01:28,953 - INFO - 67 - Starting polling thread...
2010-04-28 23:01:28,953 - INFO - 86 - Starting scheduling thread...
2010-04-28 23:01:29,147 - INFO - 311 - Job
ec2-184-73-80-84.compute-1.amazonaws.com#3.0#1272495614 added to
unscheduled jobs list
2010-04-28 23:01:29,147 - INFO - 153 - Creating a default VM for job
ec2-184-73-80-84.compute-1.amazonaws.com#3.0#1272495614 on primary
resource
2010-04-28 23:04:58,678 - INFO - 508 - System exiting gracefully

System exiting gracefully, so I think that CloudScheduler is
aborted...?

What may be the problem?

P.S I have set this priority:
condor start 98 stop 0
context start 20 stop 0

Thanks.

miccloud

unread,
Apr 28, 2010, 7:14:50 PM4/28/10
to cloudscheduler

Patrick Armstrong

unread,
Apr 28, 2010, 8:03:37 PM4/28/10
to cloudsc...@googlegroups.com

On 28-Apr-10, at 4:14 PM, miccloud wrote:
> resolved...I have erased the link and re-added with right priority.

Great.

> Now................another problem......if you want I could open
> another post.

Nah, don't worry about it.

> The problem is that when the job is finished the worker node not
> shutdown, and if I cat cloudscheduler.conf I say:
> 2010-04-28 23:01:28,951 - INFO - 63 - Reading cloud resource
> configuration file /etc/cloudscheduler/cloud_resources.conf
> 2010-04-28 23:01:28,952 - INFO - 145 - New cluster AmazonEC2 created
> 2010-04-28 23:01:28,953 - INFO - 141 - Started info server on port
> 8111
> 2010-04-28 23:01:28,953 - INFO - 67 - Starting polling thread...
> 2010-04-28 23:01:28,953 - INFO - 86 - Starting scheduling thread...
> 2010-04-28 23:01:29,147 - INFO - 311 - Job
> ec2-184-73-80-84.compute-1.amazonaws.com#3.0#1272495614 added to
> unscheduled jobs list
> 2010-04-28 23:01:29,147 - INFO - 153 - Creating a default VM for job
> ec2-184-73-80-84.compute-1.amazonaws.com#3.0#1272495614 on primary
> resource
> 2010-04-28 23:04:58,678 - INFO - 508 - System exiting gracefully
>
> System exiting gracefully, so I think that CloudScheduler is
> aborted...?

Yep, that's right.

> What may be the problem?

Hmm, did you install the boto module? Also, including your entire log
might help me diagnose the problem.

>
> P.S I have set this priority:
> condor start 98 stop 0
> context start 20 stop 0

Looks good to me.


--patrick

michele

unread,
Apr 29, 2010, 3:55:27 AM4/29/10
to cloudsc...@googlegroups.com
Hi,
In the README it is indicated:
# wget http://boto.googlecode.com/files/boto-1.9d.tar.gz
# tar xvf boto-1.9d.tar.gz
# cd boto-1.8d
# python setup.py install

Bu boto-1.9.d.tar.gz doesn't exist. There is boto-1.9b , 1.9a and 1.8d.
So when I installed cloudscheduler I installed boto-1.9b. Is it wrong?

I have included this file on this email :
cloud_scheduler.conf
cloud_resources.conf
cloudscheduler.log
test.job "used for job example"

In cloudscheduler.log the last line indicates:
2010-04-29 07:43:02,586 - DEBUG -  240 - Scheduler - Waiting 5s
2010-04-29 07:43:07,602 - DEBUG -  244 - Clearing all un-needed VMs from the system
2010-04-29 07:43:07,602 - DEBUG -  248 - Gathering required VM types.
2010-04-29 07:43:07,602 - DEBUG -  487 - get_required_vmtypes - Required VM types: default
2010-04-29 07:43:07,602 - DEBUG -  367 - Querying condor startd with SOAP API
2010-04-29 07:43:09,512 - INFO -  508 - System exiting gracefully
2010-04-29 07:43:09,512 - DEBUG -   82 - Waiting for scheduling loop to end
2010-04-29 07:43:09,752 - DEBUG -  146 - Killing info server...

Thanks in advance.

2010/4/29 Patrick Armstrong <patr...@uvic.ca>
cloud_resources.conf
cloudscheduler.log
cloud_scheduler.conf
test.job

Michael Paterson

unread,
Apr 29, 2010, 12:30:12 PM4/29/10
to cloudsc...@googlegroups.com
Looks like Cloud Scheduler is getting an unexpected error when querying
or parsing the classad from the collector daemon.
Should have been a log.error being reported but apparently not.
The parser isn't particularly smart, so most likely it's a problem there.

which version of condor is running?

michele

unread,
Apr 29, 2010, 1:57:36 PM4/29/10
to cloudsc...@googlegroups.com
I am using Condor 7.4.2

2010/4/29 Michael Paterson <m...@uvic.ca>

michele

unread,
Apr 29, 2010, 3:44:34 PM4/29/10
to cloudsc...@googlegroups.com
Any idea?

2010/4/29 michele <michel...@gmail.com>

Patrick Armstrong

unread,
Apr 29, 2010, 4:19:34 PM4/29/10
to cloudsc...@googlegroups.com

On 29-Apr-10, at 12:44 PM, michele wrote:
> Any idea?

We think it's a bug with our current code. Do you think you could try
out an update mhp made earlier today? It's in the dev branch on github
at:

http://github.com/hep-gc/cloud-scheduler/tree/dev

--patrick

miccloud

unread,
Apr 29, 2010, 5:07:15 PM4/29/10
to cloudscheduler
Ok I try...
Should I remove from my ami cloudscheduler directory and install this
dev package?

Thanks.

Patrick Armstrong

unread,
Apr 29, 2010, 5:09:25 PM4/29/10
to cloudsc...@googlegroups.com
On 29-Apr-10, at 2:07 PM, miccloud wrote:
> Should I remove from my ami cloudscheduler directory and install this
> dev package?

You should be able to just download the tarball (http://github.com/hep-gc/cloud-scheduler/tarball/dev
) then install it overtop of your existing installation.

miccloud

unread,
Apr 29, 2010, 5:11:55 PM4/29/10
to cloudscheduler
I have tried your AMI http://wiki.github.com/hep-gc/cloud-scheduler/cloud-scheduler-test-drive
.
So I have launched also a worker node ami-fdee0094 but if I would
enter in, by ssh with user root it answer me a password.
Can I have it?

Thanks.

Patrick Armstrong

unread,
Apr 29, 2010, 5:27:38 PM4/29/10
to cloudsc...@googlegroups.com

On 29-Apr-10, at 2:11 PM, miccloud wrote:

> I have tried your AMI http://wiki.github.com/hep-gc/cloud-scheduler/cloud-scheduler-test-drive
> .
> So I have launched also a worker node ami-fdee0094 but if I would
> enter in, by ssh with user root it answer me a password.
> Can I have it?

The machine just has a randomly generated password, so I don't know it
either. ;)

What you need to do is start the VM with ec2run's -k option, and ssh
with your keypair with the -i option.


michele

unread,
Apr 29, 2010, 5:58:59 PM4/29/10
to cloudsc...@googlegroups.com
I have tried the new source.
I have the same error.
LOG:
2010-04-27 13:17:20,972 - INFO -   63 - Reading cloud resource configuration file /etc/cloudscheduler/cloud_resources.conf
2010-04-27 13:17:20,973 - INFO -  145 - New cluster AmazonEC2 created
2010-04-27 13:17:20,974 - INFO -  141 - Started info server on port 8111
2010-04-27 13:17:20,974 - INFO -   67 - Starting polling thread...
2010-04-27 13:17:20,975 - INFO -   86 - Starting scheduling thread...
2010-04-29 21:26:32,972 - INFO -   63 - Reading cloud resource configuration file /etc/cloudscheduler/cloud_resources.conf
2010-04-29 21:26:33,001 - INFO -  145 - New cluster AmazonEC2 created
2010-04-29 21:26:33,002 - INFO -  141 - Started info server on port 8111
2010-04-29 21:26:33,002 - INFO -   67 - Starting polling thread...
2010-04-29 21:26:33,002 - INFO -   89 - Starting scheduling thread...
2010-04-29 21:28:43,824 - INFO -  311 - Job ec2-184-73-38-225.compute-1.amazonaws.com#1.0#1272576520 added to unscheduled jobs list
2010-04-29 21:37:04,370 - ERROR -  385 - There was a problem connecting to the Condor scheduler web service (http://localhost:9618)
2010-04-29 21:37:05,530 - INFO -  490 - System exiting gracefully
2010-04-29 21:57:38,320 - INFO -   63 - Reading cloud resource configuration file /etc/cloudscheduler/cloud_resources.conf
2010-04-29 21:57:38,321 - INFO -  145 - New cluster AmazonEC2 created
2010-04-29 21:57:38,322 - INFO -  141 - Started info server on port 8111
2010-04-29 21:57:38,322 - INFO -   67 - Starting polling thread...
2010-04-29 21:57:38,322 - INFO -   89 - Starting scheduling thread...
2010-04-29 21:57:38,533 - ERROR -  385 - There was a problem connecting to the Condor scheduler web service (http://localhost:9618)
2010-04-29 21:57:40,338 - INFO -  490 - System exiting gracefully
2010-04-29 21:58:20,835 - INFO -   63 - Reading cloud resource configuration file /etc/cloudscheduler/cloud_resources.conf
2010-04-29 21:58:20,835 - INFO -  145 - New cluster AmazonEC2 created
2010-04-29 21:58:20,836 - INFO -  141 - Started info server on port 8111
2010-04-29 21:58:20,836 - INFO -   67 - Starting polling thread...
2010-04-29 21:58:20,837 - INFO -   89 - Starting scheduling thread...
2010-04-29 21:58:21,060 - ERROR -  385 - There was a problem connecting to the Condor scheduler web service (http://localhost:9618)
2010-04-29 21:58:21,061 - INFO -  490 - System exiting gracefully



In my $condor_config I have:
## CLOUD SCHEDULER SETTINGS
ENABLE_SOAP = TRUE
ENABLE_WEB_SERVER = TRUE
WEB_ROOT_DIR=$(RELEASE_DIR)/web
ALLOW_SOAP=localhost, 127.0.0.1
SCHEDD_ARGS = -p 8080
UPDATE_COLLECTOR_WITH_TCP=True
COLLECTOR_SOCKET_CACHE_SIZE=1000
It is write?



2010/4/29 Patrick Armstrong <patr...@uvic.ca>

michele

unread,
Apr 29, 2010, 6:19:53 PM4/29/10
to cloudsc...@googlegroups.com
I remember that I have installed boto1.9b and not 1.9d because it doesn't exist.
In the howto (FILE README) is indicated 1.9d

2010/4/29 michele <michel...@gmail.com>

Patrick Armstrong

unread,
Apr 29, 2010, 6:47:15 PM4/29/10
to cloudsc...@googlegroups.com
On 29-Apr-10, at 3:19 PM, michele wrote:

I remember that I have installed boto1.9b and not 1.9d because it doesn't exist.
In the howto (FILE README) is indicated 1.9d

Thanks. I fixed this in the README.
That looks right to me. What happens when you run "telnet localhost 9618"?

You can also check your SchedLog for HTTP requests, to make sure they're going through. This is usually in /var/log/condor/

--patrick


Patrick Armstrong

unread,
Apr 29, 2010, 8:03:25 PM4/29/10
to cloudsc...@googlegroups.com
Hi Michele,

I'm pretty confused about your problem, so I'm spinning up an Ubuntu VM, and I'll try testing cloud scheduler on ubuntu.



On 29-Apr-10, at 3:19 PM, michele wrote:

miccloud

unread,
Apr 30, 2010, 2:54:01 AM4/30/10
to cloudscheduler
Thanks a lot.
I wait your news.
If you want I can give you my AMI number.

On Apr 30, 2:03 am, Patrick Armstrong <patri...@uvic.ca> wrote:
> Hi Michele,
>
> I'm pretty confused about your problem, so I'm spinning up an Ubuntu  
> VM, and I'll try testing cloud scheduler on ubuntu.
>
> On 29-Apr-10, at 3:19 PM, michele wrote:
>
> > I remember that I have installed boto1.9b and not 1.9d because it  
> > doesn't exist.
> > In the howto (FILE README) is indicated 1.9d
>
> > 2010/4/29 michele <michelepo...@gmail.com>
> > 2010/4/29 Patrick Armstrong <patri...@uvic.ca>
>
> > On 29-Apr-10, at 2:11 PM, miccloud wrote:
>
> > I have tried your AMIhttp://wiki.github.com/hep-gc/cloud-scheduler/cloud-scheduler-test-drive
Message has been deleted

miccloud

unread,
Apr 30, 2010, 3:35:48 AM4/30/10
to cloudscheduler
I tried again and log file return me this error when a node is added
on the pool:

MyType TargetType
Name

Scheduler None
ec2-184-73-96-33.compute-1.ama
DaemonMaster None
ec2-184-73-96-33.compute-1.ama
Negotiator None
ec2-184-73-96-33.compute-1.ama
ubuntu@ec2-184-73-96-33:~/condor/etc$ sudo /etc/init.d/cloud_scheduler
start
2010-04-30 07:28:51,936 - INFO - 63 - Reading cloud resource
configuration file /etc/cloudscheduler/cloud_resources.conf
2010-04-30 07:28:51,937 - INFO - 145 - New cluster AmazonEC2 created
2010-04-30 07:28:51,938 - INFO - 141 - Started info server on port
8111
2010-04-30 07:28:51,938 - INFO - 67 - Starting polling thread...
2010-04-30 07:28:51,938 - INFO - 89 - Starting scheduling thread...

2010-04-30 07:33:43,441 - ERROR - 385 - There was a problem
connecting to the Condor scheduler web service (http://localhost:9618)
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 525, in
__bootstrap_inner
self.run()
File "/etc/init.d/cloud_scheduler", line 231, in run
machineList = self.resource_pool.resource_querySOAP()
File "/usr/local/lib/python2.6/dist-packages/cloudscheduler/
cloud_management.py", line 372, in resource_querySOAP
machineList = self.convert_classad_list(machines)
File "/usr/local/lib/python2.6/dist-packages/cloudscheduler/
cloud_management.py", line 361, in convert_classad_list
native_list.append(self.convert_classad_dict(item))
File "/usr/local/lib/python2.6/dist-packages/cloudscheduler/
cloud_management.py", line 352, in convert_classad_dict
if attr.has_key('name') and attr.has_key('value'):
AttributeError: ClassAdStructAttr instance has no attribute 'has_key'

2010-04-30 07:33:43,442 - INFO - 490 - System exiting gracefully

michele

unread,
Apr 30, 2010, 4:08:31 AM4/30/10
to cloudscheduler
If it can be useful I have opened cloud_management.py and at line 372 I have added ===> print machinelist .
The output is in the file in the email annex .


2010/4/30 miccloud <michel...@gmail.com>
machines

Patrick Armstrong

unread,
Apr 30, 2010, 12:38:57 PM4/30/10
to cloudsc...@googlegroups.com
Actually, if you can give me the AMI of your VM, that would probably help with debugging.

Is that possible?


<machines>

Michael Paterson

unread,
Apr 30, 2010, 1:28:32 PM4/30/10
to cloudsc...@googlegroups.com
Looks like some difference in way python is resolving the types
are you running python 2.6 or 3.0?
pushed another change to dev that should fix that has_key() crash, and
hopefully fix the parsing problem.


Michael

michele wrote:
> If it can be useful I have opened cloud_management.py and at line 372
> I have added ===> print machinelist .
> The output is in the file in the email annex .
>
>
> 2010/4/30 miccloud <michel...@gmail.com
> <mailto:michel...@gmail.com>>
> <mailto:michelepo...@gmail.com>> wrote:
> > Thanks a lot.
> > I wait your news.
> > If you want I can give you my AMI number.
> >
> > On Apr 30, 2:03 am, Patrick Armstrong <patri...@uvic.ca
> <mailto:patri...@uvic.ca>> wrote:
> >
> > > Hi Michele,
> >
> > > I'm pretty confused about your problem, so I'm spinning up an
> Ubuntu
> > > VM, and I'll try testing cloud scheduler on ubuntu.
> >
> > > On 29-Apr-10, at 3:19 PM, michele wrote:
> >
> > > > I remember that I have installed boto1.9b and not 1.9d
> because it
> > > > doesn't exist.
> > > > In the howto (FILE README) is indicated 1.9d
> >
> > > > 2010/4/29 michele <michelepo...@gmail.com
> <mailto:michelepo...@gmail.com>>
> <http://ec2-184-73-38-225.compute-1.amazonaws.com#1.0#1272576520>
> <mailto:patri...@uvic.ca>>

miccloud

unread,
Apr 30, 2010, 2:08:31 PM4/30/10
to cloudscheduler
I think python 2.6
I will post you as soon as the ami.

Where I can download you fix? I have to download
http://github.com/hep-gc/cloud-scheduler/tarball/dev ?

Thanks again.

On Apr 30, 7:28 pm, Michael Paterson <m...@uvic.ca> wrote:
> Looks like some difference in way python is resolving the types
> are you running python 2.6 or 3.0?
> pushed another change to dev that should fix that has_key() crash, and
> hopefully fix the parsing problem.
>
> Michael
>
> michele wrote:
> > If it can be useful I have opened cloud_management.py and at line 372
> > I have added ===> print machinelist .
> > The output is in the file in the email annex .
>
> > 2010/4/30 miccloud <michelepo...@gmail.com
> > <mailto:michelepo...@gmail.com>>

miccloud

unread,
Apr 30, 2010, 2:16:35 PM4/30/10
to cloudscheduler
This is the AMI ami-45a9402c.
In /home/ubuntu there is README file where I havre wrote all my path
of condor and cloudscheduler.
You have to set amazon key in /etc/clouscheduler/cloud_resource.conf.

Log in via ssh as ubuntu user.

README file contents:
condor_config is in /home/ubuntu/condor/etc/condor_manager
condor_config.local is in /home/ubuntu/local
central manager is in /home/ubuntu/condor/etc
condor log are in /home/ubuntu/local/log

cloud scheduler file config ==> /etc/cloudscheduler

Patrick Armstrong

unread,
Apr 30, 2010, 2:20:18 PM4/30/10
to cloudsc...@googlegroups.com

On 30-Apr-10, at 11:08 AM, miccloud wrote:

> I think python 2.6
> I will post you as soon as the ami.
>
> Where I can download you fix? I have to download
> http://github.com/hep-gc/cloud-scheduler/tarball/dev ?

Yep, that'll work.

Patrick Armstrong

unread,
Apr 30, 2010, 2:59:05 PM4/30/10
to cloudsc...@googlegroups.com
I just tried the last fix from mike on ubuntu 10.4, and it seems to be
working fine for me.

What about you michele?

miccloud

unread,
Apr 30, 2010, 3:48:41 PM4/30/10
to cloudscheduler
Yes! It works!
But when cloudscheduler shutdown the worker node ,after that a job is
executed ?

Thanks again.

Patrick Armstrong

unread,
Apr 30, 2010, 3:56:01 PM4/30/10
to cloudsc...@googlegroups.com

On 2010-04-30, at 12:48 PM, miccloud wrote:

> Yes! It works!
> But when cloudscheduler shutdown the worker node ,after that a job is
> executed ?

The idea is that there's no point leaving resources running when there are no jobs to run on them, to save money on EC2, or to save CPU cycles on your own Eucalyptus or Nimbus cluster.

We have thought about a "grace period", where cloud scheduler would wait a period of time before shutting down running VMs, but it's not a high priority for us.

--patrick

miccloud

unread,
Apr 30, 2010, 4:08:55 PM4/30/10
to cloudscheduler
Ok, thanks again.
Now I will think about a system to start a new AMI only if the other
machine on the pool are busy.
So if there are free machine in file.job I don't have to put +VMAMI.

Last question:
When I type /etc/init.d/cloud_scheduler start the shell is fixed on
the output of cloudscheduler.conf.
How do I get out of them without stopping cloudscheduler?

mhp

unread,
Apr 30, 2010, 4:39:11 PM4/30/10
to cloudsc...@googlegroups.com

> Last question:
> When I type /etc/init.d/cloud_scheduler start the shell is fixed on
> the output of cloudscheduler.conf.
> How do I get out of them without stopping cloudscheduler?
>

You should be able to use:

service cloud_scheduler start
service cloud_scheduler stop


Patrick Armstrong

unread,
Apr 30, 2010, 7:11:00 PM4/30/10
to cloudsc...@googlegroups.com
Also, make sure you set log_stdout to false in your config file.
Reply all
Reply to author
Forward
0 new messages