Re: [Testbed-admins] Failure during creating an experiment

67 views
Skip to first unread message

Leigh Stoller

unread,
Aug 29, 2007, 9:47:29 AM8/29/07
to Sanjay Kumar, tbadmin admins/operators
Hi Sanjay. I've redirected this to testbed-admins since this is a
good topic for other people to see.

> We have started having problem creating experiments in our georgia
> tech emulab cluster. it fails with the following error.
>
> Error occured: problem starting SSL connection (client mode)
> waitforactive: RPC failed!
> *** /usr/testbed/sbin/eventsys.proxy:
> Failed to start event system for Test/Test1: 1768 65280!
> *** Failed to start the event system.
>
> I don't know what triggered this problem. This error email is
> pasted below.


So, if nothing has changed (you have not recently done an install)
the first thing to check is if the xmlrpc server is running on your
boss. Do this on your boss:

boss> ps auxww | grep xml

and you should see at least one sslxmlrpc process, owned by root. If
not, then you can simply restart it:

boss> sudo /usr/testbed/sbin/sslxmlrpc_server.py

and see if your experiment will swap in. If the problem persists, the
next thing to is look at /proj/$pid/exp/$eid/logs/event-sched.log to
see if there is an obvious (read: decipherable) error message. Often,
the above message is caused because the user deleted his .ssl
directory, or the DB on boss is out of sync with the contents of the
cert in the user's .ssl directory.

If none of the above helps let us know and we can take the next step
(s) ...

Lbs


Sanjay Kumar

unread,
Aug 29, 2007, 10:50:47 AM8/29/07
to Leigh Stoller, tbadmin admins/operators
Hi Leigh,

I see that the xmlrpc server is running.

boss# ps auwwx | grep xml
root 292 0.0 0.2 6504 5792 ?? Is Mon04PM 0:00.01
/usr/local/bin/python /usr/testbed/sbin/sslxmlrpc_server.py
boss#


I checked the /proj/$pid/exp/$eid/logs/ directory but there was no
event-sched.log there.
I monitored the directory through the whole experiment creation process
and I am sure this file doesn't get created at all.

Thanks,
Sanjay


On Wed, 29 Aug 2007 09:47:29 -0400, Leigh Stoller <sto...@flux.utah.edu>
wrote:

--
--------------------
http://www.cc.gatech.edu/~ksanjay/

Leigh Stoller

unread,
Aug 29, 2007, 11:41:04 AM8/29/07
to Sanjay Kumar, tbadmin admins/operators
> I see that the xmlrpc server is running.
>
> I checked the /proj/$pid/exp/$eid/logs/ directory but there was no
> event-sched.log there.
> I monitored the directory through the whol
> e experiment creation process and I am sure this file doesn't get
> created at all.

Okay, try this on your ops node:

ops$ sslxmlrpc_client.py info proj=emulab-ops exp=hwdown aspect=mapping

which should tell you the experiment is not active. If this fails,
then next take a look at the server log file on boss to see what it
says. That file is called sslxmlrpc.log
in /usr/testbed/log.

If no progress, then I will have to log in and take a look. Send me
the details (privately) of how I log into your boss.

Lbs

Sanjay Kumar

unread,
Aug 29, 2007, 11:59:13 AM8/29/07
to Leigh Stoller, tbadmin admins/operators
Hi Leigh,
I executed the command on OPS and I get the error

M2Crypto.SSL.SSLError: sslv3 alert certificate expired

Is this the real problem? How do I get a new certificate?

Thanks,
Sanjay


bash-2.05b# sslxmlrpc_client.py info proj=emulab-ops exp=hwdown
aspect=mapping
Traceback (most recent call last):
File "/usr/testbed/bin/sslxmlrpc_client.py", line 237, in ?
sys.exit(do_method(server, req_args))
File "/usr/testbed/bin/sslxmlrpc_client.py", line 132, in do_method
response = apply(meth, meth_args)
File "/usr/local/lib/python2.3/xmlrpclib.py", line 1029, in __call__
return self.__send(self.__name, args)
File "/usr/local/lib/python2.3/xmlrpclib.py", line 1316, in __request
verbose=self.__verbose
File "/usr/local/lib/python2.3/site-packages/M2Crypto/m2xmlrpclib.py",
line 53, in request
h.endheaders()
File "/usr/local/lib/python2.3/httplib.py", line 712, in endheaders
self._send_output()
File "/usr/local/lib/python2.3/httplib.py", line 597, in _send_output
self.send(msg)
File "/usr/local/lib/python2.3/httplib.py", line 564, in send
self.connect()
File "/usr/local/lib/python2.3/site-packages/M2Crypto/httpslib.py", line
75, in connect
self.sock.connect((self.host, self.port))
File
"/usr/local/lib/python2.3/site-packages/M2Crypto/SSL/Connection.py", line
103, in connect
return self.connect_ssl()
File
"/usr/local/lib/python2.3/site-packages/M2Crypto/SSL/Connection.py", line
96, in connect_ssl
return m2.ssl_connect(self.ssl)
M2Crypto.SSL.SSLError: sslv3 alert certificate expired
bash-2.05b#


On Wed, 29 Aug 2007 11:41:04 -0400, Leigh Stoller <sto...@flux.utah.edu>
wrote:

>> I see that the xmlrpc server is running.

--
--------------------
http://www.cc.gatech.edu/~ksanjay/


Leigh Stoller

unread,
Aug 29, 2007, 1:47:38 PM8/29/07
to Sanjay Kumar, tbadmin admins/operators
> Hi Leigh,
> I executed the command on OPS and I get the error
>
> M2Crypto.SSL.SSLError: sslv3 alert certificate expired
>
> Is this the real problem? How do I get a new certificate?

Yes indeed, this is your problem! Its amazing how the years fly by; I
think they were set to expire after 3 years, and now I have it set to
5 years.

Anyway, the certificates were created when you initially installed
your boss node. Generating and installing the new certificates is
easy. Cd to the object tree on your boss, and go into the ssl
directory. Clean, rebuild, install:

gmake cleanX
gmake remote-site
gmake remote-site-boss-install

You may have to use sudo for some or all of the above. Then you want
to copy over emulab.pem and capture.pem to ops:/usr/testbed/etc. Also
copy ctrlnode.pem to ops:/usr/testbed/etc/client.pem.

Lastly, if you did *not* purchase real certs for your web servers,
you will want to copy the self signed certs to the apache config
directories

cp apache_cert.pem /usr/local/etc/apache/ssl.crt/www.${OURDOMAIN}.crt
cp apache_key.pem /usr/local/etc/apache/ssl.key/www.${OURDOMAIN}.key

and for ops:

scp apache-ops_cert.pem
ops:/usr/local/etc/apache/ssl.crt/${USERNODE}.crt
scp apache-ops_key.pem
ops:/usr/local/etc/apache/ssl.key/${USERNODE}.key

Now reboot boss and ops.

Next you need to regenerate all of the user certificates.

mysql> delete from user_sslcerts;

and then in the testbed source directory:

boss> sql/initcerts.pl

and wait for a little while ...

Lbs

Sanjay Kumar

unread,
Aug 29, 2007, 2:40:23 PM8/29/07
to Leigh Stoller, tbadmin admins/operators
Hi Leigh,

Please see inline.

Thanks,
Sanjay

On Wed, 29 Aug 2007 13:47:38 -0400, Leigh Stoller <sto...@flux.utah.edu>
wrote:

>


> gmake cleanX
> gmake remote-site
> gmake remote-site-boss-install
>

running "gmake remote-site-boss-install" gives one error.

boss# pwd
/root/tbobj/ssl
boss# gmake remote-site-boss-install
/usr/bin/install -c -m 444 usercert.cnf /usr/testbed/lib/ssl/usercert.cnf
install: usercert.cnf: No such file or directory
gmake: *** [remote-site-boss-install] Error 71
boss#

There was no usercert.cnf file in the ssl directory. So I commented out
that line in GNUMakefile and carried on with the rest of the steps.

> and for ops:
>
> scp apache-ops_cert.pem
> ops:/usr/local/etc/apache/ssl.crt/${USERNODE}.crt
> scp apache-ops_key.pem
> ops:/usr/local/etc/apache/ssl.key/${USERNODE}.key
>

There were no apache-ops* files in the ssl directory and no
/usr/local/etc/apache/ directory on ops. so I couldn't perform this step.


> Now reboot boss and ops.
>

I rebooted the two machines.

> Next you need to regenerate all of the user certificates.
>
> mysql> delete from user_sslcerts;
>
> and then in the testbed source directory:
>
> boss> sql/initcerts.pl
>
> and wait for a little while ...
>
> Lbs
>

But the good news is that, after doing all that when I create new
experiments, it WORKS!! :)

Thanks for your help.


>
>
>
>
>

--
--------------------
http://www.cc.gatech.edu/~ksanjay/

Leigh Stoller

unread,
Aug 29, 2007, 2:57:11 PM8/29/07
to Sanjay Kumar, tbadmin admins/operators
> Hi Leigh,
>
> Please see inline.

Both those errors are strange. However, no need to worry about
usercert.cnf since it probably did not change. For the apache ops files:

gmake apache-ops.pem

and then copy over the files to ops.

Otherwise, glad things are working again!

Lbs


Reply all
Reply to author
Forward
0 new messages