Having Tons of Trouble Setting up Disco

98 views
Skip to first unread message

Ryan Rosario

unread,
Oct 2, 2009, 1:13:00 PM10/2/09
to Disco-development
I am trying to set up Disco on OS X 10.6 and am having a lot of
trouble. I am having horrible stability issues. Sometimes when I check
the status page, both of my nodes are there, sometimes only one is
there, and sometimes nothing is there. It is really frustrating.

When I try to run the word count example, I get
bash-3.2$ python count_words.py http://envstat-03:8989
Starting Disco job..
Go to http://envstat-03:8989 to see status of the job.
Traceback (most recent call last):
File "count_words.py", line 21, in <module>
reduce = fun_reduce).wait()
File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
core.py", line 86, in new_job
return Job(self, **kwargs)
File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
core.py", line 220, in __init__
self._run(**kwargs)
File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
core.py", line 434, in _run
reply = self.master.request("/disco/job/new", self.msg)
File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
core.py", line 54, in request
raise DiscoError('Got %s, make sure disco master is running at %s'
% (e, self.host))
disco.error.DiscoError: Got HTTP exception (http://envstat-03:8989/
disco/job/new): Downloading http://envstat-03:8989/disco/job/new
failed after 10 attempts: HTTP exception (http://envstat-03:8989/disco/
job/new): Invalid HTTP reply (expected 200 got 500), make sure disco
master is running at http://envstat-03:8989

I started off with step 0 of the Troubleshooting steps. when I run

sudo disco master stop
sudo disco master nodaemon

I get an erlang prompt, which does not seem correct:

Erlang R13B01 (erts-5.7.2) [source] [64-bit] [smp:8:8] [rq:8] [async-
threads:0] [kernel-poll:true]

Eshell V5.7.2 (abort with ^G)
(disco_4444_master@id-55-74)1>
=INFO REPORT==== 2-Oct-2009::10:06:13 ===
{"DISCO BOOTS"}

=INFO REPORT==== 2-Oct-2009::10:06:13 ===
{"Resultfs enabled:",disco,false}

=INFO REPORT==== 2-Oct-2009::10:06:13 ===
{"Event server starts"}

=INFO REPORT==== 2-Oct-2009::10:06:13 ===
{"DISCO SERVER STARTS"}

=INFO REPORT==== 2-Oct-2009::10:06:13 ===
{'Config table update'}

=INFO REPORT==== 2-Oct-2009::10:06:13 ===
{"Job queue starts"}

=INFO REPORT==== 2-Oct-2009::10:06:13 ===
{"OOB server starts"}

=INFO REPORT==== 2-Oct-2009::10:06:13 ===
{'SCGI SERVER STARTS'}

(disco_4444_master@id-55-74)1>

Then, when I run the count_words script, I get

bash-3.2$ python count_words.py http://envstat-03:8989
Starting Disco job..
Go to http://envstat-03:8989 to see status of the job.
Traceback (most recent call last):
File "count_words.py", line 21, in <module>
reduce = fun_reduce).wait()
File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
core.py", line 225, in g
return f(*tuple([self.name] + list(args)), **kw)
File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
core.py", line 161, in wait
raise JobException("Job status %s" % status, self.host, name)
disco.error.JobException: Job http://envstat-03:8989/wordcount@1254503511
failed: Job status dead

I would appreciate any help you can provide.

Thanks in advance,
Ryan

kunthar

unread,
Oct 2, 2009, 2:40:57 PM10/2/09
to disc...@googlegroups.com
any firewall running?

Sent from my toaster

Jared Flatow

unread,
Oct 2, 2009, 2:53:50 PM10/2/09
to disc...@googlegroups.com
The output from nodaemon is correct, but make sure you don't close or
suspend that shell.

If you stop the master, check to make sure all the erlang processes
are stopped. Then, when you start it up, try it without sudo, you
shouldn't need it. It seems disco is booting up normally (at least in
no daemon node), did you configure the nodes using the webpage? If you
are able to save your configuration, the master is probably running
fine.

jared

Ryan Rosario

unread,
Oct 2, 2009, 9:04:20 PM10/2/09
to Disco-development
Thanks. I am trying to narrow it down by only using the master machine
as the worker as well.

It turns out the failure is due to the warning
"Python: No user 504."

I tried to fix that using a blog post I found, but I have another
problem that is messing everything up.
I can't do:

ssh localhost erl

I get "Command not found."

On this machine (Mac), erl is a symbolic link to /usr/local/lib/erlang/
bin/erl
I can successfully do
ssh localhost /usr/local/lib/erlang/bin/erl

How can I get the SSH path, or Disco, to recognize /usr/local/lib/
erlang/bin/erl?

Thanks,
Ryan

On Oct 2, 11:53 am, Jared Flatow <jfla...@gmail.com> wrote:
> The output from nodaemon is correct, but make sure you don't close or  
> suspend that shell.
>
> If you stop the master, check to make sure all the erlang processes  
> are stopped. Then, when you start it up, try it without sudo, you  
> shouldn't need it. It seems disco is booting up normally (at least in  
> no daemon node), did you configure the nodes using the webpage? If you  
> are able to save your configuration, the master is probably running  
> fine.
>
> jared
>
> On Oct 2, 2009, at 10:13 AM, Ryan Rosario wrote:
>
>
>
>
>
> > I am trying to set up Disco on OS X 10.6 and am having a lot of
> > trouble. I am having horrible stability issues. Sometimes when I check
> > the status page, both of my nodes are there, sometimes only one is
> > there, and sometimes nothing is there. It is really frustrating.
>
> > When I try to run the word count example, I get
> > bash-3.2$ python count_words.pyhttp://envstat-03:8989
> > Starting Disco job..
> > Go tohttp://envstat-03:8989to see status of the job.
> > Traceback (most recent call last):
> >  File "count_words.py", line 21, in <module>
> >    reduce = fun_reduce).wait()
> >  File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
> > core.py", line 86, in new_job
> >    return Job(self, **kwargs)
> >  File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
> > core.py", line 220, in __init__
> >    self._run(**kwargs)
> >  File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
> > core.py", line 434, in _run
> >    reply = self.master.request("/disco/job/new", self.msg)
> >  File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
> > core.py", line 54, in request
> >    raise DiscoError('Got %s, make sure disco master is running at %s'
> > % (e, self.host))
> > disco.error.DiscoError: Got HTTP exception (http://envstat-03:8989/
> > disco/job/new): Downloadinghttp://envstat-03:8989/disco/job/new
> > failed after 10 attempts: HTTP exception (http://envstat-03:8989/
> > disco/
> > job/new): Invalid HTTP reply (expected 200 got 500), make sure disco
> > master is running athttp://envstat-03:8989
> > bash-3.2$ python count_words.pyhttp://envstat-03:8989
> > Starting Disco job..
> > Go tohttp://envstat-03:8989to see status of the job.
> > Traceback (most recent call last):
> >  File "count_words.py", line 21, in <module>
> >    reduce = fun_reduce).wait()
> >  File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
> > core.py", line 225, in g
> >    return f(*tuple([self.name] + list(args)), **kw)
> >  File "/Library/Python/2.6/site-packages/disco-0.1-py2.6.egg/disco/
> > core.py", line 161, in wait
> >    raise JobException("Job status %s" % status, self.host, name)
> > disco.error.JobException: Jobhttp://envstat-03:8989/wordcount@1254503511

Jared Flatow

unread,
Oct 2, 2009, 10:52:16 PM10/2/09
to disc...@googlegroups.com
On Oct 2, 2009, at 6:04 PM, Ryan Rosario wrote:
>
> Thanks. I am trying to narrow it down by only using the master machine
> as the worker as well.
>
> It turns out the failure is due to the warning
> "Python: No user 504."

What user are you running disco as?

>
> I tried to fix that using a blog post I found, but I have another
> problem that is messing everything up.
> I can't do:
>
> ssh localhost erl
>
> I get "Command not found."
>
> On this machine (Mac), erl is a symbolic link to /usr/local/lib/
> erlang/
> bin/erl
> I can successfully do
> ssh localhost /usr/local/lib/erlang/bin/erl
>
> How can I get the SSH path, or Disco, to recognize /usr/local/lib/
> erlang/bin/erl?

One way is to unset the DISCO_ERLANG variable and add /usr/local/lib/
erlang/bin to your PATH in ~/.bashrc (export PATH=$PATH:/usr/local/lib/
erlang/bin). Another is to set DISCO_ERLANG = '/usr/libexec/
StartupItemContext /usr/local/lib/erlang/bin/erl'.

jared

Ryan Rosario

unread,
Oct 2, 2009, 11:18:21 PM10/2/09
to Disco-development
I am running Disco as user disco (UID 504), but in order to start the
server I have to use an admin account, otherwise

disco master start
outputs:
Only root can change DISCO_USER

I tried unsetting the variable, then restarting the master, no change.
Still says "Command not found."

I also tried setting DISCO_ERLANG = '/usr/libexec/ StartupItemContext /
usr/local/lib/erlang/bin/erl'
Now when I start disco, I get

Disco encountered a fatal system error:
[Errno 13] Permission denied

What should I do?

Ryan Rosario

unread,
Oct 3, 2009, 2:34:52 PM10/3/09
to Disco-development
Any suggestions? I still cannot get this to work and I'd really like
to get this set up.

Ryan

On Oct 2, 8:18 pm, Ryan Rosario <uclamath...@gmail.com> wrote:
> I am runningDiscoas userdisco(UID 504), but in order to start the
> server I have to use an admin account, otherwise
>
> discomaster start
> outputs:
> Only root can change DISCO_USER
>
> I tried unsetting the variable, then restarting the master, no change.
> Still says "Command not found."
>
> I also tried setting DISCO_ERLANG = '/usr/libexec/ StartupItemContext /
> usr/local/lib/erlang/bin/erl'
> Now when I startdisco, I get
>
> Discoencountered a fatal system error:
> [Errno 13] Permission denied
>
> What should I do?
>
> On Oct 2, 7:52 pm, Jared Flatow <jfla...@gmail.com> wrote:
>
>
>
> > On Oct 2, 2009, at 6:04 PM, Ryan Rosario wrote:
>
> > > Thanks. I am trying to narrow it down by only using the master machine
> > > as the worker as well.
>
> > > It turns out the failure is due to the warning
> > > "Python: No user 504."
>
> > What user are you runningdiscoas?
>
> > > I tried to fix that using a blog post I found, but I have another
> > > problem that is messing everything up.
> > > I can't do:
>
> > > ssh localhost erl
>
> > > I get "Command not found."
>
> > > On this machine (Mac), erl is a symbolic link to /usr/local/lib/
> > > erlang/
> > > bin/erl
> > > I can successfully do
> > > ssh localhost /usr/local/lib/erlang/bin/erl
>
> > > How can I get the SSH path, orDisco, to recognize /usr/local/lib/

Ville Tuulos

unread,
Oct 5, 2009, 2:20:31 PM10/5/09
to Disco-development

On Sat, 3 Oct 2009, Ryan Rosario wrote:

>
> Any suggestions? I still cannot get this to work and I'd really like
> to get this set up.

My interpretation of the problem is as follows:

You first tried to run Disco as $DISCO_USER but it failed because
passwordless ssh doesn't work for $DISCO_USER and/or there was the PATH
issue.

Now there's some log / pid file left by the previous attempt with
$DISCO_USER which you can't overwrite as yourself, thus "Permission
denied".

You could make a fresh installation of Disco, include
/usr/local/lib/erlang/bin in your ~/.bashrc or set $DISCO_ERLANG as
instructed by Jared below, unset $DISCO_USER, and try to run it as
yourself. At least you shouldn't see the "permission denied" error
anymore.


Ville

Jared Flatow

unread,
Oct 5, 2009, 2:24:14 PM10/5/09
to disc...@googlegroups.com

On Oct 2, 2009, at 8:18 PM, Ryan Rosario wrote:

>
> I am running Disco as user disco (UID 504), but in order to start the
> server I have to use an admin account, otherwise
>
> disco master start
> outputs:
> Only root can change DISCO_USER

Right, if you want to run disco as a different user, you either need
to login as a different user, or execute the start command as root.
Can you just run it without changing as your user until you get it
working?

>
> I tried unsetting the variable, then restarting the master, no change.
> Still says "Command not found."

Still? Isn't that a different error? Command not found meaning
lighttpd? erlang? The user you want to run as should have the correct
path for both and permissions to run them.

> I also tried setting DISCO_ERLANG = '/usr/libexec/
> StartupItemContext /
> usr/local/lib/erlang/bin/erl'

No space between: '/usr/libexec/' and 'StartupItemContext'. That is
the command, the argument is the path to erlang.

> Now when I start disco, I get
>
> Disco encountered a fatal system error:
> [Errno 13] Permission denied

Since you were previously running as a different user, delete the log/
run directories, to make sure the log/pid files are writable by the
new user. If you haven't changed these settings, they are in
DISCO_HOME. You can do disco -p if you can't figure out where they are.

jared

Ryan Rosario

unread,
Oct 5, 2009, 4:03:03 PM10/5/09
to Disco-development
Thanks for the all of the suggestions. I had several issues. The space
was the cause of some of my issues. I also had permissions issues
which is why I had to use an admin account. I still start it as root,
and then switch to the disco user in the terminal to run scripts
(although I would imagine it should work outside of the disco account
since the user is switched anyway).

I am now able to

ssh localhost erl

This may be a weirdness in MacOS X. The PATH must be set in .bashrc
rather than .bash_profile. .bash_profile is the default file for users
on OS X. Now that I can run that command, I was able to use the fix
for the user issue: http://infinite-sushi.com/2009/08/launching-deamons-over-ssh-and-snow-leopard/.
I think that resolved the user issue.

I was able to run the sample Python job. Now I am going to try to set
up a cluster!

Thanks,
Ryan

On Oct 5, 11:24 am, Jared Flatow <jfla...@gmail.com> wrote:
> On Oct 2, 2009, at 8:18 PM, Ryan Rosario wrote:
>
>
>
> > I am runningDiscoas userdisco(UID 504), but in order to start the
> > server I have to use an admin account, otherwise
>
> >discomaster start
> > outputs:
> > Only root can change DISCO_USER
>
> Right, if you want to rundiscoas a different user, you either need  
> to login as a different user, or execute the start command as root.  
> Can you just run it without changing as your user until you get it  
> working?
>
>
>
> > I tried unsetting the variable, then restarting the master, no change.
> > Still says "Command not found."
>
> Still? Isn't that a different error? Command not found meaning  
> lighttpd? erlang? The user you want to run as should have the correct  
> path for both and permissions to run them.
>
> > I also tried setting DISCO_ERLANG = '/usr/libexec/  
> > StartupItemContext /
> > usr/local/lib/erlang/bin/erl'
>
> No space between: '/usr/libexec/' and 'StartupItemContext'. That is  
> the command, the argument is the path to erlang.
>
> > Now when I startdisco, I get
>
> >Discoencountered a fatal system error:
> > [Errno 13] Permission denied
>
> Since you were previously running as a different user, delete the log/
> run directories, to make sure the log/pid files are writable by the  
> new user. If you haven't changed these settings, they are in  
> DISCO_HOME. You can dodisco-p if you can't figure out where they are.
>
> jared

Jared Flatow

unread,
Oct 5, 2009, 4:33:05 PM10/5/09
to disc...@googlegroups.com
Glad you got it working!

~/.bashrc and ~/.profile get called at different times, ~/.bashrc is
for non-login shells, ~/.profile is for login shells. Personally, I
use a symlink from ~/.bashrc -> ~/.profile so they are always the same
for me. Anyway, this is why the test is for 'ssh localhost erl'
instead of 'ssh localhost' then 'erl', since the first one is not a
login shell (which is how erlang does it), but the second one is.

jared
Reply all
Reply to author
Forward
0 new messages