Setting Problem?

342 views
Skip to first unread message

quasiben

unread,
May 16, 2012, 12:04:21 PM5/16/12
to Disco-development
Hey Disco Dev,

I can't seem to run the first example count_words.py on my
installation.

System:
Debian (Sid) on an X200 with a Core-2 Duo processor.

I installed system wide -- following the instruction here (using git
repo):
http://discoproject.org/doc/disco/start/install.html

I added

DISCO_MASTER_HOST = "localhost"

and changed:
DDFS_TAG_MIN_REPLICAS = 2
DDFS_TAG_REPLICAS = 2
DDFS_BLOB_REPLICAS = 2

Without DISCO_MASTER_HOST I was unable to view the status/config page
on port 8989. I started count_words.py and the output repeatedly
cycled through:
2012/05/15 22:50:47 master New job initialized!
2012/05/15 22:50:47 master Starting job
2012/05/15 22:50:47 master Starting map phase

There was nothing in error.log but console.log showed:
2012-05-16 11:24:03.403 [info] <0.137.0>@node_mon:spawn_node:35
Connection timed out to "localhost"

I checked out the Trouble Shooting page:
1) Disco is running:
root 4398 0.8 1.2 95984 48904 ? Sl 11:25 0:15 /usr/
lib/erlang/erts-5.9.1/bin/beam.smp -K true -P 10000000 -- -root /usr/
lib/erlang -progname erl -- -home /root -- -lager handlers
[{lager_console_backend, info},{lager_file_backend,[{"/usr/local/var/
disco/log/error.log", error, 1048576000, "$D0", 5},{"/usr/local/var/
disco/log/console.log", debug, 104857600, "$D0", 5}]}] -lager
crash_log "/usr/local/var/disco/log/crash.log" -rsh ssh -connect_all
false -sname disco_8989_master -pa /usr/local/lib/disco/master/ebin -
pa /usr/local/lib/disco/master/deps/mochiweb/ebin -pa /usr/local/lib/
disco/master/deps/lager/ebin -eval application:start(disco) -noshell -
noinput -heart -kernel

2)There don't seem to be any nodes on the status page with black
bars...mine have red?
indicated here: http://discoproject.org/doc/disco/start/troubleshoot.html#are-there-any-nodes-on-the-status-page

3)ps aux | grep -o disco.*slave@
returns nothing!

"ssh localhost erl" works without a password prompt so I don't think
it's an authentication issue. "disco debug", however, doesn't seem to
work. It returns:
Erlang R15B01 (erts-5.9.1) [source] [smp:2:2] [async-threads:0]
[kernel-poll:false]

*** ERROR: Shell process terminated! (^G to start new job) ***

and continues to hang. I read in a previous post to try changing -
sname to -name in disco/cli.py, but this had no change.

So it seems like my slaves aren't getting created? I'm not sure what
else to look at. I will keep my disco session up for the reminder of
the day: http://129.79.59.26:8989/ . Any help would be greatly
appreciated

Ryan Nowakowski

unread,
May 16, 2012, 5:46:23 PM5/16/12
to disc...@googlegroups.com
I believe that you have to be able to ssh with no password from the
"disco" account.

wilma@flintstone1:~$ sudo su - disco
disco@flintstone1:~$ ssh localhost true; echo $?
0
disco@flintstone1:~$
> --
> You received this message because you are subscribed to the Google Groups "Disco-development" group.
> To post to this group, send email to disc...@googlegroups.com.
> To unsubscribe from this group, send email to disco-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/disco-dev?hl=en.
>

Prashanth Mundkur

unread,
May 17, 2012, 2:56:20 AM5/17/12
to disc...@googlegroups.com
On 09:04 Wed 16 May , quasiben wrote:
> Hey Disco Dev,
>
> I can't seem to run the first example count_words.py on my
> installation.
>
> System:
> Debian (Sid) on an X200 with a Core-2 Duo processor.
>
> I installed system wide -- following the instruction here (using git
> repo):
> http://discoproject.org/doc/disco/start/install.html
>
> I added
>
> DISCO_MASTER_HOST = "localhost"
>
> and changed:
> DDFS_TAG_MIN_REPLICAS = 2
> DDFS_TAG_REPLICAS = 2
> DDFS_BLOB_REPLICAS = 2
>
> Without DISCO_MASTER_HOST I was unable to view the status/config page
> on port 8989.

This is odd. DISCO_MASTER_HOST is not used by the master, but by the
client; it should not affect whether the web ui is accessible on 8989.

> I started count_words.py and the output repeatedly
> cycled through:
> 2012/05/15 22:50:47 master New job initialized!
> 2012/05/15 22:50:47 master Starting job
> 2012/05/15 22:50:47 master Starting map phase
>
> There was nothing in error.log but console.log showed:
> 2012-05-16 11:24:03.403 [info] <0.137.0>@node_mon:spawn_node:35
> Connection timed out to "localhost"
>
> I checked out the Trouble Shooting page:
> 1) Disco is running:
> root 4398 0.8 1.2 95984 48904 ? Sl 11:25 0:15 /usr/
> lib/erlang/erts-5.9.1/bin/beam.smp -K true -P 10000000 -- -root /usr/
> lib/erlang -progname erl -- -home /root -- -lager handlers
> [{lager_console_backend, info},{lager_file_backend,[{"/usr/local/var/
> disco/log/error.log", error, 1048576000, "$D0", 5},{"/usr/local/var/
> disco/log/console.log", debug, 104857600, "$D0", 5}]}] -lager
> crash_log "/usr/local/var/disco/log/crash.log" -rsh ssh -connect_all
> false -sname disco_8989_master -pa /usr/local/lib/disco/master/ebin -
> pa /usr/local/lib/disco/master/deps/mochiweb/ebin -pa /usr/local/lib/
> disco/master/deps/lager/ebin -eval application:start(disco) -noshell -
> noinput -heart -kernel

Please do _not_ run Disco as root.

>
> 2)There don't seem to be any nodes on the status page with black
> bars...mine have red?
> indicated here: http://discoproject.org/doc/disco/start/troubleshoot.html#are-there-any-nodes-on-the-status-page
>
> 3)ps aux | grep -o disco.*slave@
> returns nothing!
>
> "ssh localhost erl" works without a password prompt so I don't think
> it's an authentication issue. "disco debug", however, doesn't seem to
> work. It returns:
> Erlang R15B01 (erts-5.9.1) [source] [smp:2:2] [async-threads:0]
> [kernel-poll:false]
>
> *** ERROR: Shell process terminated! (^G to start new job) ***

You can try specifying the hostname as an argument: 'disco debug
localhost' or 'disco debug `hostname`'

>
> and continues to hang. I read in a previous post to try changing -
> sname to -name in disco/cli.py, but this had no change.
>
> So it seems like my slaves aren't getting created? I'm not sure what
> else to look at. I will keep my disco session up for the reminder of
> the day: http://129.79.59.26:8989/ . Any help would be greatly
> appreciated

Could you please retry running Disco under a non-root account?

--
prashanth

quasiben

unread,
May 17, 2012, 12:23:07 PM5/17/12
to disc...@googlegroups.com
Thanks Prashanth  for the reply.

I switched to a user disco and removed  DISCO_MASTER_HOST from /etc/disco/settings.py.  Still having problems though; I received the same error on disco debug and disco debug, disco debug localhost, and disco debug `hostname`, count_words.py hangs and doesn't show up on the webpage (which does work).

disco@23:~$ disco start
Master 23:8989 started

disco    13156  0.6  1.1  94448 47752 ?        Sl   12:13   0:01 /usr/lib/erlang/erts-5.9.1/bin/beam.smp -K true -P 10000000 -- -root /usr/lib/erlang -progname erl -- -home /home/disco -- -lager handlers [{lager_console_backend, info},{lager_file_backend,[{"/usr/local/var/disco/log/error.log", error, 1048576000, "$D0", 5},{"/usr/local/var/disco/log/console.log", debug, 104857600, "$D0", 5}]}] -lager crash_log "/usr/local/var/disco/log/crash.log" -rsh ssh -connect_all false -sname disco_8989_master -pa /usr/local/lib/disco/master/ebin -pa /usr/local/lib/disco/master/deps/mochiweb/ebin -pa /usr/local/lib/disco/master/deps/lager/ebin -eval application:start(disco) -noshell -noinput -heart -kernel

disco    13203  0.0  0.2  18596  9452 pts/7    S+   12:13   0:00 python examples/util/count_words.py

Not sure if this matters, but I had to change the group permissions on /usr/local/var/disco to disco to allow writing by user disco.

ssh localhost erl from user: disco works without password prompt:
disco@23:~$ ssh localhost erl
Eshell V5.9.1  (abort with ^G)
1> 

Lastly, there is nothing in error.log or crash.log and below is the current output from console.log

2012-05-17 12:19:03.868 [debug] <0.48.0> Lager installed handler {lager_file_backend,"/usr/local/var/disco/log/console.log"} into lager_event
2012-05-17 12:19:03.870 [debug] <0.50.0> Lager installed handler error_logger_lager_h into error_logger
2012-05-17 12:19:03.870 [info] <0.6.0> Application lager started on node disco_8989_master@23
2012-05-17 12:19:03.871 [info] <0.51.0>@disco_main:init:45 DISCO BOOTS
2012-05-17 12:19:03.873 [info] <0.51.0>@disco_proxy:start:46 Disco proxy disabled
2012-05-17 12:19:03.875 [info] <0.51.0>@ddfs_master:start_link:56 DDFS master starts
2012-05-17 12:19:03.877 [info] <0.51.0>@event_server:start_link:37 Event server starts
2012-05-17 12:19:03.877 [debug] <0.51.0> Supervisor disco_main started ddfs_master:start_link() at pid <0.52.0>
2012-05-17 12:19:03.878 [debug] <0.51.0> Supervisor disco_main started event_server:start_link() at pid <0.58.0>
2012-05-17 12:19:03.879 [info] <0.51.0>@disco_config:start_link:24 Disco config starts
2012-05-17 12:19:03.880 [debug] <0.51.0> Supervisor disco_main started disco_config:start_link() at pid <0.59.0>
2012-05-17 12:19:03.881 [info] <0.51.0>@disco_server:start_link:48 DISCO SERVER STARTS
2012-05-17 12:19:03.882 [info] <0.60.0>@fair_scheduler:start_link:10 Fair scheduler starts
2012-05-17 12:19:03.882 [info] <0.61.0>@fair_scheduler:init:24 Scheduler uses fair policy
2012-05-17 12:19:03.883 [info] <0.61.0>@fair_scheduler_fair_policy:start_link:18 Fair scheduler: Fair policy
2012-05-17 12:19:03.885 [info] <0.60.0>@disco_server:do_update_config_table:327 Config table updated
2012-05-17 12:19:03.885 [debug] <0.51.0> Supervisor disco_main started disco_server:start_link() at pid <0.60.0>
2012-05-17 12:19:03.886 [info] <0.51.0>@web_server:start:14 web_server starts
2012-05-17 12:19:03.889 [debug] <0.68.0> Supervisor inet_gethost_native_sup started undefined at pid <0.69.0>
2012-05-17 12:19:03.889 [debug] <0.31.0> Supervisor kernel_safe_sup started inet_gethost_native:start_link() at pid <0.68.0>
2012-05-17 12:19:03.890 [info] <0.65.0>@node_mon:slave_start:82 Starting node at "localhost"
2012-05-17 12:19:03.895 [info] <0.51.0>@web_server:start:17 mochiweb starts
2012-05-17 12:19:03.896 [debug] <0.51.0> Supervisor disco_main started web_server:start(8989) at pid <0.70.0>
2012-05-17 12:19:03.896 [info] <0.6.0> Application disco started on node disco_8989_master@23
2012-05-17 12:19:35.897 [info] <0.65.0>@node_mon:spawn_node:35 Connection timed out to "localhost"
2012-05-17 12:19:50.899 [warning] <0.60.0>@disco_server:nodemon_exit:239 Restarting monitor for "localhost"
2012-05-17 12:19:50.901 [info] <0.96.0>@node_mon:slave_start:82 Starting node at "localhost"
2012-05-17 12:20:22.907 [info] <0.96.0>@node_mon:spawn_node:35 Connection timed out to "localhost"
2012-05-17 12:20:32.387 [info] <0.60.0>@disco_server:do_update_config_table:327 Config table updated
2012-05-17 12:20:36.129 [info] <0.60.0>@disco_server:do_update_config_table:327 Config table updated
2012-05-17 12:20:37.909 [warning] <0.60.0>@disco_server:nodemon_exit:239 Restarting monitor for "localhost"
2012-05-17 12:20:37.910 [info] <0.138.0>@node_mon:slave_start:82 Starting node at "localhost"
2012-05-17 12:20:38.198 [info] <0.60.0>@disco_server:do_update_config_table:327 Config table updated
2012-05-17 12:21:09.916 [info] <0.138.0>@node_mon:spawn_node:35 Connection timed out to "localhost"
2012-05-17 12:21:24.917 [warning] <0.60.0>@disco_server:nodemon_exit:239 Restarting monitor for "localhost"
2012-05-17 12:21:24.918 [info] <0.160.0>@node_mon:slave_start:82 Starting node at "localhost"



--Ben

Benjamin Zaitlen

unread,
May 17, 2012, 12:39:32 PM5/17/12
to disc...@googlegroups.com
One more thing I just noticed.  When exiting (crt-c) the python script (count_words.py):

disco.error.CommError: Unable to access resource (http://23:8989/disco/job/new): Failed to connect to 0.0.0.23: Invalid argument (is disco master running at http://23:8989?)

When I add DISCO_MASTER_HOST = "localhost" back to /etc/disco/settings.py , the script starts, appears on the disco webpage (http://129.79.59.26:8989/), however, i'm back to the repeating sequence of

disco@23:~/DISCO_HOME$ python examples/util/count_words.py
2012/05/17 12:35:47  master     New job initialized!
2012/05/17 12:35:47  master     Starting job
2012/05/17 12:35:47  master     Starting map phase
....

nothing in error.log or crash.log and console.log still shows:
2012-05-17 12:37:26.938 [info] <0.231.0>@node_mon:spawn_node:35 Connection timed out to "localhost"
2012-05-17 12:37:41.939 [warning] <0.60.0>@disco_server:nodemon_exit:239 Restarting monitor for "localhost"
2012-05-17 12:37:41.940 [info] <0.292.0>@node_mon:slave_start:82 Starting node at "localhost"


--
You received this message because you are subscribed to the Google Groups "Disco-development" group.
To view this discussion on the web visit https://groups.google.com/d/msg/disco-dev/-/g9ORs-tNzQ4J.

Prashanth Mundkur

unread,
May 17, 2012, 11:22:14 PM5/17/12
to disc...@googlegroups.com
Does 'ssh 23 erl' work? I'm guessing you'll need to change your
hostname to one with an alphabetic prefix; from your next mail too, it
looks like '23' is not being resolved as a hostname but being treated
as an IP address of 0.0.0.23. It appears 23 is a legal hostname, but
all numeric names seem to trigger bugs.
> > [1]http://discoproject.org/doc/disco/start/install.html
> [2]http://discoproject.org/doc/disco/start/troubleshoot.html#are-there-any-nodes-on-the-status-page
> >
> > 3)ps aux | grep -o disco.*slave@
> > returns nothing!
> >
> > "ssh localhost erl" works without a password prompt so I don't think
> > it's an authentication issue. "disco debug", however, doesn't seem to
> > work. It returns:
> > Erlang R15B01 (erts-5.9.1) [source] [smp:2:2] [async-threads:0]
> > [kernel-poll:false]
> >
> > *** ERROR: Shell process terminated! (^G to start new job) ***
>
> You can try specifying the hostname as an argument: 'disco debug
> localhost' or 'disco debug `hostname`'
>
> >
> > and continues to hang. I read in a previous post to try changing -
> > sname to -name in disco/cli.py, but this had no change.
> >
> > So it seems like my slaves aren't getting created? I'm not sure what
> > else to look at. I will keep my disco session up for the reminder of
> > the day: [3]http://129.79.59.26:8989/ . Any help would be greatly
> > appreciated
>
> Could you please retry running Disco under a non-root account?
>
> --
> prashanth
>
> --
> You received this message because you are subscribed to the Google Groups
> "Disco-development" group.
> To view this discussion on the web visit
> [4]https://groups.google.com/d/msg/disco-dev/-/g9ORs-tNzQ4J.
> To post to this group, send email to disc...@googlegroups.com.
> To unsubscribe from this group, send email to
> disco-dev+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/disco-dev?hl=en.
>
> References
>
> Visible links
> 1. http://discoproject.org/doc/disco/start/install.html
> 2. http://discoproject.org/doc/disco/start/troubleshoot.html#are-there-any-nodes-on-the-status-page
> 3. http://129.79.59.26:8989/
> 4. https://groups.google.com/d/msg/disco-dev/-/g9ORs-tNzQ4J

--
prashanth

quasiben

unread,
May 18, 2012, 10:22:44 AM5/18/12
to disc...@googlegroups.com
Yup!  That was exactly the problem.  After switching the hostname to alphabet only characters and modifying /etc/hosts everything worked smoothly.

Thanks again!
>    disco-dev+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages