Can't get it working - HELP URGENT

65 views
Skip to first unread message

s.p.m...@gmail.com

unread,
May 24, 2016, 8:25:54 PM5/24/16
to Disco-development
Hi I installed the Disco on 4 machines ( 1 master and 3 worker nodes)

The control panel web page shows all four machines with black bar on top (i.e. working and starting just fine).
However, when I submit the count_word.py job (paritioned=20). All three workers fail with a following log:

(this one belongs to slave-03 machine)
WARNING: [reduce:16] Traceback (most recent call last):
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/__init__.py", line 345, in main
    job.worker.start(task, job, **jobargs)
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/__init__.py", line 308, in start
    self.run(task, job, **jobargs)
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 329, in run
    getattr(self, task.stage)(task, params)
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 387, in reduce
    ordered = self.reduce_input(task, params)
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 381, in reduce_input
    return self.sort(SerialInput(shuffled(inputlist(inputs, label=label)),
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 283, in inputlist
    for input in inputs) if inp]
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 42, in chainify
    return list(chain(*iterable))
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 283, in <genexpr>
    for input in inputs) if inp]
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 273, in inputexpand
    return zip(*(parse_dir(i, label=label) for i in iterify(input)))
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 273, in <genexpr>
    return zip(*(parse_dir(i, label=label) for i in iterify(input)))
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 243, in parse_dir
    return [url for lab, url, size in sorted(read_index(dir)) if label in (None, lab)]
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 258, in read_index
    file = open_url(proxy_url(dir, to_master=False))
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 104, in open_url
    return open_remote(url, *args, **kwargs)
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 110, in open_remote
    return Connection(urlresolve(url), token)
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 146, in __init__
    self.read(1)
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 174, in read
    bytes = self._read_chunk(size if size > 0 else CHUNK_SIZE)
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 194, in _read_chunk
    headers=headers)
  File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 80, in request
    raise CommError(response.read(), url, status)
CommError: Unable to access resource (http://localhost:8989/disco/localhost/e2/Job@5b8:1ff9f:c1c20/.disco/map_shuffle-1-1464130977359256.results): Not found. (404)



I think this is due to the fact that "omen-disco-slave-03" is trying to access omen-disco-master:8989/disco.... but it does that by addressing "localhost".
I have modified settings.py on all worker nodes and set DISCO_MASTER_HOST=omen-disco-master. When I run disco -v on my worker nodes, they all show that they 
recognize omen-disco-master as their master! 

Any idea what is the problem? I've checked the ssh communications, and everything seems to work just fine.


Laurent Roger

unread,
May 25, 2016, 1:51:16 AM5/25/16
to Disco-development
can you check on master with curl on the given url that the job has the expected file written ?

s.p.m...@gmail.com

unread,
May 25, 2016, 1:02:04 PM5/25/16
to Disco-development
When I check it with curl on master it works as expected. But the problem is the worker nodes are not addressing the master node by its hostname and use localhost which points to themselves. That being said, I have configured settings.py to introduce to worker the master node's hostname, but that didn't solve the problem.
Any idea?

Laurent Roger

unread,
May 25, 2016, 4:45:59 PM5/25/16
to Disco-development, s.p.m...@gmail.com
Reply all
Reply to author
Forward
0 new messages