The control panel web page shows all four machines with black bar on top (i.e. working and starting just fine).
However, when I submit the count_word.py job (paritioned=20). All three workers fail with a following log:
WARNING: [reduce:16] Traceback (most recent call last):
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/__init__.py", line 345, in main
job.worker.start(task, job, **jobargs)
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/__init__.py", line 308, in start
self.run(task, job, **jobargs)
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 329, in run
getattr(self, task.stage)(task, params)
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 387, in reduce
ordered = self.reduce_input(task, params)
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 381, in reduce_input
return self.sort(SerialInput(shuffled(inputlist(inputs, label=label)),
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 283, in inputlist
for input in inputs) if inp]
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 42, in chainify
return list(chain(*iterable))
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 283, in <genexpr>
for input in inputs) if inp]
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 273, in inputexpand
return zip(*(parse_dir(i, label=label) for i in iterify(input)))
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 273, in <genexpr>
return zip(*(parse_dir(i, label=label) for i in iterify(input)))
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 243, in parse_dir
return [url for lab, url, size in sorted(read_index(dir)) if label in (None, lab)]
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/util.py", line 258, in read_index
file = open_url(proxy_url(dir, to_master=False))
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 104, in open_url
return open_remote(url, *args, **kwargs)
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 110, in open_remote
return Connection(urlresolve(url), token)
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 146, in __init__
self.read(1)
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 174, in read
bytes = self._read_chunk(size if size > 0 else CHUNK_SIZE)
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 194, in _read_chunk
headers=headers)
File "/usr/var/disco/data/omen-disco-slave-03/e2/Job@5b8:1ff9f:c1c20/root/.local/lib/python2.7/site-packages/disco/comm.py", line 80, in request
raise CommError(response.read(), url, status)
CommError: Unable to access resource (http://localhost:8989/disco/localhost/e2/Job@5b8:1ff9f:c1c20/.disco/map_shuffle-1-1464130977359256.results): Not found. (404)
I think this is due to the fact that "omen-disco-slave-03" is trying to access omen-disco-master:8989/disco.... but it does that by addressing "localhost".
I have modified settings.py on all worker nodes and set DISCO_MASTER_HOST=omen-disco-master. When I run disco -v on my worker nodes, they all show that they
recognize omen-disco-master as their master!
Any idea what is the problem? I've checked the ssh communications, and everything seems to work just fine.