Hi,
I suppose here:if __name__ == '__main__':
job = WordCount().run(input=["http://discoproject.org/media/text/chekhov.txt"])
for word, count in result_iterator(job.wait(show=True)):
print(word, count)You should import this module itself, like in https://github.com/discoproject/disco/blob/master/examples/util/simple_innerjoin.py
something like:
if __name__ == '__main__':from this_module_name import WordCountjob = WordCount().run(input=["http://discoproject.org/media/text/chekhov.txt"])
for word, count in result_iterator(job.wait(show=True)):
print(word, count)Regards,Alex
On Tuesday, May 14, 2013 5:26:45 AM UTC+3, j.barrett...@gmail.com wrote:To all,
I just started with Disco today. I haven't found a coherent example showing reading and writing from DDFS.
I'm trying to put the pieces together with the examples. When I run the example found here : http://discoproject.org/doc/disco/howto/discodb.html
from disco.core import Job,result_iterator
from disco.util import kvgroup
from disco.worker.classic.func import discodb_stream
class WordCount(Job):
reduce_output_stream = discodb_stream
@staticmethod
def map(line, params):
for word in line.split():
yield word, 1
@staticmethod
def reduce(iter, params):
for word, counts in kvgroup(sorted(iter)):
yield word, str(sum(counts))
if __name__ == '__main__':
job = WordCount().run(input=["http://discoproject.org/media/text/chekhov.txt"])
for word, count in result_iterator(job.wait(show=True)):
print(word, count)
I get the following error
Status: [map] 0 waiting, 1 running, 0 done, 0 failed
2013/05/13 22:24:54 master New job initialized!
2013/05/13 22:24:54 master Starting job
2013/05/13 22:24:54 master Starting map phase
2013/05/13 22:24:54 master map:0 assigned to localhost
2013/05/13 22:24:54 master ERROR: Job failed: Worker at 'localhost' died: Traceback (most recent call last):
File "/srv/disco/data/localhost/22/WordCount@558:79a76:cfaac/usr/lib/python2.7/dist-packages/disco/worker/__init__.py", line 335, in main
task = cls.get_task()
File "/srv/disco/data/localhost/22/WordCount@558:79a76:cfaac/usr/lib/python2.7/dist-packages/disco/worker/__init__.py", line 385, in get_task
return Task(**dict((str(k), v) for k, v in cls.send('TASK').items()))
File "/srv/disco/data/localhost/22/WordCount@558:79a76:cfaac/usr/lib/python2.7/dist-packages/disco/task.py", line 67, in __init__
self.jobobjs = dPickle.loads(self.jobpack.jobdata)
AttributeError: 'module' object has no attribute 'WordCount'
2013/05/13 22:24:54 master WARN: Job killed
Status: [map] 1 waiting, 0 running, 0 done, 1 failed
Traceback (most recent call last):
File "/home/bearrito/Pythontools/pycharm-2.7.2/helpers/pydev/pydevd.py", line 1481, in <module>
debugger.run(setup['file'], None, None)
File "/home/bearrito/Pythontools/pycharm-2.7.2/helpers/pydev/pydevd.py", line 1124, in run
pydev_imports.execfile(file, globals, locals) #execute the script
File "/home/bearrito/Git/PythonComputation/pythoncomputation/WordCount.py", line 21, in <module>
for word, count in result_iterator(job.wait(show=True)):
File "/usr/lib/python2.7/dist-packages/disco/core.py", line 365, in wait
timeout, poll_interval * 1000)
File "/usr/lib/python2.7/dist-packages/disco/core.py", line 325, in check_results
raise JobError(Job(name=jobname, master=self), "Status {0}".format(status))
disco.error.JobError: Job WordCount@558:79a76:cfaac failed: Status dead
What could I be doing incorrectly :