How to debug mrjob when running in Hadoop cluster?

1,218 views

Skip to first unread message

Craig Rodrigues

unread,

Jan 20, 2013, 7:06:39 PM1/20/13

to mr...@googlegroups.com

Hi,

I am new to mrjob, and am taking this class: "Machine Learning on BigData w. Map Reduce" ( http://www.meetup.com/HandsOnProgrammingEvents/events/96046502/ ) being taught by
Mike Bowles.

I am using my own 3-node hadoop cluster, running Ubuntu 12.04.1 and Cloudera 4.1.2.

I installed mrjob, and have gotten some basics working, but have some questions as to how to debug possible problems with MapReduce jobs written in mrjob.

From these slides, http://machinelearningbigdata.pbworks.com/w/file/50030744/Machine%20Learning%20on%20Big%20Data%20-%20ClassIntro.pdf ,
I took the example, which I have attached as mrjob_test1.py.

In my Hadoop cluster, I ran:

python mrjob_test1.py -r hadoop < good_data.txt

and it worked fine.

When I ran this:

python mrjob_test1.py -r hadoop < bad_data.txt

I got a Python exception:

        STDOUT: packageJobJar: [/tmp/hadoop-hadoop1/hadoop-unjar5518420872309545823/] [] /tmp/streamjob4286063445422626248.jar tmpDir=null
        Job failed with return code 1: ['/usr/lib/hadoop-0.20-mapreduce/bin/hadoop', 'jar', '/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.2.jar', '-files', 'hdfs:///user/hadoop1/tmp/mrjob/mrjob_test1.hadoop1.20130120.235407.486390/files/mrjob_test1.py#mrjob_test1.py', '-archives', 'hdfs:///user/hadoop1/tmp/mrjob/mrjob_test1.hadoop1.20130120.235407.486390/files/mrjob.tar.gz#mrjob.tar.gz', '-cmdenv', 'PYTHONPATH=mrjob.tar.gz', '-input', 'hdfs:///user/hadoop1/tmp/mrjob/mrjob_test1.hadoop1.20130120.235407.486390/files/STDIN', '-output', 'hdfs:///user/hadoop1/tmp/mrjob/mrjob_test1.hadoop1.20130120.235407.486390/output', '-mapper', 'python mrjob_test1.py --step-num=0 --mapper', '-reducer', 'python mrjob_test1.py --step-num=0 --reducer']
        Scanning logs for probable cause of failure
        Traceback (most recent call last):
          File "mrjob_test1.py", line 28, in <module>
            mrMeanVar.run()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 483, in run
            mr_job.execute()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 501, in execute
            super(MRJob, self).execute()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 146, in execute
            self.run_job()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 207, in run_job
            runner.run()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 448, in run
            self._run()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 232, in _run
            self._run_job_in_hadoop()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 334, in _run_job_in_hadoop
            raise Exception(msg)
        Exception: Job failed with return code 1: ['/usr/lib/hadoop-0.20-mapreduce/bin/hadoop', 'jar', '/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.2.jar', '-files', 'hdfs:///user/hadoop1/tmp/mrjob/mrjob_test1.hadoop1.20130120.235407.486390/files/mrjob_test1.py#mrjob_test1.py', '-archives', 'hdfs:///user/hadoop1/tmp/mrjob/mrjob_test1.hadoop1.20130120.235407.486390/files/mrjob.tar.gz#mrjob.tar.gz', '-cmdenv', 'PYTHONPATH=mrjob.tar.gz', '-input', 'hdfs:///user/hadoop1/tmp/mrjob/mrjob_test1.hadoop1.20130120.235407.486390/files/STDIN', '-output', 'hdfs:///user/hadoop1/tmp/mrjob/mrjob_test1.hadoop1.20130120.235407.486390/output', '-mapper', 'python mrjob_test1.py --step-num=0 --mapper', '-reducer', 'python mrjob_test1.py --step-num=0 --reducer']


I couldn't figure out the source of the problem based on
this exception. By looking in the Mapreduce logs of my Hadoop server,
I found this:

        2013-01-20 23:54:58,853 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201301191948_0008_m_000000_3: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
                at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
                at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
                at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
                at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
                at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
                at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
                at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:396)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
                at org.apache.hadoop.mapred.Child.main(Child.java:262)


I still couldn't figure out the source of the problem.
However, if I ran the mrjob without Hadoop, I got this error:

        Traceback (most recent call last):
          File "mrjob_test1.py", line 28, in <module>
            mrMeanVar.run()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 483, in run
            mr_job.execute()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 501, in execute
            super(MRJob, self).execute()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 146, in execute
            self.run_job()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 207, in run_job
            runner.run()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 448, in run
            self._run()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/inline.py", line 161, in _run
            'mapper')
          File "/usr/local/lib/python2.7/dist-packages/mrjob/inline.py", line 216, in _invoke_inline_mrjob
            child_instance.execute()
          File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 492, in execute
            self.run_mapper(self.options.step_num)
          File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 557, in run_mapper
            for out_key, out_value in mapper(key, value) or ():
          File "mrjob_test1.py", line 9, in mapper
            num = json.loads(line)
          File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
            return _default_decoder.decode(s)
          File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
            obj, end = self.raw_decode(s, idx=_w(s, 0).end())
          File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
            raise ValueError("No JSON object could be decoded")
        ValueError: No JSON object could be decoded


At this point, I could figure out that the input data was bad.
Is there a way that I can get mrjob to display this
exception information when running in "Hadoop mode"?
Without this stacktrace, things are very difficult to figure out.

Thanks.


--
Craig Rodrigues
rod...@crodrigues.org

mrjob_test1.py.txt

good_data.txt

bad_data.txt

Thomas Arnfeld

unread,

Jan 21, 2013, 6:30:50 AM1/21/13

to mr...@googlegroups.com, rod...@crodrigues.org

Hey,

The `--strict-protocols` command line option may help you here, it causes the I/O protocol exceptions to bubble up which may give a bit more useful information.

You can extend this method (calling super, too) and add the strict protocol option into the job arguments to get this option onto hadoop - https://github.com/Yelp/mrjob/blob/2c5645e7b09f24ac1dbfa874add8b25ef2692cf1/mrjob/launch.py#L587

On another note, by the looks of it neither of your input data samples contain JSON data, it might be worth setting your input protocol to RawValueProtocol (if you don't have any keys) or the RawProtocol if you do.

Hope this helps!

Tom.

Reply all

Reply to author

Forward

0 new messages