I've been fighting with a job for the last day or two and I cannot get it to finish. I'm using a remote HBase for input and output and I'm reading all of the data in one table and transforming the data for output to a different table in a different format. I should only get one row per key so I'm just using an identity mapper and then doing the data format gymnastics in the reducer. The source table has about 38M rows.
Every single one of the errors shows a broken pipe in write_string in typedbytes.py, but I'm assuming that is because the java process that called it has been killed due to the memory error. I've pasted the Last 8KB output of one of the failures below.
I'd appreciate any insight or guidance you can give. This may have nothing to do with Dumbo but since all of the errors are in typedbytes, I thought I'd ask.
stdout logs
stderr logs
INFO: inputting typed bytes
INFO: outputting typed bytes
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201303121752_0003/attempt_201303121752_0003_r_001579_0/work/translate_profiles.py", line 442, in <module>
dumbo.run(dumbo.identitymapper, Reducer, opts=opts)
File "dumbo/core.py", line 359, in run
typedbytes.PairedOutput(sys.stdout).writes(outputs)
File "typedbytes.py", line 397, in writes
self._writes(flatten(iterable))
File "typedbytes.py", line 257, in _writes
w(obj)
File "typedbytes.py", line 250, in _write
writefunc(self, obj)
File "typedbytes.py", line 321, in write_map
self._writes(flatten(map.iteritems()))
File "typedbytes.py", line 257, in _writes
w(obj)
File "typedbytes.py", line 250, in _write
writefunc(self, obj)
File "typedbytes.py", line 321, in write_map
self._writes(flatten(map.iteritems()))
File "typedbytes.py", line 257, in _writes
w(obj)
File "typedbytes.py", line 250, in _write
writefunc(self, obj)
File "typedbytes.py", line 299, in write_string
self.file.write(string)
IOError: [Errno 32] Broken pipe
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:528)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
syslog logs
2013-03-13 00:20:25,843 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,042 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,063 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,070 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,079 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,106 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,113 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,122 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,128 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,135 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,142 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,152 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,159 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,166 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 2 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,172 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,178 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201303121752_0003_r_001579_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2013-03-13 00:20:26,858 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): GetMapEventsThread exiting
2013-03-13 00:20:26,858 INFO org.apache.hadoop.mapred.ReduceTask (main): getMapsEventsThread joined.
2013-03-13 00:20:26,859 INFO org.apache.hadoop.mapred.ReduceTask (main): Closed ram manager
2013-03-13 00:20:26,859 INFO org.apache.hadoop.mapred.ReduceTask (main): Interleaved on-disk merge complete: 0 files left.
2013-03-13 00:20:26,859 INFO org.apache.hadoop.mapred.ReduceTask (main): In-memory merge complete: 56 files left.
2013-03-13 00:20:26,869 INFO org.apache.hadoop.mapred.Merger (main): Merging 56 sorted segments
2013-03-13 00:20:26,870 INFO org.apache.hadoop.mapred.Merger (main): Down to the last merge-pass, with 56 segments left of total size: 657025 bytes
2013-03-13 00:20:26,872 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new compressor
2013-03-13 00:20:26,905 INFO org.apache.hadoop.mapred.ReduceTask (main): Merged 56 segments, 657025 bytes to disk to satisfy reduce memory limit
2013-03-13 00:20:26,905 INFO org.apache.hadoop.mapred.ReduceTask (main): Merging 1 files, 290447 bytes from disk
2013-03-13 00:20:26,906 INFO org.apache.hadoop.mapred.ReduceTask (main): Merging 0 segments, 0 bytes from memory into reduce
2013-03-13 00:20:26,906 INFO org.apache.hadoop.mapred.Merger (main): Merging 1 sorted segments
2013-03-13 00:20:26,909 INFO org.apache.hadoop.mapred.Merger (main): Down to the last merge-pass, with 1 segments left of total size: 290443 bytes
2013-03-13 00:20:26,932 INFO org.apache.hadoop.streaming.PipeMapRed (main): PipeMapRed exec [/usr/bin/python, -m, translate_profiles, red, 0, 262144000]
2013-03-13 00:20:26,997 INFO com.strcst.hadoop.typedbytes.HBaseOutputFormat (main): Setting 'hbase.zookeeper.quorum' to '[redacted]'
2013-03-13 00:20:26,997 INFO com.strcst.hadoop.typedbytes.HBaseOutputFormat (main): Setting 'hbase.zookeeper.property.clientPort' to '2181'
2013-03-13 00:20:27,330 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2013-03-13 00:20:27,330 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2013-03-13 00:20:27,332 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
2013-03-13 00:20:27,344 INFO org.apache.hadoop.streaming.PipeMapRed (Thread-40): Records R/W=158/1
2013-03-13 00:20:27,961 WARN org.apache.hadoop.streaming.PipeMapRed (Thread-40): java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.typedbytes.TypedBytesInput.readRawBytes(TypedBytesInput.java:212)
at org.apache.hadoop.typedbytes.TypedBytesInput.readRaw(TypedBytesInput.java:152)
at org.apache.hadoop.streaming.io.TypedBytesOutputReader.readKeyValue(TypedBytesOutputReader.java:51)
at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:418)
2013-03-13 00:20:27,989 INFO org.apache.hadoop.streaming.PipeMapRed (Thread-38): MRErrorThread done
2013-03-13 00:20:27,989 WARN org.apache.hadoop.streaming.PipeMapRed (main): java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:282)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:569)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:528)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-03-13 00:20:27,989 INFO org.apache.hadoop.streaming.PipeMapRed (main): PipeMapRed failed!
2013-03-13 00:20:27,992 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-03-13 00:20:28,012 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-03-13 00:20:28,012 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop for UID 106 from the native implementation
2013-03-13 00:20:28,014 WARN org.apache.hadoop.mapred.Child (main): Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:528)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-03-13 00:20:28,017 INFO org.apache.hadoop.mapred.Task (main): Runnning cleanup for the task