Managing memory usage

63 views
Skip to first unread message

Roger Hoover

unread,
Feb 20, 2014, 2:56:43 PM2/20/14
to camu...@googlegroups.com
For testing, I ran Camus with a single map task and 1000 message to process (each around 600 bytes) and I'm getting OutOfMemoryError.

What's the best way to prevent this?

I can configure more map tasks but it still could happen.  

I see settings like these in the config which I might be able to use, knowing approximately what my expected ingestion rate is.

kafka.max.pull.minutes.per.task=-1

kafka.max.pull.hrs=1

Any other suggestions?

Thanks,

Roger

$ HADOOP_CONF_DIR=/etc/bdf/hadoop/conf hadoop jar /opt/bdf/hadoop/target/hadoop-0.0.1-SNAPSHOT.jar com.linkedin.camus.etl.kafka.CamusJob  -P /etc/bdf/hadoop/camus.properties
14/02/19 22:40:57 INFO kafka.CamusJob: Dir Destination set to: /user/bigdatafoundry/sit/ingestion/data
14/02/19 22:40:58 INFO kafka.CamusJob: removing old execution: 2014-02-03-15-20-07
14/02/19 22:40:58 INFO kafka.CamusJob: Previous execution: hdfs://nameservice1/user/bigdatafoundry/sit/ingestion/metadata/history/2014-02-19-21-50-08
14/02/19 22:40:58 INFO kafka.CamusJob: New execution temp location: /user/bigdatafoundry/sit/ingestion/metadata/2014-02-19-22-40-58
14/02/19 22:40:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/19 22:40:59 INFO mapred.EtlInputFormat: Fetching metadata from broker 13.7.140.149:9092 with client id -stream-ingester for 0 topic(s) []
14/02/19 22:41:00 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/02/19 22:41:00 INFO compress.CodecPool: Got brand-new compressor [.deflate]
14/02/19 22:41:00 INFO mapred.EtlInputFormat: previous offset file:hdfs://nameservice1/user/bigdatafoundry/sit/ingestion/metadata/history/2014-02-19-21-50-08/offsets-previous
14/02/19 22:41:00 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.raw.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:0     offset:40     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.raw.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:1     offset:40     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.raw.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:2     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.raw.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:3     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.raw.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:4     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.raw.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:5     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.raw.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:6     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.raw.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:7     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.validated.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:0     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.validated.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:1     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.validated.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:2     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.validated.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:3     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.validated.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:4     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.validated.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:5     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.validated.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:6     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.EtlInputFormat: bdf.validated.msgs     uri:tcp://13.7.140.149:9092     leader:168427565     partition:7     offset:0     latest_offset:125
14/02/19 22:41:00 INFO mapred.JobClient: Running job: job_201402042151_8376
14/02/19 22:41:01 INFO mapred.JobClient:  map 0% reduce 0%
14/02/19 22:41:14 INFO mapred.JobClient:  map 32% reduce 0%
14/02/19 22:41:17 INFO mapred.JobClient:  map 39% reduce 0%
14/02/19 22:41:23 INFO mapred.JobClient:  map 47% reduce 0%
14/02/19 22:41:25 INFO mapred.JobClient:  map 100% reduce 0%
14/02/19 22:41:25 INFO mapred.JobClient: Job complete: job_201402042151_8376
14/02/19 22:41:26 INFO mapred.JobClient: Counters: 27
14/02/19 22:41:26 INFO mapred.JobClient:   File System Counters
14/02/19 22:41:26 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/02/19 22:41:26 INFO mapred.JobClient:     FILE: Number of bytes written=195637
14/02/19 22:41:26 INFO mapred.JobClient:     FILE: Number of read operations=0
14/02/19 22:41:26 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/02/19 22:41:26 INFO mapred.JobClient:     FILE: Number of write operations=0
14/02/19 22:41:26 INFO mapred.JobClient:     HDFS: Number of bytes read=1217
14/02/19 22:41:26 INFO mapred.JobClient:     HDFS: Number of bytes written=707799
14/02/19 22:41:26 INFO mapred.JobClient:     HDFS: Number of read operations=1
14/02/19 22:41:26 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/02/19 22:41:26 INFO mapred.JobClient:     HDFS: Number of write operations=9
14/02/19 22:41:26 INFO mapred.JobClient:   Job Counters
14/02/19 22:41:26 INFO mapred.JobClient:     Launched map tasks=1
14/02/19 22:41:26 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=22230
14/02/19 22:41:26 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/02/19 22:41:26 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/02/19 22:41:26 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/19 22:41:26 INFO mapred.JobClient:   Map-Reduce Framework
14/02/19 22:41:26 INFO mapred.JobClient:     Map input records=912
14/02/19 22:41:26 INFO mapred.JobClient:     Map output records=924
14/02/19 22:41:26 INFO mapred.JobClient:     Input split bytes=1217
14/02/19 22:41:26 INFO mapred.JobClient:     Spilled Records=0
14/02/19 22:41:26 INFO mapred.JobClient:     CPU time spent (ms)=4890
14/02/19 22:41:26 INFO mapred.JobClient:     Physical memory (bytes) snapshot=386433024
14/02/19 22:41:26 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1631154176
14/02/19 22:41:26 INFO mapred.JobClient:     Total committed heap usage (bytes)=991821824
14/02/19 22:41:26 INFO mapred.JobClient:   total
14/02/19 22:41:26 INFO mapred.JobClient:     data-read=1096446
14/02/19 22:41:26 INFO mapred.JobClient:     decode-time(ms)=137
14/02/19 22:41:26 INFO mapred.JobClient:     event-count=1767
14/02/19 22:41:26 INFO mapred.JobClient:     request-time(ms)=18248
14/02/19 22:41:26 INFO kafka.CamusJob: Group: File System Counters
14/02/19 22:41:26 INFO kafka.CamusJob: FILE: Number of bytes read:     0
14/02/19 22:41:26 INFO kafka.CamusJob: FILE: Number of bytes written:     195637
14/02/19 22:41:26 INFO kafka.CamusJob: FILE: Number of read operations:     0
14/02/19 22:41:26 INFO kafka.CamusJob: FILE: Number of large read operations:     0
14/02/19 22:41:26 INFO kafka.CamusJob: FILE: Number of write operations:     0
14/02/19 22:41:26 INFO kafka.CamusJob: HDFS: Number of bytes read:     1217
14/02/19 22:41:26 INFO kafka.CamusJob: HDFS: Number of bytes written:     707799
14/02/19 22:41:26 INFO kafka.CamusJob: HDFS: Number of read operations:     1
14/02/19 22:41:26 INFO kafka.CamusJob: HDFS: Number of large read operations:     0
14/02/19 22:41:26 INFO kafka.CamusJob: HDFS: Number of write operations:     9
14/02/19 22:41:26 INFO kafka.CamusJob: Group: Job Counters
14/02/19 22:41:26 INFO kafka.CamusJob: Launched map tasks:     1
14/02/19 22:41:26 INFO kafka.CamusJob: Total time spent by all maps in occupied slots (ms):     22230
14/02/19 22:41:26 INFO kafka.CamusJob: Total time spent by all reduces in occupied slots (ms):     0
14/02/19 22:41:26 INFO kafka.CamusJob: Total time spent by all maps waiting after reserving slots (ms):     0
14/02/19 22:41:26 INFO kafka.CamusJob: Total time spent by all reduces waiting after reserving slots (ms):     0
14/02/19 22:41:26 INFO kafka.CamusJob: Group: Map-Reduce Framework
14/02/19 22:41:26 INFO kafka.CamusJob: Map input records:     912
14/02/19 22:41:26 INFO kafka.CamusJob: Map output records:     924
14/02/19 22:41:26 INFO kafka.CamusJob: Input split bytes:     1217
14/02/19 22:41:26 INFO kafka.CamusJob: Spilled Records:     0
14/02/19 22:41:26 INFO kafka.CamusJob: CPU time spent (ms):     4890
14/02/19 22:41:26 INFO kafka.CamusJob: Physical memory (bytes) snapshot:     386433024
14/02/19 22:41:26 INFO kafka.CamusJob: Virtual memory (bytes) snapshot:     1631154176
14/02/19 22:41:26 INFO kafka.CamusJob: Total committed heap usage (bytes):     991821824
14/02/19 22:41:26 INFO kafka.CamusJob: Group: total
14/02/19 22:41:26 INFO kafka.CamusJob: data-read:     1096446
14/02/19 22:41:26 INFO kafka.CamusJob: decode-time(ms):     137
14/02/19 22:41:26 INFO kafka.CamusJob: event-count:     1767
14/02/19 22:41:26 INFO kafka.CamusJob: request-time(ms):     18248
topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=0 offset=1 server= checksum=634667686 time=1392849674100
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=1 offset=2 server= checksum=1167688791 time=1392849674105
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=2 offset=3 server= checksum=2124285611 time=1392849674106
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=3 offset=4 server= checksum=871654287 time=1392849674106
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=4 offset=5 server= checksum=69324309 time=1392849674107
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=5 offset=6 server= checksum=420984128 time=1392849674107
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=6 offset=7 server= checksum=3635816992 time=1392849674108
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=7 offset=8 server= checksum=3965401232 time=1392849674108
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=8 offset=9 server= checksum=168114254 time=1392849674109
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=0leaderId=168427565 server= service= beginOffset=9 offset=10 server= checksum=1685958109 time=1392849674110
java.io.IOException: java.lang.IndexOutOfBoundsException
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:128)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     ... 12 more

topic=bdf.validated.msgs partition=1leaderId=168427565 server= service= beginOffset=77 offset=78 server= checksum=2967038340 time=1392849675153
java.lang.Exception: Java heap space
     at org.apache.avro.util.Utf8.setByteLength(Utf8.java:77)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:260)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.OutOfMemoryError: Java heap space
     ... 24 more

topic=bdf.validated.msgs partition=2leaderId=168427565 server= service= beginOffset=32 offset=33 server= checksum=2297811482 time=1392849675328
java.lang.Exception: Java heap space
     at org.apache.avro.util.Utf8.setByteLength(Utf8.java:77)
     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:260)
     at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:344)
     at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:337)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)
     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:74)
     at com.parc.bdf.ingestion.hadoop.CamusMessageDecoder.decode(CamusMessageDecoder.java:36)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.getWrappedRecord(EtlRecordReader.java:125)
     at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:255)
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:484)
     at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
     at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.OutOfMemoryError: Java heap space
     ... 24 more

14/02/19 22:41:26 INFO kafka.CamusJob: Job finished
14/02/19 22:41:26 INFO kafka.CamusJob: ***********Timing Report*************
Job time (seconds):
       pre setup    2.0 (7%)
      get splits    1.0 (3%)
      hadoop job   25.0 (86%)
          commit    0.0 (0%)
Total: 0 minutes 29 seconds

Hadoop job task times (seconds):
             min   19.0
            mean   19.0
             max   19.0
            skew   19.0/19.0 = 1.00

Task wait time (seconds):
             min    4.7
            mean    4.7
             max    4.7

Hadoop task breakdown:
           kafka 96%
          decode 1%
      map output 0%
           other 3%

  Total MB read: 1 


Ken

unread,
Feb 20, 2014, 5:57:16 PM2/20/14
to Roger Hoover, camu...@googlegroups.com
The following error usually means an avro schema mismatch. 

org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IndexOutOfBoundsException
     at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:163)
     at 


-Ken

Roger Hoover

unread,
Feb 21, 2014, 5:08:19 PM2/21/14
to Ken, camu...@googlegroups.com
Gaurav + Ken,

It was an Avro schema mismatch.  Once I fixed the issue, everything is working great.  Thanks!

Roger


--
You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages