Runtime error [: Error for EtlKey] while running camus .

156 views

Skip to first unread message

Sugandha Goyal

unread,

Aug 26, 2015, 2:56:30 PM8/26/15

to Camus - Kafka ETL for Hadoop

Hi All,

i am getting below error while running the camus job.

hadoop jar camus-example-0.1.0-SNAPSHOT-shaded.jar com.linkedin.camus.etl.kafka.CamusJob -P camus.properties

15/08/26 14:37:18 ERROR kafka.CamusJob: Error for EtlKey [topic=demoDVRRecTopic0825 partition=0leaderId= server= service= beginOffset=4 offset=5 msgSize=150 server= checksum=2778582805 time=1440613724034 message.size=150]: java.lang.RuntimeException: null record

at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:295)

at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)

at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)

at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/08/26 14:37:18 ERROR kafka.CamusJob: job failed: 100.0% messages skipped due to other, maximum allowed is 0.1%

Exception in thread "main" java.lang.RuntimeException: job failed: 100.0% messages skipped due to other, maximum allowed is 0.1%

at com.linkedin.camus.etl.kafka.CamusJob.checkIfTooManySkippedMsg(CamusJob.java:467)

at com.linkedin.camus.etl.kafka.CamusJob.checkIfTooManySkippedMsg(CamusJob.java:453)

at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:372)

at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:235)

at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:691)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at com.linkedin.camus.etl.kafka.CamusJob.main(CamusJob.java:646)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcces

here is the camus properties file .

# Needed Camus properties, more cleanup to come

# Almost all properties have decent default properties. When in doubt, comment out the property.

# The job name.

camus.job.name=Camus Job

fs.defaultFS=hdfs://

etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.StringRecordWriterProvider

#etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.AvroRecordWriterProvider

# final top-level data output directory, sub-directory will be dynamically created for each topic pulled

etl.destination.path=/user//camus

# HDFS location where you want to keep execution files, i.e. offsets, error logs, and count files

etl.execution.base.path=/user//camus/exec

# where completed Camus job output directories are kept, usually a sub-dir in the base.path

etl.execution.history.path=/user//camus/exec/history

# Concrete implementation of the Encoder class to use (used by Kafka Audit, and thus optional for now)

#camus.message.encoder.class=com.linkedin.camus.etl.kafka.coders.DummyKafkaMessageEncoder

# Concrete implementation of the Decoder class to use.

# Out of the box options are:

# com.linkedin.camus.etl.kafka.coders.JsonStringMessageDecoder - Reads JSON events, and tries to extract timestamp.

# com.linkedin.camus.etl.kafka.coders.KafkaAvroMessageDecoder - Reads Avro events using a schema from a configured schema repository.

# com.linkedin.camus.etl.kafka.coders.LatestSchemaKafkaAvroMessageDecoder - Same, but converts event to latest schema for current topic.

camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.JsonStringMessageDecoder

#camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.LatestSchemaKafkaAvroMessageDecoder

# Decoder class can also be set on a per topic basis.

#camus.message.decoder.class.<topic-name>=com.your.custom.MessageDecoder

# Used by avro-based Decoders (KafkaAvroMessageDecoder and LatestSchemaKafkaAvroMessageDecoder) to use as their schema registry.

# Out of the box options are:

# com.linkedin.camus.schemaregistry.FileSchemaRegistry

# com.linkedin.camus.schemaregistry.MemorySchemaRegistry

# com.linkedin.camus.schemaregistry.AvroRestSchemaRegistry

# com.linkedin.camus.example.schemaregistry.DummySchemaRegistry

kafka.message.coder.schema.registry.class=com.linkedin.camus.example.schemaregistry.DummySchemaRegistry

# Used by JsonStringMessageDecoder when extracting the timestamp

# Choose the field that holds the time stamp (default "timestamp")

camus.message.timestamp.field=timestamp

# What format is the timestamp in? Out of the box options are:

# "unix" or "unix_seconds": The value will be read as a long containing the seconds since epoc

# "unix_milliseconds": The value will be read as a long containing the milliseconds since epoc

# "ISO-8601": Timestamps will be fed directly into org.joda.time.DateTime constructor, which reads ISO-8601

# All other values will be fed into the java.text.SimpleDateFormat constructor, which will be used to parse the timestamps

# Default is "[dd/MMM/yyyy:HH:mm:ss Z]"

#camus.message.timestamp.format=yyyy-MM-dd_HH:mm:ss

camus.message.timestamp.format=ISO-8601

# Used by the committer to arrange .avro files into a partitioned scheme. This will be the default partitioner for all

# topic that do not have a partitioner specified.

# Out of the box options are (for all options see the source for configuration options):

# com.linkedin.camus.etl.kafka.partitioner.HourlyPartitioner, groups files in hourly directories

# com.linkedin.camus.etl.kafka.partitioner.DailyPartitioner, groups files in daily directories

# com.linkedin.camus.etl.kafka.partitioner.TimeBasedPartitioner, groups files in very configurable directories

# com.linkedin.camus.etl.kafka.partitioner.DefaultPartitioner, like HourlyPartitioner but less configurable

# com.linkedin.camus.etl.kafka.partitioner.TopicGroupingPartitioner

#etl.partitioner.class=com.linkedin.camus.etl.kafka.partitioner.HourlyPartitioner

# Partitioners can also be set on a per-topic basis. (Note though that configuration is currently not per-topic.)

#etl.partitioner.class.<topic-name>=com.your.custom.CustomPartitioner

# all files in this dir will be added to the distributed cache and placed on the classpath for hadoop tasks

# hdfs.default.classpath.dir=

# max hadoop tasks to use, each task can pull multiple topic partitions

mapred.map.tasks=30

# max historical time that will be pulled from each partition based on event timestamp

kafka.max.pull.hrs=1

# events with a timestamp older than this will be discarded.

kafka.max.historical.days=3

# Max minutes for each mapper to pull messages (-1 means no limit)

kafka.max.pull.minutes.per.task=-1

# if whitelist has values, only whitelisted topic are pulled. Nothing on the blacklist is pulled

#kafka.blacklist.topics=

kafka.whitelist.topics=demoDVRRecTopic0825

log4j.configuration=false

# Name of the client as seen by kafka

kafka.client.name=camus

# The Kafka brokers to connect to, format: kafka.brokers=host1:port,host2:port,host3:port

kafka.brokers=test:6667

# Fetch request parameters:

#kafka.fetch.buffer.size=

#kafka.fetch.request.correlationid=

#kafka.fetch.request.max.wait=

#kafka.fetch.request.min.bytes=

#kafka.timeout.value=

#Stops the mapper from getting inundated with Decoder exceptions for the same topic

#Default value is set to 10

max.decoder.exceptions.to.print=5

#Controls the submitting of counts to Kafka

#Default value set to true

post.tracking.counts.to.kafka=true

#monitoring.event.class=class.that.generates.record.to.submit.counts.to.kafka

# everything below this point can be ignored for the time being, will provide more documentation down the road

##########################

etl.run.tracking.post=true

#kafka.monitor.tier=

#etl.counts.path=

kafka.monitor.time.granularity=10

#etl.hourly=hourly

#etdal.ily=daily

# Should we ignore events that cannot be decoded (exception thrown by MessageDecoder)?

# `false` will fail the job, `true` will silently drop the event.

etl.ignore.schema.errors=true

# configure output compression for deflate or snappy. Defaults to deflate

mapred.output.compress=false

#etl.output.codec=deflate

#etl.deflate.level=6

#etl.output.codec=snappy

etl.default.timezone=America/Los_Angeles

etl.output.file.time.partition.mins=60

etl.keep.count.files=false

etl.execution.history.max.of.quota=.8

# Configures a customer reporter which extends BaseReporter to send etl data

#etl.reporter.class

mapred.map.max.attempts=1

kafka.client.buffer.size=20971520

kafka.client.so.timeout=60000

#zookeeper.session.timeout=

#zookeeper.connection.timeout=

i have added a StringMessageDecoder.java to the camus jar.The map reduce completes and the error log is written to /user/camus/exec folder

Please help me with the above issue.

thanks.

Reply all

Reply to author

Forward

0 new messages