Kafka-HDFS Ingestion Error

300 views
Skip to first unread message

Lee kyo

unread,
Feb 16, 2016, 10:30:56 AM2/16/16
to gobblin-users
hello.

For my experiment, I have
--- 3 kafka brokers
--- 5 nodes for hadoop cluster
--- a Kafka topic with txt format
--- gobblin 0.6.2

My job configuration file looks like


job.name=GobblinKafkaQuickStart2
job.group=GobblinKafka
job.description=Gobblin quick start job for Kafka
job.lock.enabled=false

kafka.brokers=[broker1 ip]:9092,[broker2 ip]:9092,[broker3 ip]:9092

topic.whitelist=test
bootstrap.with.offset=earliest

source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=gobblin.extract.kafka

writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.file.path.type=tablename
writer.destination.type=HDFS
writer.output.format=txt

data.publisher.type=gobblin.publisher.BaseDataPublisher

mr.job.max.mappers=1

metrics.reporting.file.enabled=true
metrics.log.dir=/gobblin-kafka/metrics
metrics.reporting.file.suffix=txt

bootstrap.with.offset=earliest

fs.uri=hdfs://dev-likehadoop001.ncl:9000
writer.fs.uri=hdfs://dev-likehadoop001.ncl:9000
#state.store.fs.uri=hdfs://dev-likehadoop001.ncl:9000

mr.job.root.dir=/user/irteam/gobblin-kafka/working
state.store.dir=/user/irteam/gobblin-kafka/state-store
task.data.root.dir=/user/irteam/jobs/kafkaetl/gobblin/gobblin-kafka/task-data
data.publisher.final.dir=/user/irteam/gobblintest/job-output


The following command is used to submit the job:
bin/gobblin-mapreduce.sh  --conf job/kafka2.pull



Server log is look like :

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home1/irteam/apps/gobblin/gobblin-dist/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home1/irteam/apps/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARN [JobContext] Property writer.staging.dir is deprecated. No need to use it if task.data.root.dir is specified.
WARN [JobContext] Property writer.output.dir is deprecated. No need to use it if task.data.root.dir is specified.
WARN [MRJobLauncher] Job working directory already exists for job GobblinKafkaQuickStart2
WARN [KafkaSource] Previous offset for partition test:0 does not exist. This partition will be skipped.
WARN [KafkaSource] Previous offset for partition test:1 does not exist. This partition will be skipped.
WARN [KafkaSource] Previous offset for partition test:2 does not exist. This partition will be skipped.
WARN [KafkaSource] Previous offset for partition test:3 does not exist. This partition will be skipped.
WARN [KafkaSource] Previous offset for partition test:4 does not exist. This partition will be skipped.
WARN [KafkaSource] Previous offset for partition test:5 does not exist. This partition will be skipped.
WARN [KafkaSource] Previous offset for partition test:6 does not exist. This partition will be skipped.
WARN [KafkaSource] Previous offset for partition test:7 does not exist. This partition will be skipped.
WARN [KafkaSource] Previous offset for partition test:8 does not exist. This partition will be skipped.
WARN [KafkaSource] Previous offset for partition test:9 does not exist. This partition will be skipped.
Error: java.lang.ClassNotFoundException: com.typesafe.config.ConfigFactory
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at gobblin.util.ConfigUtils.propertiesToConfig(ConfigUtils.java:95)
        at gobblin.metrics.reporter.OutputStreamReporter$Builder.build(OutputStreamReporter.java:162)
        at gobblin.metrics.GobblinMetrics.buildFileMetricReporter(GobblinMetrics.java:468)
        at gobblin.metrics.GobblinMetrics.startMetricReporting(GobblinMetrics.java:380)
        at gobblin.metrics.GobblinMetrics.startMetricReportingWithFileSuffix(GobblinMetrics.java:340)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.setup(MRJobLauncher.java:533)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:540)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Error: java.lang.ClassNotFoundException: com.typesafe.config.ConfigFactory
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at gobblin.util.ConfigUtils.propertiesToConfig(ConfigUtils.java:95)
        at gobblin.metrics.reporter.OutputStreamReporter$Builder.build(OutputStreamReporter.java:162)
        at gobblin.metrics.GobblinMetrics.buildFileMetricReporter(GobblinMetrics.java:468)
        at gobblin.metrics.GobblinMetrics.startMetricReporting(GobblinMetrics.java:380)
        at gobblin.metrics.GobblinMetrics.startMetricReportingWithFileSuffix(GobblinMetrics.java:340)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.setup(MRJobLauncher.java:533)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:540)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Error: java.lang.ClassNotFoundException: com.typesafe.config.ConfigFactory
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at gobblin.util.ConfigUtils.propertiesToConfig(ConfigUtils.java:95)
        at gobblin.metrics.reporter.OutputStreamReporter$Builder.build(OutputStreamReporter.java:162)
        at gobblin.metrics.GobblinMetrics.buildFileMetricReporter(GobblinMetrics.java:468)
        at gobblin.metrics.GobblinMetrics.startMetricReporting(GobblinMetrics.java:380)
        at gobblin.metrics.GobblinMetrics.startMetricReportingWithFileSuffix(GobblinMetrics.java:340)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.setup(MRJobLauncher.java:533)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:540)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

WARN [TaskStateCollectorService] Output task state path /user/irteam/gobblin-kafka/working/GobblinKafkaQuickStart2/output/job_GobblinKafkaQuickStart2_1455636098215 does not exist
WARN [ContextAwareReporter] Reporter MetricReportReporter has already been stopped.
WARN [ContextAwareReporter] Reporter MetricReportReporter has already been stopped.


help me please.

Ziyang Liu

unread,
Feb 16, 2016, 12:55:11 PM2/16/16
to gobblin-users
Hi Lee: is config-1.2.1.jar in your lib dir?

Also, "This partition will be skipped." doesn't seem right. Can you remove the duplicate "bootstrap.with.offset=earliest" and try again.

Thanks
Ziyang

Sahil Takiar

unread,
Feb 16, 2016, 3:42:38 PM2/16/16
to Ziyang Liu, gobblin-users
Also, make sure you are using the latest code. The ClassNotFoundException was a bug that was fixed in the following PR: https://github.com/linkedin/gobblin/pull/690

--Sahil

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/57235778-25b3-4f2d-ae81-8b3e61fd77a9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Lee kyo

unread,
Feb 16, 2016, 8:05:51 PM2/16/16
to gobblin-users
Hi. Ziyang

"config-1.2.1.jar" in my lib 
and delete bootstrap.with.offset=earliest 

-rw-rw-r-- 1 irteam irteam   219554 2016-02-16 12:16 config-1.2.1.jar

but It does not work 
 

2016년 2월 17일 수요일 오전 2시 55분 11초 UTC+9, Ziyang Liu 님의 말:

Lee kyo

unread,
Feb 16, 2016, 8:20:24 PM2/16/16
to gobblin-users, zli...@asu.edu
hi Sahil

modify gobblin-mapreduce.sh 
but It does not work

My log looks like

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home1/irteam/apps/gobblin/gobblin-dist/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home1/irteam/apps/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARN [JobContext] Property writer.staging.dir is deprecated. No need to use it if task.data.root.dir is specified.
WARN [JobContext] Property writer.output.dir is deprecated. No need to use it if task.data.root.dir is specified.
WARN [KafkaSource] Previous offset for partition test:0 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition test:1 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition test:2 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition test:3 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition test:4 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition test:5 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition test:6 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition test:7 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition test:8 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition test:9 does not exist. This partition will start from the earliest offset: 0
Error: java.io.IOException: Not all tasks running in container attempt_1448965280464_0036_m_000000_0 completed successfully
        at gobblin.runtime.AbstractJobLauncher.runWorkUnits(AbstractJobLauncher.java:580)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:548)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Error: java.io.IOException: Not all tasks running in container attempt_1448965280464_0036_m_000000_1 completed successfully
        at gobblin.runtime.AbstractJobLauncher.runWorkUnits(AbstractJobLauncher.java:580)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:548)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Error: java.io.IOException: Not all tasks running in container attempt_1448965280464_0036_m_000000_2 completed successfully
        at gobblin.runtime.AbstractJobLauncher.runWorkUnits(AbstractJobLauncher.java:580)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:548)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

WARN [AbstractJobLauncher] Not committing dataset  of job job_GobblinKafkaQuickStart2_1455671920621 with commit policy COMMIT_ON_FULL_SUCCESS and state FAILED
ERROR [AbstractJobLauncher] Failed to launch and run job job_GobblinKafkaQuickStart2_1455671920621: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart2_1455671920621
java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart2_1455671920621
        at gobblin.runtime.JobContext.commit(JobContext.java:417)
        at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:274)
        at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:60)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:133)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Failed to launch the job due to the following exception:
gobblin.runtime.JobException: Job job_GobblinKafkaQuickStart2_1455671920621 failed

Bala Kasaram

unread,
May 27, 2016, 4:51:29 AM5/27/16
to gobblin-users, zli...@asu.edu
Any update on this issue?

Sahil Takiar

unread,
May 27, 2016, 12:33:56 PM5/27/16
to Bala Kasaram, gobblin-users, Ziyang Liu
The "IOException Not all tasks running in container ... completed successfully" just means that a Map Task failed, but it does not show why the Map Task failed. Take a look at http://gobblin.readthedocs.io/en/latest/user-guide/FAQs/#resolve-gobblin-on-mr-exception-ioexception-not-all-tasks-running-in-mapper-attempt_id-completed-successfully

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages