WARN [JobContext] Property task.data.root.dir is missing.WARN [KafkaSource] Previous offset for partition mytopic:0 does not exist. This partition will start from the earliest offset: 0WARN [KafkaSource] Previous offset for partition mytopic:1 does not exist. This partition will start from the earliest offset: 0WARN [TaskStateCollectorService] No output task state files found in gobblin/kafka2gcs/job_kafka2gcs_1495211213990/output/job_kafka2gcs_1495211213990WARN [SafeDatasetCommit] Not committing dataset of job job_kafka2gcs_1495211213990 with commit policy COMMIT_ON_FULL_SUCCESS and state FAILEDERROR [JobContext] Iterator executor failure.java.util.concurrent.ExecutionException: java.lang.RuntimeException: Not committing dataset of job job_kafka2gcs_1495211213990 with commit policy COMMIT_ON_FULL_SUCCESS and state FAILED
... stacktrace continues
job.name=kafka2gcs
job.group=gkafka2gcs
job.description=Gobblin job to read messages from Kafka and save as is on GCS
job.lock.enabled=false
kafka.brokers=mykafka:9092
topic.whitelist=mytopic
bootstrap.with.offset=earliest
source.class=gobblin.source.extractor.extract.kafka.KafkaDeserializerSource
kafka.deserializer.type=BYTE_ARRAY
extract.namespace=nskafka2gcs
writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.destination.type=HDFS
mr.job.max.mappers=2
writer.output.format=txt
data.publisher.type=gobblin.publisher.BaseDataPublisher
metrics.enabled=false
fs.uri=file:///.
writer.fs.uri=${fs.uri}
mr.job.root.dir=gobblin
writer.output.dir=${mr.job.root.dir}/out
writer.staging.dir=${mr.job.root.dir}/stg
fs.gs.project.id=my-test-project
data.publisher.fs.uri=gs://my-bucket
state.store.fs.uri=${data.publisher.fs.uri}
data.publisher.final.dir=gobblin/pub
state.store.dir=gobblin/state
gcloud dataproc clusters create myspark \
--image-version 1.1 \
--master-machine-type n1-standard-4 \
--master-boot-disk-size 10 \
--num-workers 2 \
--worker-machine-type n1-standard-4 \
--worker-boot-disk-size 10 \
--scopes 'https://www.googleapis.com/auth/cloud-platform' \
--initialization-actions gs://my-bucket/gobblin-setup.sh
gcloud dataproc jobs submit hadoop --cluster=myspark --class gobblin.runtime.mapreduce.CliMRJobLauncher --properties mapreduce.job.user.classpath.first=true -- -sysconfig /opt/gobblin-dist/conf/gobblin-mapreduce.properties -jobconfig gs://my-bucket/gobblin-kafka-gcs.job
#!/bin/bash
sed -i 's%hdfs://myspark-m%file:///.%g' /usr/lib/hadoop/etc/hadoop/core-site.xml
cd /opt
gsutil cp gs://my-bucket/gobblin-distribution-0.10.0.tar.gz .
tar xvf gobblin-distribution-0.10.0.tar.gz
BASE="$PWD/gobblin-dist"
HADOOP_LIB="/usr/lib/hadoop/lib"
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
rm $HADOOP_LIB/commons-cli-1*.jar $HADOOP_LIB/guava-1*.jar
cp $BASE/lib/* $HADOOP_LIB
#*/ fix syntax highlight
else
rm $HADOOP_LIB/guava-1*.jar
GOBBLIN_VERSION="0.10.0"
FWDIR_LIB=$BASE/lib
#copied this from gobblin-dist/bin/gobblin-mapreduce.sh
cp \
$FWDIR_LIB/gobblin-api-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-avro-json-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-codecs-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-core-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-core-base-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-crypto-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-crypto-provider-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-data-management-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-metastore-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-metrics-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-metrics-base-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-metadata-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/gobblin-utility-$GOBBLIN_VERSION.jar \
$FWDIR_LIB/avro-1.8.1.jar \
$FWDIR_LIB/avro-mapred-1.8.1.jar \
$FWDIR_LIB/commons-lang3-3.4.jar \
$FWDIR_LIB/config-1.2.1.jar \
$FWDIR_LIB/data-2.6.0.jar \
$FWDIR_LIB/gson-2.6.2.jar \
$FWDIR_LIB/guava-15.0.jar \
$FWDIR_LIB/guava-retrying-2.0.0.jar \
$FWDIR_LIB/joda-time-2.9.3.jar \
$FWDIR_LIB/javassist-3.18.2-GA.jar \
$FWDIR_LIB/kafka_2.11-0.8.2.2.jar \
$FWDIR_LIB/kafka-clients-0.8.2.2.jar \
$FWDIR_LIB/metrics-core-2.2.0.jar \
$FWDIR_LIB/metrics-core-3.1.0.jar \
$FWDIR_LIB/metrics-graphite-3.1.0.jar \
$FWDIR_LIB/scala-library-2.11.8.jar \
$FWDIR_LIB/influxdb-java-2.1.jar \
$FWDIR_LIB/okhttp-2.4.0.jar \
$FWDIR_LIB/okio-1.4.0.jar \
$FWDIR_LIB/retrofit-1.9.0.jar \
$FWDIR_LIB/reflections-0.9.10.jar \
$HADOOP_LIB
fi
--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/6305a694-c576-4c87-a6ff-c1a475c8f267%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Job [9f744870-e283-42c3-a416-55605263199f] submitted.
Waiting for job output...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARN [AbstractJobLauncher] Creating a job specific SharedResourcesBroker. Objects will only be shared at the job level.
WARN [JobContext] Property task.data.root.dir is missing.
WARN [KafkaSource] Previous offset for partition of_sample:0 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Previous offset for partition of_sample:1 does not exist. This partition will start from the earliest offset: 0
WARN [TaskStateCollectorService] No output task state files found in gobblin/kafka2gcs/job_kafka2gcs_1495220348066/output/job_kafka2gcs_1495220348066
WARN [SafeDatasetCommit] Not committing dataset of job job_kafka2gcs_1495220348066 with commit policy COMMIT_ON_FULL_SUCCESS and state FAILED
ERROR [JobContext] Iterator executor failure.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Not committing dataset of job job_kafka2gcs_1495220348066 with commit policy COMMIT_ON_FULL_SUCCESS and state FAILED
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at gobblin.util.executors.IteratorExecutor.executeAndGetResults(IteratorExecutor.java:128)
at gobblin.runtime.JobContext.commit(JobContext.java:432)
at gobblin.runtime.JobContext.commit(JobContext.java:398)
at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:431)
at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:89)
at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:66)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
Caused by: java.lang.RuntimeException: Not committing dataset of job job_kafka2gcs_1495220348066 with commit policy COMMIT_ON_FULL_SUCCESS and state FAILED
at gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:86)
at gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:54)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:35)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ERROR [AbstractJobLauncher] Failed to launch and run job job_kafka2gcs_1495220348066: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_kafka2gcs_1495220348066
java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_kafka2gcs_1495220348066
at gobblin.runtime.JobContext.commit(JobContext.java:438)
at gobblin.runtime.JobContext.commit(JobContext.java:398)
at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:431)
at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:89)
at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:66)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
Caused by: gobblin.runtime.JobException: Job job_kafka2gcs_1495220348066 failed
at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:480)
at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:89)
at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:66)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:111)
... 5 more
WARN [ServiceBasedAppLauncher] ApplicationLauncher has already stopped
ERROR: (gcloud.dataproc.jobs.submit.hadoop) Job [9f744870-e283-42c3-a416-55605263199f] entered state [ERROR] while waiting for [DONE]
.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/CAFGk%3De%2B-2Patgf5TEN2Tq6xY6FJDnqq99xemt5H5Extc7a-1Mg%40mail.gmail.com.
2017-05-19 22:37:26 UTC INFO [main] gobblin.kafka.client.Kafka08ConsumerClient 143 - Fetching topic metadata from broker mykafka:9092
2017-05-19 22:37:27 UTC INFO [main] gobblin.source.extractor.extract.kafka.KafkaSource 161 - Discovered topic of_sample
2017-05-19 22:37:27 UTC INFO [main] gobblin.util.ExecutorsUtils 186 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@7ab1127c[Shutting down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
2017-05-19 22:37:27 UTC WARN [pool-8-thread-1] gobblin.source.extractor.extract.kafka.KafkaSource 342 - Previous offset for partition of_sample:0 does not exist. This partition will start from the earliest offset: 0
2017-05-19 22:37:27 UTC INFO [pool-8-thread-1] gobblin.source.extractor.extract.kafka.KafkaSource 465 - Created workunit for partition of_sample:0: lowWatermark=0, highWatermark=0, range=0
2017-05-19 22:37:27 UTC WARN [pool-8-thread-1] gobblin.source.extractor.extract.kafka.KafkaSource 342 - Previous offset for partition of_sample:1 does not exist. This partition will start from the earliest offset: 0
2017-05-19 22:37:27 UTC INFO [pool-8-thread-1] gobblin.source.extractor.extract.kafka.KafkaSource 465 - Created workunit for partition of_sample:1: lowWatermark=0, highWatermark=0, range=0
2017-05-19 22:37:27 UTC INFO [main] gobblin.util.ExecutorsUtils 205 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@7ab1127c[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
2017-05-19 22:37:27 UTC INFO [main] gobblin.source.extractor.extract.kafka.KafkaSource 185 - Created workunits for 1 topics in 0 seconds
2017-05-19 22:37:27 UTC INFO [main] gobblin.source.extractor.extract.kafka.workunit.packer.KafkaAvgRecordTimeBasedWorkUnitSizeEstimator 149 - For all topics not pulled in the previous run, estimated avg time to pull a record is 1.0 milliseconds
2017-05-19 22:37:27 UTC INFO [main] gobblin.source.extractor.extract.kafka.workunit.packer.KafkaWorkUnitPacker 237 - Created MultiWorkUnit for partitions [of_sample:0, of_sample:1]
2017-05-19 22:37:27 UTC INFO [main] gobblin.source.extractor.extract.kafka.workunit.packer.KafkaWorkUnitPacker 311 - MultiWorkUnit 0: estimated load=0.003010, partitions=[]
2017-05-19 22:37:27 UTC INFO [main] gobblin.source.extractor.extract.kafka.workunit.packer.KafkaWorkUnitPacker 311 - MultiWorkUnit 1: estimated load=0.003010, partitions=[[of_sample:0, of_sample:1]]
2017-05-19 22:37:27 UTC INFO [main] gobblin.source.extractor.extract.kafka.workunit.packer.KafkaWorkUnitPacker 297 - Min load of multiWorkUnit = 0.003010; Max load of multiWorkUnit = 0.003010; Diff = 0.000000%
2017-05-19 22:37:27 UTC INFO [main] gobblin.util.ExecutorsUtils 186 - Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@3d299393
2017-05-19 22:37:27 UTC INFO [main] gobblin.util.ExecutorsUtils 205 - Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@3d299393
2017-05-19 22:37:27 UTC INFO [main] gobblin.runtime.AbstractJobLauncher 386 - Starting job job_kafka2gcs_1495233445721
2017-05-19 22:37:27 UTC INFO [main] gobblin.util.ExecutorsUtils 186 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@5dc769f9[Shutting down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
2017-05-19 22:37:27 UTC INFO [main] gobblin.util.ExecutorsUtils 205 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@5dc769f9[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
2017-05-19 22:37:27 UTC INFO [main] gobblin.util.ExecutorsUtils 186 - Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@17fede14
2017-05-19 22:37:27 UTC INFO [main] gobblin.util.ExecutorsUtils 205 - Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@17fede14
2017-05-19 22:37:27 UTC INFO [TaskStateCollectorService STARTING] gobblin.runtime.TaskStateCollectorService 97 - Starting the TaskStateCollectorService
2017-05-19 22:37:27 UTC INFO [main] gobblin.runtime.mapreduce.MRJobLauncher 229 - Launching Hadoop MR job Gobblin-kafka2gcs
2017-05-19 22:37:28 UTC INFO [main] org.apache.hadoop.yarn.client.RMProxy 98 - Connecting to ResourceManager at myspark-m/10.128.0.2:8032
2017-05-19 22:37:29 UTC INFO [main] org.apache.hadoop.mapreduce.JobSubmitter 249 - Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1495233380584_0001
2017-05-19 22:37:29 UTC INFO [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 103 - Stopping the TaskStateCollectorService
2017-05-19 22:37:29 UTC WARN [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 131 - No output task state files found in gobblin/kafka2gcs/job_kafka2gcs_1495233445721/output/job_kafka2gcs_1495233445721
2017-05-19 22:37:29 UTC INFO [main] gobblin.runtime.mapreduce.MRJobLauncher 505 - Deleted working directory gobblin/kafka2gcs/job_kafka2gcs_1495233445721
2017-05-19 22:37:29 UTC ERROR [main] gobblin.runtime.AbstractJobLauncher 442 - Failed to launch and run job job_kafka2gcs_1495233445721: java.io.FileNotFoundException: File does not exist: hdfs://myspark-m/user/root/gobblin/kafka2gcs/job_kafka2gcs_1495233445721/job.state
java.io.FileNotFoundException: File does not exist: hdfs://myspark-m/user/root/gobblin/kafka2gcs/job_kafka2gcs_1495233445721/job.state
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/CAMZ-pYBX%2BzDVP14r0NpKqhvosNSHUXy%3DFLpT%3DPh2Lw%2Bo_dHb_Q%40mail.gmail.com.
Henrique G. Abreu
Henrique G. Abreu
Henrique G. Abreu
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/6305a694-c576-4c87-a6ff-c1a475c8f267%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/CAFGk%3De%2B-2Patgf5TEN2Tq6xY6FJDnqq99xemt5H5Extc7a-1Mg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
Henrique G. Abreu
Henrique G. Abreu
Henrique G. Abreu
--master-boot<span sty
--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/79a83d93-f8c2-4a7b-a71f-ef3b8e01f5e9%40googlegroups.com.