Hi to you all,
I am running gobblin Kafka to HDFs ingestion as a regular job via Jenkins. My job failes regularly with a
java.io.EOFException. As the next run is always successful and the temporary files where deleted, I could not, yet track the cause of the error.
Has someone of you experienced a similar issue?
mxraw is the jobName ...
Here's the stack trace:
May 26, 2016 8:16:08 AM com.google.common.util.concurrent.AbstractScheduledService$1$1 run
WARNING: Error while attempting to shut down the service after failure.
java.io.IOException: java.io.EOFException: hdfs://nameservice1/data/mxraw/.ingestion/working/mxraw/output/job_mxraw_1464243302979/task_mxraw_1464243302979_7.tst not a SequenceFile
at gobblin.util.ParallelRunner.close(ParallelRunner.java:291)
at gobblin.runtime.TaskStateCollectorService.collectOutputTaskStates(TaskStateCollectorService.java:145)
at gobblin.runtime.TaskStateCollectorService.runOneIteration(TaskStateCollectorService.java:81)
at gobblin.runtime.TaskStateCollectorService.shutDown(TaskStateCollectorService.java:102)
at com.google.common.util.concurrent.AbstractScheduledService$1$1.run(AbstractScheduledService.java:175)
at com.google.common.util.concurrent.Callables$3.run(Callables.java:93)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException: hdfs://nameservice1/data/mxraw/.ingestion/working/mxraw/output/job_mxraw_1464243302979/task_mxraw_1464243302979_7.tst not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1852)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1760)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774)
at gobblin.util.ParallelRunner$3.call(ParallelRunner.java:160)
at gobblin.util.ParallelRunner$3.call(ParallelRunner.java:154)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more
16/05/26 08:16:13 INFO mapreduce.Job: Job job_1461182855125_53342 completed successfully
16/05/26 08:16:13 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=4210550
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=83616628
HDFS: Number of bytes written=1833615219
HDFS: Number of read operations=132219
HDFS: Number of large read operations=0
HDFS: Number of write operations=13269
Job Counters
Launched map tasks=30
Other local map tasks=30
Total time spent by all maps in occupied slots (ms)=3701607
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1233869
Total vcore-seconds taken by all map tasks=1233869
Total megabyte-seconds taken by all map tasks=3790445568
Map-Reduce Framework
Map input records=30
Map output records=0
Input split bytes=4710
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=4946
CPU time spent (ms)=998150
Physical memory (bytes) snapshot=24812699648
Virtual memory (bytes) snapshot=82111893504
Total committed heap usage (bytes)=25757220864
File Input Format Counters
Bytes Read=37174
File Output Format Counters
Bytes Written=0
16/05/26 08:16:13 ERROR runtime.AbstractJobLauncher: Failed to launch and run job job_mxraw_1464243302979: java.lang.IllegalStateException: Expected the service to be TERMINATED, but the service has FAILED
java.lang.IllegalStateException: Expected the service to be TERMINATED, but the service has FAILED
at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:334)
at com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:303)
at com.google.common.util.concurrent.AbstractScheduledService.awaitTerminated(AbstractScheduledService.java:402)
at gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:227)
at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:261)
at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:60)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:133)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: java.io.EOFException: hdfs://nameservice1/data/mxraw/.ingestion/working/mxraw/output/job_mxraw_1464243302979/task_mxraw_1464243302979_15.tst not a SequenceFile
at gobblin.util.ParallelRunner.close(ParallelRunner.java:291)
at gobblin.runtime.TaskStateCollectorService.collectOutputTaskStates(TaskStateCollectorService.java:145)
at gobblin.runtime.TaskStateCollectorService.runOneIteration(TaskStateCollectorService.java:81)
at com.google.common.util.concurrent.AbstractScheduledService$1$1.run(AbstractScheduledService.java:172)
at com.google.common.util.concurrent.Callables$3.run(Callables.java:93)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException: hdfs://nameservice1/data/mxraw/.ingestion/working/mxraw/output/job_mxraw_1464243302979/task_mxraw_1464243302979_15.tst not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1852)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1760)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774)
at gobblin.util.ParallelRunner$3.call(ParallelRunner.java:160)
at gobblin.util.ParallelRunner$3.call(ParallelRunner.java:154)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more