Failed to commit dataset state for some dataset(s) of job

680 views
Skip to first unread message

Mark Sorbello

unread,
Sep 9, 2015, 6:39:13 PM9/9/15
to gobblin-users
I am attempting to deploy a Gobblin Map Reduce Job and am getting the following stacktraces.  I have increased the memory limits all over the place to no avail.  Any help with be appreciated.


Key Classes:



./bin/gobblin-mapreduce.sh --jars ../kafka-gobblin-hdfs-test-0.0.0.jar --conf ../kafka-gobblin-hdfs-test.pull --workdir hdfs://server1:8020/gobblin/work
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/user/tmp/gobblin-dist/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2621/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARN [KafkaSource] Previous offset for partition kafka-gobblin-hdfs-test:0 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Avg event size for partition kafka-gobblin-hdfs-test:0 not available, using default size 1024
Error: java.io.IOException: Not all tasks running in mapper attempt_1441819802872_0003_m_000091_0 completed successfully
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.runWorkUnits(MRJobLauncher.java:724)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:622)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Error: java.io.IOException: Not all tasks running in mapper attempt_1441819802872_0003_m_000091_1 completed successfully
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.runWorkUnits(MRJobLauncher.java:724)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:622)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Error: java.io.IOException: Not all tasks running in mapper attempt_1441819802872_0003_m_000091_2 completed successfully
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.runWorkUnits(MRJobLauncher.java:724)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:622)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

WARN [AbstractJobLauncher] Not committing dataset  of job job_kafka-gobblin-hdfs-test_1441836713419 with commit policy COMMIT_ON_FULL_SUCCESS and state FAILED
ERROR [AbstractJobLauncher] Failed to launch and run job job_kafka-gobblin-hdfs-test_1441836713419: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_kafka-gobblin-hdfs-test_1441836713419
java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_kafka-gobblin-hdfs-test_1441836713419
        at gobblin.runtime.JobContext.commit(JobContext.java:257)
        at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:271)
        at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:60)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:133)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Failed to launch the job due to the following exception:
gobblin.runtime.JobException: Job job_kafka-gobblin-hdfs-test_1441836713419 failed


Sample exceptions from my job history container logs:

2015-09-09 18:09:37,783 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1441819802872_0002_m_000088_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2015-09-09 18:09:37,783 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved server1 to /default-rack
2015-09-09 18:09:37,784 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000094_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2015-09-09 18:09:37,796 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1441819802872_0002_01_000096 taskAttempt attempt_1441819802872_0002_m_000094_0
2015-09-09 18:09:37,796 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1441819802872_0002_m_000094_0
2015-09-09 18:09:37,796 INFO [ContainerLauncher #3] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : server1:45454
2015-09-09 18:09:37,835 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1441819802872_0002_m_000094_0 : 13562
2015-09-09 18:09:37,835 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1441819802872_0002_m_000094_0] using containerId: [container_1441819802872_0002_01_000096 on NM: [server1:45454]
2015-09-09 18:09:37,835 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000094_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2015-09-09 18:09:37,835 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1441819802872_0002_m_000094 Task Transitioned from SCHEDULED to RUNNING
2015-09-09 18:09:38,405 INFO [IPC Server handler 13 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1441819802872_0002_m_000089_0 is : 0.0
2015-09-09 18:09:38,459 INFO [Socket Reader #1 for port 52683] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1441819802872_0002 (auth:SIMPLE)
2015-09-09 18:09:38,557 INFO [IPC Server handler 23 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1441819802872_0002_m_000093 asked for a task
2015-09-09 18:09:38,558 INFO [IPC Server handler 23 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1441819802872_0002_m_000093 given task: attempt_1441819802872_0002_m_000091_0
2015-09-09 18:09:38,636 INFO [IPC Server handler 26 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1441819802872_0002_m_000089_0 is : 1.0
2015-09-09 18:09:38,677 INFO [IPC Server handler 14 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1441819802872_0002_m_000089_0
2015-09-09 18:09:38,680 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000089_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP
2015-09-09 18:09:38,680 INFO [ContainerLauncher #5] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1441819802872_0002_01_000091 taskAttempt attempt_1441819802872_0002_m_000089_0
2015-09-09 18:09:38,680 INFO [ContainerLauncher #5] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1441819802872_0002_m_000089_0
2015-09-09 18:09:38,680 INFO [ContainerLauncher #5] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : server1:45454
2015-09-09 18:09:38,770 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000089_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2015-09-09 18:09:38,771 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1441819802872_0002_m_000089_0
2015-09-09 18:09:38,771 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1441819802872_0002_m_000089 Task Transitioned from RUNNING to SUCCEEDED
2015-09-09 18:09:38,771 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 90
2015-09-09 18:09:38,784 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:6 AssignedReds:0 CompletedMaps:90 CompletedReds:0 ContAlloc:95 ContRel:0 HostLocal:0 RackLocal:0
2015-09-09 18:09:38,802 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1441819802872_0002: ask=1 release= 0 newContainers=1 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=1
2015-09-09 18:09:38,802 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
2015-09-09 18:09:38,802 INFO [RMCommunicator Allocator] org.apache.hadoop.yarn.util.RackResolver: Resolved server1 to /default-rack
2015-09-09 18:09:38,802 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1441819802872_0002_01_000097 to attempt_1441819802872_0002_m_000095_0
2015-09-09 18:09:38,802 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:90 CompletedReds:0 ContAlloc:96 ContRel:0 HostLocal:0 RackLocal:0
2015-09-09 18:09:38,802 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved server1 to /default-rack
2015-09-09 18:09:38,803 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000095_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2015-09-09 18:09:38,803 INFO [ContainerLauncher #7] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1441819802872_0002_01_000097 taskAttempt attempt_1441819802872_0002_m_000095_0
2015-09-09 18:09:38,803 INFO [ContainerLauncher #7] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1441819802872_0002_m_000095_0
2015-09-09 18:09:38,803 INFO [ContainerLauncher #7] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : server1:45454
2015-09-09 18:09:38,894 INFO [ContainerLauncher #7] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1441819802872_0002_m_000095_0 : 13562
2015-09-09 18:09:38,896 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1441819802872_0002_m_000095_0] using containerId: [container_1441819802872_0002_01_000097 on NM: [server1:45454]
2015-09-09 18:09:38,896 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000095_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2015-09-09 18:09:38,896 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1441819802872_0002_m_000095 Task Transitioned from SCHEDULED to RUNNING
2015-09-09 18:09:39,868 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1441819802872_0002: ask=1 release= 0 newContainers=0 finishedContainers=1 resourcelimit=<memory:0, vCores:0> knownNMs=1
2015-09-09 18:09:39,869 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1441819802872_0002_01_000091
2015-09-09 18:09:39,869 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:6 AssignedReds:0 CompletedMaps:90 CompletedReds:0 ContAlloc:96 ContRel:0 HostLocal:0 RackLocal:0
2015-09-09 18:09:39,874 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1441819802872_0002_m_000089_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2015-09-09 18:09:40,960 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
2015-09-09 18:09:40,960 INFO [RMCommunicator Allocator] org.apache.hadoop.yarn.util.RackResolver: Resolved server1 to /default-rack
2015-09-09 18:09:40,960 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1441819802872_0002_01_000098 to attempt_1441819802872_0002_m_000096_0
2015-09-09 18:09:40,960 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:3 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:90 CompletedReds:0 ContAlloc:97 ContRel:0 HostLocal:0 RackLocal:0
2015-09-09 18:09:40,961 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved server1 to /default-rack
2015-09-09 18:09:40,961 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000096_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2015-09-09 18:09:40,961 INFO [ContainerLauncher #4] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1441819802872_0002_01_000098 taskAttempt attempt_1441819802872_0002_m_000096_0
2015-09-09 18:09:40,962 INFO [ContainerLauncher #4] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1441819802872_0002_m_000096_0
2015-09-09 18:09:40,962 INFO [ContainerLauncher #4] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : server1:45454
2015-09-09 18:09:41,036 INFO [ContainerLauncher #4] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1441819802872_0002_m_000096_0 : 13562
2015-09-09 18:09:41,037 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1441819802872_0002_m_000096_0] using containerId: [container_1441819802872_0002_01_000098 on NM: [server1:45454]
2015-09-09 18:09:41,037 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000096_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2015-09-09 18:09:41,037 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1441819802872_0002_m_000096 Task Transitioned from SCHEDULED to RUNNING
2015-09-09 18:09:41,284 INFO [IPC Server handler 18 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1441819802872_0002_m_000090_0 is : 0.0
2015-09-09 18:09:41,552 INFO [IPC Server handler 27 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1441819802872_0002_m_000090_0 is : 1.0
2015-09-09 18:09:41,629 INFO [IPC Server handler 26 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1441819802872_0002_m_000090_0
2015-09-09 18:09:41,641 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000090_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP
2015-09-09 18:09:41,643 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1441819802872_0002_01_000092 taskAttempt attempt_1441819802872_0002_m_000090_0
2015-09-09 18:09:41,664 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1441819802872_0002_m_000090_0
2015-09-09 18:09:41,664 INFO [ContainerLauncher #6] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : server1:45454
2015-09-09 18:09:41,745 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000090_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2015-09-09 18:09:41,745 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1441819802872_0002_m_000090_0
2015-09-09 18:09:41,745 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1441819802872_0002_m_000090 Task Transitioned from RUNNING to SUCCEEDED
2015-09-09 18:09:41,766 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 91
2015-09-09 18:09:41,957 INFO [Socket Reader #1 for port 52683] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1441819802872_0002 (auth:SIMPLE)
2015-09-09 18:09:41,960 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:3 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:91 CompletedReds:0 ContAlloc:97 ContRel:0 HostLocal:0 RackLocal:0
2015-09-09 18:09:41,962 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1441819802872_0002: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=1
2015-09-09 18:09:42,128 INFO [IPC Server handler 6 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1441819802872_0002_m_000094 asked for a task
2015-09-09 18:09:42,129 INFO [IPC Server handler 6 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1441819802872_0002_m_000094 given task: attempt_1441819802872_0002_m_000092_0
2015-09-09 18:09:42,992 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1441819802872_0002_01_000092
2015-09-09 18:09:42,992 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
2015-09-09 18:09:42,992 INFO [RMCommunicator Allocator] org.apache.hadoop.yarn.util.RackResolver: Resolved server1 to /default-rack
2015-09-09 18:09:42,992 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1441819802872_0002_01_000099 to attempt_1441819802872_0002_m_000097_0
2015-09-09 18:09:42,992 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:2 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:91 CompletedReds:0 ContAlloc:98 ContRel:0 HostLocal:0 RackLocal:0
2015-09-09 18:09:42,992 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1441819802872_0002_m_000090_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2015-09-09 18:09:42,992 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved server1 to /default-rack
2015-09-09 18:09:42,993 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000097_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2015-09-09 18:09:43,036 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1441819802872_0002_01_000099 taskAttempt attempt_1441819802872_0002_m_000097_0
2015-09-09 18:09:43,036 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1441819802872_0002_m_000097_0
2015-09-09 18:09:43,037 INFO [ContainerLauncher #8] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : server1:45454
2015-09-09 18:09:43,082 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1441819802872_0002_m_000097_0 : 13562
2015-09-09 18:09:43,083 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1441819802872_0002_m_000097_0] using containerId: [container_1441819802872_0002_01_000099 on NM: [server1:45454]
2015-09-09 18:09:43,083 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1441819802872_0002_m_000097_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2015-09-09 18:09:43,083 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1441819802872_0002_m_000097 Task Transitioned from SCHEDULED to RUNNING
2015-09-09 18:09:43,629 INFO [Socket Reader #1 for port 52683] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1441819802872_0002 (auth:SIMPLE)
2015-09-09 18:09:43,789 INFO [IPC Server handler 24 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1441819802872_0002_m_000095 asked for a task
2015-09-09 18:09:43,789 INFO [IPC Server handler 24 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1441819802872_0002_m_000095 given task: attempt_1441819802872_0002_m_000093_0
2015-09-09 18:09:43,994 INFO [Socket Reader #1 for port 52683] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1441819802872_0002 (auth:SIMPLE)
2015-09-09 18:09:44,005 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1441819802872_0002: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=1
2015-09-09 18:09:44,199 INFO [IPC Server handler 1 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1441819802872_0002_m_000096 asked for a task
2015-09-09 18:09:44,199 INFO [IPC Server handler 1 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1441819802872_0002_m_000096 given task: attempt_1441819802872_0002_m_000094_0
2015-09-09 18:09:44,826 INFO [Socket Reader #1 for port 52683] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1441819802872_0002 (auth:SIMPLE)
2015-09-09 18:09:45,091 INFO [IPC Server handler 10 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1441819802872_0002_m_000097 asked for a task
2015-09-09 18:09:45,091 INFO [IPC Server handler 10 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1441819802872_0002_m_000097 given task: attempt_1441819802872_0002_m_000095_0
2015-09-09 18:09:47,019 INFO [IPC Server handler 10 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1441819802872_0002_m_000091_0 is : 0.0
2015-09-09 18:09:47,021 FATAL [IPC Server handler 6 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1441819802872_0002_m_000091_0 - exited : java.io.IOException: Not all tasks running in mapper attempt_1441819802872_0002_m_000091_0 completed successfully
at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.runWorkUnits(MRJobLauncher.java:724)
at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:622)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-09-09 18:09:47,021 INFO [IPC Server handler 6 on 52683] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1441819802872_0002_m_000091_0: Error: java.io.IOException: Not all tasks running in mapper attempt_1441819802872_0002_m_000091_0 completed successfully
at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.runWorkUnits(MRJobLauncher.java:724)
at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:622)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-09-09 18:09:47,021 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1441819802872_0002_m_000091_0: Error: java.io.IOException: Not all tasks running in mapper attempt_1441819802872_0002_m_000091_0 completed successfully
at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.runWorkUnits(MRJobLauncher.java:724)
at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:622)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)




Yinan Li

unread,
Sep 9, 2015, 6:43:19 PM9/9/15
to gobblin-users
Hi Mark,

The container logs you posted here do not really tell what's the actual exception so not very useful to know what's actually wrong. Can you post the complete logs of mappers that failed here?

Yinan

Mark Sorbello

unread,
Sep 10, 2015, 3:12:54 PM9/10/15
to gobblin-users
Thanks for pointing me in the correct direction.  Found the relevant exception stack trace in the MapReduce Job logs.

Thanks.

jh...@kochava.com

unread,
Oct 21, 2015, 5:35:45 PM10/21/15
to gobblin-users
Hi Mark,

I got the same error running my test as you did.
Can you give me a hint as to how you eventually fixed this?

ERROR [AbstractJobLauncher] Failed to launch and run job job_kafka-gobblin-hdfs-test_1445462773096: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_kafka-gobblin-hdfs-test_1445462773096

java.io.IOException: Failed to commit dataset state for some dataset(s) 

jh...@kochava.com

unread,
Oct 21, 2015, 5:39:09 PM10/21/15
to gobblin-users
FYI, I'm trying to migrate from Camus to Gobblin as part of a Druid system.
I think I have the same goals that you have.
I'm on day 2 of playing with Gobblin, so any hints you could provide would be deeply appreciated.
Cheers!

Johnny

Ziyang Liu

unread,
Oct 21, 2015, 6:24:29 PM10/21/15
to gobblin-users
Hi Johnny, could you post the full stack trace of the "Failed to commit dataset state for some dataset(s)", as well as some logs before that?
Also if there's any mapper failure please post the Exceptions in the mapper log.


On Wednesday, October 21, 2015 at 2:35:45 PM UTC-7, jh...@kochava.com wrote:

jh...@kochava.com

unread,
Oct 21, 2015, 7:18:00 PM10/21/15
to gobblin-users
Ziyang,
I posted a tar of logs with exceptions.
Let me know if you need more info.
Thank-you! 
...
logs.tar.gz

Mark

unread,
Oct 21, 2015, 7:51:24 PM10/21/15
to gobblin-users
I don't recall the exact issue right now though you can find my working Gobblin example here:

Mark

unread,
Oct 21, 2015, 9:39:09 PM10/21/15
to gobblin-users
From memory,  I was getting an exception in my MapReduce logs due to issues with parsing some invalid data I added to my Kafka Topic (Thrown in KafkaJsonConverter.java), which then resulted in the IOException stacktrace I gave above.  It actually took some digging for me to find the MapReduce logs.

Generally I was not quick enough to drill down to the Map Reduce logs once the process was launched via the Hadoop and my logging configuration in Ambari cluster meant that the logs where getting deleted before I could read them.

So:

jh...@kochava.com

unread,
Oct 22, 2015, 12:24:46 PM10/22/15
to gobblin-users
Hi Mark,
OK that makes sense. I'll take another look at it today.
Thanks also for the hint about the logs. Like you, I couldn't find the logs that gave more info than what you posted.
Was the parsing error due to invalid json? Or was it part of the avro conversion process?

Mark

unread,
Oct 22, 2015, 1:47:33 PM10/22/15
to gobblin-users
While I was sometimes able to get my MapReduce logs from the MapReduce JobHistory Server (http://localhost:19888), I really did find the yarn log tool helpful "yarn logs -applicationId application_1414530900704_0007".  When I was desperate, I used a simple grep recursive call "grep -r 'texthere' /" to find matches.  I also stumbled over this YouTube video which you might find helpful.  Map Reduce Debugging with print statement


The particular issue I described above occurred as I was expecting a certain type of JSON message with certain fields in my Kafka Topic.  Unfortunately, I created a couple of random messages that did not conform to the JSON schema I was expecting, and my code did not handle this graciously.

jh...@kochava.com

unread,
Oct 26, 2015, 1:54:52 PM10/26/15
to gobblin-users
Hi Mark,
I'm still having problems getting the MR job to run.
I'm curious, how does your job figure out where all the Hadoop properties are?
I think there's some issue where certain Hadoop classes are not being located, so I just wonder how you got your job to point to the right Hadoop configs, etc.

jh...@kochava.com

unread,
Oct 26, 2015, 5:04:50 PM10/26/15
to gobblin-users
Hi Mark,
I actually had a break through and I did get your code to work.
I had to modify the jar path and give the specific jars that its calling.
I also deleted the schema in your pull file. Big mistake.
Anyway, all is well!
Thanks for all  your help!

Vamsikrushna L

unread,
Jan 6, 2016, 1:20:01 AM1/6/16
to gobblin-users
Hi,

I am also getting the same error.
Please let me know the way to fix it.

Thanks in advance!

Sahil Takiar

unread,
Jan 6, 2016, 3:26:54 PM1/6/16
to Vamsikrushna L, gobblin-users
@Vamsi, I believe your issue was resolved in another thread. However, I have added this exception and a resolution to the Gobblin FAQ page: https://github.com/linkedin/gobblin/wiki/FAQs#resolve-gobblin-on-mr-exception-ioexception-not-all-tasks-running-in-mapper-attempt_id-completed-successfully

--Sahil

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/74cafb07-81d9-4342-aba4-c6ff750d047e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages