Before kerberoization of the hadoop cluster, druid hadoop ingestion works fine. The jobs are failing after the hadoop cluster is kerberoized.
All druid nodes middle manager, Overlord, Historical, Broker, Coordinator are ensured it talks to KDC server and normal hadoop commands works fine from these machines.
I see the below error in the middle manager log after the ingestion is kick started. middle manager talks to HDFS, acknowledges the kerberos token to launch the job. Afterwards it goes for two attempts and fails.
Can anyone help on this issue.
[INFO ] 2017-01-24 04:39:47.589 [task-runner-0-priority-0] FileInputFormat - Total input paths to process : 50
[INFO ] 2017-01-24 04:39:47.959 [task-runner-0-priority-0] JobSubmitter - number of splits:50
[INFO ] 2017-01-24 04:39:48.076 [task-runner-0-priority-0] JobSubmitter - Submitting tokens for job: job_1484691336524_131839
[INFO ] 2017-01-24 04:39:48.076 [task-runner-0-priority-0] JobSubmitter - Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:stampy, Ident: (HDFS_DELEGATION_TOKEN token 1028122 for pp_dt_fpti_batch)
[INFO ] 2017-01-24 04:39:48.213 [task-runner-0-priority-0] YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
[INFO ] 2017-01-24 04:39:50.185 [task-runner-0-priority-0] YarnClientImpl - Application submission is not finished, submitted application application_1484691336524_131839 is still in NEW
[INFO ] 2017-01-24 04:39:56.209 [task-runner-0-priority-0] YarnClientImpl - Application submission is not finished, submitted application application_1484691336524_131839 is still in NEW
[INFO ] 2017-01-24 04:39:57.817 [task-runner-0-priority-0] YarnClientImpl - Submitted application application_1484691336524_131839
[INFO ] 2017-01-24 04:39:57.855 [task-runner-0-priority-0] Job - Running job: job_1484691336524_131839
[INFO ] 2017-01-24 04:40:05.878 [task-runner-0-priority-0] Job - Job job_1484691336524_131839 running in uber mode : false
[INFO ] 2017-01-24 04:40:05.879 [task-runner-0-priority-0] Job - map 0% reduce 0%
[INFO ] 2017-01-24 04:40:05.890 [task-runner-0-priority-0] Job - Job job_1484691336524_131839 failed with state FAILED due to: Application application_1484691336524_131839 failed 2 times due to AM Container for appattempt_1484691336524_131839_000002 exited with exitCode: 1
Diagnostics: Exception from container-launch.
Container id: container_1484691336524_131839_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : user is pp_dt_fpti_batch
main : requested yarn user is pp_dt_fpti_batch
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
[INFO ] 2017-01-24 04:40:05.910 [task-runner-0-priority-0] Job - Counters: 0
[ERROR] 2017-01-24 04:40:05.911 [task-runner-0-priority-0] DetermineHashedPartitionsJob - Job failed: job_1484691336524_131839
[INFO ] 2017-01-24 04:40:05.912 [task-runner-0-priority-0] JobHelper - Deleting path[hdfs://stampy/sys/dt/fpti/druid/workingPath/druid_ingest/2017-01-24T043937.904Z_dd58b1f1d85f48b4a1c8f5e51e65d8ee]
[ERROR] 2017-01-24 04:40:05.935 [task-runner-0-priority-0] ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=druid_ingest-17-01-16-16-10zN9oKA, type=index_hadoop, dataSource=druid_ingest}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.2.jar:0.9.2]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_73]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_73]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_73]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_73]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
... 7 more
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.DetermineHashedPartitionsJob] failed!
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_73]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_73]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_73]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_73]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
... 7 more
[INFO ] 2017-01-24 04:40:05.944 [task-runner-0-priority-0] TaskRunnerUtils - Task [druid_ingest-17-01-16-16-10zN9oKA] status changed to [FAILED].
[INFO ] 2017-01-24 04:40:05.947 [task-runner-0-priority-0] ExecutorLifecycle - Task completed with status: {
"id" : "druid_ingest-17-01-16-16-10zN9oKA",
"status" : "FAILED",
"duration" : 22621
}