Dear all,
I am trying to do batch ingestion via EMR (index_hadoop). The job is getting submitted to overlord, but failing at map-reduce step on EMR.I am getting this error in task logs
2025-01-08T06:53:47,542 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - map 0% reduce 0%
2025-01-08T06:53:47,554 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_1736316777435_0001 failed with state FAILED due to: Application application_1736316777435_0001 failed 2 times due to AM Container for appattempt_1736316777435_0001_000002 exited with exitCode: 1
I am getting this error in emr logs. The jar file, druid-indexing-hadoop is is my druid pod, and the lib location is part of classpath.
2025-01-08 06:53:46,620 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:545)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:525)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1790)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:525)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:310)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1745)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1676)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2428)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:223)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542)
... 11 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2332)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2426)
... 13 more
2025-01-08 06:53:46,622 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
Can someone please guide here? Thanks!