HadoopDruidIndexer - Hadoop HDFS Chmod Issues

754 views
Skip to first unread message

Peter Thai

unread,
Jun 13, 2014, 2:21:37 AM6/13/14
to druid-de...@googlegroups.com
Hello!
I've been trying get the HadoopDruidIndexer to work with my cluster, but to no avail.

After digging around, it seems like it may be a version issue between my version of Hadoop and the one Druid expects.

Thanks in advance!!

Here's how I'm starting my index task:

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath services/target/*:services/target/classes:lib/*:/etc/hadoop/conf/ io.druid.cli.Main index hadoop macrosense/hadoopindexer.specFile

Here's the error:

2014-06-13 06:17:00,871 WARN [main] org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-06-13 06:17:00,894 INFO [main] io.druid.indexer.path.StaticPathSpec - Adding paths[s3n://processing/input/]
2014-06-13 06:17:03,622 INFO [main] io.druid.indexer.JobHelper - Uploading jar to path[/tmp/classpath/druid-services-0.6.122-SNAPSHOT.jar]
2014-06-13 06:17:03,806 INFO [main] io.druid.indexer.path.StaticPathSpec - Adding paths[s3n://macrosense-rtb-processing/input/]
2014-06-13 06:17:03,818 INFO [main] org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
2014-06-13 06:17:03,819 INFO [main] org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2014-06-13 06:17:03,843 WARN [main] org.apache.hadoop.mapreduce.JobSubmitter - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2014-06-13 06:17:03,850 INFO [main] org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area file:/user/sense367716156/.staging/job_local367716156_0001
2014-06-13 06:17:03,850 WARN [main] org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:sense (auth:SIMPLE) cause:org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access ‘/user/sense367716156/.staging/job_local367716156_0001’: No such file or directory

2014-06-13 06:17:03,852 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:113)
at io.druid.cli.Main.main(Main.java:92)
Caused by: java.lang.RuntimeException: org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access ‘/user/sense367716156/.staging/job_local367716156_0001’: No such file or directory

at com.google.common.base.Throwables.propagate(Throwables.java:160)
at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:202)
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:135)
at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:86)
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:135)
at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:57)
at io.druid.cli.Main.main(Main.java:92)
... 6 more
Caused by: org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access ‘/user/sense367716156/.staging/job_local367716156_0001’: No such file or directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:598)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:179)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:120)
... 11 more


Here's my hadoop version info:

Hadoop 2.3.0-cdh5.0.0

Nishant Bangarwa

unread,
Jun 13, 2014, 5:01:03 AM6/13/14
to druid-de...@googlegroups.com
Hi Peter, 
looks like  wither it doesnt have permissions to access the files from hdfs or the input file is not present in the directory. 
can you check that the input file is present and has proper hdfs permissions to resolve this ? 

for using druid with hadoop 2.3.0-cdh5.0.0, you can set the hadoopCoordinates in the task to point to hadoop-client 2.3.0-cdh5.0.0 version. 


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/c6d09cfa-af63-4e30-9262-aa78adf56436%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Peter Thai

unread,
Jun 13, 2014, 1:50:23 PM6/13/14
to druid-de...@googlegroups.com
Hi Nishant,

Thanks for your reply.

It looks like the folder "‘/user/sense367716156/" is not being created. The user submitting the job is 'sense', but I don't know where the '367716156' is coming from.
Am I correct in thinking that all I have to do to configure the HadoopDruidIndexer to my hadoop config is include the path to the hadoop-conf/ folder in the java classpath when I run the indexer? 

Also, I found from digging around in other tickets that hadoop 2.3.0 is already the default in the hadoopDruidIndexer... so that should not be a problem.

-pt

Peter Thai

unread,
Jun 13, 2014, 5:24:42 PM6/13/14
to druid-de...@googlegroups.com
I adjusted the pathSpec to point to only one file on S3, and I'm making progress. I was also able to get the job submitted to my cluster after adding the correct hadoop-conf folder and `hadoop classpath` into the indexer's classpath.

Thanks for the tips!

Now I'm faced with yet another error... after the job starts, the reducers get:

2014-06-13 20:58:52,572 INFO [main] org.apache.hadoop.mapred.JobClient - Task Id : attempt_201406132057_0001_m_000003_8, Status : FAILED
Error: com.google.inject.CreationException: Guice creation errors:

1) No implementation for io.druid.query.QueryRunnerFactory annotated with @com.google.inject.multibindings.Element(setName=, uniqueId=17, type=MAPBINDER) was bound.
  while locating io.druid.query.QueryRunnerFactory annotated with @com.google.inject.multibindings.Element(setName=, uniqueId=17, type=MAPBINDER)
  at com.google.inject.multibindings.MapBinder$RealMapBinder$1.initialize(MapBinder.java:380)
  at io.druid.guice.DruidBinders.queryRunnerFactoryBinder(DruidBinders.java:38)

2) No implementation for io.druid.query.QueryToolChest annotated with @com.google.inject.multibindings.Element(setName=, uniqueId=3, type=MAPBINDER) was bound.
  while locating io.druid.query.QueryToolChest annotated with @com.google.inject.multibindings.Element(setName=, uniqueId=3, type=MAPBINDER)
  at com.google.inject.multibindings.MapBinder$RealMapBinder$1.initialize(MapBinder.java:380)
  at io.druid.guice.DruidBinders.queryToolChestBinder(DruidBinders.java:45)

After they all fail, the reducer gives a

2014-06-13 20:59:00,641 INFO [main] org.apache.hadoop.mapred.JobClient - Job complete: job_201406132057_0001
2014-06-13 20:59:01,285 INFO [main] org.apache.hadoop.mapred.JobClient - Counters: 7
2014-06-13 20:59:01,288 INFO [main] org.apache.hadoop.mapred.JobClient -   Job Counters 
2014-06-13 20:59:01,288 INFO [main] org.apache.hadoop.mapred.JobClient -     Failed map tasks=1
2014-06-13 20:59:01,288 INFO [main] org.apache.hadoop.mapred.JobClient -     Launched map tasks=40
2014-06-13 20:59:01,288 INFO [main] org.apache.hadoop.mapred.JobClient -     Rack-local map tasks=40
2014-06-13 20:59:01,288 INFO [main] org.apache.hadoop.mapred.JobClient -     Total time spent by all maps in occupied slots (ms)=89022
2014-06-13 20:59:01,288 INFO [main] org.apache.hadoop.mapred.JobClient -     Total time spent by all reduces in occupied slots (ms)=0
2014-06-13 20:59:01,288 INFO [main] org.apache.hadoop.mapred.JobClient -     Total time spent by all maps waiting after reserving slots (ms)=0
2014-06-13 20:59:01,288 INFO [main] org.apache.hadoop.mapred.JobClient -     Total time spent by all reduces waiting after reserving slots (ms)=0
2014-06-13 20:59:01,291 ERROR [main] io.druid.indexer.DetermineHashedPartitionsJob - Job failed: job_201406132057_0001
2014-06-13 20:59:01,291 INFO [main] io.druid.indexer.JobHelper - Deleting path[/tmp/geo3/2014-06-13T205738.417Z]
2014-06-13 20:59:01,319 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:113)
at io.druid.cli.Main.main(Main.java:92)
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.DetermineHashedPartitionsJob] failed!
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:155)
at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:86)
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:135)
at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:57)
at io.druid.cli.Main.main(Main.java:92)
... 6 more

My job's classpath looks like this:

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath processing/target/*:services/target/*:services/target/classes:lib/*:/sense/hadoop-clusters/two-nodes/:`hadoop classpath` io.druid.cli.Main index hadoop macrosense/hadoopindexer.specFile

I would appreciate any insight!!
Thanks :) 

Nishant Bangarwa

unread,
Jun 16, 2014, 3:02:21 AM6/16/14
to druid-de...@googlegroups.com
Hi Peter, 

can you try by explicitly adding only druid-services-selfcontained jar is in the classpath.  
also remove the processing jar from classpath, selfcontained jar should have all th erequired dependencies from druid.  



For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages