Fail to mkdir

17 views
Skip to first unread message

Chan 乔伊

unread,
Dec 4, 2016, 1:08:20 PM12/4/16
to mrjob
Hi,everbody:
   This is my first time using mrjob, however I encounter the following problems when executing the relevant python script using mrjob:

  No configs found; falling back on auto-configuration
Looking for hadoop binary in /home/work/alex/tools/hadoop-client-1.5.5/hadoop/bin...
Found hadoop binary: /home/work/alex/tools/hadoop-client-1.5.5/hadoop/bin/hadoop
Creating temp directory /tmp/simrank_mr.work.20161204.050846.350418
Using Hadoop version 2
STDERR: 16/12/04 13:08:48 INFO common.UpdateService: ZkstatusUpdater to hn01-lp-hdfs.dmop.ac.com:54310 started
STDERR: mkdir: cannot create directory -p: File exists
STDERR: java.io.IOException: cannot create directory -p: File exists
STDERR:         at org.apache.hadoop.fs.FsShell.mkdir(FsShell.java:1020)
STDERR:         at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1934)
STDERR:         at org.apache.hadoop.fs.FsShell.run(FsShell.java:2259)
STDERR:         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
STDERR:         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
STDERR:         at org.apache.hadoop.fs.FsShell.main(FsShell.java:2331)
Traceback (most recent call last):
  File "simrank_mr.py", line 121, in <module>
    MRSimRank.run()
  File "/home/work/.jumbo/lib/python2.7/site-packages/mrjob-0.5.6-py2.7.egg/mrjob/job.py", line 429, in run
    mr_job.execute()
  File "/home/work/.jumbo/lib/python2.7/site-packages/mrjob-0.5.6-py2.7.egg/mrjob/job.py", line 447, in execute
    super(MRJob, self).execute()
  File "/home/work/.jumbo/lib/python2.7/site-packages/mrjob-0.5.6-py2.7.egg/mrjob/launch.py", line 158, in execute
    self.run_job()
  File "/home/work/.jumbo/lib/python2.7/site-packages/mrjob-0.5.6-py2.7.egg/mrjob/launch.py", line 228, in run_job
    runner.run()
  File "/home/work/.jumbo/lib/python2.7/site-packages/mrjob-0.5.6-py2.7.egg/mrjob/runner.py", line 481, in run
    self._run()
  File "/home/work/.jumbo/lib/python2.7/site-packages/mrjob-0.5.6-py2.7.egg/mrjob/hadoop.py", line 335, in _run
    self._upload_local_files_to_hdfs()
  File "/home/work/.jumbo/lib/python2.7/site-packages/mrjob-0.5.6-py2.7.egg/mrjob/hadoop.py", line 362, in _upload_local_files_to_hdfs
    self.fs.mkdir(self._upload_mgr.prefix)
  File "/home/work/.jumbo/lib/python2.7/site-packages/mrjob-0.5.6-py2.7.egg/mrjob/fs/composite.py", line 76, in mkdir
    return self._do_action('mkdir', path)
  File "/home/work/.jumbo/lib/python2.7/site-packages/mrjob-0.5.6-py2.7.egg/mrjob/fs/composite.py", line 63, in _do_action
    raise first_exception
IOError: Could not mkdir hdfs:///user/work/alex/tmp/cluster/mrjob/tmp/tmp/simrank_mr.work.20161204.050846.350418/files/
      
     Anyone knows how to solve this problem?   Many thanks!

János Brezniczky

unread,
Dec 16, 2016, 8:47:37 PM12/16/16
to mrjob
Hi Chan,

You'll need to create and apply a configuration file (mrjob.conf).

This can be specified e.g. using the command line switch --conf-path ./mrjob.conf.

I think for starters it's the easiest to experiment locally, but if you go 'live', with a fully capable user, i.e. you need to create the AWS credentials (top right hand corner), confirm the dialog, then use those credentials in the configuration file.
See aws_access_key_id and aws_secret_access_key. If you are new to this, I'd also mention that setting the region to your preferred location can help you to easily find the executing jobs as AWS will filter them by region, and the default US one isn't everyone's default choice.

For more info you can check out the Runners topic.

I hope this helps.

Janos
Reply all
Reply to author
Forward
0 new messages