subprocess.CalledProcessError: Command returned non-zero exit status 1

969 views
Skip to first unread message

Manish Maheshwari

unread,
Aug 21, 2014, 2:06:34 PM8/21/14
to mr...@googlegroups.com

Hi,

Following the mrjob guide, I have created a sample job as  -

[cloudera@quickstart mrjob]$ cat mr_first_job.py
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
    def mapper(self, _, line):
        yield "chars", len(line)
        yield "words", len(line.split())
        yield "lines", 1
    def reducer(self, key, values):
        yield key, sum(values)
if __name__ == '__main__':
    MRWordFrequencyCount.run()

Next I have uploaded a file in hdfs as below - 
[cloudera@quickstart mrjob]$ hadoop fs -ls /user/cloudera/ngrams
Found 1 items
-rw-r--r--   1 cloudera cloudera    9175040 2014-08-21 10:23 /user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv

But when i run the mrjob i get the below error - 
[cloudera@quickstart mrjob]$ python mr_first_job.py -r hadoop --hadoop-bin /usr/bin/hadoop --jobconf mapred.reduce.tasks=1 -o hdfs:///user/cloudera/output-mrjob hdfs:///user/cloudera/ngrams
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/mr_first_job.cloudera.20140821.180047.687023
writing wrapper script to /tmp/mr_first_job.cloudera.20140821.180047.687023/setup-wrapper.sh
STDERR: mkdir: `hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.180047.687023/files/': No such file or directory
Traceback (most recent call last):
  File "mr_first_job.py", line 10, in <module>
    MRWordFrequencyCount.run()
  File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 494, in run
    mr_job.execute()
  File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 512, in execute
    super(MRJob, self).execute()
  File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 147, in execute
    self.run_job()
  File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 208, in run_job
    runner.run()
  File "/usr/lib/python2.6/site-packages/mrjob/runner.py", line 458, in run
    self._run()
  File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 238, in _run
    self._upload_local_files_to_hdfs()
  File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 265, in _upload_local_files_to_hdfs
    self._mkdir_on_hdfs(self._upload_mgr.prefix)
  File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 273, in _mkdir_on_hdfs
    self.invoke_hadoop(['fs', '-mkdir', path])
  File "/usr/lib/python2.6/site-packages/mrjob/fs/hadoop.py", line 109, in invoke_hadoop
    raise CalledProcessError(proc.returncode, args)
subprocess.CalledProcessError: Command '['/usr/bin/hadoop', 'fs', '-mkdir', 'hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.180047.687023/files/']' returned non-zero exit status 1
[cloudera@quickstart mrjob]$


The python code is correct as it runs in local mode - 

[cloudera@quickstart mrjob]$ python mr_first_job.py mr_first_job.py
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/mr_first_job.cloudera.20140821.180158.291763
writing to /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-mapper_part-00000
Counters from step 1:
  (no counters found)
writing to /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-mapper-sorted
> sort /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-mapper_part-00000
writing to /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
Moving /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-reducer_part-00000 -> /tmp/mr_first_job.cloudera.20140821.180158.291763/output/part-00000
Streaming final output from /tmp/mr_first_job.cloudera.20140821.180158.291763/output
"chars" 308
"lines" 12
"words" 31
removing tmp directory /tmp/mr_first_job.cloudera.20140821.180158.291763
[cloudera@quickstart mrjob]$


I tried with 
[cloudera@quickstart mrjob]$ python mr_first_job.py -r hadoop hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv --check-input-paths=false
Usage: mr_first_job.py [options] [input files]
mr_first_job.py: error: --check-input-paths option does not take a value
[cloudera@quickstart mrjob]$


Thanks for your help.

Manish

Anusha Rajan

unread,
Aug 21, 2014, 2:14:57 PM8/21/14
to mr...@googlegroups.com
Hi Manish,
The check_input_paths option does not take any value - i.e. "--check-input-paths=false" is an incorrect syntax. 
Use "--check-input-paths" for true and "--no-check-input-paths" for false, as mentioned in the doc:

For your case,
$ python mr_first_job.py -r hadoop hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv --check-input-paths

should work.

-anusha

--
You received this message because you are subscribed to the Google Groups "mrjob" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mrjob+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Manish Maheshwari

unread,
Aug 21, 2014, 2:29:00 PM8/21/14
to mr...@googlegroups.com
Hi Anusha,

Thanks for the quick reply. 

I tried both the options and still they error out as below - 

[cloudera@quickstart mrjob]$ python mr_first_job.py -r hadoop hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv --check-input-paths
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
STDERR: Error: Could not find or load main class org.apache.hadoop.util.PlatformName
STDERR: Error: Could not find or load main class org.apache.hadoop.fs.FsShell
Traceback (most recent call last):
  File "mr_first_job.py", line 10, in <module>
    MRWordFrequencyCount.run()
  File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 494, in run
    mr_job.execute()
  File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 512, in execute
    super(MRJob, self).execute()
  File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 147, in execute
    self.run_job()
  File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 208, in run_job
    runner.run()
  File "/usr/lib/python2.6/site-packages/mrjob/runner.py", line 458, in run
    self._run()
  File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 235, in _run
    self._check_input_exists()
  File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 249, in _check_input_exists
    if not self.path_exists(path):
  File "/usr/lib/python2.6/site-packages/mrjob/fs/composite.py", line 78, in path_exists
    return self._do_action('path_exists', path_glob)
  File "/usr/lib/python2.6/site-packages/mrjob/fs/composite.py", line 62, in _do_action
    raise first_exception
IOError: Could not check path hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv



[cloudera@quickstart mrjob]$ python mr_first_job.py -r hadoop hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv --no-check-input-paths
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/mr_first_job.cloudera.20140821.182418.897267
writing wrapper script to /tmp/mr_first_job.cloudera.20140821.182418.897267/setup-wrapper.sh
STDERR: Error: Could not find or load main class org.apache.hadoop.util.PlatformName
STDERR: Error: Could not find or load main class org.apache.hadoop.fs.FsShell
Traceback (most recent call last):
  File "mr_first_job.py", line 10, in <module>
    MRWordFrequencyCount.run()
  File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 494, in run
    mr_job.execute()
  File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 512, in execute
    super(MRJob, self).execute()
  File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 147, in execute
    self.run_job()
  File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 208, in run_job
    runner.run()
  File "/usr/lib/python2.6/site-packages/mrjob/runner.py", line 458, in run
    self._run()
  File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 238, in _run
    self._upload_local_files_to_hdfs()
  File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 265, in _upload_local_files_to_hdfs
    self._mkdir_on_hdfs(self._upload_mgr.prefix)
  File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 273, in _mkdir_on_hdfs
    self.invoke_hadoop(['fs', '-mkdir', path])
  File "/usr/lib/python2.6/site-packages/mrjob/fs/hadoop.py", line 109, in invoke_hadoop
    raise CalledProcessError(proc.returncode, args)
subprocess.CalledProcessError: Command '['/usr/lib/hadoop-0.20-mapreduce/bin/hadoop', 'fs', '-mkdir', 'hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.182418.897267/files/']' returned non-zero exit status 1


Since its crying about mkdir command, i tried to execute them with the same id and got the below output.
[cloudera@quickstart mrjob]$ hadoop fs -mkdir hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.182418.897267/files
mkdir: `hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.182418.897267/files': No such file or directory


Just in case HADOOP_HOME has been set 
[cloudera@quickstart mrjob]$ env | grep HAD
HADOOP_HOME=/usr/lib/hadoop-0.20-mapreduce

Thanks,
Manish
Reply all
Reply to author
Forward
0 new messages