subprocess.CalledProcessError: Command returned non-zero exit status 1

969 views

Skip to first unread message

Manish Maheshwari

unread,

Aug 21, 2014, 2:06:34 PM8/21/14

to mr...@googlegroups.com

Hi,

Following the mrjob guide, I have created a sample job as -

[cloudera@quickstart mrjob]$ cat mr_first_job.py

from mrjob.job import MRJob

class MRWordFrequencyCount(MRJob):

def mapper(self, _, line):

yield "chars", len(line)

yield "words", len(line.split())

yield "lines", 1

def reducer(self, key, values):

yield key, sum(values)

if __name__ == '__main__':

MRWordFrequencyCount.run()

Next I have uploaded a file in hdfs as below -

[cloudera@quickstart mrjob]$ hadoop fs -ls /user/cloudera/ngrams

Found 1 items

-rw-r--r-- 1 cloudera cloudera 9175040 2014-08-21 10:23 /user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv

But when i run the mrjob i get the below error -

[cloudera@quickstart mrjob]$ python mr_first_job.py -r hadoop --hadoop-bin /usr/bin/hadoop --jobconf mapred.reduce.tasks=1 -o hdfs:///user/cloudera/output-mrjob hdfs:///user/cloudera/ngrams

no configs found; falling back on auto-configuration

creating tmp directory /tmp/mr_first_job.cloudera.20140821.180047.687023

writing wrapper script to /tmp/mr_first_job.cloudera.20140821.180047.687023/setup-wrapper.sh

STDERR: mkdir: `hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.180047.687023/files/': No such file or directory

Traceback (most recent call last):

File "mr_first_job.py", line 10, in <module>

MRWordFrequencyCount.run()

File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 494, in run

mr_job.execute()

File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 512, in execute

super(MRJob, self).execute()

File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 147, in execute

self.run_job()

File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 208, in run_job

runner.run()

File "/usr/lib/python2.6/site-packages/mrjob/runner.py", line 458, in run

self._run()

File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 238, in _run

self._upload_local_files_to_hdfs()

File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 265, in _upload_local_files_to_hdfs

self._mkdir_on_hdfs(self._upload_mgr.prefix)

File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 273, in _mkdir_on_hdfs

self.invoke_hadoop(['fs', '-mkdir', path])

File "/usr/lib/python2.6/site-packages/mrjob/fs/hadoop.py", line 109, in invoke_hadoop

raise CalledProcessError(proc.returncode, args)

subprocess.CalledProcessError: Command '['/usr/bin/hadoop', 'fs', '-mkdir', 'hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.180047.687023/files/']' returned non-zero exit status 1

[cloudera@quickstart mrjob]$

The python code is correct as it runs in local mode -

[cloudera@quickstart mrjob]$ python mr_first_job.py mr_first_job.py

no configs found; falling back on auto-configuration

creating tmp directory /tmp/mr_first_job.cloudera.20140821.180158.291763

writing to /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-mapper_part-00000

Counters from step 1:

(no counters found)

writing to /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-mapper-sorted

> sort /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-mapper_part-00000

writing to /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-reducer_part-00000

Counters from step 1:

(no counters found)

Moving /tmp/mr_first_job.cloudera.20140821.180158.291763/step-0-reducer_part-00000 -> /tmp/mr_first_job.cloudera.20140821.180158.291763/output/part-00000

Streaming final output from /tmp/mr_first_job.cloudera.20140821.180158.291763/output

"chars" 308

"lines" 12

"words" 31

removing tmp directory /tmp/mr_first_job.cloudera.20140821.180158.291763

[cloudera@quickstart mrjob]$

I tried with

[cloudera@quickstart mrjob]$ python mr_first_job.py -r hadoop hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv --check-input-paths=false

Usage: mr_first_job.py [options] [input files]

mr_first_job.py: error: --check-input-paths option does not take a value

[cloudera@quickstart mrjob]$

Thanks for your help.

Manish

Anusha Rajan

unread,

Aug 21, 2014, 2:14:57 PM8/21/14

to mr...@googlegroups.com

Hi Manish,

The check_input_paths option does not take any value - i.e. "--check-input-paths=false" is an incorrect syntax.

Use "--check-input-paths" for true and "--no-check-input-paths" for false, as mentioned in the doc:

https://pythonhosted.org/mrjob/guides/configs-hadoopy-runners.html#options-available-to-hadoop-runner-only

For your case,

$ python mr_first_job.py -r hadoop hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv --check-input-paths

should work.

-anusha

--
You received this message because you are subscribed to the Google Groups "mrjob" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mrjob+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Manish Maheshwari

unread,

Aug 21, 2014, 2:29:00 PM8/21/14

to mr...@googlegroups.com

Hi Anusha,

Thanks for the quick reply.

I tried both the options and still they error out as below -

[cloudera@quickstart mrjob]$ python mr_first_job.py -r hadoop hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv --check-input-paths

no configs found; falling back on auto-configuration

STDERR: Error: Could not find or load main class org.apache.hadoop.util.PlatformName

STDERR: Error: Could not find or load main class org.apache.hadoop.fs.FsShell

Traceback (most recent call last):

File "mr_first_job.py", line 10, in <module>

MRWordFrequencyCount.run()

File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 494, in run

mr_job.execute()

File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 512, in execute

super(MRJob, self).execute()

File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 147, in execute

self.run_job()

File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 208, in run_job

runner.run()

File "/usr/lib/python2.6/site-packages/mrjob/runner.py", line 458, in run

self._run()

File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 235, in _run

self._check_input_exists()

File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 249, in _check_input_exists

if not self.path_exists(path):

File "/usr/lib/python2.6/site-packages/mrjob/fs/composite.py", line 78, in path_exists

return self._do_action('path_exists', path_glob)

File "/usr/lib/python2.6/site-packages/mrjob/fs/composite.py", line 62, in _do_action

raise first_exception

IOError: Could not check path hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv

[cloudera@quickstart mrjob]$ python mr_first_job.py -r hadoop hdfs:///user/cloudera/ngrams/googlebooks-eng-all-5gram-20090715-199.csv --no-check-input-paths

no configs found; falling back on auto-configuration

creating tmp directory /tmp/mr_first_job.cloudera.20140821.182418.897267

writing wrapper script to /tmp/mr_first_job.cloudera.20140821.182418.897267/setup-wrapper.sh

STDERR: Error: Could not find or load main class org.apache.hadoop.util.PlatformName

STDERR: Error: Could not find or load main class org.apache.hadoop.fs.FsShell

Traceback (most recent call last):

File "mr_first_job.py", line 10, in <module>

MRWordFrequencyCount.run()

File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 494, in run

mr_job.execute()

File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 512, in execute

super(MRJob, self).execute()

File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 147, in execute

self.run_job()

File "/usr/lib/python2.6/site-packages/mrjob/launch.py", line 208, in run_job

runner.run()

File "/usr/lib/python2.6/site-packages/mrjob/runner.py", line 458, in run

self._run()

File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 238, in _run

self._upload_local_files_to_hdfs()

File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 265, in _upload_local_files_to_hdfs

self._mkdir_on_hdfs(self._upload_mgr.prefix)

File "/usr/lib/python2.6/site-packages/mrjob/hadoop.py", line 273, in _mkdir_on_hdfs

self.invoke_hadoop(['fs', '-mkdir', path])

File "/usr/lib/python2.6/site-packages/mrjob/fs/hadoop.py", line 109, in invoke_hadoop

raise CalledProcessError(proc.returncode, args)

subprocess.CalledProcessError: Command '['/usr/lib/hadoop-0.20-mapreduce/bin/hadoop', 'fs', '-mkdir', 'hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.182418.897267/files/']' returned non-zero exit status 1

Since its crying about mkdir command, i tried to execute them with the same id and got the below output.

[cloudera@quickstart mrjob]$ hadoop fs -mkdir hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.182418.897267/files

mkdir: `hdfs:///user/cloudera/tmp/mrjob/mr_first_job.cloudera.20140821.182418.897267/files': No such file or directory

Just in case HADOOP_HOME has been set

[cloudera@quickstart mrjob]$ env | grep HAD

HADOOP_HOME=/usr/lib/hadoop-0.20-mapreduce

Thanks,

Manish

Reply all

Reply to author

Forward

0 new messages