Hadoop command fails with python3 & works with python 2.7

110 views

Skip to first unread message

bhoot...@gmail.com

unread,

Apr 1, 2017, 11:48:46 AM4/1/17

to mrjob

I have a macbook pro & i have installed hadoop 2.7.3 on it following this :

https://www.youtube.com/watch?v=06hpB_Rfv-w

I am trying to run hadoop MRJob command via python3 & it is giving me this error:.

bhoots21304s-MacBook-Pro:2.7.3 bhoots21304$ python3 /Users/bhoots21304/PycharmProjects/untitled/MRJobs/Mr_Jobs.py -r hadoop /Users/bhoots21304/PycharmProjects/untitled/MRJobs/File.txt

No configs found; falling back on auto-configuration

Looking for hadoop binary in /usr/local/Cellar/hadoop/2.7.3/bin...

Found hadoop binary: /usr/local/Cellar/hadoop/2.7.3/bin/hadoop

Using Hadoop version 2.7.3

Looking for Hadoop streaming jar in /usr/local/Cellar/hadoop/2.7.3...

Found Hadoop streaming jar: /usr/local/Cellar/hadoop/2.7.3/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar

Creating temp directory /var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/Mr_Jobs.bhoots21304.20170328.165022.965610

Copying local files to hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/files/...

Running step 1 of 1...

Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

packageJobJar: [/var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/hadoop-unjar5078580082326840824/] [] /var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/streamjob2711596457025539343.jar tmpDir=null

Connecting to ResourceManager at /0.0.0.0:8032

Total input paths to process : 1

number of splits:2

Submitting tokens for job: job_1490719699504_0003

Submitted application application_1490719699504_0003

The url to track the job: http://bhoots21304s-MacBook-Pro.local:8088/proxy/application_1490719699504_0003/

Running job: job_1490719699504_0003

Job job_1490719699504_0003 running in uber mode : false

map 0% reduce 0%

Task Id : attempt_1490719699504_0003_m_000001_0, Status : FAILED

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127

at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)

at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)

at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Task Id : attempt_1490719699504_0003_m_000000_0, Status : FAILED

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127

at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)

at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)

at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Problem is if i run the same command with python2.7 then it runs fine & shows me the correct output.

Python3 is added in bash_profile.

export JAVA_HOME=$(/usr/libexec/java_home)

export PATH=/usr/local/bin:$PATH

export PATH=/usr/local/bin:/usr/local/sbin:$PATH

# Setting PATH for Python 2.6

PATH="/System/Library/Frameworks/Python.framework/Versions/2.6/bin:${PATH}"

export PATH

# Setting PATH for Python 2.7

PATH="/System/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"

export PATH

# added by Anaconda2 4.2.0 installer

export PATH="/Users/bhoots21304/anaconda/bin:$PATH"

export HADOOP_HOME=/usr/local/Cellar/hadoop/2.7.3

export PATH=$HADOOP_HOME/bin:$PATH

export HIVE_HOME=/usr/local/Cellar/hive/2.1.0/libexec

export PATH=$HIVE_HOME:$PATH

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/libexec/share/hadoop/common

export PATH=$HADOOP_COMMON_LIB_NATIVE_DIR:$PATH

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/libexec/share/hadoop"

export PATH=$HADOOP_OPTS:$PATH

export PYTHONPATH="$PYTHONPATH:/usr/local/Cellar/python3/3.6.1/bin"

# Setting PATH for Python 3.6

# The original version is saved in .bash_profile.pysave

PATH="/usr/local/Cellar/python3/3.6.1/bin:${PATH}"

export PATH

This is my MR_Jobs.py:

    #!/usr/local/Cellar/python3/3.6.1/bin/python3

from mrjob.job import MRJob

import re

WORD_RE = re.compile(r"[\w']+")

class MRWordFreqCount(MRJob):

def mapper(self, _, line):

for word in WORD_RE.findall(line):

yield (word.lower(), 1)

def combiner(self, word, counts):

yield (word, sum(counts))

def reducer(self, word, counts):

yield (word, sum(counts))

if __name__ == '__main__':

MRWordFreqCount.run()

I am running it on hadoop using this command:

/usr/local/Cellar/python3/3.6.1/bin/python3 /Users/bhoots21304/PycharmProjects/untitled/MRJobs/Mr_Jobs.py -r hadoop /Users/bhoots21304/PycharmProjects/untitled/MRJobs/File.txt

If i run the same file using the above mentioned command on my ubuntu machine..it works but when i run the same thing on my mac machine it gives me an error.

Here are the logs from my mac machine :

+ __mrjob_PWD=/tmp/nm-local-

dir/usercache/bhoots21304/appcache/application_1490719699504_0005/

container_1490719699504_0005_01_000010

+ exec

+ python3 -c 'import fcntl; fcntl.flock(9, fcntl.LOCK_EX)'

setup-wrapper.sh: line 6: python3: command not found

Reply all

Reply to author

Forward

0 new messages