Hadoop command fails with python3 & works with python 2.7

110 views
Skip to first unread message

bhoot...@gmail.com

unread,
Apr 1, 2017, 11:48:46 AM4/1/17
to mrjob
I have a macbook pro & i have installed hadoop 2.7.3 on it following this :

I am trying to run hadoop MRJob command via python3 & it is giving me this error:.

bhoots21304s-MacBook-Pro:2.7.3 bhoots21304$ python3 /Users/bhoots21304/PycharmProjects/untitled/MRJobs/Mr_Jobs.py -r hadoop /Users/bhoots21304/PycharmProjects/untitled/MRJobs/File.txt
No configs found; falling back on auto-configuration
Looking for hadoop binary in /usr/local/Cellar/hadoop/2.7.3/bin...
Found hadoop binary: /usr/local/Cellar/hadoop/2.7.3/bin/hadoop
Using Hadoop version 2.7.3
Looking for Hadoop streaming jar in /usr/local/Cellar/hadoop/2.7.3...
Found Hadoop streaming jar: /usr/local/Cellar/hadoop/2.7.3/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar
Creating temp directory /var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/Mr_Jobs.bhoots21304.20170328.165022.965610
Copying local files to hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/files/...
Running step 1 of 1...
 Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 packageJobJar: [/var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/hadoop-unjar5078580082326840824/] [] /var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/streamjob2711596457025539343.jar tmpDir=null
 Connecting to ResourceManager at /0.0.0.0:8032
 Connecting to ResourceManager at /0.0.0.0:8032
 Total input paths to process : 1
 number of splits:2
 Submitting tokens for job: job_1490719699504_0003
 Submitted application application_1490719699504_0003
 Running job: job_1490719699504_0003
 Job job_1490719699504_0003 running in uber mode : false
  map 0% reduce 0%
 Task Id : attempt_1490719699504_0003_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

 Task Id : attempt_1490719699504_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Problem is if i run the same command with python2.7 then it runs fine & shows me the correct output.

Python3 is added in bash_profile.

        export JAVA_HOME=$(/usr/libexec/java_home)

export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/bin:/usr/local/sbin:$PATH

# Setting PATH for Python 2.6
PATH="/System/Library/Frameworks/Python.framework/Versions/2.6/bin:${PATH}"
export PATH

# Setting PATH for Python 2.7
PATH="/System/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH

# added by Anaconda2 4.2.0 installer
export PATH="/Users/bhoots21304/anaconda/bin:$PATH"

export HADOOP_HOME=/usr/local/Cellar/hadoop/2.7.3
export PATH=$HADOOP_HOME/bin:$PATH

export HIVE_HOME=/usr/local/Cellar/hive/2.1.0/libexec
export PATH=$HIVE_HOME:$PATH

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/libexec/share/hadoop/common 
export PATH=$HADOOP_COMMON_LIB_NATIVE_DIR:$PATH

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/libexec/share/hadoop"
export PATH=$HADOOP_OPTS:$PATH

export PYTHONPATH="$PYTHONPATH:/usr/local/Cellar/python3/3.6.1/bin"

# Setting PATH for Python 3.6
# The original version is saved in .bash_profile.pysave
PATH="/usr/local/Cellar/python3/3.6.1/bin:${PATH}"
export PATH


This is my MR_Jobs.py:

    #!/usr/local/Cellar/python3/3.6.1/bin/python3
from mrjob.job import MRJob
import re

WORD_RE = re.compile(r"[\w']+")


class MRWordFreqCount(MRJob):

def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)

def combiner(self, word, counts):
yield (word, sum(counts))

def reducer(self, word, counts):
yield (word, sum(counts))


if __name__ == '__main__':
MRWordFreqCount.run()

&&

I am running it on hadoop using this command:

/usr/local/Cellar/python3/3.6.1/bin/python3 /Users/bhoots21304/PycharmProjects/untitled/MRJobs/Mr_Jobs.py -r hadoop /Users/bhoots21304/PycharmProjects/untitled/MRJobs/File.txt

If i run the same file using the above mentioned command on my ubuntu machine..it works but when i run the same thing on my mac machine it gives me an error.

Here are the logs from my mac machine :

    + __mrjob_PWD=/tmp/nm-local-
    dir/usercache/bhoots21304/appcache/application_1490719699504_0005/ 
    container_1490719699504_0005_01_000010
  + exec
+ python3 -c 'import fcntl; fcntl.flock(9, fcntl.LOCK_EX)'
setup-wrapper.sh: line 6: python3: command not found

Reply all
Reply to author
Forward
0 new messages