I have a macbook pro & i have installed hadoop 2.7.3 on it following this :
I am trying to run hadoop MRJob command via python3 & it is giving me this error:.
bhoots21304s-MacBook-Pro:2.7.3 bhoots21304$ python3 /Users/bhoots21304/PycharmProjects/untitled/MRJobs/Mr_Jobs.py -r hadoop /Users/bhoots21304/PycharmProjects/untitled/MRJobs/File.txt
No configs found; falling back on auto-configuration
Looking for hadoop binary in /usr/local/Cellar/hadoop/2.7.3/bin...
Found hadoop binary: /usr/local/Cellar/hadoop/2.7.3/bin/hadoop
Using Hadoop version 2.7.3
Looking for Hadoop streaming jar in /usr/local/Cellar/hadoop/
2.7.3...
Found Hadoop streaming jar: /usr/local/Cellar/hadoop/2.7.3/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar
Creating temp directory /var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/Mr_Jobs.bhoots21304.20170328.165022.965610
Copying local files to hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/files/...
Running step 1 of 1...
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/hadoop-unjar5078580082326840824/] [] /var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/streamjob2711596457025539343.jar tmpDir=null
Total input paths to process : 1
number of splits:2
Submitting tokens for job: job_1490719699504_0003
Submitted application application_1490719699504_0003
Running job: job_1490719699504_0003
Job job_1490719699504_0003 running in uber mode : false
map 0% reduce 0%
Task Id : attempt_1490719699504_0003_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Task Id : attempt_1490719699504_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Problem is if i run the same command with python2.7 then it runs fine & shows me the correct output.
Python3 is added in bash_profile.
export JAVA_HOME=$(/usr/libexec/java_home)
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/bin:/usr/local/sbin:$PATH
# Setting PATH for Python 2.6
PATH="/System/Library/Frameworks/Python.framework/Versions/2.6/bin:${PATH}"
export PATH
# Setting PATH for Python 2.7
PATH="/System/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH
# added by Anaconda2 4.2.0 installer
export PATH="/Users/bhoots21304/anaconda/bin:$PATH"
export HADOOP_HOME=/usr/local/Cellar/hadoop/2.7.3
export PATH=$HADOOP_HOME/bin:$PATH
export HIVE_HOME=/usr/local/Cellar/hive/2.1.0/libexec
export PATH=$HIVE_HOME:$PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/libexec/share/hadoop/common
export PATH=$HADOOP_COMMON_LIB_NATIVE_DIR:$PATH
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/libexec/share/hadoop"
export PATH=$HADOOP_OPTS:$PATH
export PYTHONPATH="$PYTHONPATH:/usr/local/Cellar/python3/3.6.1/bin"
# Setting PATH for Python 3.6
# The original version is saved in .bash_profile.pysave
PATH="/usr/local/Cellar/python3/3.6.1/bin:${PATH}"
export PATH
This is my MR_Jobs.py:
#!/usr/local/Cellar/python3/3.6.1/bin/python3
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[\w']+")
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MRWordFreqCount.run()
&&
I am running it on hadoop using this command:
/usr/local/Cellar/python3/3.6.1/bin/python3 /Users/bhoots21304/PycharmProjects/untitled/MRJobs/Mr_Jobs.py -r hadoop /Users/bhoots21304/PycharmProjects/untitled/MRJobs/File.txt
If i run the same file using the above mentioned command on my ubuntu machine..it works but when i run the same thing on my mac machine it gives me an error.
Here are the logs from my mac machine :
+ __mrjob_PWD=/tmp/nm-local-
dir/usercache/bhoots21304/appcache/application_1490719699504_0005/
container_1490719699504_0005_01_000010
+ exec
+ python3 -c 'import fcntl; fcntl.flock(9, fcntl.LOCK_EX)'
setup-wrapper.sh: line 6: python3: command not found