I'm getting an error on emr. Other than comments, the beginning of the Python file looks like:
import math
from heapq import heappush, heappop, heappushpop
from mrjob.job import MRJob
from mrjob.protocol import RawProtocol, JSONValueProtocol
The stderr log file is:
Traceback (most recent call last):
File "mr.py", line 8, in <module>
from heapq import heappush, heappop, heappushpop
ImportError: cannot import name heappushpop
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:372)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:582)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:477)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
log4j:WARN Please initialize the log4j system properly.
Later I ssh'd into an Amazon emr instance and saw Python 2.6.6 with heapq and all functions. Any ideas?