Using all available cores with -r local (and comparison with Spark)

65 views
Skip to first unread message

Luis Quesada

unread,
May 30, 2017, 11:29:55 AM5/30/17
to mrjob
Dear all,

I am new to mrjob. I apologise in advance if this question has been asked multiple times.

I am using mrjob 0.5.10 with macOS Sierra. How can I make mrjob use all the available cores with -r local. Even though I have 4 cores, mrjob seems to be using two threads of execution....

On a separe topic, when it comes to comparing the performance with spark, my understanding is that if there is only one mapReduce iteration (like in my naive implementation of all pair shortest path attached), spark would not be much faster..., is that correct?

Thanks in advance for your answers!

Cheers,
Luis 

from mrjob.job import MRJob
import networkx as nx
import pickle
import time


class APSP(MRJob):
def mapper(self, _, line):
G=pickle.load(open("/Users/lquesada/Dropbox/hadoop/mrjob/apsp/G.pk"))
s=int(str.strip(line))
length=nx.single_source_dijkstra_path_length(G,s)
for t in length:
yield ((s,t),length[t])


def reducer(self, (s,t), wI):
yield ((s,t), wI.next())


if __name__ == '__main__':
start = time.time()
APSP.run()
f=open('output.txt','w')
f.write('total time '+str(time.time()-start))
f.close()
Reply all
Reply to author
Forward
0 new messages