Using all available cores with -r local (and comparison with Spark)

65 views

Skip to first unread message

Luis Quesada

unread,

May 30, 2017, 11:29:55 AM5/30/17

to mrjob

Dear all,

I am new to mrjob. I apologise in advance if this question has been asked multiple times.

I am using mrjob 0.5.10 with macOS Sierra. How can I make mrjob use all the available cores with -r local. Even though I have 4 cores, mrjob seems to be using two threads of execution....

On a separe topic, when it comes to comparing the performance with spark, my understanding is that if there is only one mapReduce iteration (like in my naive implementation of all pair shortest path attached), spark would not be much faster..., is that correct?

Thanks in advance for your answers!

Cheers,

Luis

from mrjob.job import MRJob
import networkx as nx
import pickle
import time


class APSP(MRJob):
    def mapper(self, _, line):
        G=pickle.load(open("/Users/lquesada/Dropbox/hadoop/mrjob/apsp/G.pk"))
        s=int(str.strip(line))
        length=nx.single_source_dijkstra_path_length(G,s)
        for t in length:
            yield ((s,t),length[t])


    def reducer(self, (s,t), wI):
        yield ((s,t), wI.next())


if __name__ == '__main__':
    start = time.time()
    APSP.run()
    f=open('output.txt','w')
    f.write('total time '+str(time.time()-start))
    f.close()

Reply all

Reply to author

Forward

0 new messages