Re: How to reduce results and continue working?

23 views

Skip to first unread message

Luis Pedro Coelho

unread,

Jul 24, 2012, 7:29:40 AM7/24/12

to jug-...@googlegroups.com

Hi Andreas,

On Monday, July 23, 2012 09:57:38 AM Andreas Weller wrote:
> I plan to use Jug to parallelize data-mining tasks on a SGE cluster.

Ok, great!

> Ideally, I would write a script in a map/reduce fashion where part of the
> script is to iterate over several massive files in parallel and collect
> relevant information into one dictionary per file (the map step), then
> combine those dictionaries (the reduce) and do other work on the result
> dictionary, all within one script.

Ok.

> I am confused now how to accomplish the collection and merging of results.
> Is the jug.mapreduce module the right one for me?

It is one possibility, yes.

> If yes, how do I implement the merging of dicts in the reduce step

> Or do I need a reduce function decorated with @TaskGenerator?

No.

> global_results = {}
> targets = [target_file1, target_file2, target_file3]
>
> def map(filename):
> local_results = {}
> with open(filename):
> # do iteration, collect relevant info
> return local_results
>
> def reduce(local_results, global_results):
> for key in local_results:
> global_results[key] = local_results[key]

> run = jug.mapreduce(map, reduce, targets)

This would not actually work.

Imagine if you were not using jug, but just did

run = builtins.reduce(reduce, builtins.map(map, targets))

This would not work!

With jug, you should not use any global variables; your functions should be
pure (i.e., depend on their inputs only). You are looking for something like:

def reduce(local0, local1):
combined = local0.copy()
combined.update(local1)
return combined

HTH
Luis
--
Luis Pedro Coelho | Institute for Molecular Medicine | http://luispedro.org

LxMLS 2012: Lisbon Machine Learning School
http://lxmls.it.pt

Andreas Weller

unread,

Jul 25, 2012, 9:30:42 AM7/25/12

to jug-...@googlegroups.com

Hi Luis,

this does indeed clear things up. I do not have any previous experience with any mapreduce implementation, so i got confused between this new concept and Jug itself.
Thanks very much for answering this question even though in the end it wasn't really about Jug...

Best
Andreas

Reply all

Reply to author

Forward

0 new messages