how would you improve this map-reduce style fn?

67 views
Skip to first unread message

Jim - FooBar();

unread,
Jun 1, 2013, 7:49:35 AM6/1/13
to clo...@googlegroups.com
Hi all,

I've been using the following fn quite extensively lately and I'm just
wondering if you can see any flaws with it...or perhaps something that
can be done better? I generally find that this performs better than a
bare pmap and in fact it often performs better than my 'pool-map' (which
I'm not showing here) which is backed by executors.

(defn mapr
"A basic map-reduce style mapping-function. Will partition the data
according to p-size and assign a future to each partition (per pmap)."
([f coll p-size]
(->> coll
(partition-all p-size)
(pmap (fn [p] (reduce #(conj %1 (f %2)) [] p)) )
(apply concat)) ) ;;concat the inner vectors that represent the
partitions
([f coll]
(mapr f coll (+ 2 cpu-no))))

thanks,

Jim

James Reeves

unread,
Jun 1, 2013, 8:06:11 AM6/1/13
to clo...@googlegroups.com
Why do you use:

  (reduce #(conj %1 (f %2)) [] p)

Instead of:

  (mapv f p)

?

Also, this looks a lot like what you can achieve with the reducers library:

    (r/fold p-size r/cat r/append! (r/map f (vec coll)))

- James




--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Jim - FooBar();

unread,
Jun 1, 2013, 8:12:23 AM6/1/13
to clo...@googlegroups.com
On 01/06/13 13:06, James Reeves wrote:
> Why do you use:
>
> (reduce #(conj %1 (f %2)) [] p)
>
> Instead of:
>
> (mapv f p)
>
> ?

oops you're right mapv is simpler and has the same effect :)

> Also, this looks a lot like what you can achieve with the reducers
> library:
>
> (r/fold p-size r/cat r/append! (r/map f (vec coll)))

hmmm...I'll have to think about that for a while...

thanks a lot James :)
I was suspecting it can be simplified...

Jim

Jim - FooBar();

unread,
Jun 1, 2013, 9:56:12 AM6/1/13
to clo...@googlegroups.com
On 01/06/13 13:06, James Reeves wrote:
Also, this looks a lot like what you can achieve with the reducers library:

    (r/fold p-size r/cat r/append! (r/map f (vec coll)))

well, for fork-join-based parallelism I was using this:

(defn fold-into-vec [chunk coll]
"Provided a reducer, concatenate into a vector.
 Same as (into [] coll), but parallel."
  (r/fold chunk (r/monoid into vector) conj coll))

(defn rmap
"A fork-join based mapping function that pours the results in a vector."
[f coll fj-chunk-size]
(fold-into-vec fj-chunk-size (r/map f (vec coll))))

where 'fold-into-vec' is taken from http://www.thebusby.com/2012/07/tips-tricks-with-clojure-reducers.html

basically, the only difference is the reducing/combining fns...you suggest r/cat & r/append! whereas I'm using (r/monoid into vector) and conj.
from a quick look in reducers.clj it seems that cat uses an ArrayList underneath...is this why it's considered high-performance? I also see there is 'foldcat' which is (fold cat append! coll) - exacly what you 're sugegsting! interesting stuff...I'll try it now :)

Jim
Reply all
Reply to author
Forward
0 new messages