Proposal - improving performance of reduce for persistent collections

Mikera

unread,

Dec 2, 2011, 3:41:50 AM12/2/11

to Clojure Dev

Hi all,

Currently reduce is implemented in Clojure by calling seq on a
collection. Construction of an entire new seq seems to be an
unnecessary and expensive operation, particularly for the key
persistent collections (map, set and vector suffer - list is probably
fine because it implements ISeq directly for free).

I personally ran into this issue when trying to optimise some
particularly map-intensive code.

To improve this situation, I believe it would be an improvement to do
the following:
1. Define a Java interface for reducible collections (IReducible or
similar)
2. Make persistent collections support implementations of reduce
directly by implementing IReducible
3. Modify the internal-reduce protocol to operate on concrete
collections (not just seqs) and make use of IReducible implementations
where available
4. Change reduce itself to call internal-reduce directly rather than
calling seq first

I hacked a quick proof of concept together for PersistentHashMap
(https://github.com/mikera/clojure/tree/better-reduce) and found with
some informal testing that this approach resulted in a noticeable
speedup (over 2x) for a fairly standard reduce-over-map test case:

;; some test data
(def ms (zipmap (range 100) (range 100)))

;; 1.3 release
(time (dotimes [i 100000] (reduce (fn [acc [k v]] (+ acc v)) 0 ms)))
"Elapsed time: 716.919024 msecs"

;; 1.4 master
(time (dotimes [i 100000] (reduce (fn [acc [k v]] (+ acc v)) 0 ms)))
=> "Elapsed time: 583.392149 msecs"

;; 1.4 + better reduce
(time (dotimes [i 100000] (reduce (fn [acc [k v]] (+ acc v)) 0 ms)))
=> "Elapsed time: 280.546348 msecs"

Does this kind of optimisation / improvement make sense to go into the
next Clojure release?

If so, happy to work on a patch if someone can give feedback / steer
me through the process.

Mike.

Stuart Halloway

unread,

Dec 2, 2011, 11:26:33 AM12/2/11

to cloju...@googlegroups.com

Can you put all the tests in an outer (dotimes [_ 10]) and report
those numbers? First pass is often very misleading.

Stu

Mikera

unread,

Dec 2, 2011, 8:34:12 PM12/2/11

to Clojure Dev

> Can you put all the tests in an outer (dotimes [_ 10]) and report
> those numbers? First pass is often very misleading.
>
> Stu

Sure thing Stu.

Doing multiple passes seems to help all versions a little but doesn't
change the overall conclusion.

Clojure 1.3.0

(def ms (zipmap (range 100) (range 100)))

#'user/ms
user=> (dotimes [i 10] (time (dotimes [i 100000] (reduce (fn [acc [k
v]] (+ acc v)) 0 ms))))
"Elapsed time: 646.512414 msecs"
"Elapsed time: 553.264593 msecs"
"Elapsed time: 544.999672 msecs"
"Elapsed time: 510.606868 msecs"
"Elapsed time: 513.192451 msecs"
"Elapsed time: 537.665664 msecs"
"Elapsed time: 524.166514 msecs"
"Elapsed time: 512.966964 msecs"
"Elapsed time: 501.677219 msecs"
"Elapsed time: 496.379194 msecs"

Clojure 1.4.0-alpha2
user=> (def ms (zipmap (range 100) (range 100)))
#'user/ms
user=> (dotimes [i 10] (time (dotimes [i 100000] (reduce (fn [acc [k
v]] (+ acc v)) 0 ms))))
"Elapsed time: 548.822683 msecs"
"Elapsed time: 469.275299 msecs"
"Elapsed time: 464.742096 msecs"
"Elapsed time: 443.500129 msecs"
"Elapsed time: 431.272138 msecs"
"Elapsed time: 430.514649 msecs"
"Elapsed time: 432.753752 msecs"
"Elapsed time: 431.195876 msecs"
"Elapsed time: 435.254274 msecs"
"Elapsed time: 433.516375 msecs"

Clojure 1.4.0 snapshot (with better reduce modifications)

(def ms (zipmap (range 100) (range 100)))

#'user/ms
user=> (dotimes [i 10] (time (dotimes [i 100000] (reduce (fn [acc [k
v]] (+ acc v)) 0 ms))))
"Elapsed time: 202.29696 msecs"
"Elapsed time: 186.589505 msecs"
"Elapsed time: 179.691805 msecs"
"Elapsed time: 184.191644 msecs"
"Elapsed time: 183.265131 msecs"
"Elapsed time: 180.162578 msecs"
"Elapsed time: 182.323219 msecs"
"Elapsed time: 181.810649 msecs"
"Elapsed time: 182.896285 msecs"
"Elapsed time: 187.30153 msecs"

Mikera

unread,

Dec 5, 2011, 8:41:10 PM12/5/11

to Clojure Dev

Any further thoughts on this?

I'd quite like something of this sort to get into the next Clojure
release, if only because it's a pretty big speedup for some common
cases in my code and presumably also for other people.

Stuart Sierra

unread,

Dec 6, 2011, 11:00:39 AM12/6/11

to cloju...@googlegroups.com

I think this is worth having a JIRA/patch for, so others can try it out.
-S

Mikera

unread,

Dec 9, 2011, 5:04:15 AM12/9/11

to Clojure Dev

On Dec 6, 4:00 pm, Stuart Sierra <the.stuart.sie...@gmail.com> wrote:
> I think this is worth having a JIRA/patch for, so others can try it out.
> -S

OK I've created a JIRA issue as below (hope I've done it right,
feedback/comments appreciated!):

http://dev.clojure.org/jira/browse/CLJ-894

Reply all

Reply to author

Forward