reduce, reduce-kv, map, mapv, reducers/map and nil

1,255 views
Skip to first unread message

Wolodja Wentland

unread,
Oct 29, 2012, 9:00:02 AM10/29/12
to clo...@googlegroups.com
Hi all,

I am currently testing performance of different reduce and map implementations
in my programs and have problems because their treatment of nil is different.
The "normal" clojure.core implementations of reduce and map work well when
called on nil, but reduce-kv and functions in clojure.reducers throw
exceptions.

--- snip ---
user=> (defn fold-into-vec
"Provided a reducer, concatenate into a vector.

Note: same as (into [] coll), but parallel."
([coll]
(r/fold (r/monoid into vector) conj coll))
([n coll]
(r/fold n (r/monoid into vector) conj coll)))

user=> (map (fn [el] (* 2 el)) nil)
()
user=> (mapv (fn [el] (* 2 el)) nil)
[]
user=> (fold-into-vec (r/map (fn [el] (* 2 el)) nil))
#<IllegalArgumentException java.lang.IllegalArgumentException: No implementation of method: :coll-fold of protocol: #'clojure.core.reducers/CollFold found for class: nil>
user=> (reduce (fn [ret el] (+ el el)) {} nil)
{}
user=> (reduce (fn [ret [k v]] (+ k v)) {} nil)
{}
user=> (reduce-kv (fn [ret k v] (+ k v)) {} nil)
#<IllegalArgumentException java.lang.IllegalArgumentException: No implementation of method: :kv-reduce of protocol: #'clojure.core.protocols/IKVReduce found for class: nil>
--- snip ---

I find this behaviour quite unfortunate because I now have to explicitly test
for nil? and ensure consistent behaviour. This inconsistency violates the
principle of least-surprise and I am not sure if the current behaviour is
intentional or merely an unfortunate implementation detail.

Wouldn't it make sense if reduce-kv and r/map mirror the behaviour of reduce,
map and mapv in core?

P.S. Would it be possible to have something like fold-into-vec in clojure.reducers?
--
Wolodja <bab...@gmail.com>

4096R/CAF14EFC
081C B7CD FF04 2BA9 94EA 36B2 8B7F 7D30 CAF1 4EFC
signature.asc

abp

unread,
Oct 30, 2012, 7:56:01 AM10/30/12
to clo...@googlegroups.com
I find this behaviour quite unfortunate because I now have to explicitly test 
for nil? and ensure consistent behaviour.

Yes, especially unfortunate considering that Rich said the reducers lib could be used as a drop in replacement for core to speed up programs, or something along the lines.

Herwig Hochleitner

unread,
Oct 30, 2012, 10:17:18 AM10/30/12
to clo...@googlegroups.com
I've also run into this. Maybe this is just an oversight, since clojure handles nils gracefully almost everywhere else.

Should CollFold and IKVReduce be extended to nil, or is there some rationale against it?

Wolodja Wentland

unread,
Oct 31, 2012, 8:02:52 AM10/31/12
to clo...@googlegroups.com
I would much rather prefer consistent behaviour than the current mixture and
can't really think of a good reason why we see the current behaviour. Maybe
someone more knowledgeable with the actual implementation of those function
can shed some light on the issue.

Guess it wouldn't be too far fetched to consider this to be a bug, but I
thought I ask here first.
signature.asc

Herwig Hochleitner

unread,
Oct 31, 2012, 2:27:49 PM10/31/12
to clo...@googlegroups.com
Created an issue: http://dev.clojure.org/jira/browse/CLJ-1098

Don't know if patch is welcome, but fix should be trivial, so not much is lost if declined.

Alan Busby

unread,
Nov 1, 2012, 2:11:19 AM11/1/12
to clo...@googlegroups.com
On Mon, Oct 29, 2012 at 10:00 PM, Wolodja Wentland <bab...@gmail.com> wrote:
> I find this behaviour quite unfortunate because I now have to explicitly test
> for nil? and ensure consistent behaviour. This inconsistency violates the
> principle of least-surprise and I am not sure if the current behaviour is
> intentional or merely an unfortunate implementation detail.

fold wont work in parallel on list/seqs/etc, so if you're trying to
get the improved threading performance out of fold/fold-into-vec
you'll have to supply a vector.

(vec nil) => []

Then;
(->> data
vec
(r/map inc)
fold-into-vec)



Unfortunately if you "vec" a vector, it'll actually do the work so you may want;

(defn ensure-vec [xs]
"Ensure's that the value provided is a vector"
(if (= (type xs) clojure.lang.PersistentVector)
xs
(vec xs)))



> P.S. Would it be possible to have something like fold-into-vec in clojure.reducers?

Don't forget fold-into-map and fold-into-map-with, but both of those
will likely require a better merge/merge-with function for maps. :(



P.S.
As soon as I can find a moment, I can provide a JVM friendly library
for reducing over mmap'ed text files as well.

Wolodja Wentland

unread,
Nov 1, 2012, 7:27:25 AM11/1/12
to clo...@googlegroups.com
On Thu, Nov 01, 2012 at 15:11 +0900, Alan Busby wrote:
> On Mon, Oct 29, 2012 at 10:00 PM, Wolodja Wentland <bab...@gmail.com> wrote:
> > I find this behaviour quite unfortunate because I now have to explicitly test
> > for nil? and ensure consistent behaviour. This inconsistency violates the
> > principle of least-surprise and I am not sure if the current behaviour is
> > intentional or merely an unfortunate implementation detail.
>
> fold wont work in parallel on list/seqs/etc, so if you're trying to
> get the improved threading performance out of fold/fold-into-vec
> you'll have to supply a vector.

I am aware of that, but just tried to highlight the behaviour with nil. Your
ensure-vec function will actually do the right thing™ here, so I might just
start using that.

It seems to me as if we are currently figuring out which (boilerplate?)
functions are missing in reducers.clj and that we will have a nice and
well-integrated library at the end.

> > P.S. Would it be possible to have something like fold-into-vec in clojure.reducers?
>
> Don't forget fold-into-map and fold-into-map-with, but both of those
> will likely require a better merge/merge-with function for maps. :(

Oh, fold-into-map and fold-into-map-with would be wonderful and I tried to
implement the former along the lines of fold-into-vec, but the performance was
abysmal. I am now using fold-into-vec + r/map with zipmap which is better, but
I wouldn't consider that optimal.
signature.asc

Alan Busby

unread,
Nov 1, 2012, 9:34:15 AM11/1/12
to clo...@googlegroups.com
On Thu, Nov 1, 2012 at 8:27 PM, Wolodja Wentland <bab...@gmail.com> wrote:
> It seems to me as if we are currently figuring out which (boilerplate?)
> functions are missing in reducers.clj and that we will have a nice and
> well-integrated library at the end.

To be fair, it's in beta and it's open source; so if anyone thinks it
needs something they can always volunteer. :)



>> > P.S. Would it be possible to have something like fold-into-vec in clojure.reducers?
>>
>> Don't forget fold-into-map and fold-into-map-with, but both of those
>> will likely require a better merge/merge-with function for maps. :(
>
> Oh, fold-into-map and fold-into-map-with would be wonderful and I tried to
> implement the former along the lines of fold-into-vec, but the performance was
> abysmal. I am now using fold-into-vec + r/map with zipmap which is better, but
> I wouldn't consider that optimal.

The bottleneck with fold-into-map is that merge just doesn't scale up
well and you end up doing a significant amount of merging.
Currently a merge operation takes elements one at a time from one map,
and adds them one at a time to another.

This is incredibly wasteful for large maps as you already have two
"pre-sorted" tries, and can pair each branch together recursively. If
each branch node stored the number of child nodes, then you can assign
different threads to work on different branches as well. This would be
perfect for reducers, but from a quick look it didn't appear that any
of the key internals were exposed to be taken advantage of.

Unfortunately I haven't discovered a way to facilitate more efficient
merging without modifying the actual Clojure core source. I guess this
is my next step.


If you're interesting in using reducers over text files, I've uploaded
a library to facilitate that here;
http://github.com/thebusby/iota

I'll be cleaning up the code with more documentation and examples at
some point in the near future, and will send out an announcement then.
In the mean time though feel free to use it.

Wolodja Wentland

unread,
Jan 7, 2013, 10:18:04 AM1/7/13
to clo...@googlegroups.com
On Thu, Nov 01, 2012 at 22:34 +0900, Alan Busby wrote:
> On Thu, Nov 1, 2012 at 8:27 PM, Wolodja Wentland <bab...@gmail.com> wrote:

> > Oh, fold-into-map and fold-into-map-with would be wonderful and I tried to
> > implement the former along the lines of fold-into-vec, but the performance was
> > abysmal. I am now using fold-into-vec + r/map with zipmap which is better, but
> > I wouldn't consider that optimal.
>
> The bottleneck with fold-into-map is that merge just doesn't scale up
> well and you end up doing a significant amount of merging.
> Currently a merge operation takes elements one at a time from one map,
> and adds them one at a time to another.
>
> This is incredibly wasteful for large maps as you already have two
> "pre-sorted" tries, and can pair each branch together recursively. If
> each branch node stored the number of child nodes, then you can assign
> different threads to work on different branches as well. This would be
> perfect for reducers, but from a quick look it didn't appear that any
> of the key internals were exposed to be taken advantage of.
>
> Unfortunately I haven't discovered a way to facilitate more efficient
> merging without modifying the actual Clojure core source. I guess this
> is my next step.

Just stumbled over this old thread again and wondered if you found time to
look into this again?
signature.asc

Alan Busby

unread,
Jan 7, 2013, 7:26:41 PM1/7/13
to clo...@googlegroups.com
I've done a good bit of work here but haven't finished anything yet unfortunately.

In the meantime I've been working on other reducer utility functions/libraries like,
1. fold-into-lazy-seq ( https://gist.github.com/4479877 )
2. Iota, for treating large text files as input for a reducer ( https://github.com/thebusby/iota )

Unfortunately some changes in reducers between alpha2 and beta1 have slowed 
performance down a bit, so I've been waiting for 1.5 to be released before spending 
too much time on it.

I guess all of the above is motivation for me to finally send in my Clojure license stuff...
Reply all
Reply to author
Forward
0 new messages