Reducers

1,435 views
Skip to first unread message

Rich Hickey

unread,
May 8, 2012, 11:20:37 AM5/8/12
to clo...@googlegroups.com
I'm happy to have pushed [1] today the beginnings of a new Clojure library for higher-order manipulation of collections, based upon *reduce* and *fold*. Of course, Clojure already has Lisp's *reduce*, which corresponds to the traditional *foldl* of functional programming. *reduce* is based upon sequences, as are many of the core functions of Clojure, like *map*, *filter* etc. So, what could be better? It's a long story, so I'll give you the ending first:

* There is a new namespace: clojure.core.reducers
* It contains new versions of *map*, *filter* etc based upon transforming reducing functions - reducers
* It contains a new function, **fold**, which is a parallel reduce+combine
* *fold* uses **fork/join** when working with (the existing!) Clojure vectors and maps
* Your new parallel code has exactly the same shape as your existing seq-based code
* The reducers are composable
* Reducer implementations are primarily functional - no iterators
* The model uses regular data structures, not 'parallel collections' or other OO malarkey
* It's fast, and can become faster still
* This is work-in-progress

I've described the library in more detail here:

http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html

Rich

[1] https://github.com/clojure/clojure/commit/89e5dce0fdfec4bc09fa956512af08d8b14004f6

Mark Engelberg

unread,
May 8, 2012, 1:57:01 PM5/8/12
to clo...@googlegroups.com
Exciting!

I'm having trouble visualizing at what points the computation is actually executed, and therefore, I'm not clear on how this feature interacts with dynamic binding.  If the reducers/map occurs in one dynamic binding context, and the fold occurs in another, what happens?

Alan Malloy

unread,
May 8, 2012, 3:10:46 PM5/8/12
to Clojure
My understanding is that reducers/map merely returns a new function
(or something like a function, maybe an instance of IReducible or
something), and all actual computation is done during the reduce. So
dynamic bindings around the scope of the map have no effect at all.

kovas boguta

unread,
May 8, 2012, 5:14:37 PM5/8/12
to clo...@googlegroups.com
Will definitely be using this, thanks! One question:

"Those IFn.LLL, DDD etc primitive-taking function interfaces can now
spring to life."

Can someone unpack this? What are those things, why does this allow
them to exist, why do we need them, what can be built with them?
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Daniel Solano Gómez

unread,
May 8, 2012, 5:29:37 PM5/8/12
to clo...@googlegroups.com
On Tue May 8 17:14 2012, kovas boguta wrote:
> Will definitely be using this, thanks! One question:
>
> "Those IFn.LLL, DDD etc primitive-taking function interfaces can now
> spring to life."
>
> Can someone unpack this? What are those things, why does this allow
> them to exist, why do we need them, what can be built with them?

As of Clojure 1.3, you can use primitive hints to function definitions
allowing the resulting function to be able to take raw doubles or longs
instead of boxed numbers. This results in much better numeric
performance. For more details see
<http://dev.clojure.org/display/doc/Documentation+for+1.3+Numerics>.

As I understand it, since the new reducers library relies on composing
functions together, if those functions have this primitive support, it
should be able to take advantage of it.

For example, there are implementations of persistent vectors that keep
unboxed primitives internally. Unfortunately, it is currently
impossible to get values into or out of these structures without boxing.
The new reducers library means that if the primitive-supporting vectors
are reducible, they can invoke the reducer using primitives and store
the result in yet another primitive-supporting vector, all without any
boxing. This could also apply to raw primitive arrays.

I hope that: 1. this helps, 2: I am not mistaken.

Sincerely,

Daniel
signature.asc

Brent Millare

unread,
May 8, 2012, 7:07:54 PM5/8/12
to clo...@googlegroups.com
Yet another example of doing more, by complecting less, a mantra I'm continuing to pursue. Awesome library, can't wait to use it and see the performance benefits.

Christian Romney

unread,
May 9, 2012, 6:56:26 AM5/9/12
to clo...@googlegroups.com
All I can say is, wow! I do have some questions, though.

Am I mistaken or is fork/join Java 7? Anyone have this running on OS X yet? If so with the Oracle preview or Open JDK? Also, core map and friends are lazy, reducer map and friends inherently parallel. We also have pmap/preduce. Can someone help me sort out some basic heuristics (and trade offs) so I can be better equipped to make an informed choice among these?

I undestand I'd still want lazy seqs for things like infinite sequences, but after that I confess I'm hazy. Any additional illumination is very much appreciated.

Baishampayan Ghose

unread,
May 9, 2012, 7:00:56 AM5/9/12
to clo...@googlegroups.com
On Wed, May 9, 2012 at 4:26 PM, Christian Romney <xml...@gmail.com> wrote:
> Am I mistaken or is fork/join Java 7? Anyone have this running on OS X yet? If so with the Oracle preview or Open JDK?

Yes, it's a part of Java 7, but there is also a reference
implementation of the spec (jsr166y) available as a standalone JAR
which Clojure now includes as a compile time dependency.

So the reducers library should work on Java 1.5, 1.6 as well.

Regards,
BG

--
Baishampayan Ghose
b.ghose at gmail.com

nicola...@gmail.com

unread,
May 9, 2012, 7:09:11 AM5/9/12
to clo...@googlegroups.com
It looks very good.

Just a few side-notes:
- this representation is quite well known, as it is the Church
encoding of list, often used in System F, for example.
It corresponds to stating the definition of sequences as the
initial algebra of F A = 1 + Any x A.
- you can play a dual trick for Streams and coLists by representing
them as how to generate them, the same way that you can represent list
as how you reduce them.
This gives a representation of a coList as
[ seed f ] where f is a function from the seed to either nil(if the
coList is finished) or a pair of an element and as seed.
For example, the integers would be
nat = [ 0 (fn [n] (vector n (+ n 1))) ]
Then, you can play the same game and define map, filter (which is not
always productive...), and so on, on this
representation, with the same benefits.

You can then also write converters from this representation to
reducible. This would allow to write
(reduce + (take n nat)) without allocating any sequence...

Best,

Nicolas.

Rich Hickey

unread,
May 9, 2012, 8:00:00 AM5/9/12
to clo...@googlegroups.com
This analogy is not quite right.

> (fn [n] (vector n (+ n 1))

is not a reducing fn. Tt is a cons cell builder, and the Church encoding builds lists.

The point of this library is to define map/filter etc *without* using lists/streams - not as input, not as output, not producing links/thunks etc, nor implicit streams formed by first/rest chained recursion, nor presuming input collections defined similarly.

It is only by doing this that you can get the true prize of this library - versions of map/filter etc that can be run in parallel, and that are completely independent of the data structures on which they are run.

Rich

nicola...@gmail.com

unread,
May 9, 2012, 9:44:26 AM5/9/12
to clo...@googlegroups.com
On Wed, May 9, 2012 at 1:00 PM, Rich Hickey <richh...@gmail.com> wrote:
> This analogy is not quite right.
>
>> (fn [n] (vector n (+ n 1))
>
> is not a reducing fn. Tt is a cons cell builder, and the Church encoding builds lists.
>
It is because I did not write it in the right way. Sorry about that.
It can be made nicer with protocols, probably. But the idea is quite simple.
Something is generative (dual to reducible) if it contains a seed, a
function from seed to the first element (called now)
and a function from a seed to the next seed (called later).
Then

(defn from [n]
[n id inc])

(def nat (from 0))

Then, take converts a generative to a reducible:

(defn take [n [seed now later]]
(fn [ r ] ; r is the function we reduce with
(loop [m (long n) seed seed acc (r) ]
(if (zero? m) acc
(recur (dec m) (later seed) (r acc (now seed)))
) )

So in (reduce + (map square (take n nat))), there is no cell cons
created in any way.
It just beta-reduces to the loop someone would write for the same problem:
(without any kind of deforestation):
(loop [m (long n) seed 0 acc 0 ]
(if (zero? m) acc
(recur (dec m) (inc seed) (+ acc (square seed)))

By the way, map has also a definition in the generative setting:

(defn map-generative [f [seed now later] ]
[seed (comp now f) later])

There might be multiple interesting way to convert from generative to reducible.
take is one, until can be another:

(defn until [p [seed now later]]
(fn [ r ]
(loop [seed seed acc (r) ]
(let [v (now seed)]
(if (p v)
acc
(recur (later seed) (r acc v)))))))

It can happen than this never ends if p is never true. (You can add an
optional limit on the number of elements.)

Then to sum all squared number smaller than 1000000:
(reduce + (until #(% > 1000000) (map square nat))

Then again, no cons cell allocated.

It is straightforward to extend the generative things so that they can
finish early, when now returns nil
for example. Then you can convert easily those finite generative to reducible.

One last point: you can convert any sequence (including infinite
streams) to a generative by using
(fn [s] [ s first next ] )

It is also possible to convert any generative to a stream (for
memoisation for example)
However,the generative representation is independent of the sequence
representation.

> It is only by doing this that you can get the true prize of this library - versions of map/filter etc that can be run in parallel, and that are completely independent of the data structures on which they are run.
>

It is less clear that generative things are amenable to //-reduction.
However, it should be possible to add a function to jump ahead.( [seed
now later jump] )
This function could default to calling later repeatedly, but can
sometimes be implemented more efficiently.
It might be possible to do better if your reduction function is commutative.

I am not saying it replaces reducible, but that it can be an
interesting complement for a generate/reduce pattern.

I hope it clarifies my last email.

Best,

Nicolas.

Christian Romney

unread,
May 9, 2012, 9:11:05 PM5/9/12
to clo...@googlegroups.com
I would be indebted to you if you could point me in the direction of the reading material necessary to follow this discussion. I'm afraid I'm currently out of my depth, but very eager to understand.

Sincerely,
Christian Romney

nicola...@gmail.com

unread,
May 10, 2012, 8:02:09 AM5/10/12
to clo...@googlegroups.com
I can describe the background to understand my last email.

From the programming point of view, I have been told since yesterday
that what I explained has already been explained better in "Stream
Fusion From Lists to Streams to Nothing at All" from Coutts
Leschinskiy and Stewart:

metagraph.org/papers/stream_fusion.pdf

The article is very clear if you can get used to the strange Haskell
syntax and its lack of brackets.

<off-topic>
From the cultural/theoritical point of view, there is quite well
understood link between data-structures and
some categorical structures.
( I try stay informal and put fancy names and link to definitions
inside brackets )

Types of finite recursive data structures (think of a list) are
defined by being the smallest solution to an
equation like List = 1 + Any x List. Here 1 means the type that
contains only nil, Any means the type of anything, x corresponds to a
pair and + to a choice. So this reads: a list is either nil or a pair
of anything and another list.
Being the smallest solution means that it contains only finite data-structures.
( http://en.wikipedia.org/wiki/Initial_algebra )

An element of such a type is characterised by how they can be reduced/folded.
( http://en.wikipedia.org/wiki/Catamorphism )
fold: forall A, List -> ((1 + Any x Acc) -> Acc) -> Acc
Meaning that there is a one to one mapping between lists and functions of type:
forall Acc, ((1 + Any x Acc) -> Acc) -> Acc
This is the type of #(reduce _ l): you give it a function that takes
either no arguments and give you
an initial Accumulator or take an element of the list and and the last
Accumulator and returns a new Accumulator, and it returns the last
Accumulators.

This one to one correspondence can be seen as (reduce _ l) in one
direction and #(_ (fn [p] (and p (apply cons p))) in the other
direction.

This definition of lists (as functions) has been used in programming
languages without data constructors.
( System F for example, http://en.wikipedia.org/wiki/System_F ).

If you look to infinite data structures, like Streams, they are the
biggest solutions of similar equations:
Stream = 1 + Any x Stream (if you accept a Stream can finish early)

They are dual (which means you invert all arrows in all definitions) to Lists.
( see Final Coalgebra in http://en.wikipedia.org/wiki/Initial_algebra )

A stream is characterised as how it is unfolded/generated.
unfold : forall B, (B -> 1 + Any x B) -> B -> Stream
(See how the arrows are reversed with respect to fold)
And so they are in one to one correspondence with :
exists Seed, (Seed x (Seed -> 1 + Any x Seed))

Which means any Stream can be defined as a Seed, of a given type Seed
that you can choose freely,
and a way to grow from the seed, a function grow that takes a Seed and
tells you:
- either that the Stream is finished
- or the first element of the Stream and the Seed for the rest of the Stream.

For example the stream of even natural numbers can be seen as the seed
0 together with a function
grow = (fn [ seed ] [ seed (+ seed 2) ]), saying that first element
is equal to the seed and the rest of the stream
is the one defined by a seed = seed + 2 .

A very good paper showing this style of programming, and more, is :

http://eprints.eemcs.utwente.nl/7281/01/db-utwente-40501F46.pdf
</off-topic>

Sorry for the long off-topic.

Best regards,

Nicolas.

Nico Balestra

unread,
May 10, 2012, 7:30:21 AM5/10/12
to clo...@googlegroups.com
I'm with you Christian,
maybe a code example would be very helpful.

Thank you guys,
Nico

2012/5/10 Christian Romney <xml...@gmail.com>
I would be indebted to you if you could point me in the direction of the reading material necessary to follow this discussion. I'm afraid I'm currently out of my depth, but very eager to understand.

Sincerely,
Christian Romney

Christian Romney

unread,
May 10, 2012, 1:26:45 PM5/10/12
to clo...@googlegroups.com


On Thursday, May 10, 2012 8:02:09 AM UTC-4, Nicolas Oury wrote:
I can describe the background to understand my last email. 

Thank you very much for taking the time to post all of that–I've got some reading to do for sure. 

Rich Hickey

unread,
May 10, 2012, 7:47:47 PM5/10/12
to clo...@googlegroups.com
IMO, Nicolas' material is a distraction in understanding reducers, except as historical background.

The reducers library is a rejection/avoidance of the primacy of recursively/generatively defined data structures and the operations thereon.

I recommend, rather than reading about stream fusion, one watches the Guy Steele presentation linked from the post:

Organizing Functional Code for Parallel Execution
or, foldl and foldr Considered Slightly Harmful:

http://vimeo.com/6624203

Which boils down to - stop programming with streams, lists, generators etc if you intend to exploit parallelism, which the reducers library does.

Rich

nicola...@gmail.com

unread,
May 11, 2012, 4:21:59 AM5/11/12
to clo...@googlegroups.com
On Fri, May 11, 2012 at 12:47 AM, Rich Hickey <richh...@gmail.com> wrote:
> IMO, Nicolas' material is a distraction in understanding reducers, except as historical background.

I perfectly agree.

Softaddicts

unread,
May 11, 2012, 9:28:46 AM5/11/12
to clo...@googlegroups.com
Hi Rich,

I may be a bit aggressive here :) but will this be ready for 1.5 ? Or available
as an independent feature ? (I am not sure about this given the name)

I intend to skip 1.4 depending on our delivery cycles and the availability of 1.5, we are
about to move 1.3 in prod here and my eyes are
trying to keep focussed on that. I plan releases in a horizon of 6 to 10 months.

Do not feel pressed to answer, take more hammock time if needed after all,
the outside temperatures rose a bit so time spent there is not exposing you anymore
to various diseases :)

We run production on clusters of tiny boxes (2 to 4 cores with HT) and we like optimizing
our code along the way. This would be a nice improvement, future hardware
upgrades will likely move us to an increased number of cores.

I really like this stuff, it's not alien at all to the language contrary to other
attempts made in other languages to implement "transparent" parallelism.

Thank you
Luc

Sean Corfield

unread,
May 11, 2012, 12:17:23 PM5/11/12
to clo...@googlegroups.com
On Fri, May 11, 2012 at 6:28 AM, Softaddicts
<lprefo...@softaddicts.ca> wrote:
> I may be a bit aggressive here :) but will this be ready for 1.5 ? Or available
> as an independent feature ? (I am not sure about this given the name)

Reducers are part of the 1.5.0 master branch. The only reason they're
not already available in a master-SNAPSHOT build is because Clojure
isn't building at the moment due to the jsr166y change and Java 6
dependency. So, not to speak for Rich or Clojure/core but, I can't
imagine reducers being removed from 1.5.0 at this point...
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/

"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)

Sean Corfield

unread,
May 11, 2012, 12:30:47 PM5/11/12
to clo...@googlegroups.com
On Fri, May 11, 2012 at 9:17 AM, Sean Corfield <seanco...@gmail.com> wrote:
> Reducers are part of the 1.5.0 master branch. The only reason they're
> not already available in a master-SNAPSHOT build is because Clojure
> isn't building at the moment due to the jsr166y change and Java 6
> dependency.

Just to clarify: Clojure isn't building at the moment _on
build.clojure.org_ but you can build it yourself easily enough:
* git clone git://github.com/clojure/clojure.git
* mvn install
Then you'll have a local clojure-1.5.0-master-SNAPSHOT.jar you can
depend on and try the reducers out for yourself.

Stuart Sierra

unread,
May 15, 2012, 11:33:54 AM5/15/12
to clo...@googlegroups.com
It's fixed now.

Rich Hickey

unread,
May 15, 2012, 2:58:01 PM5/15/12
to clo...@googlegroups.com
I've written another post which goes into the reducers in more detail:

http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html

Rich

On May 10, 2012, at 1:26 PM, Christian Romney wrote:

>
>
> --
>

Robert McIntyre

unread,
May 15, 2012, 3:54:49 PM5/15/12
to clo...@googlegroups.com
There's a right parenthesis missing at
http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html :

Now:

(reduce + 0 (map inc [1 2 3 4]))
;;becomes
(reduce + 0 (reducer [1 2 3 4] (mapping inc)) < MISSING PAREN


under the heading "Reducers"


sincerely,
--Robert McIntyre, Dylan Holmes

Rich Hickey

unread,
May 15, 2012, 7:17:53 PM5/15/12
to clo...@googlegroups.com
Fixed - thanks.

Rich

Leif

unread,
May 20, 2012, 3:24:57 PM5/20/12
to clo...@googlegroups.com
From the article: "The combining fn must supply an identity value when called with no arguments"  This means that you can't use combining functions whose identity value can't be computed, but satisfies the proper rules.  E.g. set intersection:

(intersection) == e == the set of all possible elements of a set

Of course this arity doesn't exist, but the arity-1 could be viewed as shorthand for:
(intersection s) == s
; == (intersection s e) == (intersection e s)  ; if e could actually be computed

So, the new lib behaves slightly differently than core/reduce here:
(use 'clojure.set)
(require '(clojure.core [reducers :as r]))
(reduce intersection [#{1 2} #{2 3}])  ;==> #{2}
(r/reduce intersection [#{1 2} #{2 3}])  ;==> throws ArityException
; for completeness
(reduce intersection []) ;==> throws ArityException
(r/reduce intersection []) ;==> throws ArityException

It might fix things to special-case empty collections and make the "leaves" of the recursion single elements, but maybe functions with these weird non-computable identity elements, like set intersection, are too rare to bother.  I can't think of another one off the top of my head.

--Leif


On Tuesday, May 8, 2012 11:20:37 AM UTC-4, Rich Hickey wrote:

Leif

unread,
Jun 27, 2012, 7:57:39 PM6/27/12
to clo...@googlegroups.com
OK, the complete silence leads me to believe that not many people reduce with set/intersection.

However, I thought of a couple of other functions that don't work with the new framework: max and min.

user> (require '[clojure.core.reducers :as r])
user=> (reduce min [1 2])
1
user=> (r/reduce min [1 2])
ArityException Wrong number of args (0) passed to: core$min  clojure.lang.AFn.throwArity (AFn.java:437)

Min, like intersection, has a noncomputable "identity" value -- infinity.  And max and min are used in many more algorithms than intersection, so maybe this will start some discussion.

I really think that single elements are the place to bottom out the recursion.

Devin Walters

unread,
Jun 28, 2012, 12:30:37 AM6/28/12
to clo...@googlegroups.com
Christian: Thank you for asking for additional reading material.
Nicolas: Thank you for providing additional reading material.

Sean Corfield

unread,
Jun 28, 2012, 12:53:03 AM6/28/12
to clo...@googlegroups.com
On Wed, Jun 27, 2012 at 4:57 PM, Leif <leif.p...@gmail.com> wrote:
> Min, like intersection, has a noncomputable "identity" value -- infinity.
> And max and min are used in many more algorithms than intersection, so maybe
> this will start some discussion.

It seems to me the reducers framework is clearly documented to either
need a computable identity or to have an initial value. Given that, it
seems reasonable that for such functions that don't have a computable
identity you would need:

(r/reduce func (first coll) (rest coll))

[perhaps that's why there was no response in this thread?]
Reply all
Reply to author
Forward
0 new messages