A proposal for an alternative implementation of clojure.core/transduce

615 views
Skip to first unread message

Daniel James

unread,
Oct 15, 2014, 9:22:28 AM10/15/14
to clo...@googlegroups.com
Hi All,

I’d like to offer an alternative implementation of clojure.core/transduce.

(I’ve done some searching, and I didn’t find any evidence that this has been raised before, but my apologies if I missed anything.)

This alternative is best motivated by example. Consider the following transducers:

(defn init-with
 
[x]
 
(fn [rf]
   
(fn
     
([] (rf (rf) x))
     
([result] (rf result))
     
([result input]
         
(rf result input)))))

(defn complete-with
 
[x]
 
(fn [rf]
   
(fn
     
([] (rf))
     
([result]
         
(rf (rf result x)))
     
([result input]
         
(rf result input)))))

(defn dupl
 
[]
 
(fn [rf]
   
(fn
     
([] (rf))
     
([result] (rf result))
     
([result input]
         
(rf (rf result input)
             input
)))))

I have selected these as each illustrate something ‘interesting’ occurring in each of the three arities of the reducing functions produced.

Now consider the following transducer, which transforms numeric reducing functions.

(def xform (comp
           
(map inc)
           
(complete-with 10)
           
(map inc)
           
(init-with 100)
           
(map inc)
           
(dupl)))

The current behavior of transduce (and into, for illustration purposes) is:

(into [37] xform (range 5))
;=> [37 3 3 4 4 5 5 6 6 7 7 12 12]


(transduce xform + 37 (range 5))
;=> 111

(reduce + (into [37] xform (range 5)))
;=> 111

(transduce xform + (range 5))
;=> 74


So the dupl and complete-with transducers work as I expected, but the init-with does not.

What I was hoping for was:

;=> [37 101 101 3 3 4 4 5 5 6 6 7 7 12 12]

;=> 313

;=> 313

;=> 276


This is because transduce and into are currently implemented as something equivalent to the following:

(defn curr-transduce
 
([xform f coll]
     
(transduce xform f (f) coll))
 
([xform f init coll]
     
(let [rf (xform f)]
       
(rf (reduce rf init coll)))))


(defn curr-into
 
[to xform from]
 
(curr-transduce xform conj to from))

In the first arity of transduce, the initial value is taken from the reducing function f, not the reducing function produced by (xform f).
And in the second arity, the supplied initial value is supplied directly to the reduction.
These both bypass the transducer xform entirely, so the zeroth-arity is never invoked and no transformation occurs.

I would like to propose the following alternative implementation:

(defn alt-transduce
 
([xform f coll]
     
(let [rf (xform f)]
       
(rf (reduce rf (rf) coll))))
 
([xform f init coll]
     
(let [rf (xform
               
(fn
                 
([] init)
                 
([result] (f result))
                 
([result input] (f result input))))]
       
(rf (reduce rf (rf) coll)))))

In the first arity, the initial value is drawn from the transformed reducing function, rather than the supplied reducing function.
In the second arity, a reducing function is constructed from f and init to be f in arity 1 and 2, but return init in arity 0.
Both cases ensure that the transducer is used to transform all three components of the reduction: init, step, and completion.

Because of the way that the second arity is implemented, into doesn’t need to change:

(defn alt-into
 
[to xform from]
 
(alt-transduce xform conj to from))

The effect is that the reducing function passed to xform is conj, except in arity 0, where the `to` collection is returned.

These return the results I was originally expecting:

(alt-into [37] xform (range 5))
;=> [37 101 101 3 3 4 4 5 5 6 6 7 7 12 12]

(reduce + (alt-into [37] xform (range 5)))
;=> 313

(alt-transduce xform + 37 (range 5))
;=> 313

(alt-transduce xform + (range 5))
;=> 276


I went though the list of transducers currently provided in clojure.core and none of them do anything ‘interesting’ in zeroth-arity init component, so the alternate implementation would not change their observable behavior. This only affects transducers that do something other than the trivial implementation of the zero-arity. The discussion https://groups.google.com/d/topic/clojure/M-13lRPfguc/discussion was the only related one I could find.

Thanks,
Dan

Andy Fingerhut

unread,
Oct 15, 2014, 10:23:37 AM10/15/14
to clo...@googlegroups.com
Dan, I haven't read yours or Christophe Grand's articles thoroughly enough even to know whether your ideas are similar, but you may be interested to read this thread, and the blog posts written by Christophe, linked from his first message in this discussion thread: https://groups.google.com/forum/#!topic/clojure-dev/cWzMS_qqgcM

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel James

unread,
Oct 15, 2014, 12:34:39 PM10/15/14
to clo...@googlegroups.com
Hi Andy,

Thanks for the reference. I had already read Christophe’s blog posts, and had seen some of the the discussion around them. I can attempt to offer a comparison of what is being proposed.

Christophe is proposing changing the types. To illustrate this, in his formulation, the identity transducer would look like:

(def identity
 
(fn [rf]
   
([] (rf))
   
([input] (rf input))))

Transducers as they are in Clojure 1.7, the identity transducer is:

(def identity
 
(fn [rf]

   
([] (rf))
   
([result] (rf result))

   
([result input] (rf result input))))

In my proposal above, nothing is changing about the fact that transducers transform reducing functions to new reducing functions. The simple change is to use the reducing function that is produced by a transformation stack to seed the initial value of the reduction, rather than using the input reducing function (or explicit input seed value, in the second arity of transduce).

I hope that helps,

Dan

Michael van Acken

unread,
Oct 16, 2014, 1:54:03 AM10/16/14
to clo...@googlegroups.com
Am Mittwoch, 15. Oktober 2014 18:34:39 UTC+2 schrieb Daniel James:
[...]

In my proposal above, nothing is changing about the fact that transducers transform reducing functions to new reducing functions. The simple change is to use the reducing function that is produced by a transformation stack to seed the initial value of the reduction, rather than using the input reducing function (or explicit input seed value, in the second arity of transduce).


This is a change I am interested in as well.  Just yesterday I was bitten by the fact that in
the 3-arity case the init value of transduce defaults to (f) instead of ((xform f)):

While experimenting I was trying to splice a group-by-style transducer into an existing
(comp ...) chain.  This step works on a map valued accumulator, and needs an initial value
of {} irregardless of what (f) would provide.

-- Michael


Daniel James

unread,
Oct 16, 2014, 10:02:52 PM10/16/14
to clo...@googlegroups.com

On Thursday, October 16, 2014 1:54:03 AM UTC-4, Michael van Acken wrote:

This is a change I am interested in as well.  Just yesterday I was bitten by the fact that in
the 3-arity case the init value of transduce defaults to (f) instead of ((xform f)):

While experimenting I was trying to splice a group-by-style transducer into an existing
(comp ...) chain.  This step works on a map valued accumulator, and needs an initial value
of {} irregardless of what (f) would provide.

-- Michael

Hi Michael,

I’m glad you are in favor of this change; however, and with tongue firmly in cheek, you’ve taken a beautiful thing and corrupted it, which I can’t condone. ;)

Let me offer an explanation as to why I half-jokingly say this.

From what you’ve said, your “group-by-style transducer” is going to have the following form:

(fn [rf]
 
([] {})
 
...)

Irrespective of the rest of its implementation, it’s a function that transforms a reducing function into another reducing function, but it is most certainly *not* a transducer—a necessary, but not a sufficient condition.

Before I make an appeal to types, let me build some intuition by borrowing the analogy that Rich Hickey made in his talk at Strange Loop. He was saying that he wanted to give instructions to the baggage handlers about how to handle the baggage, without having to say anything about whether the baggage arrived on a trolley or by conveyor belt. By returning {} in the 0-arity, you are effectively saying, “No, it’s definitely going to be on a trolley!”.

Transducers have a type. Borrowing the from the exposition in
http://conscientiousprogrammer.com/blog/2014/08/07/understanding-cloure-transducers-through-types/
to a first order of approximation (ignoring initialization, stateful transducers, and early termination), but with no loss of generality with respect to this discussion, transducers can be typed as follows:

type Reducer a r = r -> a -> r

type
Transducer a b = forall r . Reducer a r -> Reducer b r

Your reducing-function transformation effectively has the type:

type SomethingElse a b r = Reducer a r -> Reducer b PersistentMap

No matter what reducing function comes in, a reducing function on persistent maps is going to come out the other side.

The `Transducer` type says: for all means of baggage conveyance, you can convey me some baggage, and I might transform it in some way, but no matter what I give you back, it will be on *exactly* the same means of baggage conveyance.

This is *extremely* important, and is the crux of what makes transducers great.

You have to promise to give up the right to know the type of the reduction—whether you are building a collection with `into`, or reducing to, say, a number with `transduce`, or transforming what flows through a channel—and in return your transducer can be used to transform values in any and all of those processes.

The rank-2 polymorphism used in the type of `Transducer` above expresses exactly this promise (the `forall r.` part).

If I call

(into [] xform (range 5))

(transduce xform + 0 (range 5))

I expect a vector and an integer to pop out, respectively, and I’m going to be sorely disappointed if a map pops out instead!


By the way, please, please, don’t read this as a polemic. I just want to be precise about the issue, and I am certainly sympathetic to “While experimenting I…”!


Note the 0-arity of my `init-with` transducer:

(defn init-with
 
[x]
 
(fn [rf]
   
(fn
     
([] (rf (rf) x))

     
...)))

I want to add an input `x`. I need to return some result, but the type of the result is unknowable to me. The only way I can construct one of these is with `rf`. The 2-arity of `rf` requires a value of this unknowable type as a first argument, and again the only way I can construct one of these is with `rf` (the 0-arity).

With just `x` and `rf` in scope, there are still an infinite number of plausible and valid implementations:

([] (rf))
([] (rf (rf) x))
([] (rf (rf (rf) x) x))
...

etc., etc.

Actually, this is not quite the case. Consider,

([]
   
(let [r1 (rf)
         r2
(rf r1 x)]
     
(rf r1 x)))

This will satisfy the rank-2 polymorphic type for `Transducer` above, but this will *not* lead to a valid transducer.
Here we used `(rf)` to get a seed `r1` to the reduction, we then fed `r1` and `x` into `rf` to get `r2`, but then we *discarded* `r2` and fed `r1` a *second* time into `rf`. It should be rather obvious that skipping over intermediate values of the reduction and letting them float off into the ether is incorrect.

To express this property of transducers, one would need not only the rank-2 polymorphism, but also linear typing
which would then adequately express that values produced by the reduction function supplied to a transducer must be used *exactly once*.


As it stands, the current implementation of the “transducible processes” `into`, `transduce`, and `sequence`, all ignore their given transducers when constructing the seed value… but that does mean that one can’t corrupt the seed value… of course you are still free to return any heinous value you like in the other arities if you so choose…


Ok, I better leave it there…

Dan

Michael van Acken

unread,
Oct 17, 2014, 2:58:20 AM10/17/14
to clo...@googlegroups.com
Am Freitag, 17. Oktober 2014 04:02:52 UTC+2 schrieb Daniel James:
Hi Michael,

I’m glad you are in favor of this change; however, and with tongue firmly in cheek, you’ve taken a beautiful thing and corrupted it, which I can’t condone. ;)

Let me offer an explanation as to why I half-jokingly say this.

From what you’ve said, your “group-by-style transducer” is going to have the following form:

(fn [rf]
 
([] {})
 
...)

Irrespective of the rest of its implementation, it’s a function that transforms a reducing function into another reducing function, but it is most certainly *not* a transducer—a necessary, but not a sufficient condition.

Before I make an appeal to types, let me build some intuition by borrowing the analogy that Rich Hickey made in his talk at Strange Loop. He was saying that he wanted to give instructions to the baggage handlers about how to handle the baggage, without having to say anything about whether the baggage arrived on a trolley or by conveyor belt. By returning {} in the 0-arity, you are effectively saying, “No, it’s definitely going to be on a trolley!”.

Hi Daniel,

I'm not quite sure if your trolley-analogy is viable here (but see below).  Please note that I'm working
within a reduce scenario: a "conveyor belt" is moving data to the reducing function, and once the
conveyor stops a single value is left behind -- the final result of the reducing fn.

The grouping I am using does two things within this metaphor: in the one direction, it splits one conveyor
into n independent conveyors; in the other direction, it combines n reduced results again into a single result.

In code it looks like this:

(defn group-by-xf [f]
 
(fn [rf]
   
(let [init (rf)]
     
(fn
       
([] {})
       
([result]
           
(reduce-kv (fn [acc k v] (assoc acc k (rf v)))
                     
{} result))
       
([result input]
           
(let [k (f input)
                 
group-result (get result k init)]
             
(assoc result k (rf group-result input))))))))


[...]


If I call

(into [] xform (range 5))

(transduce xform + 0 (range 5))

I expect a vector and an integer to pop out, respectively, and I’m going to be sorely disappointed if a map pops out instead!


In this case you might prefer to write something like (transduce xform (comp group-by-xf +) ...) instead
of (transduce (comp xform group-by-xf) + ...) . If I had used the first variant, then I would not have stumbled
over the default init value of transduce at all.  Mh.  Seen this way, I only have to adjust my code accordingly and can
continue to use the default init as it is.  Thank you for this insight!
 
By the way, please, please, don’t read this as a polemic. I just want to be precise about the issue, and I am certainly sympathetic to “While experimenting I…”!

No offense taken.  I do understand that beauty is always in the eye of the beholder.

Using the grouping function above I was able to take a gnarly aggregation function (basically a multi
level update-in plus a transformation on the leaves when done) and turn it into a sequence of
group-by-xf plus a much simpler reducing fn.  This is very beautiful to me ;-)

-- Michael


Michael van Acken

unread,
Oct 17, 2014, 3:42:15 AM10/17/14
to clo...@googlegroups.com
Correction after actually trying it out:

In this case you might prefer to write something like (transduce xform (comp group-by-xf +) ...) instead
of (transduce (comp xform group-by-xf) + ...) . [...]

The first variant must use a valid rf, and needs to be written
as (transduce xform ((group-by-xf odd?) +) ...)

In the second it should be e.g.
(comp xform (group-by-xf odd?))

-- Michael

Daniel James

unread,
Oct 17, 2014, 7:11:12 AM10/17/14
to clo...@googlegroups.com
Hi Michael,

As I said above, while this is certainly a function that transforms one reducing function into another, you should not call this a transducer. It is something else, with a distinct type.

Compare

type Transducer a b = forall r . Reducer a r -> Reducer b r

with

type SomethingElse a b r = Reducer a r -> Reducer b PersistentMap

The baggage handler analogy is getting at the idea that you really shouldn’t have to know anything about the conveyance, the `r` here. The rank-2 polymorphic type neatly enforces that a valid implementation *can’t* know. Your function does know, because it has removed the polymorphism and always reduces on maps—you’ve dictated that you are using map trolleys.

 


If I call

(into [] xform (range 5))

(transduce xform + 0 (range 5))

I expect a vector and an integer to pop out, respectively, and I’m going to be sorely disappointed if a map pops out instead!


In this case you might prefer to write something like (transduce xform (comp group-by-xf +) ...) instead
of (transduce (comp xform group-by-xf) + ...) . If I had used the first variant, then I would not have stumbled
over the default init value of transduce at all.  Mh.  Seen this way, I only have to adjust my code accordingly and can
continue to use the default init as it is.  Thank you for this insight!

I think you may have missed my point here.

Your group-by-xf function only makes sense in a specific interpretation of `transduce`, again indicating that it is not a valid transducer.

If `into` was changed to reflect my proposal, you would find it would blown up with your group-by-xf:

(into [1 2 3] (group-by-xf odd?) (range 100))

A map would be returned not a vector! Similarly, if you used it with `sequence`, you’d get a map, not a sequence; or with core.async channels, you’d corrupt the channel by replacing its buffer with a map.

  
By the way, please, please, don’t read this as a polemic. I just want to be precise about the issue, and I am certainly sympathetic to “While experimenting I…”!

No offense taken.  I do understand that beauty is always in the eye of the beholder.

Using the grouping function above I was able to take a gnarly aggregation function (basically a multi
level update-in plus a transformation on the leaves when done) and turn it into a sequence of
group-by-xf plus a much simpler reducing fn.  This is very beautiful to me ;-)

-- Michael

I agree that your group-by-xf is a nice formulation of a ‘group by’ operation. But I maintain that it should not be considered a valid transducer. It should only be used with `reduce`.


I hope that helps with making my point clear.

Dan
 

Daniel James

unread,
Oct 17, 2014, 9:01:51 AM10/17/14
to clo...@googlegroups.com
Just a quick follow up to add some clarifications…


On Friday, October 17, 2014 7:11:12 AM UTC-4, Daniel James wrote:

Hi Michael,

As I said above, while this is certainly a function that transforms one reducing function into another, you should not call this a transducer. It is something else, with a distinct type.

Compare

type Transducer a b = forall r . Reducer a r -> Reducer b r

with

type SomethingElse a b r = Reducer a r -> Reducer b PersistentMap

The baggage handler analogy is getting at the idea that you really shouldn’t have to know anything about the conveyance, the `r` here. The rank-2 polymorphic type neatly enforces that a valid implementation *can’t* know. Your function does know, because it has removed the polymorphism and always reduces on maps—you’ve dictated that you are using map trolleys.


I’ve been making some simplifications to abstract away from the unnecessary details, but I think I was a bit too loose when I said “it has removed the polymorphism”.

Your group-by-xf function has a type the can be characterized as the following: 

group-by-xf :: (a -> k) -> (forall. Reducer a r -> Reducer a (Map k r))
 

You are still polymorphic in the type of values that are being reduced, as well as in the type of the values of the map (you are using the input reducing function to construct the values of the map). Nonetheless, it is still remains something other than a legitimate transducer: you have specialized `Reducer a r` to `Reducder a (Map k r)`. And that’s the bit that counts.

 

I think you may have missed my point here.

Your group-by-xf function only makes sense in a specific interpretation of `transduce`, again indicating that it is not a valid transducer.

If `into` was changed to reflect my proposal, you would find it would blown up with your group-by-xf:

(into [1 2 3] (group-by-xf odd?) (range 100))

A map would be returned not a vector! Similarly, if you used it with `sequence`, you’d get a map, not a sequence; or with core.async channels, you’d corrupt the channel by replacing its buffer with a map.


Let me rephrase. As you said in your first reply, `transduce` *already* doesn’t jibe with your `group-by-xf`.

- transduce:
  it doesn’t call your 0-arity, so it will give you `init`, but you require it to be a map.
- into:
  it doesn’t call your 0-arity, so it will give you the `to`, a collection, but you require it to be a map.
- sequence:
  it doesn’t call your 0-arity, so it will give you a lazy sequence, but you require it to be a map.
- chan:
  it doesn’t call your 0-arity, so it will give you a channel buffer, but you require it to be a map.


*If*, these were all changed to call your 0-arity:

- transduce:
  you replaced `init` with {}, so the reduction returns a map.
- into:
  you throw away in output collection `to`, and return a map.
- sequence:
  exception??? whatever machinery is handling the laziness will surely not like being handed a map
- chan:
  exception??? whatever machinery is handling the channel will surely not like it when you replace the channel buffer with a map


Your function is good for use with `reduce`, but only that. I hope I’ve helped build an intuition for why it’s actually impossible to implement ‘group by’ as a transducer. 


Dan

Michael van Acken

unread,
Oct 17, 2014, 9:21:19 AM10/17/14
to clo...@googlegroups.com


Am Freitag, 17. Oktober 2014 15:01:51 UTC+2 schrieb Daniel James:
[...]

Your function is good for use with `reduce`, but only that. I hope I’ve helped build an intuition for why it’s actually impossible to implement ‘group by’ as a transducer. 

This is correct.  Unlike you, I am exclusively talking about the reduce case: transformations on reducing
functions, and the general init-reduce-complete cycle represented by the 0/2/1-arities of the reducing
functions.  My use case is a reduce using the reducing function (xform f), but by default transduce
does not pick this reducing function's 0-arity for its init value, but rather uses (f) instead.

-- Michael

Daniel James

unread,
Oct 17, 2014, 9:52:49 AM10/17/14
to clo...@googlegroups.com


On Oct 17, 2014 9:21 AM, "Michael van Acken" <michael....@gmail.com> wrote:
>
> This is correct.  Unlike you, I am exclusively talking about the reduce case: transformations on reducing
> functions, and the general init-reduce-complete cycle represented by the 0/2/1-arities of the reducing
> functions.  My use case is a reduce using the reducing function (xform f), but by default transduce
> does not pick this reducing function's 0-arity for its init value, but rather uses (f) instead.

Great, Michael, I think we're both on the same page.

After all that, I was ultimately arguing that it is technically incorrect to say:

“… trying to splice a group-by-style *transducer* into an existing
(comp ...) chain.”
(emphasis mine)

As you can tell, I come to transducers with type-theory in mind, but I'm actually equally interested in the the algebraic properties of transducers, if not more so. Why? Because these can be formulated as property-based tests (using test check, for example) and then used to search for counter examples against functions that claim to be transducers. I also feel that these, along with types, would be helpful in characterizing the boundaries of what can and can't be expressed as a transducer.

Dan

Michael van Acken

unread,
Oct 17, 2014, 10:24:32 AM10/17/14
to clo...@googlegroups.com


Am Freitag, 17. Oktober 2014 15:52:49 UTC+2 schrieb Daniel James:

[...]

Great, Michael, I think we're both on the same page.

After all that, I was ultimately arguing that it is technically incorrect to say:

“… trying to splice a group-by-style *transducer* into an existing
(comp ...) chain.”
(emphasis mine)

As you can tell, I come to transducers with type-theory in mind, but I'm actually equally interested in the the algebraic properties of transducers, if not more so.

Ah, understood.  I am using the term as in http://clojure.org/transducers, where it is defined as

A transducer (sometimes referred to as xform or xf) is a transformation from one reducing function to another:
;; transducer signature
(whatever, input -> whatever) -> (whatever, input -> whatever)

In my point of view, this is a general formulation that covers the composition of reducing functions both in a
process and a reduce scenario.

-- Michael

Daniel James

unread,
Oct 17, 2014, 10:48:02 AM10/17/14
to clo...@googlegroups.com

On Friday, October 17, 2014 10:24:32 AM UTC-4, Michael van Acken wrote:
Ah, understood.  I am using the term as in http://clojure.org/transducers, where it is defined as

A transducer (sometimes referred to as xform or xf) is a transformation from one reducing function to another:
;; transducer signature
(whatever, input -> whatever) -> (whatever, input -> whatever)

In my point of view, this is a general formulation that covers the composition of reducing functions both in a
process and a reduce scenario.


As I said before, this is a necessary, but not a sufficient condition. It does not tell the full story. That definition provides a useful introduction for building an intuition, but it is refined later in that document as well as in Rich Hickey’s talk at Strange Loop.

The promise of transducers is that you can give me one and I am free to use everywhere in a *open* world of transducible processes, `transduce`, `into`, `sequence`, `chan`, being initial examples. There is no way to write a transformation on reducing functions that provides the ‘group by’ semantics you are looking for, without breaking that contract.

Irrespective of whether you accept my claims about typing transducers[1], the very fact that your `group-by-xf` function does not operate correctly in this open world is the sufficient counter-example to show that it is not a ‘transducer’ in the precise sense.


Dan


[1]
“The rank-2 type in particular captures an important property.” Rich Hickey
Reply all
Reply to author
Forward
0 new messages