Datatypes and protocols - update

136 views
Skip to first unread message

Rich Hickey

unread,
Nov 30, 2009, 8:42:45 PM11/30/09
to Clojure
An updated version of the code for datatypes[1] and protocols[2] is
now available in the 'new' branch[3].

I have done a lot of work on performance, and refined the design. The
big news is that you can now directly implement a protocol inside a
deftype, and you can also reify protocols. This cements protocols as
the superior way to model the things for which they are suitable,
since they can match the performance of interfaces without their
limitations.

In addition, defprotocol automatically generates a corresponding
interface, which can be used to reach the protocol from Java with the
highest performance. This same interface is used internally by the
deftype and reify support described above.

Small changes include:

- No more use of . in reify/deftype method names
- No more implicit this in reify/deftype
- Protocols are supported directly, where interfaces are, in reify
and deftype (and my be intermingled).

I've striven to make these features consistent, flexible, and high
performance, with good support for dynamic interactive development. If
you have the time and inclination, please try them out.

Feedback is particularly welcome as they are being finalized.

Thanks,
Rich

[1] http://www.assembla.com/wiki/show/clojure/Datatypes
[2] http://www.assembla.com/wiki/show/clojure/Protocols
[3] http://github.com/richhickey/clojure/tree/new
[4] http://build.clojure.org/

Krukow

unread,
Dec 1, 2009, 2:59:43 PM12/1/09
to Clojure


On Dec 1, 2:42 am, Rich Hickey <richhic...@gmail.com> wrote:
> I have done a lot of work on performance, and refined the design. The
> big news is that you can now directly implement a protocol inside a
> deftype, and you can also reify protocols. This cements protocols as
> the superior way to model the things for which they are suitable,
> since they can match the performance of interfaces without their
> limitations.

First of all, I think this is a wonderful addition to the language.
I've tried what is in the "new" branch on a small but real example,
and I am quite happy with it: So thanks!

Could you go into more detail about how protocols and datatypes are
actually implemented, and the performance improvements you've recently
made? (probably I'm not the only one interested :-)

What is generated when I define a protocol, datatype and extend the
type to the protocol?

As I understand the performance of calling a protocol function matches
the performance of calling an interface method in Java. How is it
possible to achieve this in combination with the dynamic extensibility
of extend?

/Karl

Rich Hickey

unread,
Dec 1, 2009, 4:56:41 PM12/1/09
to Clojure


On Tue, Dec 1, 2009 at 2:59 PM, Krukow <karl....@gmail.com> wrote:
>
>
> On Dec 1, 2:42 am, Rich Hickey <richhic...@gmail.com> wrote:
>> I have done a lot of work on performance, and refined the design. The
>> big news is that you can now directly implement a protocol inside a
>> deftype, and you can also reify protocols. This cements protocols as
>> the superior way to model the things for which they are suitable,
>> since they can match the performance of interfaces without their
>> limitations.
>
> First of all, I think this is a wonderful addition to the language.
> I've tried what is in the "new" branch on a small but real example,
> and I am quite happy with it: So thanks!
>

You're welcome.

> Could you go into more detail about how protocols and datatypes are
> actually implemented, and the performance improvements you've recently
> made? (probably I'm not the only one interested :-)
>
> What is generated when I define a protocol, datatype and extend the
> type to the protocol?
>

A protocol is a data structure that contains a set of signatures, and
a set of implementations, which are maps of fn-name to fn, supplied by
explicit extenders. A protocol also generates a corresponding
interface, and protocol fns special-case that interface.

There are 2 ways to make a deftype reach a protocol. First, you can
implement the protocol directly in the deftype/reify, supplying the
protocol where you do interfaces, and the methods of the protocol as
methods of the type. The type will be made to implement the protocol's
interface. The second way, for types you don't control, is to use
extend-type/class/protocol, which will create method maps and register
them with the protocol.

Note that these do not differ in their support for dynamic use (i.e.
neither requires AOT), just that the former is type-intrusive (you
have to 'own' the type you are modifying) while the latter is not.

> As I understand the performance of calling a protocol function matches
> the performance of calling an interface method in Java. How is it
> possible to achieve this in combination with the dynamic extensibility
> of extend?
>

Different methods of implementing the protocol have different
performance. Implementing directly in deftype or reify is as fast as a
direct interface call. Using extend-* is not quite as fast, but still
fast. Both methods have direct support in callsites, so a call to a
protocol fn has support both for using the interface and caching
lookup results.

The most important thing is, writing to protocols gives you a dynamic,
open, extensible system not tied to derivation, and is fast, so a
great way to architect the polymorphic part of your designs (when
single-dispatch is appropriate).

Rich

Krukow

unread,
Dec 2, 2009, 12:29:12 AM12/2/09
to Clojure


On Dec 1, 10:56 pm, Rich Hickey <richhic...@gmail.com> wrote:
[snip]
> There are 2 ways to make a deftype reach a protocol. First, you can
> implement the protocol directly in the deftype/reify, supplying the
> protocol where you do interfaces, and the methods of the protocol as
> methods of the type. The type will be made to implement the protocol's
> interface.

OK. With extend you can use maps and merge to share implementations.
Does directly implementing the protocol in deftype allow also for
"abstract super-classes", i.e., sharing protocol-function
implementations across types?

[snip]

> Different methods of implementing the protocol have different
> performance. Implementing directly in deftype or reify is as fast as a
> direct interface call. Using extend-* is not quite as fast, but still
> fast. Both methods have direct support in callsites, so a call to a
> protocol fn has support both for using the interface and caching
> lookup results.

Great. So the preferred way for data-types in my program is to
implement the protocol directly, whereas for other types I can still
use my protocol with extend.

Just to confirm my understanding: Is it correct to say, for example,
that clojure.lang.Seqable will be a protocol implemented directly in
the Clojure data types, whereas it would reach the Java lang types
using extend?

In Clojure-in-Java interfaces like IPersistentCollection extend
Seqable: would these be unrelated type-wise as protocols?

> The most important thing is, writing to protocols gives you a dynamic,
> open, extensible system not tied to derivation, and is fast, so a
> great way to architect the polymorphic part of your designs (when
> single-dispatch is appropriate).
>
> Rich

I think these constructs will have great impact on how we will
structure our Clojure programs.

/Karl

Rich Hickey

unread,
Dec 2, 2009, 11:13:56 AM12/2/09
to Clojure


On Dec 2, 12:29 am, Krukow <karl.kru...@gmail.com> wrote:
> On Dec 1, 10:56 pm, Rich Hickey <richhic...@gmail.com> wrote:
> [snip]
>
> > There are 2 ways to make a deftype reach a protocol. First, you can
> > implement the protocol directly in the deftype/reify, supplying the
> > protocol where you do interfaces, and the methods of the protocol as
> > methods of the type. The type will be made to implement the protocol's
> > interface.
>
> OK. With extend you can use maps and merge to share implementations.
> Does directly implementing the protocol in deftype allow also for
> "abstract super-classes", i.e., sharing protocol-function
> implementations across types?
>

Right now you would just call an implementation helper inside your
method. I'm still considering if more support is needed, and what form
it might take.

> > Different methods of implementing the protocol have different
> > performance. Implementing directly in deftype or reify is as fast as a
> > direct interface call. Using extend-* is not quite as fast, but still
> > fast. Both methods have direct support in callsites, so a call to a
> > protocol fn has support both for using the interface and caching
> > lookup results.
>
> Great. So the preferred way for data-types in my program is to
> implement the protocol directly, whereas for other types I can still
> use my protocol with extend.
>
> Just to confirm my understanding: Is it correct to say, for example,
> that clojure.lang.Seqable will be a protocol implemented directly in
> the Clojure data types, whereas it would reach the Java lang types
> using extend?
>

Yes. extend is the key to removing the current (closed) multiway
conditionals in, e.g., RT.seq/seqFrom, and is much faster as well.

> In Clojure-in-Java interfaces like IPersistentCollection extend
> Seqable: would these be unrelated type-wise as protocols?
>

Yes. One of the reasons we use interface inheritance in a language
like Java is that, short of generifying everything, we have no way to
say:

foo(Counted+Sorted+Seqable+Collection coll){...}

We only get to specify one type, and any other interfaces it doesn't
imply require casts. So we use hierarchy to reduce the required
casting, but it has a cost in flexibility - i.e. you can't make a non-
Seqable collection, if that made sense.

In a dynamic language with protocols there is no reason to do it this
way. You don't need to have a single type imply multiple types via
hierarchy, and you don't need to declare anything. So each protocol is
a la carte, and a piece of code that requires the collection support
Counted, Sorted, Seqable and Collection protocols will simply use
those protocols, and work with anything that supports them. A type can
support any and just the protocols that make sense for it, without
bringing in others as a side effect of hierarchy.

Protocols are very much about polymorphism without hierarchy.

Rich

Krukow

unread,
Dec 2, 2009, 12:35:13 PM12/2/09
to Clojure
Thanks for sharing the insights.

/Karl

Konrad Hinsen

unread,
Dec 7, 2009, 3:11:14 AM12/7/09
to clo...@googlegroups.com
On 01.12.2009, at 02:42, Rich Hickey wrote:

> An updated version of the code for datatypes[1] and protocols[2] is
> now available in the 'new' branch[3].

This weekend I finally got around to converting all my deftype-and-
defprotocol-using code to the current Clojure new branch. It is now
more compact and more readable, and the few parts where performance
matters are faster. A big step forward!

> Small changes include:
>
> - No more use of . in reify/deftype method names
> - No more implicit this in reify/deftype

That's the only feature that I regret a bit. It looks weird in a
functional language to have functions (or something that is very
similar) that have an implicit argument named outside its definition
(in the :as option). Of course, one can think of it as similar to a
closure over the object, but it still looks a bit weird to me.

Konrad.

Rich Hickey

unread,
Dec 7, 2009, 6:53:38 AM12/7/09
to clo...@googlegroups.com
Yes, methods are not really functions. Thinking about them as closures
over the object is a good way to go - you can see that analogy in play
when you consider recur, which works with these methods, but could not
rebind 'this'. The recur case sealed the deal in the decision not to
include 'this' in the argument lists.

Rich

Hugo Duncan

unread,
Dec 7, 2009, 10:51:10 AM12/7/09
to clo...@googlegroups.com
On Mon, 07 Dec 2009 06:53:38 -0500, Rich Hickey <richh...@gmail.com>
wrote:

> Yes, methods are not really functions. Thinking about them as closures
> over the object is a good way to go - you can see that analogy in play
> when you consider recur, which works with these methods, but could not
> rebind 'this'. The recur case sealed the deal in the decision not to
> include 'this' in the argument lists.

I had a quick play with protocols, and the biggest problem I had getting
started was realising that the signature of a method definition in
defprotocol was different to the signature required to implement the same
method in deftype. FWIW, I found it very non-intuitive.

--
Hugo Duncan

Laurent PETIT

unread,
Dec 7, 2009, 12:07:12 PM12/7/09
to clo...@googlegroups.com


2009/12/7 Hugo Duncan <hugod...@users.sourceforge.net>

Hello,

And now that you've got it, do you still feel this  non-intuitive.
Because I had the same feeling first: I thought I would never rembember how things work and why put 'this-like args there, and not there ...
But now that everything "clicked in place", I feel the last status of what Rich achieved to do the most natural and intuitive.

Basically, what helped me was along the lines of what Konrad said :
 * defprotocol and extend are "purely functional" : so you have to specify every argument, including the object the functions acts upon.
 * deftype with embedded protocol definition for the type, or reify, in the contrary, do not define pure functions. They define methods. You cannot get them as values and pass them around like higher-order functions, for example. And you must know this fact, it cannot be an implementation detail. So, since you know this fact, you remember that you're in a method definition (in the general sense of object oriented languages : method of a class) and, as you do with e.g. java, C#, ... , when definining methods, you do not add the target of the method as an implicit argument.

The big advantage I see to this is that once you get it, you don't have anymore to remember where 'this is explicit and where it's implicit: it's intuitive.
The other big advantage is that the use of recur inside these functions/methods bodies continue to match exactly the signature of the function/method (otherwise you would have had to remember that, e.g. in methods defined via deftype, you must place an explicit "this" argument in the method arg list, but not place it in the recur calls ... :-( )

HTH,

--
laurent

ataggart

unread,
Dec 7, 2009, 2:23:54 PM12/7/09
to Clojure


On Dec 7, 9:07 am, Laurent PETIT <laurent.pe...@gmail.com> wrote:
> 2009/12/7 Hugo Duncan <hugodun...@users.sourceforge.net>
>
> > On Mon, 07 Dec 2009 06:53:38 -0500, Rich Hickey <richhic...@gmail.com>
That was my experience as well. It started off as a gotcha (since I
was copy/pasting the protocol definitions over to the deftype), but
then after playing for a bit it wasn't a big deal. In all my usages
so far I haven't needed to reference 'this'.

The one area I am running into issues is with being able to provide a
default implementation, or extending types such that I can override a
method.

For example, I have:

(defprotocol http-resource
(GET [res req resp])
(POST [res req resp])
(PUT [res req resp])
(DELETE [res req resp])
(HEAD [res req resp])
(OPTIONS [res req resp])
(TRACE [res req resp]))

But I considering the usage, most of those need not be implemented per-
type, and could all be defaulted to something like:

(deftype resource [] http-resource
(GET [req resp] (send-status! resp 405))
(POST [req resp] (send-status! resp 405))
(PUT [req resp] (send-status! resp 405))
(DELETE [req resp] (send-status! resp 405))
(HEAD [req resp] (send-status! resp 405))
(OPTIONS [req resp] (send-status! resp 405))
(TRACE [req resp] (send-status! resp 405)))

Alas I can't simply extend-type since that modifies the type, instead
of creating a new, modified type. And even then, methods in the
extension map don't get called if the method exists directly on the
type, i.e., no overriding.

I'm sure my problem is simply vestigial OO thinking, but I'm not sure
how to achieve the simplicity I want. The one route I tried that sort-
of works is making a macro to create the types, rather than extending
some extant implementation. The downside is I have to see which
methods I'm being given and only provide "defaults" for the method
names/arities that aren't.

It might be sufficient if there was some facility for cloning another
type and overriding certain methods, though I can foresee problems
dealing with managing the field definitions between the original and
the altered clone.

ataggart

unread,
Dec 7, 2009, 2:35:54 PM12/7/09
to Clojure
One idea: (defdefault name options* specs*)

Similar to deftype except without any field definitions (a simplifying
restriction), thus can only operate on their args. With that I could
do:

(defdefault base-resource http-resource
(GET [req resp] (send-status! resp 405))
(POST [req resp] (send-status! resp 405))
(PUT [req resp] (send-status! resp 405))
(DELETE [req resp] (send-status! resp 405))
(HEAD [req resp] (send-status! resp 405))
(OPTIONS [req resp] (send-status! resp 405))
(TRACE [req resp] (send-status! resp 405)))

Then modify deftype to include "defaults" in the spec* set:

(deftype user-resource [user-id] base-resource
(GET [req resp]
(if-let [u (find-user user-id)]
(send-entity! resp 200 u)
(send-status! resp 404))))





DTH

unread,
Dec 7, 2009, 2:59:49 PM12/7/09
to Clojure
On Dec 1, 9:56 pm, Rich Hickey <richhic...@gmail.com> wrote:
>
> There are 2 ways to make a deftype reach a protocol. First, you can
> implement the protocol directly in the deftype/reify, supplying the
> protocol where you do interfaces, and the methods of the protocol as
> methods of the type. The type will be made to implement the protocol's
> interface. The second way, for types you don't control, is to use
> extend-type/class/protocol, which will create method maps and register
> them with the protocol.
>

For the record; this means that you cannot implement two protocols
directly in deftype or reify if those protocols have fns with the same
names and signatures. You can, however, implement one protocol
directly, and then extend it to the other, as with the following,
rather silly, example:

user> (ns foo)
nil
foo> (defprotocol Foo (write [this]))
Foo
foo> (ns bar)
nil
bar> (defprotocol Bar (write [this badger]))
Bar
bar> (ns me)
nil
me> (defprotocol Me (write [this] [this badger]))
Me
me> (ns user)
nil
user> (deftype FooMe [a b c d] foo/Foo (write [] a) bar/Bar (write
[badger] b))
#'user/FooMe
user> (def fm (FooMe :foo :bar :me1 :me2))
#'user/fm
user> (foo/write fm)
:foo
user> (bar/write fm 1)
:bar
user> (extend-type ::FooMe me/Me (write ([this] (:c this)) ([this
badger] (:d this))))
nil
user> (me/write fm)
:me1
user> (me/write fm 1)
:me2
user> (foo/write fm)
:foo
user> (bar/write fm 1)
:bar
user>

which was unexpectedly sweet, yet totally consistent with your
explanation of how it works. Rich, you _are_ the Badgers Nadgers.

-Dave

Hugo Duncan

unread,
Dec 9, 2009, 10:35:50 AM12/9/09
to clo...@googlegroups.com
On Mon, 07 Dec 2009 12:07:12 -0500, Laurent PETIT
<lauren...@gmail.com> wrote:

> 2009/12/7 Hugo Duncan <hugod...@users.sourceforge.net>
>
>> On Mon, 07 Dec 2009 06:53:38 -0500, Rich Hickey <richh...@gmail.com>
>> wrote:
>>
>> > Yes, methods are not really functions. Thinking about them as closures
>> > over the object is a good way to go - you can see that analogy in play
>> > when you consider recur, which works with these methods, but could not
>> > rebind 'this'. The recur case sealed the deal in the decision not to
>> > include 'this' in the argument lists.
>>
>> I had a quick play with protocols, and the biggest problem I had getting
>> started was realising that the signature of a method definition in
>> defprotocol was different to the signature required to implement the
>> same
>> method in deftype. FWIW, I found it very non-intuitive.
>>
> And now that you've got it, do you still feel this non-intuitive.
> Because I had the same feeling first: I thought I would never rembember
> how things work and why put 'this-like args there, and not there ...
> But now that everything "clicked in place", I feel the last status of
> what Rich achieved to do the most natural and intuitive.


I'll no doubt get used to it :-) A couple of things that would have
helped me "get it":

From the deftype doc:

"Thus methods for protocols will take one fewer arguments than do the
protocol functions."

would (at least for me) be clearer as:

"Thus methods for protocols are implemented with one fewer argument than
in the protocol function definitions."


The example of a deftype protocol implementation that is in the
defprotocol doc string could be repeated in the deftype doc string.
That explanation certainly helps describe and clarify the reasons for the
difference. I still find it counter-intuitive that the definition follows
the syntax of the functional world, and the implementation that of an
object orientated world. However, I can't think of any suggestion to
resolve this, and as you say it does reflect the reality of the situation,
so I'll get used to it :-)

--
Hugo Duncan

Chris Kent

unread,
Dec 11, 2009, 8:48:56 AM12/11/09
to clo...@googlegroups.com
Rich Hickey <richhickey <at> gmail.com> writes:

>
> An updated version of the code for datatypes[1] and protocols[2] is
> now available in the 'new' branch[3].

I've converted some code that used gen-class to use deftype and defprotocol and
the results are great so far. The code is shorter, easier to write and the
intent is much clearer. I'm a big fan.

I've come across one problem though. I've created a type with deftype that
calls a function in one of its methods:

(ns ns1.deftypetest)

(defn bar [] "bar")
(defprotocol P (foo [p]))
(deftype T [] P (foo [] (bar)))

When I compile the namespace, create an instance of the type from Java and
invoke the foo method I get:

java.lang.IllegalStateException: Var ns1.deftypetest/bar is unbound.

I guess the namespace isn't getting loaded. Should this work? I've created a
class in the same namespace using gen-class and that has no problem invoking the
function when it's instantiated from Java.

Thanks
Chris


Jason Wolfe

unread,
Dec 11, 2009, 7:14:08 PM12/11/09
to Clojure
I've been trying out the new branch, and on the whole I like it a lot.
I know it'll take some time to learn how do things properly the "new"
way, and I've figured out how to do most of the things I want to do
thus far. Thanks, Rich!

One thing I haven't figured out how to do cleanly without inheritance
is to specify "properties" of objects in a hierarchical domain in a
clean, efficient way. I'm sure I haven't fully wrapped my head around
the new abstractions, so I'd love to hear about a clean way to solve
this problem.

A very simple example is: I have a protocol "A", and sub-protocols
"A1" and "A2". Every A is either an A1 or A2, but not both (and this
split is closed, as far as I'm concerned). Sometimes I want to deal
with instances of A1 and A2 together, and so I put the methods shared
between all "As" in protocol "A". But, at some point I need to
separate out the "A1"s from the "A2"s. To do this, it seems like I
have at least three options:

1. Add an "is-A1" method to Protocol A. The problem with this option
is that every type that derives from A1 needs to manually write out
this method returning "true", and vice-versa for implementers of A2.
Users could eliminate this by "extending" their types with a mixin map
to A, rather than implementing it directly in the deftype. But, this
sacrifices readability (IMO) as well as efficiency.

2. use (satisfies? A1 x) to determine if x satisfies A1. The main
problem with this, at least currently, is that satisfies? seems to be
really slow in the negative case. I profiled my (non-trivial) program
and half the runtime was going to reflection in satisfies? Moreover,
this solution is not as general.

3. Use a multimethod. This would work generally and be reasonably
efficient, but I feel like I'd be cluttering up my interface by mixing
up protocols and multimethods. On the other hand, I guess
multimethods are the main (only?) hierarchical construct built into
Clojure, so maybe this is what's intended.

So, which do people feel is preferred? Or have I missed a better
option?

Thanks!
Jason

ataggart

unread,
Dec 12, 2009, 2:44:54 AM12/12/09
to Clojure
If I understand Rich's reasoning, what you want runs antithetical to
the protocols design, namely it being explicitly non-hierarchical.

As such, you'd instead have 3 composable protocols, A, B, and C (where
B and C would correspond to the functions of A1 and A2, respectively):

(defprotocol A (a [x]))
(defprotocol B (b [x]))
(defprotocol C (c [x]))

And then the type would reify the appropriate protocols:

(deftype A1 [] A B
(a [] (println "in A1.a"))
(b [] (println "in A1.b")))

(deftype A2 [] A C
(a [] (println "in A2.a"))
(c [] (println "in A2.c")))


ataggart

unread,
Dec 12, 2009, 2:56:26 AM12/12/09
to Clojure
I should also note that isa? can be used for differentiation:

user=> (def my-a (A1))
#'user/my-a
user=> (isa? (type my-a) ::A1)
true
user=> (isa? (type my-a) ::A2)
false

ataggart

unread,
Dec 12, 2009, 3:02:05 AM12/12/09
to Clojure
Oh, and it occurs to me that you could create an after-the-fact
hierarchical relationship:

user=> (derive ::A1 ::A)
nil
user=> (derive ::A2 ::A)
nil
user=> (isa? (type my-a) ::A)
true

Konrad Hinsen

unread,
Dec 12, 2009, 7:12:52 AM12/12/09
to clo...@googlegroups.com
On 12 Dec 2009, at 01:14, Jason Wolfe wrote:

> A very simple example is: I have a protocol "A", and sub-protocols
> "A1" and "A2". Every A is either an A1 or A2, but not both (and this
> split is closed, as far as I'm concerned). Sometimes I want to deal
> with instances of A1 and A2 together, and so I put the methods shared
> between all "As" in protocol "A". But, at some point I need to
> separate out the "A1"s from the "A2"s. To do this, it seems like I
> have at least three options:

Those are the options you have in the world of protocols, types/
classes, and hierarchies. But there are many more options for
classifying objects. For example, you could have a set A1 to which you
add all the types in your A1 category. Or you could call functions
specific to A1 or A2 through a lookup table implemented as a map. I
can't judge if any of these would work fine for you, of course.

Konrad.

Rich Hickey

unread,
Dec 12, 2009, 10:01:07 AM12/12/09
to Clojure
Currently deftype classes do no automatic namespace loading, unlike
gen-class classes. The difference is gen-class classes are AOT-only,
whereas deftype classes can be defined dynamically, where reloading
namespaces would not be desired. I'm still thinking about how best to
support Java consumption of deftype classes. For now, you will need
some init glue code to load any support namespaces.

Rich
Reply all
Reply to author
Forward
0 new messages