Protocols, subclassing and the semantics of reify/proxy

508 views
Skip to first unread message

Antony Lee

unread,
May 27, 2012, 9:30:05 PM5/27/12
to clojure...@googlegroups.com
Some Python standard classes, such as cmd.Cmd (http://docs.python.org/library/cmd.html) are here 1/ only to be subclassed, and 2/ expect their subclasses to define arbitrary methods (in the case of cmd.Cmd, methods that start with "do_*").
As a consequence, I believe it is needed to actually support easy subclassing, and method definition should not be limited to protocols or interfaces.  Yes, that feels unclojuresque, but well -- better approaches are welcome.

Apparently the plan is to support only reify and to use it both for actual subclassing (https://github.com/halgari/clojure-py/issues/61) and for protocol extension.  But these are very different beasts, because protocol extension should not be simulated by subclassing (otherwise one could not extend protocols to classes that already exist).  So I guess we could have the following syntax:

(proxify superclasses ;; a better name is welcome
  (method-name [args+] body)*
  ...
  protocol
  (method-name [args+] body)*
  protocol
  (method-name [args+] body)*
  ...)

or, more formally

(proxify superclasses method* specs*)
superclasses => a vector of superclasses
specs => protocol method*
method => (method-name [args+] body) ('self (rather than this) is explicit, in Python tradition)

The methods before the first protocol are actual methods of the class of the object created by reify.  The methods after the first protocol are implementations of protocolfns, and NOT methods (i.e. cannot be called as (.method-name args+), only as (protocolfn args+)).  In particular it should be allowed to have the same function-name appear twice, once as a method and once as a protocolfn implementation.  (Or, actual methods could come last, e.g. introduced by an :else (or other) keyword.)

Compare with what clojure-jvm provides:

(reify
  protocol-or-Object
  (method-name [args+] body)*
  protocol-or-Object
  (method-name [args+] body)*
  ...)
=> Does not allow (multiple) inheritance (except, de facto, from Object), or definition of methods outside protocols.

(proxy class-and-interfaces args
  (method-name [args*] body)*)
=> Actually looks more suitable for the job (allows inheritance, and definition of arbitrary methods).  However, the "class-and-interfaces" vector should really become "superclasses-and-protocols"; and we should separate superclasses (to be inherited from) and protocols (to be extended).  Moreover, it is not clear if a method-name should be used also as the implementation of a protocolfn.  So we may as well remove the protocols from the "superclasses-and-protocols" vector and put them directly with their implementations.

What do you think?

Antony

Konrad Hinsen

unread,
May 28, 2012, 1:47:45 AM5/28/12
to clojure...@googlegroups.com
Antony Lee writes:

> Some Python standard classes, such as cmd.Cmd
> (http://docs.python.org/library/ cmd.html) are here 1/ only to be
> subclassed, and 2/ expect their subclasses to define arbitrary
> methods (in the case of cmd.Cmd, methods that start with "do_*").
> As a consequence, I believe it is needed to actually support easy
> subclassing, and method definition should not be limited to
> protocols or interfaces.  Yes, that feels unclojuresque, but well
> -- better approaches are welcome.

This raises once more the question of what the priority for ClojurePy
is: staying close to JVM-Clojure, or close to the Python platform.

Python is obviously much more dynamic than Java, and it has the
tradition of defining interfaces as mere conventions, with perhaps
some support code once an interface is recognized as important (I am
thinking of abc here). Clojure's protocols are more formalized than
Python's interface conventions, and I think that's something worth
bringing to the Python platform.

On the other hand, examples such as Cmd illustrates how Python is too
much anchored in the OO world. The Clojure way of doing something like
Cmd is maps of symbols to functions. For interop reasons there should
be some way to work with Cmd-like frameworks, but I wouldn't use them
as a criterion for choosing a representation of Clojure protocols.

Konrad.


Timothy Baldridge

unread,
May 28, 2012, 11:18:07 AM5/28/12
to clojure...@googlegroups.com
As reify works now, it doesn't keep you from inheriting from multiple
interfaces, but you're right, we would run into problems when there's
conflicting method names. I suggest we fix reify to use extend, and
don't have reified classes inherit from the protocol interfaces. Let
me explain:

(reify
cmd.Cmd
(do_foo [x] 43)
MyInterface
(foo [x] 42) ;; interface, so we add this to superclasses
bar/MyConflictingProto1
(foo_proto [x] 42) ;; does not get added to superclasses, but
calls extend for MyConflictingProto1
baz/MyConflictingProto1
(foo_proto [x] 43)) ;; does not get added to superclasses, but
calls extend for MyConflictingProto1


We do not keep users from defining extra methods not covered in the
parent classes (as in cmd.Cmd) but it is discouraged. Instead users
should create an interface and put do_foo into there.

To answer Konrad's question, my view has been that we should try to
have Clojure-Py run Clojure code but not always the other way around.

I'm not sure if that answers all your questions, but those are my
thoughts for now. Yes if we


Timothy

Antony Lee

unread,
May 29, 2012, 6:12:21 PM5/29/12
to clojure...@googlegroups.com
The protocols branch on my repo now implements something like that for deftype and reify (and defrecord, but I'd like to move that one back to ad-hoc namedtuples).  I also had to change the multimethods implementation, as the __call__ method was supplied through a protocol (IMultiFn), which is of course not enough to make multimethods callable.  Likewise, the with-open test now fails because the __enter__ and __exit__ are no longer defined in the type, but as protocolfns.  Yes, a big chunk of Python assumes that methods are defined in classes...  A better approach is welcome.
(Less importantly, "first" now also calls the protofn defined in the ISeq protocol, instead of the .first method.)
Antony

2012/5/28 Timothy Baldridge <tbald...@gmail.com>

Antony Lee

unread,
May 29, 2012, 7:35:32 PM5/29/12
to clojure...@googlegroups.com
I guess we have to accept for now that special methods such as __enter__ and __exit__ cannot be overridden in a protocol; rather they must be "set in the py/object protocol":
(reify py/object (__enter__ [self] etc.))

Except for that glitch I think the move from "interfaces" to protocols is going to be relatively easy now: in my protocols branch I was already able to get entirely rid of the ISeq "interface", and it is probably only a matter of patience to replace all the interface inheritances by protocol extensions:

class Foo(Interface, Bar)
=>
@protocol.extends(Interface)
class Foo(Bar)

Note that there is no "inheritance of protocols" so there are actually a lot of extensions to be written... but that's probably not the hardest part :-)
Antony

2012/5/29 Antony Lee <anton...@berkeley.edu>

Antony Lee

unread,
May 29, 2012, 11:09:52 PM5/29/12
to clojure...@googlegroups.com
Well, apparently subclasses do inherit their parents' protocols in clojure-jvm (the code in core_deftype.clj explicitely walks through the superclasses to find an implementor):
user=> (defprotocol P (foo [self]))
P
user=> (extend-type Object P (foo [self] "obj"))
nil
user=> (foo 1)
"obj"

It's only a matter of a couple of lines to walk through the MRO in Python (again, see my repo) but I am becoming very confused... by what follows (still in clojure-jvm).
user=> (extends? P Object)
true
user=> (extends? P Long)
false

Now I really have no idea of what the semantics of extends? is: (extends? P Cls) == True doesn't imply that I can call foo on an object of class Cls (the implementation may be missing); (extends? P Cls) == False doesn't imply that I cannot call foo on a object of class Cls (the implementation may be provided by a super-class).

Ugh.

Antony Lee

unread,
May 30, 2012, 6:23:29 PM5/30/12
to clojure...@googlegroups.com
Random thoughts...

If I understand the clojure model well, protocols should never modify the classes they extend (well, in the Java world that is not possible anyways AFAIK).  The fact that deftype creates a Java interface (and thus allows calling protocolfns through interop) is an "implementation detail" (http://stackoverflow.com/questions/5605192/mocking-clojure-protocols) that should not be relied on; after all protocol extensions that are not provided inline (i.e., that are provided through extend-type) are not callable through interop.

The current implementation of protocols in clojurepy uses multiple approaches: for non-modifiable classes (e.g. Python builtins), store the implementor fn in a dispatch table carried by the protocolfn; for modifiable classes, store the implementor fn in a new __proto__<fname> attribute of the class, for extra efficiency; for the classes needed for bootstrapping (PersistentList, etc.), use inheritance first, and then do some magic to get back to the previous case.  As a side note, somehow it feels like adding the __proto__<fname> field breaks the whole contract of protocols (which is to leave classes intact); if one says that we should exploit the openness of Pythonic classes (compared to Java classes) then we may as well throw the whole concept of independent protocols away.  Also, using inheritance will not work to extend protocols to classes after class definition-time, so I'd like to get rid of it (even more).  Yes, it has the nice side effect of adding the protocolfns as actual methods but we can't rely on that.

But my main point is that there is already a Pythonic notion of protocols, that is checked through the presence of special methods.  For example, the Counted protocol, in clojure-py, is defined by a __len__ method.  Unfortunately, extending Counted to a class (after definition-time, not by inheritance) will not work as expected, because it will allow (count foo), but not (py/len foo) (or len(foo) from the Python side), because the class will have a __proto____len__ method, not a __len__ method.  Sure, I can call RT.count(foo) in this case but other special methods (__enter__ and __exit__, for example) that are not that easy to solve.  Of course, we could special-case protocolfns whose name start and end with "__" so that they are *always* installed as new methods on the class itself and *always* called directly, but that feels awkward to me...  Still, I am looking for better options.

Timothy Baldridge

unread,
May 31, 2012, 10:20:41 AM5/31/12
to clojure...@googlegroups.com
I agree, it's ugly that we're modifying the classes to add protocols,
and I really don't like it either.

So here's an idea. Technically protocols are nothing more than
multimethods that dispatch on type. Source for Clojure Multimethods:
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/MultiFn.java

So we basically could implement protocols thusly:

(defmulti foo py/type)

(defmethod foo float "float")
(defmethod foo object "object")

(foo 1.0)
"float"
(foo (py/object))
"object"
(foo 1)
"object"


This won't be super fast, but it'll compatible with Clojure.

So my excursions into PyPy have been going well. I'm thinking more and
more that we should make Clojure-Py be the "Runs on CPython, may be
slow" version, and Clojure-PyPy be the "High-performance, but requires
a custom copy of PyPy"

Things like these multimethods can be extremely fast in PyPy (the
basically get optimized away by the JIT). So I'm not sure if it's all
that bad if we're not super fast in CPython. I'd probably recommend
against implementing +, -, / and * with them though.
--
“One of the main causes of the fall of the Roman Empire was
that–lacking zero–they had no way to indicate successful termination
of their C programs.”
(Robert Firth)

Antony Lee

unread,
May 31, 2012, 7:29:39 PM5/31/12
to clojure...@googlegroups.com
Sounds good -- at least, I'm personally very happy to drop the
redundancy between multimethods and protocols. However, we'll probably
need to add some logic to the protocolfn-multimethod hierarchy to follow
Python's MRO in that special case.

As for __special_methods__, I guess we could add some options to
defprotocol:

(defprotocol ContextManager
(__enter__ ^:real-method [self])
(__exit__ ^:real-method [self exc-type exc-val tb]))

in which case 1/ no protocolfns are generated and 2/ calls to
extend-type (or reify, etc) directly add the protocols to the classes
themselves (with the risk of overriding a preexisting implementation --
or we could throw an error in that case).

By the way, this reminds me of something else: is there any plan to
check function signatures? I've toyed with the inspect module before
and that's definitely possible; in fact, as much as protocols "annoy"
me as slighly "unpythonic", I would be happy to (have a way to) enforce
signature compatibility.

Antony

Antony Lee

unread,
Jun 1, 2012, 1:12:26 AM6/1/12
to clojure...@googlegroups.com
I'm not done yet... I've just noticed that in the Python world, it is
probably *not* true, actually, that protocols are just multimethods that
dispatch on type, because of multiple inheritance. At least, in my
understanding, I expect protocols to follow the MRO, whereas
multimethods have no reason to do so (special-casing multimethods that
dispatch on py/type would be very ugly indeed); moreover the MRO cannot
be "simulated" using clojure's ad-hoc hierarchies. Let's say for
example that we have

# assume this implements a foo protocolfn using a foo method in the
# class:
@protocol.extends(Foo)
class A(object):
def foo(self): pass

@protocol.extends(Foo)
class B(object):
def foo(self): pass

class C(A, B): pass
class D(B, A): pass

(foo (C)) should call foo_A, but (foo (D)) should call foo_B, so neither
(prefer-method foo A B) not (prefer-method foo B A) is going to help us.

Also, protocols should probably also take care of abstract methods:

@protocol.extends(Foo)
class A(object)
__metaclass__ = ABCMeta
@abstractmethod
def foo(self): pass

class B(A):
def foo(self): return 42

(foo (B)) should return 42. Indeed, for example, ASeq extends the ISeq
protocol, but does not define first and next, so (first foo) (when foo is
an ASeq) should call type(foo).seq (modulo Pythonic complications
regarding __getattribute__ etc etc.), and we don't really want to
repeat, for each subclass of ASeq, that it extends ISeq.

(This is actually easy to do, and already implemented in my code:
basically, if protocolfn.dispatchtable[type(args[0])] is a function fn
that is an abstract method (i.e., that sets __isabstractmethod__ to
True), then instead of calling it on the args, call getattr(args[0],
fn.__name__) on the args.)

Anyways, all this probably means that we're back to (mostly) separate
implementations of multimethods and protocolfns.

Antony

PS: still on the issue of protocols for special methods, probably we
want to be able to still specify an associated protocolfn:

(defprotocol Foo
(__bar__ ^:real-method [self]) ; don't create a protocolfn
(__baz__ ^{:real-method baz} [self])) ; create a baz protocolfn

Konrad Hinsen

unread,
Jun 1, 2012, 7:19:56 AM6/1/12
to clojure...@googlegroups.com
Antony Lee writes:

> I'm not done yet... I've just noticed that in the Python world, it is
> probably *not* true, actually, that protocols are just multimethods that
> dispatch on type, because of multiple inheritance. At least, in my
> understanding, I expect protocols to follow the MRO, whereas

Why would you expect protocols to follow the MRO, and how? Protocols
don't provide inheritance. In fact, nothing in Clojure provides
inheritance, except low-level Java interop stuff. I'd say that MRO
is not an issue.

The example you give requires a Python subclass of a type for which
a protocol extension has been defined at the Clojure level. This is
"advanced Python interop" to me. It should do something predictable,
but that could even be 'raise an exception' as far as I am concerned.

Konrad.

Antony Lee

unread,
Jun 1, 2012, 12:16:20 PM6/1/12
to clojure...@googlegroups.com
Clojure-JVM's protocols do provide inheritance: try extending some
protocol to Object, and you'll see that Long also extends that protocol.
If you have a look at the source, this even requires some code that
explicitly walks through the list of superclasses to find an
extender (the super-chain fn in core_deftype.clj, etc.). You can more
or less copy that approach to Python, except, of course, that __mro__ is
already here for you to walk through.

Also, there is a practical reason why I would like protocols to follow
inheritance, and that is if we want to move to a protocol-based
implementation rather than an "interface" (whatever this means for
Python)-based one: as I said in the previous email, it would be tricky
if we had to mark *each* subclass of ASeq as extending ISeq, instead of
just marking ASeq itself as extending ISeq. I can understand that this
is a relatively minor argument compared to whether this is good language
design but it's still there.

Antony

David Nolen

unread,
Jun 1, 2012, 12:37:26 PM6/1/12
to clojure...@googlegroups.com
On Fri, Jun 1, 2012 at 11:16 AM, Antony Lee <anntz...@gmail.com> wrote:
Clojure-JVM's protocols do provide inheritance: try extending some
protocol to Object

This is not about inheritance, this about providing default behavior. I recommend examining closely how these bits were implemented by Rich Hickey in ClojureScript.

David 

Konrad Hinsen

unread,
Jun 1, 2012, 12:53:36 PM6/1/12
to clojure...@googlegroups.com
Antony Lee writes:

> Clojure-JVM's protocols do provide inheritance: try extending some
> protocol to Object, and you'll see that Long also extends that protocol.

Right. I looked at the docs again, and found something even weirder:
you can extend nil and Object, where extending to Object applies to
anything but nil. That makes sense on the JVM, where nil is the
null-pointer, but not with Python, where nil is None, whose type is a
subclass of object.

If only Clojure had a language spec independent of the idiosyncrasies
of the JVM...

Konrad.

David Nolen

unread,
Jun 1, 2012, 1:27:34 PM6/1/12
to clojure...@googlegroups.com
On Fri, Jun 1, 2012 at 11:53 AM, Konrad Hinsen <google...@khinsen.fastmail.net> wrote:
Antony Lee writes:

 > Clojure-JVM's protocols do provide inheritance: try extending some
 > protocol to Object, and you'll see that Long also extends that protocol.

Right. I looked at the docs again, and found something even weirder:
you can extend nil and Object, where extending to Object applies to
anything but nil. That makes sense on the JVM, where nil is the
null-pointer, but not with Python, where nil is None, whose type is a
subclass of object.

Object is obscuring the purpose - it's just for providing default. In CLJS you have the same cases - better named:

extend-type default
extend-type nil
extend-type specific-type 

Antony Lee

unread,
Jun 1, 2012, 6:42:12 PM6/1/12
to clojure...@googlegroups.com
If the goal was to only provide default implementation through Object
then there are quite a few holes...

user=> (defprotocol P (foo [self]))
P
user=> (extend java.util.HashMap P {:foo (fn [self] (.size self))})
nil
user=> (foo (java.util.LinkedHashMap.))
0

and again I don't know why one would want to prevent a
java.util.LinkedHashMap from using the extension provided by
java.util.HashMap to P.

===

As for CLJS (sorry, I discovered JavaScript approximately 1 hour ago):

(def C (js* "function C(arg) {this.arg=arg;};"))
(def D (js* "function D(arg) {this.arg=arg;};"))
(js* "~{D}.prototype = new ~{C}()")
(def c (C.))
(def d (D.))
(defprotocol P (foo [self]))
(extend-type C P (foo [self] 42))
(foo c) ; => 42
(foo d) ; => 42

===

As for nil, I don't see a problem if extending to py/object also extends
to py/NoneType. After all:
CLJJ: (isa? (class nil) java.lang.Object) ; => false
CLJP: (isa? (class nil) object) ; => true

===

Konrad is right, having a spec that's independent of the quirks of the
JVM would be nice...

Antony

Antony Lee

unread,
Jun 1, 2012, 6:46:38 PM6/1/12
to clojure...@googlegroups.com
Sorry, previous message got garbled somehow (still learning to configure mutt :-)), so I'll reproduce it here:


If the goal was to only provide default implementation through Object
then there are quite a few holes...

user=> (defprotocol P (foo [self]))
P
user=> (extend java.util.HashMap P {:foo (fn [self] (.size self))})
nil
user=> (foo (java.util.LinkedHashMap.))
0

and again I don't know why one would want to prevent a
java.util.LinkedHashMap from using the extension provided by
java.util.HashMap to P.

===

As for CLJS (sorry, I discovered JavaScript approximately 1 hour ago):

(def C (js* "function C(arg) {this.arg=arg;};"))
(def D (js* "function D(arg) {this.arg=arg;};"))
(js* "~{D}.prototype = new ~{C}()")
(def c (C.))
(def d (D.))
(defprotocol P (foo [self]))
(extend-type C P (foo [self] 42))
(foo c) ; => 42
(foo d) ; => 42

===

As for nil, I don't see a problem if extending to py/object also extends
to py/NoneType.  After all:
CLJJ: (isa? (class nil) java.lang.Object) ; => false
CLJP: (isa? (class nil) object) ; => true

===

Konrad is right, having a spec that's independent of the quirks of the
JVM would be nice...

Antony

2012/6/1 David Nolen <dnolen...@gmail.com>

David Nolen

unread,
Jun 1, 2012, 7:06:52 PM6/1/12
to clojure...@googlegroups.com
2012/6/1 Antony Lee <anton...@berkeley.edu>
user=> (defprotocol P (foo [self]))
P
user=> (extend java.util.HashMap P {:foo (fn [self] (.size self))})
nil
user=> (foo (java.util.LinkedHashMap.))
0

The behavior is a host-y convenience. Nothing critical to the idea behind protocols.
 
As for CLJS (sorry, I discovered JavaScript approximately 1 hour ago):

(def C (js* "function C(arg) {this.arg=arg;};"))
(def D (js* "function D(arg) {this.arg=arg;};"))
(js* "~{D}.prototype = new ~{C}()")
(def c (C.))
(def d (D.)) 
(defprotocol P (foo [self]))
(extend-type C P (foo [self] 42))
(foo c) ; => 42
(foo d) ; => 42

Another host-y inheritance side-effect. Nothing specifically stated to work. Also there are no interfaces in JS so not really totally analogous to what's possible on the JVM.

Protocols are there to implement Clojure - built upon whatever constructs the host provides best suited for the purpose. I think implementations may choose to leverage some host specific conveniences when extending host types.

But it's important to note that you can't extend-type on a protocol on the JVM w/o going under the hood. protocols are not about inheritance.

David

Timothy Baldridge

unread,
Jun 1, 2012, 9:56:53 PM6/1/12
to clojure...@googlegroups.com
> But it's important to note that you can't extend-type on a protocol on the
> JVM w/o going under the hood. protocols are not about inheritance.

Right, I don't think we were ever questioning that. What we were
questioning is if the following should be supported:

(defprotocol IFoo (foo [x]))

(extend-type clojure.lang.AFn IFoo (foo [x] "AFn"))

(foo clojure.core.cons) ; => "AFn"


So it's a question of how we respect inheritance on the dispatched
type, not of inheritance of the actual protocol.

Timothy
Reply all
Reply to author
Forward
0 new messages