And pretty soon, I hit an obvious design problem (which is okay,
really - this is still an exercise in exploring the right approach).
Namely, lots of languages actually have two namespaces when it comes
to accessing data belonging to an object. I'll refer to one of them as
"attributes" (as that's what they're called in both Ruby and Python)
and the other are "items", which are elements of some container
object. All objects have attributes, but only some objects
(containers) have items.
Some languages don't make a distinction, most notably, JavaScript. In
JavaScript, all objects are containers and they only have items (and
an item can be a function, in which case it functions as a method on
the object). Can't get much more generic than that, right? Other
languages (Ruby, Python) will distinguish between the two; the
containers are arrays/lists and hashes/dictionaries/maps. As a matter
of fact, it helps thinking of Java as having the distinction -
JavaBeans properties are the attributes, and arrays, Maps, and Lists
will have items.
I'd like to think that most people's mental model of objects actually
distinguishes the two. This of course, just as everything else I say
here, is open to debate (and I'd actually welcome a good debate to
iron things out).
Now, to make matters a bit more complicated, in lots of languages the
container API is actually just a syntactic sugar. Give an object a []
and a []= method, and it's a container in Ruby! Give it __getitem__,
__setitem__, and few others, and it's a container in Python! Honestly,
this is okay - as a byproduct of duck typing, one shouldn't expect
there be any sort of an explicit declaration of "containerness", right?
Bottom line is, I feel this is a big deal to solve in interoperable
manner, as the raison d'être of the MOP would be to allow
interoperability between programs written in different languages
within a single JVM; I imagine in most cases the programs will pass
complex data structures built out of lists and dictionaries to one
another, so it feels... essential to get this right. It also feels
like something that can rightfully belong in a generic MOP as most
languages do have the concept of ordered sequences and associative
arrays. Of course, I might also be wrong here; it is also an essential
goal to not end up with a baroque specification that contains
everything plus the kitchen sink. I.e. you might notice that my
current MOP effort for now doesn't have a concept of the class, as not
all languages have a class concept; funnily enough the concept of
sequences and maps actually looks more important for interop (since
it's more general) to me right now than the concept of a class.
So, here am I wondering whether this is something that can be made
sufficiently unified across the languages to the point that if a Ruby
program is given a Python dictionary, and it calls []= on it, it
actually ends up being translated into a __setitem__ call. The goal
seems worthwhile, and is certainly possible but I'm not entirely sure
how much of an effort will it take. There's only one way to find out
though (doing it), but I'd really appreciate some debate and feedback
here before I embark on this.
Thanks,
Attila.
Attila Szegedi wrote:
> So, here I am trying to further my metaobject protocol library. In
> order to get further with it, I tried the eat-my-own-dog-food approach
> and decided to - after having written a MOP for POJOs - to try and
> write actual MOP implementations for some dynamic language
> implementations, most notably, Jython and JRuby.
Excellent. JRuby 1.1 will be out soon, and then this sort of work (and
the improved integration with Java and other languages it brings) will
become a high priority for us as well.
> And pretty soon, I hit an obvious design problem (which is okay,
> really - this is still an exercise in exploring the right approach).
> Namely, lots of languages actually have two namespaces when it comes
> to accessing data belonging to an object. I'll refer to one of them as
> "attributes" (as that's what they're called in both Ruby and Python)
> and the other are "items", which are elements of some container
> object. All objects have attributes, but only some objects
> (containers) have items.
Some clarification for Ruby. Ruby doesn't distinguish between the public
representation of attributes and methods. Attributes *are* just accessor
methods that return values. There's no way to iterate only attributes or
only methods, because they are the same structure and stored in the same
way. But Ruby does have a concept of instance variables, represented as
always-protected entries in a per-object dictionary. The set of instance
variables is not determined ahead of time, and can grow as the program
runs. The only way to access an object's instance variables is from
within that object...or by wiring up attribute accessors (which just
creates methods.
I think the MOP would be entirely satisfactory if it represented only
methods for Ruby, but I understand this may not be the case for other
languages.
> Some languages don't make a distinction, most notably, JavaScript. In
> JavaScript, all objects are containers and they only have items (and
> an item can be a function, in which case it functions as a method on
> the object). Can't get much more generic than that, right? Other
> languages (Ruby, Python) will distinguish between the two; the
> containers are arrays/lists and hashes/dictionaries/maps. As a matter
> of fact, it helps thinking of Java as having the distinction -
> JavaBeans properties are the attributes, and arrays, Maps, and Lists
> will have items.
It's also important to note here that in general Ruby is more about
"maps" than "lists". There is a core class that's a list, but methods
and instance variables and global variables and constants are all held
in hash-like structures. But true, they're not all the same exact
structure, and most of them you can't really access as primitive data
structures.
> Now, to make matters a bit more complicated, in lots of languages the
> container API is actually just a syntactic sugar. Give an object a []
> and a []= method, and it's a container in Ruby! Give it __getitem__,
> __setitem__, and few others, and it's a container in Python! Honestly,
> this is okay - as a byproduct of duck typing, one shouldn't expect
> there be any sort of an explicit declaration of "containerness", right?
In Ruby, the container API is defined by more than [], really. [] and
[]= are just method calls...syntactic sugar for foo.[](key) and
foo.[]=(key,value). They may make an object look like a collection, but
they don't mean it *is* a collection. And they're frequently used for
other syntactic magic. However...your thoughts about defining some of
these collection operations as an additional protocol across languages
is a great one. CLR already has low-level operations to abstract
collection gets/sets, which get translated into collection operations
across languages. So if a language is defined to handle them, its
collections are transportable to other languages without too much fuss.
> Bottom line is, I feel this is a big deal to solve in interoperable
> manner, as the raison d'être of the MOP would be to allow
> interoperability between programs written in different languages
> within a single JVM; I imagine in most cases the programs will pass
> complex data structures built out of lists and dictionaries to one
> another, so it feels... essential to get this right. It also feels
> like something that can rightfully belong in a generic MOP as most
> languages do have the concept of ordered sequences and associative
> arrays. Of course, I might also be wrong here; it is also an essential
> goal to not end up with a baroque specification that contains
> everything plus the kitchen sink. I.e. you might notice that my
> current MOP effort for now doesn't have a concept of the class, as not
> all languages have a class concept; funnily enough the concept of
> sequences and maps actually looks more important for interop (since
> it's more general) to me right now than the concept of a class.
In Ruby, I think the problem is not as bad as you think. Almost everyone
working with collections in Ruby uses Array and Hash or descendants of
them, since they have everything you'd need from such data structures
plus nice literal syntaxes. Supporting the concept of lists and
dictionaries in the MOP as being Array and Hash in Ruby would obviously
be a minimum requirement...but it might also be a suitable maximum as
well. Far more interesting and powerful, I think, is how the MOP can be
appropriately wired in to the formal coercion protocols of a given
language, such as the to_* methods in Ruby. These coercion methods take
two forms...those typically used by programmers explicitly to coerce
values (to_i, to_s, to_a, ...) and those mostly used internally for
implicit coercion (to_int, to_str, to_ary). The protocols aren't 100%
defined (because nothing is in Ruby), but they are fairly well
understood and key to type interop in Ruby...and therefore key to
language interop in the MOP.
> So, here am I wondering whether this is something that can be made
> sufficiently unified across the languages to the point that if a Ruby
> program is given a Python dictionary, and it calls []= on it, it
> actually ends up being translated into a __setitem__ call. The goal
> seems worthwhile, and is certainly possible but I'm not entirely sure
> how much of an effort will it take. There's only one way to find out
> though (doing it), but I'd really appreciate some debate and feedback
> here before I embark on this.
I think perhaps we want to do a quick survey of how most of the key
languages represent collections and try to find the commonality. Where
possible, we should always represent collections in a way that all
languages can see them as such and use them as such, but we also need to
define a bit more clearly where the demarcation is between the low-level
"collection" concept and higher-level duck-typed operations, so we don't
pull in too much for too little gain.
- Charlie
The C++ STL has a notion of "concepts" that is interesting, although
it is naturally syntax-specific since it applies only to one language.
http://www.sgi.com/tech/stl/stl_introduction.html
After describing the requirements for the first argument to the
standard function "find", it says:
"Find isn't the only STL algorithm that has such a set of
requirements; the arguments to for_each and count, and other
algorithms, must satisfy the same requirements. These requirements are
sufficiently important that we give them a name: we call such a set of
type requirements a concept, and we call this particular concept Input
Iterator. We say that a type conforms to a concept, or that it is a
model of a concept, if it satisfies all of those requirements. We say
that int* is a model of Input Iterator because int* provides all of
the operations that are specified by the Input Iterator requirements."
Clearly for a cross-language object model, you would have to be more
specific about which operations support which concepts. If you were
compiling the class MyArray in language Foo, you would not say (using
made-up types and concepts) "MyArray satisfies IndexableList" but
rather "(MyArray, __item__) satisfies IndexableList." Then when an
instance of MyArray was indexed using [] in Python or Ruby, the call
would be linked or dispatched to __item__.
If this information can be maintained and used at runtime in a
reasonably efficient way, or even used at compile-time to augment a
type with standardly-named operations, that would be extremely
exciting for me as a language user.
However (as a language *user* only, who lurks on this list out of
curiosity) I do not like the idea of "objects having items" being
promoted to an object-model-level concept. It would be just as much a
(confusing) mistake as promoting "objects having creation dates" or
"objects having IDs" to the linguistic level merely because you happen
to be working in a database-oriented business language where creation
dates and IDs are supported by special syntax. Language syntax that
maps to operations such as iteration doesn't change the fact that they
are *operations* that happen to be standardized within a language, and
potentially across languages, for the sake of supporting syntactic
sugar, and not fundamentally different (below the syntactic level)
from other operations.
I don't think you can distinguish a priori, for the purposes of
creating an object model, between operations that might need syntactic
sugar and operations that will not. If half a dozen popular languages
erupt tomorrow with built-in syntax for date operations ("for obj on
Saturday mornings do"), does that mean that "date" is now no longer an
attribute and must be raised to a different category on the
object-model level? Would you say, "Objects have 'attributes', but
they also have 'items' and 'dates'"?
On Tue, Mar 25, 2008 at 2:26 PM, Charles Oliver Nutter
<charles...@sun.com> wrote:
> I think perhaps we want to do a quick survey of how most of the key
> languages represent collections and try to find the commonality. Where
> possible, we should always represent collections in a way that all
> languages can see them as such and use them as such, but we also need to
> define a bit more clearly where the demarcation is between the low-level
> "collection" concept and higher-level duck-typed operations, so we don't
> pull in too much for too little gain.
I think the demarcation is that the collection concept goes beyond
duck typing in order to bridge between languages where "looks like a
duck" has different meanings. "It may not look like a duck to you,
owing to its lack of a quack() method, but I swear that if you call
kvack() it will actually quack like a duck." E.g., my
IncrediblyVerboseArray class doesn't have an __item__ method, but if
you call its GetItemByNonnegativeIntegerIndex method it does the same
thing.
Ideally this would be implemented in such a way that there is no final
list of concepts, so there are no agonizing decisions about whether to
support list slicing or other concepts with marginally popular syntax
support. It would be better to allow a language to declare its own
extension concepts than to declare the end of syntactic history ;-)
(On the other hand, better to have it available, working, and usably
fast....)
- David
> > And pretty soon, I hit an obvious design problem (which is okay,
> > really - this is still an exercise in exploring the right approach).
> > Namely, lots of languages actually have two namespaces when it comes
> > to accessing data belonging to an object. I'll refer to one of them as
> > "attributes" (as that's what they're called in both Ruby and Python)
> > and the other are "items", which are elements of some container
> > object. All objects have attributes, but only some objects
> > (containers) have items.
>
> Some clarification for Ruby. Ruby doesn't distinguish between the public
> representation of attributes and methods. Attributes *are* just accessor
> methods that return values. There's no way to iterate only attributes or
> only methods, because they are the same structure and stored in the same
> way. But Ruby does have a concept of instance variables, represented as
> always-protected entries in a per-object dictionary. The set of instance
> variables is not determined ahead of time, and can grow as the program
> runs. The only way to access an object's instance variables is from
> within that object...or by wiring up attribute accessors (which just
> creates methods.
Python can be seen as taking the opposite approach: Classes, instances
and modules are at the fundamental level just dictionaries of
attributes where some of the attributes happen to be methods (though
it is a bit more complicated than this -- but in the end this is a
good aproximation).
> I think the MOP would be entirely satisfactory if it represented only
> methods for Ruby, but I understand this may not be the case for other
> languages.
Despite my last comment, I *think* this may work for Python as well,
since in the end there are always methods to call on the dictionaries
that ultimately hold the attributes.
> > Some languages don't make a distinction, most notably, JavaScript. In
> > JavaScript, all objects are containers and they only have items (and
> > an item can be a function, in which case it functions as a method on
> > the object). Can't get much more generic than that, right? Other
> > languages (Ruby, Python) will distinguish between the two; the
> > containers are arrays/lists and hashes/dictionaries/maps. As a matter
> > of fact, it helps thinking of Java as having the distinction -
> > JavaBeans properties are the attributes, and arrays, Maps, and Lists
> > will have items.
>
> It's also important to note here that in general Ruby is more about
> "maps" than "lists". There is a core class that's a list, but methods
> and instance variables and global variables and constants are all held
> in hash-like structures. But true, they're not all the same exact
> structure, and most of them you can't really access as primitive data
> structures.
Python is also primarily about "maps" -- what I called dictionaries
above. From the outside pretty much anything in Python can be seen as
living in a dictionary, even sequences for example:
>>> x = []
>>> dir([])
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
'__delslice__', '__doc__', '__eq__', '__ge__', '__getattribute__',
'__getitem__', '__getslice__', '__gt__', '__hash__', '__iadd__',
'__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__',
'__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__',
'__setslice__', '__str__', 'append', 'count', 'extend', 'index',
'insert', 'pop', 'remove', 'reverse', 'sort']
In the above, x is assigned an empty list, dir() shows the keys in x's
internal dictionary
> > Now, to make matters a bit more complicated, in lots of languages the
> > container API is actually just a syntactic sugar. Give an object a []
> > and a []= method, and it's a container in Ruby! Give it __getitem__,
> > __setitem__, and few others, and it's a container in Python! Honestly,
> > this is okay - as a byproduct of duck typing, one shouldn't expect
> > there be any sort of an explicit declaration of "containerness", right?
Python 3000 is growing "containerness" in the form of abstract base
classes (see http://www.python.org/dev/peps/pep-3119/) but of course
duck-typing is not going away, so the above is true (now though it
will be easier to find the true definition of the methods that make up
the collection "duck types" -- in Python we often call them
"protocols".
> In Ruby, the container API is defined by more than [], really. [] and
> []= are just method calls...syntactic sugar for foo.[](key) and
> foo.[]=(key,value). They may make an object look like a collection, but
> they don't mean it *is* a collection. And they're frequently used for
> other syntactic magic. However...your thoughts about defining some of
> these collection operations as an additional protocol across languages
> is a great one. CLR already has low-level operations to abstract
> collection gets/sets, which get translated into collection operations
> across languages. So if a language is defined to handle them, its
> collections are transportable to other languages without too much fuss.
This would be of great value for Jython as well, I think.
> > Bottom line is, I feel this is a big deal to solve in interoperable
> > manner, as the raison d'être of the MOP would be to allow
> > interoperability between programs written in different languages
> > within a single JVM; I imagine in most cases the programs will pass
> > complex data structures built out of lists and dictionaries to one
> > another, so it feels... essential to get this right. It also feels
> > like something that can rightfully belong in a generic MOP as most
> > languages do have the concept of ordered sequences and associative
> > arrays. Of course, I might also be wrong here; it is also an essential
> > goal to not end up with a baroque specification that contains
> > everything plus the kitchen sink. I.e. you might notice that my
> > current MOP effort for now doesn't have a concept of the class, as not
> > all languages have a class concept; funnily enough the concept of
> > sequences and maps actually looks more important for interop (since
> > it's more general) to me right now than the concept of a class.
>
> In Ruby, I think the problem is not as bad as you think. Almost everyone
> working with collections in Ruby uses Array and Hash or descendants of
> them, since they have everything you'd need from such data structures
> plus nice literal syntaxes. Supporting the concept of lists and
> dictionaries in the MOP as being Array and Hash in Ruby would obviously
> be a minimum requirement...but it might also be a suitable maximum as
> well.
Much of Python's semantics can be thought of as being sequences and
maps as well (in Python they are generally list and dict) and as I've
said most of Python can be represented this way I think.
> Far more interesting and powerful, I think, is how the MOP can be
> appropriately wired in to the formal coercion protocols of a given
> language, such as the to_* methods in Ruby. These coercion methods take
> two forms...those typically used by programmers explicitly to coerce
> values (to_i, to_s, to_a, ...) and those mostly used internally for
> implicit coercion (to_int, to_str, to_ary). The protocols aren't 100%
> defined (because nothing is in Ruby), but they are fairly well
> understood and key to type interop in Ruby...and therefore key to
> language interop in the MOP.
Python is at least a little different here, in general implicit
coercion doesn't really happen for Python objects to Python objects.
If you have one thing and you want the other you generally coerce it
yourself (so you have a number x -- to get a string you must say
str(x). Of course in Jython's case some things are mapped from Java
into the appropriate Python types, so in the MOP case we would need
some coercion from some set of generic types (String, Integer, etc).
-Frank