Tricky translation issues

Glenn Vanderburg

unread,

Nov 14, 2006, 1:04:17 PM11/14/06

to Smalltalk.rb

There are a couple of differences between the Smalltalk and Ruby
semantics that seem to me to be quite tricky to deal with. I'm
wondering if anyone on the list has good ideas about how to handle
them.

The first is instance variables. In Smalltalk, the definition of
instance variables is associated with the class, and all instances have
the same instance variables. Furthermore, methods (in compiled form)
refer to those instance variables by numeric index, not by name. New
instance variables can be added at any time, but the process involves
chasing down all of the instances of the class and adding the slot for
the new variable (and possibly also recompiling methods).

In Ruby, on the other hand, each instance can have entirely different
instance variables. This means that compiled methods can't refer to
instance variables by numeric index, because @a might be variable 1 in
one instance, but variable 3 in another instance. Is there a way to
model this efficiently in an existing Smalltalk VM?

The second issue is the "arity" of methods. All Smalltalk methods have
a strict number of parameters that can be deduced from the method
selector. That means that it's invalid to call a method with more or
fewer parameters than required, and the method implementation can just
know how many parameters to pull off the stack. Ruby methods, on the
other hand, can have optional parameters of two different kinds
(param=val and *other_params) as well as the implicit block.
Furthermore, the arity does not depend on the method name; it depends
on the implementation. So, for example, Hash#[] takes one parameter,
while Array#[] takes one mandatory parameter and an optional second
parameter. This suggests to me that each Ruby method call (when
compiled to Smalltalk bytecodes) requires an extra parameter that
describes the number of parameters being passed, and each compiled
method body will have to begin with code to map those parameters to
local variables appropriate to that particular implementation.

I hope my explanations are clear. It seems to me that in these two
areas, Ruby code running on a stock Smalltalk VM will incur some
performance penalty. I'd love to be proven wrong.

(I don't think these are killer issues. Strongtalk will still be a
very fast platform for Ruby code. I'm just thinking through the
implementation strategy.)

---Glenn

Avi Bryant

unread,

Nov 14, 2006, 1:33:05 PM11/14/06

to smallta...@googlegroups.com

On 11/14/06, Glenn Vanderburg <glenn.va...@gmail.com> wrote:

> The first is instance variables. In Smalltalk, the definition of
> instance variables is associated with the class, and all instances have
> the same instance variables. Furthermore, methods (in compiled form)
> refer to those instance variables by numeric index, not by name. New
> instance variables can be added at any time, but the process involves
> chasing down all of the instances of the class and adding the slot for
> the new variable (and possibly also recompiling methods).
>
> In Ruby, on the other hand, each instance can have entirely different
> instance variables. This means that compiled methods can't refer to
> instance variables by numeric index, because @a might be variable 1 in
> one instance, but variable 3 in another instance. Is there a way to
> model this efficiently in an existing Smalltalk VM?

My inclination would be to have every instance actually get an indexed
slot for every possible instance variable. So, at the time that you
compile a method like

class A
def foo=(x)
@foo = x
end
end

The "foo" instance variable would be added to class A. The set of all
possible instance variables for a given class should be small enough
to make the runtime memory overhead, and the compile time overhead of
incrementally adding instance variables, pretty manageable in almost
all cases.

Eventually we probably would need a fallback for the cases where
someone uses eval or similar to generate hundreds of instance
variables, but that can probably be ignored at first.

This would imply that we need to extend the compilation protocol to
inform the class about instance variables referenced in the method:

class compileSmalltalkMethod: 'foo: x foo := x' instVars: #(foo).

> So, for example, Hash#[] takes one parameter,
> while Array#[] takes one mandatory parameter and an optional second
> parameter. This suggests to me that each Ruby method call (when
> compiled to Smalltalk bytecodes) requires an extra parameter that
> describes the number of parameters being passed, and each compiled
> method body will have to begin with code to map those parameters to
> local variables appropriate to that particular implementation.

I mentioned this briefly in my Specification post - I suggest we just
have some bridge methods that translate a given call into the
canonical form. So say we had something like this:

class A
def foo(x, y=3)
x+y
end

def bar
foo(1)
end
end

I'd expect the following methods:

foo: x with: y
^ x + y

foo: x
^ self foo: x with: 3

bar
^ self foo: 1

We can do similar things for blocks and array args:

class A
def foo(*args)
yield args
end

def bar
foo(1,2){|a| ...}
end
end

fooWithArgs: argArray block: aBlock
^ aBlock value: argArray

foo: arg1 block: aBlock
^ self fooWithArgs: (Array with: arg1) block: aBlock

foo: arg1 with: arg2 block: aBlock
^ self fooWithArgs: (Array with: arg1 with: arg2) block: aBlock

bar
^ self foo: 1 with: 2 block: [:a | ...]

Now, we don't have to generate all the possibilities eagerly - we
should be able to trap doesNotUnderstand: and, if the missing method
is just a differently parameterized version of a method we have in
canonical form, build the appropriate stub at that point and continue.

Glenn Vanderburg

unread,

Nov 14, 2006, 1:43:54 PM11/14/06

to Smalltalk.rb

Avi Bryant wrote:
> My inclination would be to have every instance actually get an indexed
> slot for every possible instance variable.

[...]

> The set of all
> possible instance variables for a given class should be small enough
> to make the runtime memory overhead, and the compile time overhead of
> incrementally adding instance variables, pretty manageable in almost
> all cases.

The case I was actually worried about is Rails controllers, where each
instance (depending on which action method is invoked) typically has
completely different instance variables from most other instances.
(And then, when the view is identified and instantiated, all assigned
instance variables are identified and copied over to the view.) But at
least in that case there are generally few iinstances of those classes.
So yeah, this sounds like the best approach.

> Now, we don't have to generate all the possibilities eagerly - we
> should be able to trap doesNotUnderstand: and, if the missing method
> is just a differently parameterized version of a method we have in
> canonical form, build the appropriate stub at that point and continue.

Gotcha. This is the part that hadn't occurred to me:
doesNotUnderstand: would do the fallback work to detect these cases
before punting and calling method_missing.