I've taken a first crack at a simple Ruby/Smalltalk source code pairing to act as a starter test case for any translators and runtime implementations. It's simple enough to be quickly doable on both sides, but it has enough meat to be worth discussing, I think. Here's the code, with line numbers added to the Smalltalk to aid in dissection.
---- Ruby ---- class Person attr_accessor :first, :last
def full_name if(first && last) first + " " + last else "John Doe" end end end
--------- Smalltalk ---------
"1" Ruby classNamed: 'Person' asRubySymbol do: "2" [:class | "3" class attr_accessor: 'first' asRubySymbol with: 'last' asRubySymbol. "4" class compileSmalltalkMethod: ' "5" full_name "6" ^ "7" (self first notFalseOrNil and: [self last notFalseOrNil]) "8" ifTrue: [self first + '' '' asRubyString + self last] "9" ifFalse: [''John Doe'' asRubyString]'
----- Notes -----
Line 1. Ruby classes/globals should be in a separate namespace from Smalltalk classes. We'll need a single global in the Smalltalk namespace from which to access them; by analogy with the "Smalltalk" global, I think "Ruby" makes sense.
The #classNamed:do: method is intended to mimic the scoping inside Ruby class definitions: within the block passed as the second parameter, the :class variable will be bound to the current open class.
Note that we have to convert Smalltalk string literals to Ruby symbols (#asRubySymbol) and strings (#asRubyString), since we want these to be distinct classes from Smalltalk's Symbol and String, and because Smalltalk's literal semantics (only one instance, usually immutable, ever created per method, at compile time) don't work for Ruby.
Line 3. The advantage of representing the source this way (as a single sequence of Smalltalk stamements) instead of in Smalltalk's chunk format is that we can do the usual Ruby stuff with invoking class methods inside a class definition in a natural way. I'm suggesting the addition of "with:" keywords to method names for parameters beyond the first one, which is as close to idiomatic Smalltalk as we're likely to get. Another option would be to always pass an array of arguments, but all of that array creation is going to incur a performance penalty. In cases like this, where the method implementation actually takes a variable number of arguments, we can generate stub methods that delegate to a varargs version:
attr_accessorWithArgs: anArray "the actual method" ...
Line 4. The actual method definition carries none of the enclosing scope, so it doesn't make sense to make it a block or anything - we just want to give the class some Smalltalk source code for the method, to compile as usual.
Line 5. There's no possibility of being passed parameters, and no use of "yield" inside the method, so we can do a unary method here.
Line 6. Unlike Ruby, Smalltalk doesn't implicitly return the value of the last expression in the method, so we need to explicitly add a return here.
Line 7. In Ruby, nil is treated the same as false in conditionals. Smalltalk's optimized control methods expect only true or false, so we have to do something like #notFalseOrNil to normalize.
Line 8. Note the doubling of the single quotes to escape them - we're inside a string literal being passed to the compiler.
This would mimic Ruby behavior pretty well because it gives the runtime a chance to freak out if you do this...
class Foo < Baz end class Foo < Grr end
Also, given what you've written Avi, might classes defined inside modules or other classes look something like this? I'm going to use the version without subClassOf below since it's shorter. ;-)
Avi Bryant wrote: > I've taken a first crack at a simple Ruby/Smalltalk source code pairing > to act as a starter test case for any translators and runtime > implementations. It's simple enough to be quickly doable on both > sides, but it has enough meat to be worth discussing, I think. Here's > the code, with line numbers added to the Smalltalk to aid in > dissection.
I might suggest you look at two more difficult areas of Ruby before exploring straight-up translation. Translation of boring old normal Ruby won't be difficult in any language, and as you see it's pretty simple in Smalltalk. However Ruby's eval logic and scoping rules are going to be a challenge to exactly duplicate. We're trying to iron out the last bugs in JRuby related to these areas and running into many weird and wonderful edge cases that don't make much sense. Tackling these areas first would be very valuable in the long term, so you don't paint yourself into a corner with a too-simple translation early on.
> Also, given what you've written Avi, might classes defined inside > modules or other classes look something like this? I'm going to use > the version without subClassOf below since it's shorter. ;-)
> Avi Bryant wrote: > > I've taken a first crack at a simple Ruby/Smalltalk source code pairing > > to act as a starter test case for any translators and runtime > > implementations. It's simple enough to be quickly doable on both > > sides, but it has enough meat to be worth discussing, I think. Here's > > the code, with line numbers added to the Smalltalk to aid in > > dissection.
> I might suggest you look at two more difficult areas of Ruby before > exploring straight-up translation. Translation of boring old normal > Ruby won't be difficult in any language, and as you see it's pretty > simple in Smalltalk. However Ruby's eval logic and scoping rules are > going to be a challenge to exactly duplicate. We're trying to iron out > the last bugs in JRuby related to these areas and running into many > weird and wonderful edge cases that don't make much sense. Tackling > these areas first would be very valuable in the long term, so you don't > paint yourself into a corner with a too-simple translation early on.
That's good advice. My personal interest is in seeing a proof of concept that gets things 80% right which people can use to do some benchmarking, and see if there's interest at that point to do it "for real", so I'm not too worried if we get the true edge cases wrong at first. But others may feel differently.
> > Ruby classNamed: 'Person' asRubySymbol subClassOf: 'Object' asRubySymbol do: > Yep, except that the subclass is an expression, not a symbol - so that should be
With an asRubySymbol tossed in, too (this stuff's tricky to do by hand, good thing we've got machines to do it for us ;-).
It occurs to me though, that there will probably need to be two variants of the this classNamed:* method, since if a class was created with a supertype, it's legal to reopen it with the same supertype or no specified supertype. But attempting use Object or some other supertype of the supertype won't work. So not specifying a supertype is different than having an invisible < Object, I guess.
# Superclass class One end
# Subclass class Two < One end
# Reopens fine class Two < One end
# Reopens fine class Two end
# Blows up class Two < Object end
Anyways, just peanuts, but thought we might as well have it documented on list.
On 11/14/06, Topher Cyll <christopherc...@gmail.com> wrote:
> It occurs to me though, that there will probably need to be two > variants of the this classNamed:* method, since if a class was created > with a supertype, it's legal to reopen it with the same supertype or > no specified supertype. But attempting use Object or some other > supertype of the supertype won't work. So not specifying a supertype > is different than having an invisible < Object, I guess.
It's different in implementation, but not in interface (you can't tell looking at a given .rb file whether that's going to be the case or not). So basically #classNamed: is going to have to create a new class in some cases and not in others, and #classNamed:subclassOf: would have to error out if the class already existed. Maybe the former should be #ensureClassNamed:.
Have you given any thought to debugging? It would be a pitty to be in a smalltalk image and not have that ability. Perhaps some debugger pragma could go into the generated code to cross reference it to the original ruby?
Here's a new revision of the specification for discussion, which incorporates some of the things we've been discussing (like subclassing, instance variables, and calling conventions) and adds a couple of new twists. Things to note: - How do we translate method names like "empty?" and "name="? I'm proposing "emptyQuestion" and "nameEquals" but there are other possiblities... for example "pEmpty" and "setName" to be lispish. Any of these are potentially in conflict with real Ruby methods of the same name but I'm not sure how to avoid this (short of adding a lot of underscores or something to make it even less probable). - We're implementing #set_childrenWithArgs: but sending #set_children:with:. That's implying automatic stub creation to make that work (see my discussion with Glenn V.). - I've tried to mimic Ruby's odd scoping for block parameters. It's kinda tempting to use proper block scoping, but that would break things. - How do people feel about this specification being a real-ish model rather than just stuff like "class A < B"? It may get awkward to come up with plausible method names for the various things we want to test, but I also like grounding things in semi-realistic examples.
---- Ruby ---- class Person < Thing attr_accessor :first, :last
def full_name if(first && last) first + " " + last else "John Doe" end end end
class Person def set_children(*kids) @children = kids end
def each_child @children.each{|k| yield k} end end
> Here's a new revision of the specification for discussion, which > incorporates some of the things we've been discussing (like > subclassing, instance variables, and calling conventions) and adds a > couple of new twists. Things to note: > - How do we translate method names like "empty?" and "name="? I'm > proposing "emptyQuestion" and "nameEquals" but there are other > possiblities... for example "pEmpty" and "setName" to be lispish. Any > of these are potentially in conflict with real Ruby methods of the > same > name but I'm not sure how to avoid this (short of adding a lot of > underscores or something to make it even less probable). > - We're implementing #set_childrenWithArgs: but sending > #set_children:with:. That's implying automatic stub creation to make > that work (see my discussion with Glenn V.). > - I've tried to mimic Ruby's odd scoping for block parameters. It's > kinda tempting to use proper block scoping, but that would break > things. > - How do people feel about this specification being a real-ish model > rather than just stuff like "class A < B"? It may get awkward to come > up with plausible method names for the various things we want to test, > but I also like grounding things in semi-realistic examples.
It'd be helpful to think about the Ruby in terms of the AST that it's parsed into, because those are the symbols get get for input to the translator. Below are the ParseTree sexpressions corresponding to to the various bits (any errors in brackets or commas are my own from breaking up the expression into chunks)
One interesting observation is that Ruby seems to treat arguments passed to a method as an array from the outset, pulling off from the front in the order they appear in the method definition, and then tossing the rest into any catchall at the end if applicable. Looking at the output of RubyNode on the same methods is similarly interesting, but for another night.
On 11/15/06, Dane Jensen <ca...@fastmail.fm> wrote:
> It'd be helpful to think about the Ruby in terms of the AST that it's > parsed into, because those are the symbols get get for input to the > translator. Below are the ParseTree sexpressions corresponding to to > the various bits (any errors in brackets or commas are my own from > breaking up the expression into chunks)
Hm, if we're gonna be looking at these sexprs a lot, can we write them in real sexpr notation rather than Ruby's much more verbose array literal format? So, eg,
(attrasgn (lvar x) last= (array (str "Smith")))
> One interesting observation is that Ruby seems to treat arguments > passed to a method as an array from the outset, pulling off from the > front in the order they appear in the method definition, and then > tossing the rest into any catchall at the end if applicable.
> Hm, if we're gonna be looking at these sexprs a lot, can we write them > in real sexpr notation rather than Ruby's much more verbose array > literal format? So, eg,
> (attrasgn (lvar x) last= (array (str "Smith")))
If we'd like to go that route on the mailing list, I have a Rubygem named 'sexp' on Rubyforge that can do this. It's a little frustrating to get working because ParseTree also provides a 'sexp.rb' in the root directory of their gem that conflicts.
But here's an example of how to get them working together well enough to do what we need (uses a manual require_gem)
Avi Bryant wrote: > On 11/15/06, Dane Jensen <ca...@fastmail.fm> wrote: [snip] > > One interesting observation is that Ruby seems to treat arguments > > passed to a method as an array from the outset, pulling off from the > > front in the order they appear in the method definition, and then > > tossing the rest into any catchall at the end if applicable.
> Not sure I follow, can you point to an example?
Using "real" MRI (as it appears ParseTree is doing), all methods that take arguments actually receive just 1 argument: an array. The contents of that array are then mapped to the declared method arguments at runtime. (Yarv also waits until the last second to do this mapping). An example is provided below, using Dane's sexp util.
Before the example, here are some strongtalk bytecodes to keep in mind:
bytecode id#: desc ----------------------------------------- 80: interpreted send, 0 args 81: interpreted send, 1 args 82: interpreted send, 2 args 83: interpreted send, n args 90-93: polymorphic send bytecodes (0, 1, 2, and n args) A0-A3: compiled send bytecodes (0, 1, 2, and n args) (B0-B3 are not relevant) C0-C3: megamorphic sends
In strongtalk, the method inlining optimizations stem in part from the deep understanding of the implications of smalltalk's method call semantics. If you want to benefit, you will likely have to change the ruby message send semantics to match smalltalk's whenever possible.
=== some example method calls w/ generated sexp follow ===
class Foo def add_it_up(a, b, *manyargs) answer = a + b manyargs.each { |argn| answer += argn } answer end def add_it_up_plenty add_it_up(5, 5, 5, 5, 5) end def add_it_up_just_enough add_it_up(2, 2) end def add_it_up_not_enough add_it_up(1) end end
> - How do we translate method names like "empty?" and "name="? I'm > proposing "emptyQuestion" and "nameEquals" but there are other > possiblities... for example "pEmpty" and "setName" to be lispish. Any > of these are potentially in conflict with real Ruby methods of the same > name but I'm not sure how to avoid this (short of adding a lot of > underscores or something to make it even less probable).
We could use 'name' rather than 'nameEquals,' but that may be throwing away the equal sign.
We may also be able to take advantage of things that are not legal names in ruby but are in smalltalk. For example, we could compile things to use a dummy keyword parameter, like so:
rubyObject name: name equals: nil.
where 'name:equals' is not a legal symbol in ruby.
That is not particularly pretty, but it would avoid collisions with ruby names. This also makes me wonder what we will do for multiple arguments. I would propose appending with:; with:with: however many times. That would probably cover most of our calls. For example
def diabolocial_method(a,b) return a + b end
Would look like:
diabolical_methodWith: a with: b
I notice that Avi is doing a similar thing with *args in his proposed spec.
> where 'name:equals' is not a legal symbol in ruby.
This is unfortunately not true in the sense that "name:equals".to_sym does give you something, but the runtime rejects setting an instance variable to that value.
> Ah, ok. We definitely don't want to actually build an array for every > method call. I'm sure we can get the same calling semantics without > that...
Yep. However, my guess is that will be nontrivial. Consider the following pathological case. First, the use generates this simple class:
class LikesToAdd def add_or_mult(a, b) answer = a + b end end
class Foo < LikesToAdd def go add_or_mult(3,4) end def goMore add_or_mult(3,4,5) end end
p Foo.new.go() # ==> 7 #p Foo.new.goMore() # ==> Would give wrong number of args error
Now, imagine the user comes along a couple days later, fires up the image, and writes the LikesToMult module below:
module LikesToMult def add_or_mult(a, b, *manyargs) answer = a * b manyargs.each { |argn| answer *= argn } answer end end
#The user also changes Foo to include the new module class Foo < LikesToAdd include LikesToMult end
p Foo.new.go() # ==> 12 p Foo.new.goMore() # ==> 60
In this pathalogical example, Foo now includes a new def of "add_or_mult" that overrides the one in LikesToAdd. Now, add_or_mult likes to multiply, and furthermore, its now takes a splat. If you're not *very* careful, the old optimization you did when originally compiling add_or_mult (which used to take 2 parms) will still call super's "self addOrMult: arg1 with: arg2" instead of self's "self addOrMult: argArray". Using the messageNotUnderstood trick will not bail you out on this one, because the message will be understood, just the wrong one.
So if you can't catch it at runtime, how to handle this at code generation time? If you want to optimize the ruby by changing the call semantics to look like smalltalk at compile time, then *any* later changes to a class will require a deoptimization/reoptimization scan on previously translated methods to make sure their optimization is still valid (and in all subclass methods too). If you change a module, you will have to scan all classes that include the module (and their subclasses). If you create a singleton at runtime that overrides a method, all of the methods in the class it was derived from might now fail to hit the reimplemented method (if it changed the method signature).
Perhaps you're better off following ruby's call semantics (always an array, unpack the array at the start of the method) and pay the performance price. Later, once everything else is working, some ruby-friendly send bytecodes can be added to the vm (rubysend, rubysend2arg, rubysendNargs,....). To the smalltalk send bytecodes, all ruby method calls would appear monomorphic, with 1 argument of type Array, thus breaking PIC logic. You would have to change the PIC logic in the rubysend bytecode to generate type info and make inlining choices based the contents of the Array instead of from the method arguments.
On 11/15/06, Steven Swerling <swerling...@gmail.com> wrote:
> In this pathalogical example, Foo now includes a new def of > "add_or_mult" that overrides the one in LikesToAdd. Now, add_or_mult > likes to multiply, and furthermore, its now takes a splat. If you're not > *very* careful, the old optimization you did when originally compiling > add_or_mult (which used to take 2 parms) will still call super's "self > addOrMult: arg1 with: arg2" instead of self's "self addOrMult: > argArray". Using the messageNotUnderstood trick will not bail you out on > this one, because the message will be understood, just the wrong one.
Very good point. So I think the best solution is just not to use the lazy DNU trick but to actually eagerly generate all the possible variations whenever you implement or override a method. Thus, at the LikesToMult level, #addOrMult:with: would be redefined to dispatch to #addOrMultWithArgs:.
Then we just set a limit on how many args we want to use the optimized version for. I'd say that we go up to method:with:with: and beyond that (so 4 args or more), we do just collect them into an array and use #methodWithArgs:. If we have variants with and without a block param, we end up with 7 stubs and one implementation for each method, which is probably fine.
Avi Bryant wrote: > On 11/15/06, Steven Swerling <swerling...@gmail.com> wrote:
>> In this pathalogical example, Foo now includes a new def of >> "add_or_mult" that overrides the one in LikesToAdd. Now, add_or_mult >> likes to multiply, and furthermore, its now takes a splat. If you're not >> *very* careful, the old optimization you did when originally compiling >> add_or_mult (which used to take 2 parms) will still call super's "self >> addOrMult: arg1 with: arg2" instead of self's "self addOrMult: >> argArray". Using the messageNotUnderstood trick will not bail you out on >> this one, because the message will be understood, just the wrong one.
> Very good point. So I think the best solution is just not to use the > lazy DNU trick but to actually eagerly generate all the possible > variations whenever you implement or override a method. Thus, at the > LikesToMult level, #addOrMult:with: would be redefined to dispatch to > #addOrMultWithArgs:.
> Then we just set a limit on how many args we want to use the optimized > version for. I'd say that we go up to method:with:with: and beyond > that (so 4 args or more), we do just collect them into an array and > use #methodWithArgs:. If we have variants with and without a block > param, we end up with 7 stubs and one implementation for each method, > which is probably fine.
That would work. Although if you try to call a method with too *few* args, it will fail to fail. Eg. If you call add_or_mult(1), it will succeed after all cases of add_or_mult are rerouted to "self addOrMultWithArgs: args", even though the programmer never explicitly created a method to handle that.
It is clear that this too can be worked out. And that it will be tricky.
Anyway, this is all screaming "unit test", so one is attached. It tests the "failure to fail on not enough args" scenario just mentioned, as well as a bunch of other override behaviors that should happen before and after a module is mixed in.
Still can't help wondering if we should just stick to ruby's call scheme (always use array) and implement a new bytecode later.