> -----Original Message----- > From: Carlo Dapor [mailto:cat...@gmail.com] > Sent: Friday, October 27, 2006 4:36 PM > To: David Griswold > Subject: Future of StrongTalk
> David
> What does it mean for StrongTalk, once Java is open sourced. > After all, the hotspot is based on it, with a lot more to it.
> Will StrongTalk still continue, will it take inherit all the > intellectual properties from hotspot ?
The HotSpot VM is for Java, not Smalltalk, and the languages and implementations are very different. The Java VM has static implementation type information available, and it did not use the type-feedback capabilities of the Strongtalk VM (the last I heard), which was a decision I was against. So while it also does extensive inlining, it uses different algorithms that inline things in different ways, and while it can do better inlining for some things than Strongtalk, it does worse inlining for other things, so the tradeoffs are not very clear (and I don't think anyone has ever done a detailed comparison of the impact of the different inlining strategies).
Nor does the Java VM use tagging, which hurts Java a lot as a language, since it makes it really painful and expensive to treat basic types as objects. So I don't think the Java VM will ever be a better VM to host true dynamic languages like Smalltalk (JavaScript, Ruby, etc), since they use tagging and need type-feedback.
Another big disadvantage of the Java VM is that because it has to count on the type system for dynamic safety, rather than using the flexibility of type-feedback, it has an enormous amount of complexity in things like the bytecode verifier, which must prove that the bytecode type information is valid. And it has all those different kinds of basic types in addition to objects. So it is more complex than Strongtalk.
The one big engineering advantage of the HotSpot VM is that it is internally multi-threaded, which means it can take better advantage (right now) of multi-core and native preemptive threading. But hopefully that engineering will eventually be done in Strongtalk too.
And a bigger point is that, although I don't know for sure, I doubt that Sun will release the Java VM under a totally open-source license like BSD. It will probably be under a more proprietary license like the one they use for other Java open-source stuff, which I don't think is nearly as nice as the Strongtalk license. -Dave
> Nor does the Java VM use tagging, which hurts Java a lot as a language, > since it makes it really painful and expensive to treat basic types as > objects. So I don't think the Java VM will ever be a better VM to host true > dynamic languages like Smalltalk (JavaScript, Ruby, etc), since they use > tagging and need type-feedback.
I'm wondering how much needs to be added (if anything) to host Ruby. I see the problems as:
Ruby is more dynamic than Smalltalk. All instance variables are currently implemented as slots, ie you can add new instance variable to an object in a running program, and you need to be able to add mixins to a class and or change methods and expect the change to effect all of the live objects.
So:
1 Adding new instance variables isn't exactly the same as "become" because it has to be able to handle live stack frames referencing objects whoes definition changed, whereas in Smalltalk you just expect the system to crash in that case.
2. Changing the definition of methods can invalidate inlining optimizations in live frames: deoptimization plus reoptimization anyone?
3. Even adding methods to unrelated classes can change optimizations based on assuming that only a single class (or a limited number of them) uses a given selector.
4. One optimization that makes sense in Ruby but not in Smalltalk is to figure out which instance variables are rarely instanciated but which are taking up a lot of memory and changing them into being stored in a hidden subobject. I assume that using a hidden variable holding a hidden object is better than trying to implement a completely general, safe and fast become (that's almost impossible, I think).
Another point is that since Ruby supports loading a class definition in independent parts, there's no way to fake your way past the need to be able to mutate class definitions.
By the way, the exciting thing about Ruby is that there is a large community of people using it.
Josh wrote: > > Nor does the Java VM use tagging, which hurts Java a lot as a language, > > since it makes it really painful and expensive to treat basic types as > > objects. So I don't think the Java VM will ever be a better VM to host > true > > dynamic languages like Smalltalk (JavaScript, Ruby, etc), since they use > > tagging and need type-feedback.
> I'm wondering how much needs to be added (if anything) to host > Ruby. I see > the problems as:
> Ruby is more dynamic than Smalltalk. All instance variables are currently > implemented as slots, ie you can add new instance variable to an > object in a > running program, and you need to be able to add mixins to a class and or > change methods and expect the change to effect all of the live objects.
> So:
> 1 Adding new instance variables isn't exactly the same as "become" because > it has to be able to handle live stack frames referencing objects whoes > definition changed, whereas in Smalltalk you just expect the > system to crash > in that case.
From your description I don't understand how that is different than become. In Smalltalk live stack frames can reference objects that change via become, and it should work just fine if become is implemented properly. If adding instance variables has to be *fast*, however, that is another issue. It would require an object table (or a forwarding wrapper for every object), and to make the class modification itself fast would require changes that would slow down all instance var accesses, unless you discriminated against the new instance variables performance-wise. That is probably a big reason why Ruby is so slow.
> 2. Changing the definition of methods can invalidate inlining > optimizations > in live frames: deoptimization plus reoptimization anyone?
That machinery is already there; deoptimizing live frames is done all the time in Strongtalk. They don't have to be immediately reoptimized, you can do what is necessary with the interpreted frames after deoptimization, and if they are used frequently thereafter, they will get reoptimized by the normal mechanism.
> 3. Even adding methods to unrelated classes can change optimizations based > on assuming that only a single class (or a limited number of them) uses a > given selector.
Once again, deoptimization solves this problem. This exact issue exists in the JVM, and once you have deoptimization it is trivial. We don't inline in Strongtalk right now based on the number of method implementations, because type-feedback can already inline those methods anyway (because if there is only one implementation, then any send of that message is by definition going to be monomorphic, which is what Strongtalk looks at).
> 4. One optimization that makes sense in Ruby but not in Smalltalk is to > figure out which instance variables are rarely instanciated but which are > taking up a lot of memory and changing them into being stored in a hidden > subobject. I assume that using a hidden variable holding a > hidden object is > better than trying to implement a completely general, safe and fast become > (that's almost impossible, I think).
Yes, adding a slower mechanism for handling lazily-added instance variables as an uncommon case would probably be the way to do it.
Ruby would probably be a great fit for a modified Strongtalk VM, but Ruby apparently has other problems like an ill-defined grammar that would have to be dealt with first. But if someone wanted to build a StrongRuby, I would encourage them all I could.
> I'm wondering how much needs to be added (if anything) to host > Ruby. I see > the problems as:
> Ruby is more dynamic than Smalltalk. All instance variables are > currently > implemented as slots, ie you can add new instance variable to an > object in a > running program, and you need to be able to add mixins to a class > and or > change methods and expect the change to effect all of the live > objects.
No, Smalltalk does all this stuff dynamically as well. Smalltalk typically doesn't have mixins, but it's not hard to implemented them, and Strongtalk has explicit support for them. In fact, Ruby's object model is almost exactly that of Strongtalk. Only the the syntax is different.
> So:
> 1 Adding new instance variables isn't exactly the same as "become" > because > it has to be able to handle live stack frames referencing objects > whoes > definition changed, whereas in Smalltalk you just expect the system > to crash > in that case.
Not so. In Smalltalk, when you add an instance variable to a class a new version of the class is built and all instances of the old class are converted to instances of the new class using become. It doesn't crash, live stack frames continue to function correctly.
> 2. Changing the definition of methods can invalidate inlining > optimizations > in live frames: deoptimization plus reoptimization anyone?
In Smalltalk, methods used by active contexts aren't converted to the new version of the method, but all subsequent invocations of the method use the new version. If it's the same in Ruby, (and if it's not, I'd like to know how the contexts are migrated) then the Strongtalk VM should handle this transparently.
> 3. Even adding methods to unrelated classes can change > optimizations based > on assuming that only a single class (or a limited number of them) > uses a > given selector.
Again, this is normal and expected in Smalltalk, and the Strongtalk VM is designed to handle it correctly.
> 4. One optimization that makes sense in Ruby but not in Smalltalk > is to > figure out which instance variables are rarely instanciated but > which are > taking up a lot of memory and changing them into being stored in a > hidden > subobject. I assume that using a hidden variable holding a hidden > object is > better than trying to implement a completely general, safe and fast > become > (that's almost impossible, I think).
That kind of optimization is probably better done up in Ruby code. If you add instance variables on the fly, then the ones that never get used will never be added anyway. Once they do, the only cost will be an extra pointer in all instances, so it's not catastrophic.
In general, though, I think you're right. The Ruby object model is almost exactly that of Strongtalk. The main issues that come to my mind are:
Singleton objects: The runtime would have to create a custom subclass and convert the object to an instance of it.
Continuations: currently not supported by Strongtalk, but David has said it's doable.
Ruby message sends can have a variable number of arguments. The runtime would have to capture sends with an "unexpected" number of arguments (probably using #doesNotUnderstand:) and create specialized versions of the method.
Probably the biggest chunk of work in implementing Ruby-on-Strongtalk would be writing a Ruby-to-Strongtalk-bytecode compiler. Ruby is notoriously difficult to parse, so it would be a fair amount of work to make sure all the nooks and crannies of the grammar get covered.
Pulling this off would be a huge coup, though, as it would be way, way faster than the existing Ruby implementation. I'd really like to see this happen.
On Oct 28, 2006, at 1:30 PM, David Griswold wrote:
> In Smalltalk live stack frames can reference objects that change > via become, > and it should work just fine if become is implemented properly. If > adding > instance variables has to be *fast*, however, that is another > issue. It > would require an object table (or a forwarding wrapper for every > object), > and to make the class modification itself fast would require > changes that > would slow down all instance var accesses, unless you discriminated > against > the new instance variables performance-wise. That is probably a > big reason > why Ruby is so slow.
No, Ruby is slow because it doesn't have a VM. It's an interpreter that walks the AST, evaluating each node as it goes. The core classes are implemented in C, so you get acceptable performance out of them. It's easy to create Ruby bindings for C code, so people just implement performance critical code in C and call it from Ruby.
I claim adding instance variables doesn't have to be fast. It's going to happen at most a handful of times per class. Migrating all the instances won't take very long - the pathological case of adding lots of instance variables after there are lots of instances is going to be extremely rare. The common case will be that all the methods will be added before any instances are created, and in that case no object migration will be needed.
> > 1 Adding new instance variables isn't exactly the same as "become" > > because > > it has to be able to handle live stack frames referencing objects > > whoes > > definition changed, whereas in Smalltalk you just expect the system > > to crash > > in that case.
> Not so. In Smalltalk, when you add an instance variable to a class a > new version of the class is built and all instances of the old class > are converted to instances of the new class using become. It doesn't > crash, live stack frames continue to function correctly.
All instances mutation can be defferred and do not need to be mutated inmmediatelly. In Visual Smalltalk implementation objects can change their method dictionaries to have a per-instance lookup path, it is valuable when mutating instances because the mutation can be deferred until the old-fashioned object is really used. The "old" instance is set to have a method dictionary that implements only #doesNotUnderstand: and when a message impacts the object it is mutated and "becomed" to new shape. Doing this way, instances are not missed when changes are reverted and the time spent to change all instances of a class is not payed if obsolete class to current class mapping do not impose a size change... (I do not know if VS evade the #become on such situations, but I think that it is interesting to evaluate the convenience of do not have a reference to "the class of the object" in the object header)
----- Original Message ----- From: "Colin Putney" <cput...@wiresong.ca> To: <strongtalk-general@googlegroups.com> Sent: Saturday, October 28, 2006 5:58 PM Subject: Re: Future of StrongTalk
> On Oct 28, 2006, at 12:07 PM, Cafe Alpha wrote:
> > I'm wondering how much needs to be added (if anything) to host > > Ruby. I see > > the problems as:
> > Ruby is more dynamic than Smalltalk. All instance variables are > > currently > > implemented as slots, ie you can add new instance variable to an > > object in a > > running program, and you need to be able to add mixins to a class > > and or > > change methods and expect the change to effect all of the live > > objects.
> No, Smalltalk does all this stuff dynamically as well. Smalltalk > typically doesn't have mixins, but it's not hard to implemented them, > and Strongtalk has explicit support for them. In fact, Ruby's object > model is almost exactly that of Strongtalk. Only the the syntax is > different.
> > So:
> > 1 Adding new instance variables isn't exactly the same as "become" > > because > > it has to be able to handle live stack frames referencing objects > > whoes > > definition changed, whereas in Smalltalk you just expect the system > > to crash > > in that case.
> Not so. In Smalltalk, when you add an instance variable to a class a > new version of the class is built and all instances of the old class > are converted to instances of the new class using become. It doesn't > crash, live stack frames continue to function correctly.
> > 2. Changing the definition of methods can invalidate inlining > > optimizations > > in live frames: deoptimization plus reoptimization anyone?
> In Smalltalk, methods used by active contexts aren't converted to the > new version of the method, but all subsequent invocations of the > method use the new version. If it's the same in Ruby, (and if it's > not, I'd like to know how the contexts are migrated) then the > Strongtalk VM should handle this transparently.
> > 3. Even adding methods to unrelated classes can change > > optimizations based > > on assuming that only a single class (or a limited number of them) > > uses a > > given selector.
> Again, this is normal and expected in Smalltalk, and the Strongtalk > VM is designed to handle it correctly.
> > 4. One optimization that makes sense in Ruby but not in Smalltalk > > is to > > figure out which instance variables are rarely instanciated but > > which are > > taking up a lot of memory and changing them into being stored in a > > hidden > > subobject. I assume that using a hidden variable holding a hidden > > object is > > better than trying to implement a completely general, safe and fast > > become > > (that's almost impossible, I think).
> That kind of optimization is probably better done up in Ruby code. If > you add instance variables on the fly, then the ones that never get > used will never be added anyway. Once they do, the only cost will be > an extra pointer in all instances, so it's not catastrophic.
> In general, though, I think you're right. The Ruby object model is > almost exactly that of Strongtalk. The main issues that come to my > mind are:
> Singleton objects: The runtime would have to create a custom subclass > and convert the object to an instance of it.
> Continuations: currently not supported by Strongtalk, but David has > said it's doable.
> Ruby message sends can have a variable number of arguments. The > runtime would have to capture sends with an "unexpected" number of > arguments (probably using #doesNotUnderstand:) and create specialized > versions of the method.
> Probably the biggest chunk of work in implementing Ruby-on-Strongtalk > would be writing a Ruby-to-Strongtalk-bytecode compiler. Ruby is > notoriously difficult to parse, so it would be a fair amount of work > to make sure all the nooks and crannies of the grammar get covered.
> Pulling this off would be a huge coup, though, as it would be way, > way faster than the existing Ruby implementation. I'd really like to > see this happen.
> > 1 Adding new instance variables isn't exactly the same as "become" > > because > > it has to be able to handle live stack frames referencing objects > > whoes > > definition changed, whereas in Smalltalk you just expect the system > > to crash > > in that case.
> Not so. In Smalltalk, when you add an instance variable to a class a > new version of the class is built and all instances of the old class > are converted to instances of the new class using become. It doesn't > crash, live stack frames continue to function correctly.
I think that, in most smalltalks, generally you can call "become" to change an object to any object that has the same instance variables in the same order - otherwise live contexts will involve code that mis-accesses instance variables.
I suppose adding new instance variables on to the end of an object is the only case that is safe, after all.
> > 2. Changing the definition of methods can invalidate inlining > > optimizations > > in live frames: deoptimization plus reoptimization anyone?
> In Smalltalk, methods used by active contexts aren't converted to the > new version of the method, but all subsequent invocations of the > method use the new version. If it's the same in Ruby, (and if it's > not, I'd like to know how the contexts are migrated) then the > Strongtalk VM should handle this transparently.
The problem I was thinking about was that of inlined functions that aren't officially part of the active context. But I guess that deoptimization can handle it.
As an example consider this code:
foo: someObject ||
[someObject bar] whileTrue: [someObject baz] ...
Imagine that the function has inlined "baz" but the definition of "baz" changes while this thread is up a frame evaluating "bar". The definintion of "foo:" hasn't changed but the definition of optimized foo has.
I guess you have to keep track of which functions inline which others and deoptimize their frames when the functions they inline change.
> > 4. One optimization that makes sense in Ruby but not in Smalltalk > > is to > > figure out which instance variables are rarely instanciated but > > which are > > taking up a lot of memory and changing them into being stored in a > > hidden > > subobject. I assume that using a hidden variable holding a hidden > > object is > > better than trying to implement a completely general, safe and fast > > become > > (that's almost impossible, I think).
> That kind of optimization is probably better done up in Ruby code. If > you add instance variables on the fly, then the ones that never get > used will never be added anyway. Once they do, the only cost will be > an extra pointer in all instances, so it's not catastrophic.
As I was going to say in my response to Dave, I think the optimal answer for Ruby isn't an object table or extra indirection for all variables, but rather a system that recognizes that mutating classes is relatively rare:
1. the ability to mutate all instances during a stop-the-world collect and compact later on. This collect would also have to mark effected stack frames for deoptimization.
2. a stopgap where all objects have a free pointer or two in order to hold possible additions to the class until such time as a stop and collect can change the definition.
> In general, though, I think you're right. The Ruby object model is > almost exactly that of Strongtalk. The main issues that come to my > mind are:
> Singleton objects: The runtime would have to create a custom subclass > and convert the object to an instance of it.
> Continuations: currently not supported by Strongtalk, but David has > said it's doable.
> Ruby message sends can have a variable number of arguments. The > runtime would have to capture sends with an "unexpected" number of > arguments (probably using #doesNotUnderstand:) and create specialized > versions of the method.
> Probably the biggest chunk of work in implementing Ruby-on-Strongtalk > would be writing a Ruby-to-Strongtalk-bytecode compiler. Ruby is > notoriously difficult to parse, so it would be a fair amount of work > to make sure all the nooks and crannies of the grammar get covered.
> Pulling this off would be a huge coup, though, as it would be way, > way faster than the existing Ruby implementation. I'd really like to > see this happen.
> Colin
The one fly in the ointment here is that there IS a project underway to give Ruby a VM, but since development is going on in Japanese I can't really be sure how sophisticated it is planned to be. So far I don't think it's showing much speedup but it could be that their plans are ambitious.
> No, Ruby is slow because it doesn't have a VM. It's an interpreter > that walks the AST, evaluating each node as it goes. The core classes > are implemented in C, so you get acceptable performance out of them. > It's easy to create Ruby bindings for C code, so people just > implement performance critical code in C and call it from Ruby.
I wonder to what extent the conversion of Ruby extensions to a Strongtalk Ruby could be facilitated by a limited C or C++ to Strongtalk or to Strongtalk VM compiler.
I've been thinking about something similar for my own non-strongtalk Ruby project.. Throwing together a C compiler that supports a copying, non-conservative collector. Partially I meant it as a test for my code generator, but it seemed like a cool hack for facilitating the conversion of existing libraries and extensions.
>The one fly in the ointment here is that there IS a project underway to give > Ruby a VM, but since development is going on in Japanese I can't really be > sure how sophisticated it is planned to be. So far I don't think it's > showing much speedup but it could be that their plans are ambitious.
I wouldn't worry too much about that. There are currently something like 6 or 7 projects underway to either port Ruby to an existing vm or write a new one. Matz, the creator of Ruby, seems to be encouraging this. I'm sure none of them have the price/performance ratio that building a VM on top of Strongtalk would have, but at least two of them have pretty big budgets and are pretty invested in their current VMs, namely .net and JVM. YARV, the Japanese one, is being written from scratch essentially by one guy, and seems to be moving pretty slowly. Probably most interesting from a Smalltalk point of view is Rubinius, which was created by its author after reading through the Blue Book.
Anyway, getting Ruby to run on Strongtalk was a project I flagged as 'something that would be cool to do if I had time', but after looking into it a bit and hearing horror stories about actually parsing Ruby correctly and figuring out what its semantics are supposed to be in corner cases, I decided to look elsewhere for a hobby project. Don't let that discourage you, though!
> I think that, in most smalltalks, generally you can call "become" > to change > an object to any object that has the same instance variables in the > same > order - otherwise live contexts will involve code that mis-accesses > instance > variables.
> I suppose adding new instance variables on to the end of an object > is the > only case that is safe, after all.
> The problem I was thinking about was that of inlined functions that > aren't > officially part of the active context. But I guess that > deoptimization can > handle it.
> Imagine that the function has inlined "baz" but the definition of > "baz" > changes while this thread is up a frame evaluating "bar". The > definintion > of "foo:" hasn't changed but the definition of optimized foo has.
> I guess you have to keep track of which functions inline which > others and > deoptimize their frames when the functions they inline change.
When you modify methods, you do want to flush any native versions that have been compiled. But you don't have to modify existing stack frames. Consider the following ruby code:
class Alpha
def foo garple = 3 bar garple end
def bar self.class.class_eval { def foo garple = 4 bar garple end } end
end
alpha = Alpha.new puts alpha.foo puts alpha.foo
The first time it's called, #foo answers 3. The second time, it answers 4. It works the same way in Smalltalk.
> As I was going to say in my response to Dave, I think the optimal > answer for > Ruby isn't an object table or extra indirection for all variables, but > rather a system that recognizes that mutating classes is relatively > rare:
> 1. the ability to mutate all instances during a stop-the-world > collect and > compact later on. This collect would also have to mark effected stack > frames for deoptimization.
Yeah, you'd want mutation of the instances to be uninterruptible, but it needn't be tied to garbage collection. Some Smalltalks have a primitive to do mass becomes, but even if Strongtalk doesn't, I'm sure you could do #valueUninterruptibly, or something similar.
Again, I don't see why you'd have to deoptimize stacks. Ruby doesn't have a way of specifying the internal layout of an object, so you could always add instvars to the end of of the object. This would mean that existing stacks needn't be deoptimized.
> 2. a stopgap where all objects have a free pointer or two in order > to hold > possible additions to the class until such time as a stop and > collect can > change the definition.
I don't see why this has to be so complicated. Why not just do something like this:
1. A method is compiled that references an instance variable not yet present in the class.
2. The compiler notices this, and does a migration:
a. It creates a new class with the new variable after the existing variables b. It enumerates the existing instances, creating instances of the new class c. All the old instances are converted to the new instances in a mass become
> -----Original Message----- > From: strongtalk-general@googlegroups.com > [mailto:strongtalk-general@googlegroups.com]On Behalf Of Colin Putney > Sent: Sunday, October 29, 2006 8:37 AM > To: strongtalk-general@googlegroups.com > Subject: Re: Future of StrongTalk
> On Oct 28, 2006, at 3:48 PM, Cafe Alpha wrote:
> > I think that, in most smalltalks, generally you can call "become" > > to change > > an object to any object that has the same instance variables in the > > same > > order - otherwise live contexts will involve code that mis-accesses > > instance > > variables.
> > I suppose adding new instance variables on to the end of an object > > is the > > only case that is safe, after all.
> Right... luckily that's what we'd be doing
> > The problem I was thinking about was that of inlined functions that > > aren't > > officially part of the active context. But I guess that > > deoptimization can > > handle it.
> > Imagine that the function has inlined "baz" but the definition of > > "baz" > > changes while this thread is up a frame evaluating "bar". The > > definintion > > of "foo:" hasn't changed but the definition of optimized foo has.
> > I guess you have to keep track of which functions inline which > > others and > > deoptimize their frames when the functions they inline change.
> When you modify methods, you do want to flush any native versions > that have been compiled. But you don't have to modify existing stack > frames. Consider the following ruby code:
> class Alpha
> def foo > garple = 3 > bar > garple > end
> def bar > self.class.class_eval { > def foo > garple = 4 > bar > garple > end > } > end
> The first time it's called, #foo answers 3. The second time, it > answers 4. It works the same way in Smalltalk.
> > As I was going to say in my response to Dave, I think the optimal > > answer for > > Ruby isn't an object table or extra indirection for all variables, but > > rather a system that recognizes that mutating classes is relatively > > rare:
> > 1. the ability to mutate all instances during a stop-the-world > > collect and > > compact later on. This collect would also have to mark effected stack > > frames for deoptimization.
> Yeah, you'd want mutation of the instances to be uninterruptible, but > it needn't be tied to garbage collection. Some Smalltalks have a > primitive to do mass becomes, but even if Strongtalk doesn't, I'm > sure you could do #valueUninterruptibly, or something similar.
> Again, I don't see why you'd have to deoptimize stacks. Ruby doesn't > have a way of specifying the internal layout of an object, so you > could always add instvars to the end of of the object. This would > mean that existing stacks needn't be deoptimized.
> > 2. a stopgap where all objects have a free pointer or two in order > > to hold > > possible additions to the class until such time as a stop and > > collect can > > change the definition.
> I don't see why this has to be so complicated. Why not just do > something like this:
> 1. A method is compiled that references an instance variable not yet > present in the class.
> 2. The compiler notices this, and does a migration:
> a. It creates a new class with the new variable after the existing > variables > b. It enumerates the existing instances, creating instances of the > new class > c. All the old instances are converted to the new instances in a > mass become
> 3. The new method is installed in the new class
> It's actually pretty easy.
Unless the class has subclasses, in which case the added instance vars aren't at the end for the subclasses. -Dave
> Probably the biggest chunk of work in implementing Ruby-on-Strongtalk > would be writing a Ruby-to-Strongtalk-bytecode compiler. Ruby is > notoriously difficult to parse, so it would be a fair amount of work > to make sure all the nooks and crannies of the grammar get covered.
However, this work could (and should) be done last, after a full proof of concept had been completed without it. There are at least two independent parsers of Ruby's syntax (JRuby and the standard C Ruby), either of which could be used to preprocess Ruby source into a form that could be much more easily handled (say s-expr or an XML serialization of a parse tree). It would then be easy enough to produce Smalltalk source code from this that could be fed to any Smalltalk compiler, like Strongtalk's. It would certainly be awkward, but it would be enough to make for a killer demo and allow serious benchmarking, which I would think should produce enough momentum to get people interested in building a new parser.
> Pulling this off would be a huge coup, though, as it would be way, > way faster than the existing Ruby implementation. I'd really like to > see this happen.
Me too. I've been advocating and tinkering with Ruby-on-Smalltalk for at least two years now (sadly, far more of the former than the latter), but having a high-performance liberally licensed VM available makes the story much more compelling.
As long as I can get useful help from this list, it should take me much less time to adapt the StrongTalk VM than write my own... I got over my "not invented here" attitude about a day ago when I realized that time to getting something usable was a year less this way and that every optimization that StrongTalk doesn't do that I wanted, I could probably implement at least as quickly working with StrongTalk than working on my own. And then my code would be useful to other projects as well, if it gets accepted.
I tend to change directions quickly and often, but my current inclination is to devote my spare time in the next month to trying to understand and gut Ruby 1.8.5's yacc and lex files to make an adapted parser. Then I don't have to worry about duplicating the context dependent grammar, I'll use the original.
I like the idea of parsing to s-expressions as well, that has been my plan for a while. I intended my Ruby to always convert to s-expressions as the first step in parsing and to extend the language so that the s-expression form of all code is always available in order to facilitate metaprogramming.
----- Original Message ----- From: "Avi Bryant" <avi.bry...@gmail.com> To: <strongtalk-general@googlegroups.com> Sent: Sunday, October 29, 2006 8:06 PM Subject: Re: Future of StrongTalk
> On Oct 28, 2006, at 1:58 PM, Colin Putney wrote:
> Me too. I've been advocating and tinkering with Ruby-on-Smalltalk > for at least two years now (sadly, far more of the former than the > latter), but having a high-performance liberally licensed VM > available makes the story much more compelling.
> > The problem I was thinking about was that of inlined functions that > > aren't > > officially part of the active context. But I guess that > > deoptimization can > > handle it.
> > Imagine that the function has inlined "baz" but the definition of > > "baz" > > changes while this thread is up a frame evaluating "bar". The > > definintion > > of "foo:" hasn't changed but the definition of optimized foo has.
> > I guess you have to keep track of which functions inline which > > others and > > deoptimize their frames when the functions they inline change.
> When you modify methods, you do want to flush any native versions > that have been compiled. But you don't have to modify existing stack > frames. Consider the following ruby code:
> class Alpha
> def foo > garple = 3 > bar > garple > end
> def bar > self.class.class_eval { > def foo > garple = 4 > bar > garple > end > } > end
> The first time it's called, #foo answers 3. The second time, it > answers 4. It works the same way in Smalltalk.
Sure, the behavior you want is simple, but the problem is that getting optimized, inlined code to give you that simple behavior is very hard. It's a kind of multitasking problem, and anyone who's tried to write multiprocessor code knows that you have to consider every combination of states.
In fact I have my doubts that Strongtalk can preserve the symantics you're asking for if it does global optimization and scheduling beyond simple inlining.
Imagine that you're inside of a loop that's has an inlined function. The unoptimized loop would have called "baz" but now its just a loop that intertwines "baz" code with whatever other code runs in that loop, maybe it's even unrolled the loop by 4 and interwined 4 instances of "baz". In that case you're basically screwed if you have to simulate changing "baz" on an index that isn't a multiple of 4.
You'd get similar problems if you inlined up more than one level. And I haven't even taken the time to think about what happens to code and data that were folded, partially executed in compilation and optimized out of existence if you have to change some level in the inlined code...
If the code was changed by another thread, then you can just pretend that the other thread waited until a safer time, but if code in one thread modifies itself at just the wrong point, then you can't use the trick of slipping time between threads, so the the compiler has to be completely correct in detecting that possibility and preventing the wrong optimization. I hope that's possible.
Even if you don't have that sort of aggressive optimization, mere inlining makes the situation awfully complicated. You have to take a stack frame for a single function (that has inlines inside of it) and turn it into nested stack frames for each of the functions inlined at the current program counter position.
And that's on top of the complications that every smalltalk has, like the need to keep obsolete code around until all of the callers have counted out.
No doubt Mr. Griswald is infinitely more familiar with these problems than I am and can correct my misconceptions.
> > As I was going to say in my response to Dave, I think the optimal > > answer for > > Ruby isn't an object table or extra indirection for all variables, but > > rather a system that recognizes that mutating classes is relatively > > rare:
> > 1. the ability to mutate all instances during a stop-the-world > > collect and > > compact later on. This collect would also have to mark effected stack > > frames for deoptimization.
> Yeah, you'd want mutation of the instances to be uninterruptible, but > it needn't be tied to garbage collection. Some Smalltalks have a > primitive to do mass becomes, but even if Strongtalk doesn't, I'm > sure you could do #valueUninterruptibly, or something similar.
> Again, I don't see why you'd have to deoptimize stacks. Ruby doesn't > have a way of specifying the internal layout of an object, so you > could always add instvars to the end of of the object. This would > mean that existing stacks needn't be deoptimized.
> > 2. a stopgap where all objects have a free pointer or two in order > > to hold > > possible additions to the class until such time as a stop and > > collect can > > change the definition.
> I don't see why this has to be so complicated. Why not just do > something like this:
> 1. A method is compiled that references an instance variable not yet > present in the class.
> 2. The compiler notices this, and does a migration:
> a. It creates a new class with the new variable after the existing > variables > b. It enumerates the existing instances, creating instances of the > new class > c. All the old instances are converted to the new instances in a > mass become
> 3. The new method is installed in the new class
> It's actually pretty easy.
> Colin
Well, in a smalltalk without indirection through an object table, every become requires the equivalent to a full collect, because you have to find every single reference to the object and update it.
I suppose there's a trick where you change the object in place to hold some sort of forwarding object, but you'll be stuck with forwarding objects until you do a full collect that replaces them all. And forwarding objects are going to gunk up the type feedback and optimization until they're gone.
And forwarding objects will force a deoptimization of active frames because the forwarding object will not be the type already optimized for.
And as David pointed, out subclasses of change objects will have to force deoptimization even if we do a full collect and don't bother with forwarding.
But my idea of having extra pointers waiting to hold extra slots would allow us to postpone deoptimization and the full sweep. Postponing is a good thing because it lets you wait until you've aggregated enough work to be worth your while and prevent thrashing on worst cases.
Good luck. I would focus before anything else on doing a proof of concept that can run some reasonable benchmarks, because with those you can get the attention of other people out there to help out, and maybe get some of the other people working on a Ruby VM to make the switch (although from our experience trying to attract other Smalltalk VM people, that might be harder than you might expect). -Dave
> -----Original Message----- > From: strongtalk-general@googlegroups.com > [mailto:strongtalk-general@googlegroups.com]On Behalf Of Cafe Alpha > Sent: Monday, October 30, 2006 8:01 AM > To: strongtalk-general@googlegroups.com > Subject: Re: Future of StrongTalk
> I'm going to try to do it.
> As long as I can get useful help from this list, it should take > me much less > time to adapt the StrongTalk VM than write my own... I got over my "not > invented here" attitude about a day ago when I realized that time > to getting > something usable was a year less this way and that every optimization that > StrongTalk doesn't do that I wanted, I could probably implement > at least as > quickly working with StrongTalk than working on my own. And then my code > would be useful to other projects as well, if it gets accepted.
> I tend to change directions quickly and often, but my current > inclination is > to devote my spare time in the next month to trying to understand and gut > Ruby 1.8.5's yacc and lex files to make an adapted parser. Then I don't > have to worry about duplicating the context dependent grammar, > I'll use the > original.
> I like the idea of parsing to s-expressions as well, that has been my plan > for a while. I intended my Ruby to always convert to s-expressions as the > first step in parsing and to extend the language so that the s-expression > form of all code is always available in order to facilitate > metaprogramming.
> ----- Original Message ----- > From: "Avi Bryant" <avi.bry...@gmail.com> > To: <strongtalk-general@googlegroups.com> > Sent: Sunday, October 29, 2006 8:06 PM > Subject: Re: Future of StrongTalk
> > On Oct 28, 2006, at 1:58 PM, Colin Putney wrote:
> > Me too. I've been advocating and tinkering with Ruby-on-Smalltalk > > for at least two years now (sadly, far more of the former than the > > latter), but having a high-performance liberally licensed VM > > available makes the story much more compelling.
> As long as I can get useful help from this list, it should take me much less > time to adapt the StrongTalk VM than write my own... I got over my "not > invented here" attitude about a day ago when I realized that time to getting > something usable was a year less this way and that every optimization that > StrongTalk doesn't do that I wanted, I could probably implement at least as > quickly working with StrongTalk than working on my own. And then my code > would be useful to other projects as well, if it gets accepted.
> I tend to change directions quickly and often, but my current inclination is > to devote my spare time in the next month to trying to understand and gut > Ruby 1.8.5's yacc and lex files to make an adapted parser. Then I don't > have to worry about duplicating the context dependent grammar, I'll use the > original.
> I like the idea of parsing to s-expressions as well, that has been my plan > for a while. I intended my Ruby to always convert to s-expressions as the > first step in parsing and to extend the language so that the s-expression > form of all code is always available in order to facilitate metaprogramming.
> ----- Original Message ----- > From: "Avi Bryant" <avi.bry...@gmail.com> > To: <strongtalk-general@googlegroups.com> > Sent: Sunday, October 29, 2006 8:06 PM > Subject: Re: Future of StrongTalk
> > On Oct 28, 2006, at 1:58 PM, Colin Putney wrote:
> > Me too. I've been advocating and tinkering with Ruby-on-Smalltalk > > for at least two years now (sadly, far more of the former than the > > latter), but having a high-performance liberally licensed VM > > available makes the story much more compelling.
> > Avi
> Cool.
> Josh Scholar
Hi Josh,
Been following this thread, and I'm pleased that you have stepped up to the plate. There was recently a symposium called Lang.NET. One of the presentations was by John Gough from Queensland University. Apparently he as already written a mostly working Ruby compiler for the CLR. The back-end as I understand it generates C#. He as also created some yacc and lex like tools specifically for the job, all open source.
John as a presentation here. I like Avi's suggestion of compiling to s-expressions also. One of the thing on my wish list is better interoperability between OO languages. Using s-expressions as a franca-lingua just sounds right.
BTW: As anyone though about approaching the universities? This kind of stuff sounds like it's made for a PhD thesis. Surely there are proffessors and post grads out there eager and able to help?
The only problem I had with Smalltalk before was that it wasn't truly multithreaded - and I think that's the only thing that Java really has going for it (aside from a backer who is a champion at shameless self-promotion).
Is Strongtalk going to be fully multithreaded?
On 10/27/06, David Griswold <David.Grisw...@acm.org> wrote:
> > -----Original Message----- > > From: Carlo Dapor [mailto:cat...@gmail.com] > > Sent: Friday, October 27, 2006 4:36 PM > > To: David Griswold > > Subject: Future of StrongTalk
> > David
> > What does it mean for StrongTalk, once Java is open sourced. > > After all, the hotspot is based on it, with a lot more to it.
> > Will StrongTalk still continue, will it take inherit all the > > intellectual properties from hotspot ?
> The HotSpot VM is for Java, not Smalltalk, and the languages and > implementations are very different. The Java VM has static implementation > type information available, and it did not use the type-feedback > capabilities of the Strongtalk VM (the last I heard), which was a decision > I > was against. So while it also does extensive inlining, it uses different > algorithms that inline things in different ways, and while it can do > better > inlining for some things than Strongtalk, it does worse inlining for other > things, so the tradeoffs are not very clear (and I don't think anyone has > ever done a detailed comparison of the impact of the different inlining > strategies).
> Nor does the Java VM use tagging, which hurts Java a lot as a language, > since it makes it really painful and expensive to treat basic types as > objects. So I don't think the Java VM will ever be a better VM to host > true > dynamic languages like Smalltalk (JavaScript, Ruby, etc), since they use > tagging and need type-feedback.
> Another big disadvantage of the Java VM is that because it has to count on > the type system for dynamic safety, rather than using the flexibility of > type-feedback, it has an enormous amount of complexity in things like the > bytecode verifier, which must prove that the bytecode type information is > valid. And it has all those different kinds of basic types in addition to > objects. So it is more complex than Strongtalk.
> The one big engineering advantage of the HotSpot VM is that it is > internally > multi-threaded, which means it can take better advantage (right now) of > multi-core and native preemptive threading. But hopefully that > engineering > will eventually be done in Strongtalk too.
> And a bigger point is that, although I don't know for sure, I doubt that > Sun > will release the Java VM under a totally open-source license like BSD. It > will probably be under a more proprietary license like the one they use > for > other Java open-source stuff, which I don't think is nearly as nice as the > Strongtalk license. > -Dave
> John as a presentation here. I like Avi's suggestion of compiling to > s-expressions also. One of the thing on my wish list is better > interoperability between OO languages. Using s-expressions as a > franca-lingua just sounds right.
Well it's lisp. Perhaps a younger person than me would have picked a more structured intermediate language built on XML. XML is the new s-expression.
> The only problem I had with Smalltalk before was that it wasn't > truly multithreaded - and I think that's the only thing that Java > really has going for it (aside from a backer who is a champion at > shameless self-promotion).
> Is Strongtalk going to be fully multithreaded?
Strongtalk's Process class is already mapped to native threads, yes, as discussed in the conversation about continuations. The idea is that eventually it will support M:N style multiprocessing to support lighter-weight concurrency and control-flow changes.
[mailto:strongtalk-general@googlegroups.com]On Behalf Of John Kwon Sent: Monday, October 30, 2006 4:33 PM To: strongtalk-general@googlegroups.com Subject: Re: Future of StrongTalk
The only problem I had with Smalltalk before was that it wasn't truly multithreaded - and I think that's the only thing that Java really has going for it (aside from a backer who is a champion at shameless self-promotion).
Is Strongtalk going to be fully multithreaded?
Smalltalk semantics traditionally haven't included that, and it would break a lot of existing Smalltalk code, but given that the world is going multi-core, it is becoming much more important to be fully multithreaded, so I think it is definitely something that should be on the to-do-list (and it is already, if you look at the issues database).
At least all the Smalltalk code I wrote in the core libraries was designed with multi-threading in mind, so there are monitors or critical regions around important shared globals, class variables etc, so some of the work at the Smalltalk level is done, although you never really know if it will deadlock until you turn multithreading on. In fact the frequent hanging problem we had in the initial release was because of a deadlock caused by a critical region I had put around the event 'grab' stack, which I solved for now just by removing the semaphore, since it isn't needed yet anyway. So these things are not easy.
But most of the work needed is in the VM. We already use native threads, so that part is done, but only one thread is allowed to run in Smalltalk at a time right now, because the VM itself isn't internally thread-safe. It is a big job to make a VM multi-threaded (and the hardest part is testing, which becomes non-deterministic once you go multi-threaded), and like all the other big items on the list, it all depends on whether we can get smart people to step up to the plate and work on it. It isn't going to happen by itself.
> -----Original Message----- > *From:* strongtalk-general@googlegroups.com [mailto: > strongtalk-general@googlegroups.com]*On Behalf Of *John Kwon > *Sent:* Monday, October 30, 2006 4:33 PM > *To:* strongtalk-general@googlegroups.com > *Subject:* Re: Future of StrongTalk
> The only problem I had with Smalltalk before was that it wasn't truly > multithreaded - and I think that's the only thing that Java really has going > for it (aside from a backer who is a champion at shameless self-promotion).
> Is Strongtalk going to be fully multithreaded?
> Smalltalk semantics traditionally haven't included that, and it would > break a lot of existing Smalltalk code, but given that the world is going > multi-core, it is becoming much more important to be fully multithreaded, so > I think it is definitely something that should be on the to-do-list (and it > is already, if you look at the issues database).
> At least all the Smalltalk code I wrote in the core libraries was designed > with multi-threading in mind, so there are monitors or critical regions > around important shared globals, class variables etc, so some of the work at > the Smalltalk level is done, although you never really know if it will > deadlock until you turn multithreading on. In fact the frequent hanging > problem we had in the initial release was because of a deadlock caused by a > critical region I had put around the event 'grab' stack, which I solved for > now just by removing the semaphore, since it isn't needed yet anyway. So > these things are not easy.
> But most of the work needed is in the VM. We already use native threads, > so that part is done, but only one thread is allowed to run in Smalltalk at > a time right now, because the VM itself isn't internally thread-safe. It is > a big job to make a VM multi-threaded (and the hardest part is testing, > which becomes non-deterministic once you go multi-threaded), and like all > the other big items on the list, it all depends on whether we can get smart > people to step up to the plate and work on it. It isn't going to happen by > itself.
I am a bit of an expert in is multiprocessing algorithms, non-blocking algorithms etc. , so I may be able to help in multithreading the VM. My current business is not programming related, but at my last programming job I invented a few algorithms for a multi-processor database.
If you pick the right algorithms and principles then there doesn't have to be any problem.
I started to write a list of usefully multiprocessor algorithms, principles and common bugs, but that needs an essay. I don't have time to write an essay right this moment.
Also, I don't have a multicore machine, so I can't usefully test that multitasking code at the moment. I miss having 8 core machines that I could use to beat the hell out of the code.
It could be a bit scary that the code wasn't designed with multiple processors in mind in the first place depending on the idioms used.
----- Original Message ----- From: David Griswold To: strongtalk-general@googlegroups.com Sent: Monday, October 30, 2006 11:09 AM Subject: RE: Future of StrongTalk
-----Original Message----- From: strongtalk-general@googlegroups.com [mailto:strongtalk-general@googlegroups.com]On Behalf Of John Kwon Sent: Monday, October 30, 2006 4:33 PM To: strongtalk-general@googlegroups.com Subject: Re: Future of StrongTalk
The only problem I had with Smalltalk before was that it wasn't truly multithreaded - and I think that's the only thing that Java really has going for it (aside from a backer who is a champion at shameless self-promotion).
Is Strongtalk going to be fully multithreaded?
Smalltalk semantics traditionally haven't included that, and it would break a lot of existing Smalltalk code, but given that the world is going multi-core, it is becoming much more important to be fully multithreaded, so I think it is definitely something that should be on the to-do-list (and it is already, if you look at the issues database).
At least all the Smalltalk code I wrote in the core libraries was designed with multi-threading in mind, so there are monitors or critical regions around important shared globals, class variables etc, so some of the work at the Smalltalk level is done, although you never really know if it will deadlock until you turn multithreading on. In fact the frequent hanging problem we had in the initial release was because of a deadlock caused by a critical region I had put around the event 'grab' stack, which I solved for now just by removing the semaphore, since it isn't needed yet anyway. So these things are not easy.
But most of the work needed is in the VM. We already use native threads, so that part is done, but only one thread is allowed to run in Smalltalk at a time right now, because the VM itself isn't internally thread-safe. It is a big job to make a VM multi-threaded (and the hardest part is testing, which becomes non-deterministic once you go multi-threaded), and like all the other big items on the list, it all depends on whether we can get smart people to step up to the plate and work on it. It isn't going to happen by itself.
>>> I suppose adding new instance variables on to the end of an object >>> is the >>> only case that is safe, after all.
>> Right... luckily that's what we'd be doing
> As David pointed out, that's not always what we'll be doing.
A simple way around this would be to compile accessor methods for all instance variables and have the Ruby compiler generate accessor calls rather than direct variable accesses. Then when you add a variable, you can put it anywhere you like; you just have to make sure to update all the accessor methods. With inlining, it wouldn't even be much of a performance hit.
[snip a bunch of stuff, here and at the end]
> Imagine that you're inside of a loop that's has an inlined > function. The > unoptimized loop would have called "baz" but now its just a loop that > intertwines "baz" code with whatever other code runs in that loop, > maybe > it's even unrolled the loop by 4 and interwined 4 instances of > "baz". In > that case you're basically screwed if you have to simulate changing > "baz" on > an index that isn't a multiple of 4.
Ok, I think I see where you're coming from. If you've got code that calls a method inside a loop, and that method is modified, you've now inlined the wrong implementation and you've got to deoptimize the stack with the partially-executed loop intact so that subsequent invocations of the changed method will work correctly.
Fair enough, but I still don't see the need for explicit support for Ruby in the Strongtalk VM. My logic goes like this:
1. Ruby and Smalltalk both allow methods to be modified at arbitrary times during execution.
2. The Strongtalk VM supports this for Smalltalk code today. (As far as I know...)
3. Ruby methods and Smalltalk methods would be indistinguishable at the bytecode level.
4. Therefore dynamically modified Ruby methods should work just as well as Smalltalk methods.
My assumptions might be wrong, and there might be hidden gotchas, but I do think this is pretty straightforward. My inclination would be to implement the Ruby compiler and runtime support entirely in Smalltalk, and focus on getting the semantics right, even at the expense of speed. That would still produce a Ruby implementation faster than the existing one, and it could then be tuned further if need be.