RE: Future of StrongTalk

32 views
Skip to first unread message

David Griswold

unread,
Oct 27, 2006, 4:41:38 PM10/27/06
to Carlo Dapor, strongtal...@googlegroups.com

> -----Original Message-----
> From: Carlo Dapor [mailto:cat...@gmail.com]
> Sent: Friday, October 27, 2006 4:36 PM
> To: David Griswold
> Subject: Future of StrongTalk
>
>
> David
>
>
> What does it mean for StrongTalk, once Java is open sourced.
> After all, the hotspot is based on it, with a lot more to it.
>
> Will StrongTalk still continue, will it take inherit all the
> intellectual properties from hotspot ?

The HotSpot VM is for Java, not Smalltalk, and the languages and
implementations are very different. The Java VM has static implementation
type information available, and it did not use the type-feedback
capabilities of the Strongtalk VM (the last I heard), which was a decision I
was against. So while it also does extensive inlining, it uses different
algorithms that inline things in different ways, and while it can do better
inlining for some things than Strongtalk, it does worse inlining for other
things, so the tradeoffs are not very clear (and I don't think anyone has
ever done a detailed comparison of the impact of the different inlining
strategies).

Nor does the Java VM use tagging, which hurts Java a lot as a language,
since it makes it really painful and expensive to treat basic types as
objects. So I don't think the Java VM will ever be a better VM to host true
dynamic languages like Smalltalk (JavaScript, Ruby, etc), since they use
tagging and need type-feedback.

Another big disadvantage of the Java VM is that because it has to count on
the type system for dynamic safety, rather than using the flexibility of
type-feedback, it has an enormous amount of complexity in things like the
bytecode verifier, which must prove that the bytecode type information is
valid. And it has all those different kinds of basic types in addition to
objects. So it is more complex than Strongtalk.

The one big engineering advantage of the HotSpot VM is that it is internally
multi-threaded, which means it can take better advantage (right now) of
multi-core and native preemptive threading. But hopefully that engineering
will eventually be done in Strongtalk too.

And a bigger point is that, although I don't know for sure, I doubt that Sun
will release the Java VM under a totally open-source license like BSD. It
will probably be under a more proprietary license like the one they use for
other Java open-source stuff, which I don't think is nearly as nice as the
Strongtalk license.
-Dave


Cafe Alpha

unread,
Oct 28, 2006, 3:07:47 PM10/28/06
to strongtal...@googlegroups.com
> Nor does the Java VM use tagging, which hurts Java a lot as a language,
> since it makes it really painful and expensive to treat basic types as
> objects. So I don't think the Java VM will ever be a better VM to host
true
> dynamic languages like Smalltalk (JavaScript, Ruby, etc), since they use
> tagging and need type-feedback.

I'm wondering how much needs to be added (if anything) to host Ruby. I see
the problems as:

Ruby is more dynamic than Smalltalk. All instance variables are currently
implemented as slots, ie you can add new instance variable to an object in a
running program, and you need to be able to add mixins to a class and or
change methods and expect the change to effect all of the live objects.

So:

1 Adding new instance variables isn't exactly the same as "become" because
it has to be able to handle live stack frames referencing objects whoes
definition changed, whereas in Smalltalk you just expect the system to crash
in that case.

2. Changing the definition of methods can invalidate inlining optimizations
in live frames: deoptimization plus reoptimization anyone?

3. Even adding methods to unrelated classes can change optimizations based
on assuming that only a single class (or a limited number of them) uses a
given selector.

4. One optimization that makes sense in Ruby but not in Smalltalk is to
figure out which instance variables are rarely instanciated but which are
taking up a lot of memory and changing them into being stored in a hidden
subobject. I assume that using a hidden variable holding a hidden object is
better than trying to implement a completely general, safe and fast become
(that's almost impossible, I think).

Josh Scholar

Josh Scholar

unread,
Oct 28, 2006, 3:12:02 PM10/28/06
to Strongtalk-general
Another point is that since Ruby supports loading a class definition in
independent parts, there's no way to fake your way past the need to be
able to mutate class definitions.

By the way, the exciting thing about Ruby is that there is a large
community of people using it.

David Griswold

unread,
Oct 28, 2006, 4:30:27 PM10/28/06
to strongtal...@googlegroups.com

Josh wrote:
> > Nor does the Java VM use tagging, which hurts Java a lot as a language,
> > since it makes it really painful and expensive to treat basic types as
> > objects. So I don't think the Java VM will ever be a better VM to host
> true
> > dynamic languages like Smalltalk (JavaScript, Ruby, etc), since they use
> > tagging and need type-feedback.
>
> I'm wondering how much needs to be added (if anything) to host
> Ruby. I see
> the problems as:
>
> Ruby is more dynamic than Smalltalk. All instance variables are currently
> implemented as slots, ie you can add new instance variable to an
> object in a
> running program, and you need to be able to add mixins to a class and or
> change methods and expect the change to effect all of the live objects.
>
> So:
>
> 1 Adding new instance variables isn't exactly the same as "become" because
> it has to be able to handle live stack frames referencing objects whoes
> definition changed, whereas in Smalltalk you just expect the
> system to crash
> in that case.

From your description I don't understand how that is different than become.
In Smalltalk live stack frames can reference objects that change via become,
and it should work just fine if become is implemented properly. If adding
instance variables has to be *fast*, however, that is another issue. It
would require an object table (or a forwarding wrapper for every object),
and to make the class modification itself fast would require changes that
would slow down all instance var accesses, unless you discriminated against
the new instance variables performance-wise. That is probably a big reason
why Ruby is so slow.

> 2. Changing the definition of methods can invalidate inlining
> optimizations
> in live frames: deoptimization plus reoptimization anyone?

That machinery is already there; deoptimizing live frames is done all the
time in Strongtalk. They don't have to be immediately reoptimized, you can
do what is necessary with the interpreted frames after deoptimization, and
if they are used frequently thereafter, they will get reoptimized by the
normal mechanism.

> 3. Even adding methods to unrelated classes can change optimizations based
> on assuming that only a single class (or a limited number of them) uses a
> given selector.

Once again, deoptimization solves this problem. This exact issue exists in
the JVM, and once you have deoptimization it is trivial. We don't inline in
Strongtalk right now based on the number of method implementations, because
type-feedback can already inline those methods anyway (because if there is
only one implementation, then any send of that message is by definition
going to be monomorphic, which is what Strongtalk looks at).

> 4. One optimization that makes sense in Ruby but not in Smalltalk is to
> figure out which instance variables are rarely instanciated but which are
> taking up a lot of memory and changing them into being stored in a hidden
> subobject. I assume that using a hidden variable holding a
> hidden object is
> better than trying to implement a completely general, safe and fast become
> (that's almost impossible, I think).

Yes, adding a slower mechanism for handling lazily-added instance variables
as an uncommon case would probably be the way to do it.

Ruby would probably be a great fit for a modified Strongtalk VM, but Ruby
apparently has other problems like an ill-defined grammar that would have to
be dealt with first. But if someone wanted to build a StrongRuby, I would
encourage them all I could.

-Dave


Colin Putney

unread,
Oct 28, 2006, 4:58:27 PM10/28/06
to strongtal...@googlegroups.com

On Oct 28, 2006, at 12:07 PM, Cafe Alpha wrote:

> I'm wondering how much needs to be added (if anything) to host
> Ruby. I see
> the problems as:
>
> Ruby is more dynamic than Smalltalk. All instance variables are
> currently
> implemented as slots, ie you can add new instance variable to an
> object in a
> running program, and you need to be able to add mixins to a class
> and or
> change methods and expect the change to effect all of the live
> objects.

No, Smalltalk does all this stuff dynamically as well. Smalltalk
typically doesn't have mixins, but it's not hard to implemented them,
and Strongtalk has explicit support for them. In fact, Ruby's object
model is almost exactly that of Strongtalk. Only the the syntax is
different.

> So:
>
> 1 Adding new instance variables isn't exactly the same as "become"
> because
> it has to be able to handle live stack frames referencing objects
> whoes
> definition changed, whereas in Smalltalk you just expect the system
> to crash
> in that case.

Not so. In Smalltalk, when you add an instance variable to a class a
new version of the class is built and all instances of the old class
are converted to instances of the new class using become. It doesn't
crash, live stack frames continue to function correctly.

> 2. Changing the definition of methods can invalidate inlining
> optimizations
> in live frames: deoptimization plus reoptimization anyone?

In Smalltalk, methods used by active contexts aren't converted to the
new version of the method, but all subsequent invocations of the
method use the new version. If it's the same in Ruby, (and if it's
not, I'd like to know how the contexts are migrated) then the
Strongtalk VM should handle this transparently.

> 3. Even adding methods to unrelated classes can change
> optimizations based
> on assuming that only a single class (or a limited number of them)
> uses a
> given selector.

Again, this is normal and expected in Smalltalk, and the Strongtalk
VM is designed to handle it correctly.

> 4. One optimization that makes sense in Ruby but not in Smalltalk
> is to
> figure out which instance variables are rarely instanciated but
> which are
> taking up a lot of memory and changing them into being stored in a
> hidden
> subobject. I assume that using a hidden variable holding a hidden
> object is
> better than trying to implement a completely general, safe and fast
> become
> (that's almost impossible, I think).

That kind of optimization is probably better done up in Ruby code. If
you add instance variables on the fly, then the ones that never get
used will never be added anyway. Once they do, the only cost will be
an extra pointer in all instances, so it's not catastrophic.

In general, though, I think you're right. The Ruby object model is
almost exactly that of Strongtalk. The main issues that come to my
mind are:

Singleton objects: The runtime would have to create a custom subclass
and convert the object to an instance of it.

Continuations: currently not supported by Strongtalk, but David has
said it's doable.

Ruby message sends can have a variable number of arguments. The
runtime would have to capture sends with an "unexpected" number of
arguments (probably using #doesNotUnderstand:) and create specialized
versions of the method.

Probably the biggest chunk of work in implementing Ruby-on-Strongtalk
would be writing a Ruby-to-Strongtalk-bytecode compiler. Ruby is
notoriously difficult to parse, so it would be a fair amount of work
to make sure all the nooks and crannies of the grammar get covered.

Pulling this off would be a huge coup, though, as it would be way,
way faster than the existing Ruby implementation. I'd really like to
see this happen.

Colin

Colin Putney

unread,
Oct 28, 2006, 5:13:48 PM10/28/06
to strongtal...@googlegroups.com

On Oct 28, 2006, at 1:30 PM, David Griswold wrote:
> In Smalltalk live stack frames can reference objects that change
> via become,
> and it should work just fine if become is implemented properly. If
> adding
> instance variables has to be *fast*, however, that is another
> issue. It
> would require an object table (or a forwarding wrapper for every
> object),
> and to make the class modification itself fast would require
> changes that
> would slow down all instance var accesses, unless you discriminated
> against
> the new instance variables performance-wise. That is probably a
> big reason
> why Ruby is so slow.

No, Ruby is slow because it doesn't have a VM. It's an interpreter
that walks the AST, evaluating each node as it goes. The core classes
are implemented in C, so you get acceptable performance out of them.
It's easy to create Ruby bindings for C code, so people just
implement performance critical code in C and call it from Ruby.

I claim adding instance variables doesn't have to be fast. It's going
to happen at most a handful of times per class. Migrating all the
instances won't take very long - the pathological case of adding lots
of instance variables after there are lots of instances is going to
be extremely rare. The common case will be that all the methods will
be added before any instances are created, and in that case no object
migration will be needed.

Colin

Alejandro F. Reimondo

unread,
Oct 28, 2006, 5:26:27 PM10/28/06
to strongtal...@googlegroups.com

> > 1 Adding new instance variables isn't exactly the same as "become"
> > because
> > it has to be able to handle live stack frames referencing objects
> > whoes
> > definition changed, whereas in Smalltalk you just expect the system
> > to crash
> > in that case.
>
> Not so. In Smalltalk, when you add an instance variable to a class a
> new version of the class is built and all instances of the old class
> are converted to instances of the new class using become. It doesn't
> crash, live stack frames continue to function correctly.
>

All instances mutation can be defferred
and do not need to be mutated inmmediatelly.
In Visual Smalltalk implementation objects can change
their method dictionaries to have a per-instance lookup path,
it is valuable when mutating instances because the mutation
can be deferred until the old-fashioned object is really used.
The "old" instance is set to have a method dictionary that
implements only #doesNotUnderstand: and when a message
impacts the object it is mutated and "becomed" to new shape.
Doing this way, instances are not missed when changes
are reverted and the time spent to change all instances
of a class is not payed if obsolete class to current class
mapping do not impose a size change... (I do not know
if VS evade the #become on such situations, but I
think that it is interesting to evaluate the convenience of
do not have a reference to "the class of the object" in
the object header)

Ale.

Cafe Alpha

unread,
Oct 28, 2006, 6:48:35 PM10/28/06
to strongtal...@googlegroups.com

>
> > So:
> >
> > 1 Adding new instance variables isn't exactly the same as "become"
> > because
> > it has to be able to handle live stack frames referencing objects
> > whoes
> > definition changed, whereas in Smalltalk you just expect the system
> > to crash
> > in that case.
>
> Not so. In Smalltalk, when you add an instance variable to a class a
> new version of the class is built and all instances of the old class
> are converted to instances of the new class using become. It doesn't
> crash, live stack frames continue to function correctly.
>

I think that, in most smalltalks, generally you can call "become" to change
an object to any object that has the same instance variables in the same
order - otherwise live contexts will involve code that mis-accesses instance
variables.

I suppose adding new instance variables on to the end of an object is the
only case that is safe, after all.

> > 2. Changing the definition of methods can invalidate inlining
> > optimizations
> > in live frames: deoptimization plus reoptimization anyone?
>
> In Smalltalk, methods used by active contexts aren't converted to the
> new version of the method, but all subsequent invocations of the
> method use the new version. If it's the same in Ruby, (and if it's
> not, I'd like to know how the contexts are migrated) then the
> Strongtalk VM should handle this transparently.
>

The problem I was thinking about was that of inlined functions that aren't
officially part of the active context. But I guess that deoptimization can
handle it.

As an example consider this code:

foo: someObject
||

[someObject bar] whileTrue: [someObject baz]
...

Imagine that the function has inlined "baz" but the definition of "baz"
changes while this thread is up a frame evaluating "bar". The definintion
of "foo:" hasn't changed but the definition of optimized foo has.

I guess you have to keep track of which functions inline which others and
deoptimize their frames when the functions they inline change.

>
> > 4. One optimization that makes sense in Ruby but not in Smalltalk
> > is to
> > figure out which instance variables are rarely instanciated but
> > which are
> > taking up a lot of memory and changing them into being stored in a
> > hidden
> > subobject. I assume that using a hidden variable holding a hidden
> > object is
> > better than trying to implement a completely general, safe and fast
> > become
> > (that's almost impossible, I think).
>
> That kind of optimization is probably better done up in Ruby code. If
> you add instance variables on the fly, then the ones that never get
> used will never be added anyway. Once they do, the only cost will be
> an extra pointer in all instances, so it's not catastrophic.
>

As I was going to say in my response to Dave, I think the optimal answer for
Ruby isn't an object table or extra indirection for all variables, but
rather a system that recognizes that mutating classes is relatively rare:

1. the ability to mutate all instances during a stop-the-world collect and
compact later on. This collect would also have to mark effected stack
frames for deoptimization.

2. a stopgap where all objects have a free pointer or two in order to hold
possible additions to the class until such time as a stop and collect can
change the definition.

> In general, though, I think you're right. The Ruby object model is
> almost exactly that of Strongtalk. The main issues that come to my
> mind are:
>
> Singleton objects: The runtime would have to create a custom subclass
> and convert the object to an instance of it.
>
> Continuations: currently not supported by Strongtalk, but David has
> said it's doable.
>
> Ruby message sends can have a variable number of arguments. The
> runtime would have to capture sends with an "unexpected" number of
> arguments (probably using #doesNotUnderstand:) and create specialized
> versions of the method.
>
> Probably the biggest chunk of work in implementing Ruby-on-Strongtalk
> would be writing a Ruby-to-Strongtalk-bytecode compiler. Ruby is
> notoriously difficult to parse, so it would be a fair amount of work
> to make sure all the nooks and crannies of the grammar get covered.
>
> Pulling this off would be a huge coup, though, as it would be way,
> way faster than the existing Ruby implementation. I'd really like to
> see this happen.
>
> Colin
>

The one fly in the ointment here is that there IS a project underway to give
Ruby a VM, but since development is going on in Japanese I can't really be
sure how sophisticated it is planned to be. So far I don't think it's
showing much speedup but it could be that their plans are ambitious.

Josh Scholar

Cafe Alpha

unread,
Oct 28, 2006, 6:59:47 PM10/28/06
to strongtal...@googlegroups.com
>
> No, Ruby is slow because it doesn't have a VM. It's an interpreter
> that walks the AST, evaluating each node as it goes. The core classes
> are implemented in C, so you get acceptable performance out of them.
> It's easy to create Ruby bindings for C code, so people just
> implement performance critical code in C and call it from Ruby.
>

I wonder to what extent the conversion of Ruby extensions to a Strongtalk
Ruby could be facilitated by a limited C or C++ to Strongtalk or to
Strongtalk VM compiler.

I've been thinking about something similar for my own non-strongtalk Ruby
project.. Throwing together a C compiler that supports a copying,
non-conservative collector. Partially I meant it as a test for my code
generator, but it seemed like a cool hack for facilitating the conversion of
existing libraries and extensions.

Anyway I may switch over to strongtalk.

Josh Scholar

Levi Pearson

unread,
Oct 29, 2006, 12:24:57 AM10/29/06
to Strongtalk-general
>The one fly in the ointment here is that there IS a project underway to give
> Ruby a VM, but since development is going on in Japanese I can't really be
> sure how sophisticated it is planned to be. So far I don't think it's
> showing much speedup but it could be that their plans are ambitious.

I wouldn't worry too much about that. There are currently something
like 6 or 7 projects underway to either port Ruby to an existing vm or
write a new one. Matz, the creator of Ruby, seems to be encouraging
this. I'm sure none of them have the price/performance ratio that
building a VM on top of Strongtalk would have, but at least two of them
have pretty big budgets and are pretty invested in their current VMs,
namely .net and JVM. YARV, the Japanese one, is being written from
scratch essentially by one guy, and seems to be moving pretty slowly.
Probably most interesting from a Smalltalk point of view is Rubinius,
which was created by its author after reading through the Blue Book.

Anyway, getting Ruby to run on Strongtalk was a project I flagged as
'something that would be cool to do if I had time', but after looking
into it a bit and hearing horror stories about actually parsing Ruby
correctly and figuring out what its semantics are supposed to be in
corner cases, I decided to look elsewhere for a hobby project. Don't
let that discourage you, though!

Colin Putney

unread,
Oct 29, 2006, 2:37:16 AM10/29/06
to strongtal...@googlegroups.com

On Oct 28, 2006, at 3:48 PM, Cafe Alpha wrote:

> I think that, in most smalltalks, generally you can call "become"
> to change
> an object to any object that has the same instance variables in the
> same
> order - otherwise live contexts will involve code that mis-accesses
> instance
> variables.
>
> I suppose adding new instance variables on to the end of an object
> is the
> only case that is safe, after all.

Right... luckily that's what we'd be doing

> The problem I was thinking about was that of inlined functions that
> aren't
> officially part of the active context. But I guess that
> deoptimization can
> handle it.
>
> As an example consider this code:
>
> foo: someObject
> ||
>
> [someObject bar] whileTrue: [someObject baz]
> ...
>
> Imagine that the function has inlined "baz" but the definition of
> "baz"
> changes while this thread is up a frame evaluating "bar". The
> definintion
> of "foo:" hasn't changed but the definition of optimized foo has.
>
> I guess you have to keep track of which functions inline which
> others and
> deoptimize their frames when the functions they inline change.

When you modify methods, you do want to flush any native versions
that have been compiled. But you don't have to modify existing stack
frames. Consider the following ruby code:

class Alpha

def foo
garple = 3
bar
garple
end

def bar
self.class.class_eval {
def foo
garple = 4
bar
garple
end
}
end

end

alpha = Alpha.new
puts alpha.foo
puts alpha.foo

The first time it's called, #foo answers 3. The second time, it
answers 4. It works the same way in Smalltalk.

> As I was going to say in my response to Dave, I think the optimal
> answer for
> Ruby isn't an object table or extra indirection for all variables, but
> rather a system that recognizes that mutating classes is relatively
> rare:
>
> 1. the ability to mutate all instances during a stop-the-world
> collect and
> compact later on. This collect would also have to mark effected stack
> frames for deoptimization.

Yeah, you'd want mutation of the instances to be uninterruptible, but
it needn't be tied to garbage collection. Some Smalltalks have a
primitive to do mass becomes, but even if Strongtalk doesn't, I'm
sure you could do #valueUninterruptibly, or something similar.

Again, I don't see why you'd have to deoptimize stacks. Ruby doesn't
have a way of specifying the internal layout of an object, so you
could always add instvars to the end of of the object. This would
mean that existing stacks needn't be deoptimized.

> 2. a stopgap where all objects have a free pointer or two in order
> to hold
> possible additions to the class until such time as a stop and
> collect can
> change the definition.

I don't see why this has to be so complicated. Why not just do
something like this:

1. A method is compiled that references an instance variable not yet
present in the class.

2. The compiler notices this, and does a migration:

a. It creates a new class with the new variable after the existing
variables
b. It enumerates the existing instances, creating instances of the
new class
c. All the old instances are converted to the new instances in a
mass become

3. The new method is installed in the new class

It's actually pretty easy.

Colin

David Griswold

unread,
Oct 29, 2006, 4:11:01 AM10/29/06
to strongtal...@googlegroups.com
> -----Original Message-----
> From: strongtal...@googlegroups.com
> [mailto:strongtal...@googlegroups.com]On Behalf Of Colin Putney
> Sent: Sunday, October 29, 2006 8:37 AM
> To: strongtal...@googlegroups.com
> Subject: Re: Future of StrongTalk
>
>
>
>

Unless the class has subclasses, in which case the added instance vars
aren't at the end for the subclasses.
-Dave


Avi Bryant

unread,
Oct 29, 2006, 11:06:40 PM10/29/06
to strongtal...@googlegroups.com

On Oct 28, 2006, at 1:58 PM, Colin Putney wrote:
>
> Probably the biggest chunk of work in implementing Ruby-on-Strongtalk
> would be writing a Ruby-to-Strongtalk-bytecode compiler. Ruby is
> notoriously difficult to parse, so it would be a fair amount of work
> to make sure all the nooks and crannies of the grammar get covered.

However, this work could (and should) be done last, after a full
proof of concept had been completed without it. There are at least
two independent parsers of Ruby's syntax (JRuby and the standard C
Ruby), either of which could be used to preprocess Ruby source into a
form that could be much more easily handled (say s-expr or an XML
serialization of a parse tree). It would then be easy enough to
produce Smalltalk source code from this that could be fed to any
Smalltalk compiler, like Strongtalk's. It would certainly be
awkward, but it would be enough to make for a killer demo and allow
serious benchmarking, which I would think should produce enough
momentum to get people interested in building a new parser.

> Pulling this off would be a huge coup, though, as it would be way,
> way faster than the existing Ruby implementation. I'd really like to
> see this happen.

Me too. I've been advocating and tinkering with Ruby-on-Smalltalk
for at least two years now (sadly, far more of the former than the
latter), but having a high-performance liberally licensed VM
available makes the story much more compelling.

Avi

Cafe Alpha

unread,
Oct 30, 2006, 2:01:27 AM10/30/06
to strongtal...@googlegroups.com
I'm going to try to do it.

As long as I can get useful help from this list, it should take me much less
time to adapt the StrongTalk VM than write my own... I got over my "not
invented here" attitude about a day ago when I realized that time to getting
something usable was a year less this way and that every optimization that
StrongTalk doesn't do that I wanted, I could probably implement at least as
quickly working with StrongTalk than working on my own. And then my code
would be useful to other projects as well, if it gets accepted.

I tend to change directions quickly and often, but my current inclination is
to devote my spare time in the next month to trying to understand and gut
Ruby 1.8.5's yacc and lex files to make an adapted parser. Then I don't
have to worry about duplicating the context dependent grammar, I'll use the
original.

I like the idea of parsing to s-expressions as well, that has been my plan
for a while. I intended my Ruby to always convert to s-expressions as the
first step in parsing and to extend the language so that the s-expression
form of all code is always available in order to facilitate metaprogramming.


----- Original Message -----
From: "Avi Bryant" <avi.b...@gmail.com>
To: <strongtal...@googlegroups.com>
Sent: Sunday, October 29, 2006 8:06 PM
Subject: Re: Future of StrongTalk


Cool.

Josh Scholar

Cafe Alpha

unread,
Oct 30, 2006, 3:05:23 AM10/30/06
to strongtal...@googlegroups.com

----- Original Message -----
From: "Colin Putney" <cpu...@wiresong.ca>
To: <strongtal...@googlegroups.com>
Sent: Saturday, October 28, 2006 11:37 PM
Subject: Re: Future of StrongTalk

> > I suppose adding new instance variables on to the end of an object
> > is the
> > only case that is safe, after all.
>
> Right... luckily that's what we'd be doing

As David pointed out, that's not always what we'll be doing.

I think we're going to need a little bit of support in the VM, at least for
getting Ruby fully optimized.

Sure, the behavior you want is simple, but the problem is that getting
optimized, inlined code to give you that simple behavior is very hard. It's
a kind of multitasking problem, and anyone who's tried to write
multiprocessor code knows that you have to consider every combination of
states.

In fact I have my doubts that Strongtalk can preserve the symantics you're
asking for if it does global optimization and scheduling beyond simple
inlining.

Imagine that you're inside of a loop that's has an inlined function. The
unoptimized loop would have called "baz" but now its just a loop that
intertwines "baz" code with whatever other code runs in that loop, maybe
it's even unrolled the loop by 4 and interwined 4 instances of "baz". In
that case you're basically screwed if you have to simulate changing "baz" on
an index that isn't a multiple of 4.

You'd get similar problems if you inlined up more than one level. And I
haven't even taken the time to think about what happens to code and data
that were folded, partially executed in compilation and optimized out of
existence if you have to change some level in the inlined code...

If the code was changed by another thread, then you can just pretend that
the other thread waited until a safer time, but if code in one thread
modifies itself at just the wrong point, then you can't use the trick of
slipping time between threads, so the the compiler has to be completely
correct in detecting that possibility and preventing the wrong optimization.
I hope that's possible.

Even if you don't have that sort of aggressive optimization, mere inlining
makes the situation awfully complicated. You have to take a stack frame for
a single function (that has inlines inside of it) and turn it into nested
stack frames for each of the functions inlined at the current program
counter position.

And that's on top of the complications that every smalltalk has, like the
need to keep obsolete code around until all of the callers have counted out.

No doubt Mr. Griswald is infinitely more familiar with these problems than I
am and can correct my misconceptions.

Well, in a smalltalk without indirection through an object table, every
become requires the equivalent to a full collect, because you have to find
every single reference to the object and update it.

I suppose there's a trick where you change the object in place to hold some
sort of forwarding object, but you'll be stuck with forwarding objects until
you do a full collect that replaces them all. And forwarding objects are
going to gunk up the type feedback and optimization until they're gone.

And forwarding objects will force a deoptimization of active frames because
the forwarding object will not be the type already optimized for.

And as David pointed, out subclasses of change objects will have to force
deoptimization even if we do a full collect and don't bother with
forwarding.

But my idea of having extra pointers waiting to hold extra slots would allow
us to postpone deoptimization and the full sweep. Postponing is a good
thing because it lets you wait until you've aggregated enough work to be
worth your while and prevent thrashing on worst cases.

Josh Scholar

David Griswold

unread,
Oct 30, 2006, 5:36:05 AM10/30/06
to strongtal...@googlegroups.com
Good luck. I would focus before anything else on doing a proof of concept
that can run some reasonable benchmarks, because with those you can get the
attention of other people out there to help out, and maybe get some of the
other people working on a Ruby VM to make the switch (although from our
experience trying to attract other Smalltalk VM people, that might be harder
than you might expect).
-Dave

beck...@googlemail.com

unread,
Oct 30, 2006, 9:15:21 AM10/30/06
to Strongtalk-general

Hi Josh,

Been following this thread, and I'm pleased that you have stepped up to
the plate. There was recently a symposium called Lang.NET. One of the
presentations was by John Gough from Queensland University. Apparently
he as already written a mostly working Ruby compiler for the CLR. The
back-end as I understand it generates C#. He as also created some yacc
and lex like tools specifically for the job, all open source.

Here is a link to the Symposium website:

http://www.langnetsymposium.com/speakers.asp

John as a presentation here. I like Avi's suggestion of compiling to
s-expressions also. One of the thing on my wish list is better
interoperability between OO languages. Using s-expressions as a
franca-lingua just sounds right.

BTW: As anyone though about approaching the universities? This kind of
stuff sounds like it's made for a PhD thesis. Surely there are
proffessors and post grads out there eager and able to help?

Paul.

John Kwon

unread,
Oct 30, 2006, 10:33:02 AM10/30/06
to strongtal...@googlegroups.com
The only problem I had with Smalltalk before was that it wasn't truly multithreaded - and I think that's the only thing that Java really has going for it (aside from a backer who is a champion at shameless self-promotion).

Is Strongtalk going to be fully multithreaded?

Cafe Alpha

unread,
Oct 30, 2006, 11:06:39 AM10/30/06
to strongtal...@googlegroups.com
> John as a presentation here. I like Avi's suggestion of compiling to
> s-expressions also. One of the thing on my wish list is better
> interoperability between OO languages. Using s-expressions as a
> franca-lingua just sounds right.
>

Well it's lisp. Perhaps a younger person than me would have picked a more
structured intermediate language built on XML. XML is the new s-expression.

Josh Scholar

Brian Rice

unread,
Oct 30, 2006, 12:11:33 PM10/30/06
to strongtal...@googlegroups.com

On Oct 30, 2006, at 7:33 AM, John Kwon wrote:

> The only problem I had with Smalltalk before was that it wasn't
> truly multithreaded - and I think that's the only thing that Java
> really has going for it (aside from a backer who is a champion at
> shameless self-promotion).
>
> Is Strongtalk going to be fully multithreaded?

Strongtalk's Process class is already mapped to native threads, yes,
as discussed in the conversation about continuations. The idea is
that eventually it will support M:N style multiprocessing to support
lighter-weight concurrency and control-flow changes.

--
Brian T. Rice
http://briantrice.com

David Griswold

unread,
Oct 30, 2006, 2:09:15 PM10/30/06
to strongtal...@googlegroups.com
-----Original Message-----
From: strongtal...@googlegroups.com [mailto:strongtal...@googlegroups.com]On Behalf Of John Kwon
Sent: Monday, October 30, 2006 4:33 PM
To: strongtal...@googlegroups.com
Subject: Re: Future of StrongTalk

The only problem I had with Smalltalk before was that it wasn't truly multithreaded - and I think that's the only thing that Java really has going for it (aside from a backer who is a champion at shameless self-promotion).

Is Strongtalk going to be fully multithreaded?
 
Smalltalk semantics traditionally haven't included that, and it would break a lot of existing Smalltalk code, but given that the world is going multi-core, it is becoming much more important to be fully multithreaded, so I think it is definitely something that should be on the to-do-list (and it is already, if you look at the issues database).   
 
At least all the Smalltalk code I wrote in the core libraries was designed with multi-threading in mind, so there are monitors or critical regions around important shared globals, class variables etc, so some of the work at the Smalltalk level is done, although you never really know if it will deadlock until you turn multithreading on.   In fact the frequent hanging problem we had in the initial release was because of a deadlock caused by a critical region I had put around the event 'grab' stack, which I solved for now just by removing the semaphore, since it isn't needed yet anyway.  So these things are not easy.
 
But most of the work needed is in the VM.  We already use native threads, so that part is done, but only one thread is allowed to run in Smalltalk at a time right now, because the VM itself isn't internally thread-safe.  It is a big job to make a VM multi-threaded (and the hardest part is testing, which becomes non-deterministic once you go multi-threaded), and like all the other big items on the list, it all depends on whether we can get smart people to step up to the plate and work on it.  It isn't going to happen by itself. 
 
-Dave

John Kwon

unread,
Oct 30, 2006, 3:49:10 PM10/30/06
to strongtal...@googlegroups.com
Does anyone have access to the old Actra?

Cafe Alpha

unread,
Oct 30, 2006, 6:54:39 PM10/30/06
to strongtal...@googlegroups.com
I am a bit of an expert in is multiprocessing algorithms, non-blocking algorithms etc. , so I may be able to help in multithreading the VM.  My current business is not programming related, but at my last programming job I invented a few algorithms for a multi-processor database.
 
If you pick the right algorithms and principles then there doesn't have to be any problem. 
 
I started to write a list of usefully multiprocessor algorithms, principles and common bugs, but that needs an essay.  I don't have time to write an essay right this moment.
 
Also, I don't have a multicore machine, so I can't usefully test that multitasking code at the moment.  I miss having 8 core machines that I could use to beat the hell out of the code.
 
It could be a bit scary that the code wasn't designed with multiple processors in mind in the first place depending on the idioms used.
 
Josh Scholar
----- Original Message -----

Josh Scholar

unread,
Oct 30, 2006, 7:27:36 PM10/30/06
to Strongtalk-general
Ignore my embarrassing grammatical errors. It's what happens when I
try to edit and don't proofread.

Colin Putney

unread,
Oct 30, 2006, 8:07:05 PM10/30/06
to strongtal...@googlegroups.com

On Oct 30, 2006, at 12:05 AM, Cafe Alpha wrote:

>>> I suppose adding new instance variables on to the end of an object
>>> is the
>>> only case that is safe, after all.
>>
>> Right... luckily that's what we'd be doing
>
> As David pointed out, that's not always what we'll be doing.

A simple way around this would be to compile accessor methods for all
instance variables and have the Ruby compiler generate accessor calls
rather than direct variable accesses. Then when you add a variable,
you can put it anywhere you like; you just have to make sure to
update all the accessor methods. With inlining, it wouldn't even be
much of a performance hit.

[snip a bunch of stuff, here and at the end]

> Imagine that you're inside of a loop that's has an inlined
> function. The
> unoptimized loop would have called "baz" but now its just a loop that
> intertwines "baz" code with whatever other code runs in that loop,
> maybe
> it's even unrolled the loop by 4 and interwined 4 instances of
> "baz". In
> that case you're basically screwed if you have to simulate changing
> "baz" on
> an index that isn't a multiple of 4.

Ok, I think I see where you're coming from. If you've got code that
calls a method inside a loop, and that method is modified, you've now
inlined the wrong implementation and you've got to deoptimize the
stack with the partially-executed loop intact so that subsequent
invocations of the changed method will work correctly.

Fair enough, but I still don't see the need for explicit support for
Ruby in the Strongtalk VM. My logic goes like this:

1. Ruby and Smalltalk both allow methods to be modified at arbitrary
times during execution.

2. The Strongtalk VM supports this for Smalltalk code today. (As far
as I know...)

3. Ruby methods and Smalltalk methods would be indistinguishable at
the bytecode level.

4. Therefore dynamically modified Ruby methods should work just as
well as Smalltalk methods.

My assumptions might be wrong, and there might be hidden gotchas, but
I do think this is pretty straightforward. My inclination would be to
implement the Ruby compiler and runtime support entirely in
Smalltalk, and focus on getting the semantics right, even at the
expense of speed. That would still produce a Ruby implementation
faster than the existing one, and it could then be tuned further if
need be.

Colin

Cafe Alpha

unread,
Oct 30, 2006, 11:35:22 PM10/30/06
to strongtal...@googlegroups.com

>
> A simple way around this would be to compile accessor methods for all
> instance variables and have the Ruby compiler generate accessor calls
> rather than direct variable accesses. Then when you add a variable,
> you can put it anywhere you like; you just have to make sure to
> update all the accessor methods. With inlining, it wouldn't even be
> much of a performance hit.
>
> [snip a bunch of stuff, here and at the end]
>

Two problems:

1. Assuming that Strongtalk is neither more safe nor less safe on "become:"
than other smalltalks then conceptually you've just shortened the window in
which the program can screw up from being whole functions to be short
accessor functions. But in multitasking algorithms, even a window of a
couple of instructions that can crash the system is unacceptable. So, in
principle, if you call "become:" on an object when another thread is just
entering an accessor function, you'll still get the miss-access and the
crash. So it will still need some VM support to avoid that problem.

Now I'm not saying VM support is impossible, but it's not high level
smalltalk. But, assuming, as I said, that Strongtalk is neither more safe
nor less safe on "become:" than other smalltalks then this would be good
enough for a demonstration, but not for a release.

Still, I'm going to believe that deoptimization working 100% without causing
crashes when I see it. It's VERY subtle and very new to be able to
deoptimize, I'd be surprised if it's bug free. It may be that these edge
conditions I'm discribing are not handled perfectly.

2. More importantly, we need VM support for a _mass_ become (or we need
another level of indirection - which is also a VM change), because if the
damn system has to walk the stack and look for routines to deoptimize on
every "become" then walking the objects of a specific class and calling
"become" on all of them is could take hours. And if it does the "walk the
whole image and update pointers" on every become, it could take weeks.

Having a correct (or nearly correct) algorithm that can be implemented in
high level smalltalk isn't enough, it has to run acceptably quickly.


Josh Scholar

Josh Scholar

unread,
Oct 30, 2006, 11:42:15 PM10/30/06
to Strongtalk-general
I just realized that the first problem I mentioned isn't really a
problem until Strongtalk has real multitasking which it doesn't have
yet.

Of course it isn't Ruby until it has preemptive scheduling!
.........

That said, the second problem still holds. Unless strongtalk has a
fast "become:" (or some sort of lazy, aggregating "become:" that I
think is impossible in an optimized system) then your suggestion would
work but be too slow on any test with more than one or two objects.

Colin Putney

unread,
Oct 31, 2006, 3:08:48 AM10/31/06
to strongtal...@googlegroups.com

On Oct 30, 2006, at 8:42 PM, Josh Scholar wrote:

> I just realized that the first problem I mentioned isn't really a
> problem until Strongtalk has real multitasking which it doesn't have
> yet.
>
> Of course it isn't Ruby until it has preemptive scheduling!

Right. Strongtalk doesn't run Smalltalk processes concurrently on
multiple processor machines, so we don't have to worry about
optimized methods being preempted at the wrong moment. However, that
doesn't mean that Smalltalk processes are not preemptively scheduled!
The VM uses cooperative multitasking to ensure that processes are
preempted at safe points (eg, between bytecodes) but from the point
of view of Smalltalk code, the scheduling is preemptive.

Ruby works the same way - Ruby threads do not map directly to OS
threads the way Java threads do, but they are preemptively scheduled
by the Ruby interpreter. Mapping Ruby processes to Smalltalk
processes ought to work fine.

Colin

Cafe Alpha

unread,
Oct 31, 2006, 4:01:09 AM10/31/06
to strongtal...@googlegroups.com

----- Original Message -----
From: "Colin Putney" <cpu...@wiresong.ca>
To: <strongtal...@googlegroups.com>
Sent: Tuesday, October 31, 2006 12:08 AM
Subject: Re: Future of StrongTalk


>
>

As David just wrote, Smalltalk sematics traditionally did not include
preemptive scheduling.

I wish you were right that Strongtalk already has preemptive scheduling, but
I don't think you are. As he also wrote, things are likely to start
breaking when we DO turn preemptive scheduing on.

And consider that cooperative preemptive scheduling means sticking polling
code in every basic block.

Josh Scholar

Josh Scholar

Cafe Alpha

unread,
Oct 31, 2006, 4:18:41 AM10/31/06
to strongtal...@googlegroups.com
Ok, I found some old posts. David has said more than once that there is no
preemptive scheduling in Strongtalk at this time.

Another point is that if there was green-thread preemptive scheduling, then
unless that was specifically designed to make accessor functions safely fix
that obscure object redefinition problem we've been talking about, there
would be no guarantee that it would be any safer than OS threading.

It IS obscure because smalltalk isn't ruby.

And as I said before, I believe that perfectly safe and transparent
deoptimization in the case of classes being redefined while in use rather
severely limits what sorts of optimizations are completely safe. Since that
sort of event would be extremely rare in a Smalltalk program, and since
Strongtalk hasn't been completely debugged anyway, I wouldn't be surprised
to find out that there are some unsafe optimizations going on.

Josh Scholar

David Griswold

unread,
Oct 31, 2006, 6:48:00 AM10/31/06
to strongtal...@googlegroups.com
-----Original Message-----
From: strongtal...@googlegroups.com [mailto:strongtal...@googlegroups.com]On Behalf Of Cafe Alpha
Sent: Tuesday, October 31, 2006 12:55 AM
To: strongtal...@googlegroups.com
Subject: Re: Future of StrongTalk

I am a bit of an expert in is multiprocessing algorithms, non-blocking algorithms etc. , so I may be able to help in multithreading the VM.  My current business is not programming related, but at my last programming job I invented a few algorithms for a multi-processor database.
 
If you pick the right algorithms and principles then there doesn't have to be any problem. 
 
I started to write a list of usefully multiprocessor algorithms, principles and common bugs, but that needs an essay.  I don't have time to write an essay right this moment.
 
Also, I don't have a multicore machine, so I can't usefully test that multitasking code at the moment.  I miss having 8 core machines that I could use to beat the hell out of the code.
 
It could be a bit scary that the code wasn't designed with multiple processors in mind in the first place depending on the idioms used.
 
It's not just as simple as locking algorithms.  There are many tricky issues to consider.  All the self modifying code has to be made thread-safe, such as updating polymorphic inline-caches.  There are major design decisions to make, like: do you let multiple threads compile at the same time?  If so, where do they store their temp data?  What about object allocation?  Right now allocation is only a few instructions because objects are tail-allocated in a single "new" space.  In a multi-threaded system, do you have a separate thread-local new-space for each thread, which would be required for fast allocation, but make each thread heavy, or do you lock on a shared new-space which would make threads lighter but allocation more expensive?    Unless a completely new concurrent garbage collector is written, GC requires all threads to roll forward to the nearest safe-point (call or backward branch) and stop while a single VM thread scavenges or marks-and-sweeps.  Any other kind of reflective system modification also has to be changed.
 
So it requires major changes to the VM architecture.
 
That being said, it has been done with the Java VM, which is based on the same code base, so it is doable.  But it requires a lot of work and a deep knowledge of the VM.
 
Just making the threading preemptive would be much much easier, but that isn't going to increase scalability on multi-core machines.
 
-Dave

David Griswold

unread,
Oct 31, 2006, 6:48:02 AM10/31/06
to strongtal...@googlegroups.com

It is true that deoptimization needs a lot of testing and I'm sure there are
still bugs there.

> 2. More importantly, we need VM support for a _mass_ become (or we need
> another level of indirection - which is also a VM change), because if the
> damn system has to walk the stack and look for routines to deoptimize on
> every "become" then walking the objects of a specific class and calling
> "become" on all of them is could take hours. And if it does the "walk the
> whole image and update pointers" on every become, it could take weeks.
>
> Having a correct (or nearly correct) algorithm that can be implemented in
> high level smalltalk isn't enough, it has to run acceptably quickly.

If you look at the issue database, you will see I put in an item for a mass
become a long time ago. So when (not if) we implement become, it will be
mass.

-Dave


David Griswold

unread,
Oct 31, 2006, 6:48:03 AM10/31/06
to strongtal...@googlegroups.com

Josh wrote:
> Ok, I found some old posts. David has said more than once that
> there is no
> preemptive scheduling in Strongtalk at this time.
>
> Another point is that if there was green-thread preemptive
> scheduling, then
> unless that was specifically designed to make accessor functions
> safely fix
> that obscure object redefinition problem we've been talking about, there
> would be no guarantee that it would be any safer than OS threading.
>
> It IS obscure because smalltalk isn't ruby.
>
> And as I said before, I believe that perfectly safe and transparent
> deoptimization in the case of classes being redefined while in use rather
> severely limits what sorts of optimizations are completely safe.
> Since that
> sort of event would be extremely rare in a Smalltalk program, and since
> Strongtalk hasn't been completely debugged anyway, I wouldn't be surprised
> to find out that there are some unsafe optimizations going on.

Actually, I've just discovered that Strongtalk *does* currently support
dynamic modifications to object format, while methods using existing
instances and their instvars are in use. I didn't realize that
functionality was there, but it seems to work as if become-like
functionality is being used under the covers, even though become itself
isn't yet supported at the API level.

If so, then most of the work needed to implement become is already there. I
wrote a method that creates an object, initialize some instance variables,
then waits on a semaphore, then refers to instance variables after that, in
the same method. I started the method, and then while it is waiting on the
semaphore in one thread, I modified the object format by adding and removing
instance variables *before* the referenced instance variables. The
references continue to work correctly; apparently the object is being
modified in place with existing instance variable moved and references being
dynamically updated to read from the relocated location, in the running
method. If I add a new instance variable, it is added to the existing
object with a value of nil, and can be either before or after existing
instance variables.

This bears looking into, since I don't understand how this can be done
without become under the covers. It appears that all the functionality you
would need for adding instvars dynamically in Ruby is already there.

-Dave


Jecel Assumpcao Jr

unread,
Oct 31, 2006, 1:35:33 PM10/31/06
to strongtal...@googlegroups.com
Josh Scholar wrote on Tue, 31 Oct 2006 01:01:09 -0800

> As David just wrote, Smalltalk sematics traditionally did not include
> preemptive scheduling.
>
> I wish you were right that Strongtalk already has preemptive scheduling, but
> I don't think you are. As he also wrote, things are likely to start
> breaking when we DO turn preemptive scheduing on.
>
> And consider that cooperative preemptive scheduling means sticking polling
> code in every basic block.

Smalltalk has traditionally had a combination of preemptive and
cooperative scheduling, which is making this discussion a bit confusing.
Going all the way back to the "blue book" VM threads (called Process -
terms have changed since then) are divided into a set of fixed
priorities. A higher priority thread will preempt a lower priority one
as soon as it becomes active but within a single priority a thread will
run until it blocks or explicitly sends the #yield message.

The libraries were not written taking this into account but it is
extremely rare for this to be an actual problem. The reason for this is
that very little use is made of the priorities in user code - they are
mostly for things like i/o or the idle thread. Normally code that runs
at different priority levels is highly specialized and only makes
limited calls to the standard libraries.

Changing this to a fully preemptive system is as simple as running a
loop at a very high priority that wakes up a few times a second and
shuffles the lower priority queues. But, as David pointed out, some
subtle bugs will turn up if you do this. For Smalltalk V/286 I did
something a little more sophisticated that at the Smalltalk level
implemented various real time scheduling policies (rate monotonic, first
deadline first, etc) and by taking just a little care when writing the
applications I was able to avoid such bugs. Obviously it would be best
to fix the standard libraries.

-- Jecel

Colin Putney

unread,
Oct 31, 2006, 12:46:03 PM10/31/06
to strongtal...@googlegroups.com

On Oct 31, 2006, at 3:48 AM, David Griswold wrote:

> Actually, I've just discovered that Strongtalk *does* currently
> support
> dynamic modifications to object format, while methods using existing
> instances and their instvars are in use. I didn't realize that
> functionality was there, but it seems to work as if become-like
> functionality is being used under the covers, even though become
> itself
> isn't yet supported at the API level.

Interesting. Dan Ingalls mentioned to me at OOPSLA that one of the
features of Strongtalk was that variables are looked up dynamically,
unlike traditional Smalltalks. I hadn't heard this anywhere else, and
haven't had a chance to investigate further. Perhaps your experiment
shows this in action.

Good to know.

Colin

Cafe Alpha

unread,
Oct 31, 2006, 1:35:15 PM10/31/06
to strongtal...@googlegroups.com
> Smalltalk has traditionally had a combination of preemptive and
> cooperative scheduling, which is making this discussion a bit confusing.
...

> Changing this to a fully preemptive system is as simple as running a
> loop at a very high priority that wakes up a few times a second and
> shuffles the lower priority queues. But, as David pointed out, some
> subtle bugs will turn up if you do this. For Smalltalk V/286 I did
> something a little more sophisticated that at the Smalltalk level
> implemented various real time scheduling policies (rate monotonic, first
> deadline first, etc) and by taking just a little care when writing the
> applications I was able to avoid such bugs. Obviously it would be best
> to fix the standard libraries.
>
> -- Jecel

I was aware of all of that but hadn't thought about how it applies to
strongtalk.

If thread priorities DO work completely properly in Strongtalk then there
already has to be polling code compiled into every basic block (because
there's no intepreter for compiled methods).

Josh Scholar

David Griswold

unread,
Oct 31, 2006, 1:42:48 PM10/31/06
to strongtal...@googlegroups.com
> -----Original Message-----
> From: strongtal...@googlegroups.com
> [mailto:strongtal...@googlegroups.com]On Behalf Of Jecel Assumpcao
> Jr
> Sent: Tuesday, October 31, 2006 7:36 PM
> To: strongtal...@googlegroups.com
> Subject: preemptive scheduling (was: Future of StrongTalk)
>
>
>
> Josh Scholar wrote on Tue, 31 Oct 2006 01:01:09 -0800
> > As David just wrote, Smalltalk sematics traditionally did not include
> > preemptive scheduling.
> >
> > I wish you were right that Strongtalk already has preemptive
> scheduling, but
> > I don't think you are. As he also wrote, things are likely to start
> > breaking when we DO turn preemptive scheduing on.
> >
> > And consider that cooperative preemptive scheduling means
> sticking polling
> > code in every basic block.
>
> Smalltalk has traditionally had a combination of preemptive and
> cooperative scheduling, which is making this discussion a bit confusing.
> Going all the way back to the "blue book" VM threads (called Process -
> terms have changed since then) are divided into a set of fixed
> priorities. A higher priority thread will preempt a lower priority one
> as soon as it becomes active but within a single priority a thread will
> run until it blocks or explicitly sends the #yield message.

I don't think what you are talking about is preemptive scheduling in the
sense we are talking about. In the scheme you are talking about, thread
switching *only* occurs because of yield, blocking I/O, or at the point the
running thread creates and schedules a new process at a higher priority.

We are talking about preemption that is *guaranteed* to happen in bounded
time, so that even if the running process does "[ true ] whileTrue", another
process at the same priority will get time to run.

> The libraries were not written taking this into account but it is
> extremely rare for this to be an actual problem. The reason for this is
> that very little use is made of the priorities in user code - they are
> mostly for things like i/o or the idle thread. Normally code that runs
> at different priority levels is highly specialized and only makes
> limited calls to the standard libraries.
>
> Changing this to a fully preemptive system is as simple as running a
> loop at a very high priority that wakes up a few times a second and
> shuffles the lower priority queues. But, as David pointed out, some
> subtle bugs will turn up if you do this. For Smalltalk V/286 I did
> something a little more sophisticated that at the Smalltalk level
> implemented various real time scheduling policies (rate monotonic, first
> deadline first, etc) and by taking just a little care when writing the
> applications I was able to avoid such bugs. Obviously it would be best
> to fix the standard libraries.

Just running a high priority loop that shuffles the queues won't give full
preemption the way we are talking about it. It would require either a fully
multithreaded VM, *or* some kind of preemptive timer in the VM that marks
the running process in some way (by patching or setting a flag that is
polled for)that causes it to halt at the next safepoint (call or backward
branch) so that control can be transferred to the scheduler. Otherwise
"[true] whileTrue" will lock out all other processes.
-Dave


David Griswold

unread,
Oct 31, 2006, 1:42:49 PM10/31/06
to strongtal...@googlegroups.com
> -----Original Message-----
> From: strongtal...@googlegroups.com
> [mailto:strongtal...@googlegroups.com]On Behalf Of Colin Putney
> Sent: Tuesday, October 31, 2006 6:46 PM
> To: strongtal...@googlegroups.com
> Subject: Re: Future of StrongTalk
>

I think Dan is talking about something else, which is that because all code
in Strongtalk is actually held in mixins inside the VM, instance variable
offsets can't be represented directly in the method as it is
bytecode-compiled, because the instance variable offset is different for
each class that the mixin is applied to. So the offsets are lazily computed
at the point at which the mixin is applied.

But once a mixin is applied, there is no extra dynamic work involved in
instance variable access- it's just a single load instruction, in compiled
code.

-Dave


Jecel Assumpcao Jr

unread,
Oct 31, 2006, 2:42:41 PM10/31/06
to strongtal...@googlegroups.com
Josh,

> I was aware of all of that but hadn't thought about how it applies to
> strongtalk.
>
> If thread priorities DO work completely properly in Strongtalk then there
> already has to be polling code compiled into every basic block (because
> there's no intepreter for compiled methods).

I haven't looked at the Strongtalk sources but a quick browse through
the doxygen pages gave me the impression that there is a lot in common
between these and the Self 4.0 VM. So perhaps this detail is similar as
well: in Self the stack is checked for overflow on messege sends and on
backward branches for loops that have had all their messages inlined.
When an interrupt arrives the stack limit register is cleared and
execution jumps right back to where it was. On the next check a stack
overflow is signalled and the system looks to see if that was the real
cause or if there is a pending interrupt. In the latter case the limit
is restored to the real value and the interrupt is handled.

The reason for limiting these interrupts to these "safe points" is
because a lot of extra state has to be stored to make deoptimization
possible. If interrupts can't happen between these safe points, then
deoptimization can't either.

-- Jecel

Cafe Alpha

unread,
Oct 31, 2006, 1:54:55 PM10/31/06
to strongtal...@googlegroups.com
My vote on the GC would be to give each thread it's own allocation block.  Of course on multiprocessor system that means that each eden pointer has to be in thread local storage - in my own version I was thinking of keeping it in a register! 
 
One unusual algorithm that may have some use, (because it makes it easy to get all threads to safe points) is to give each processor in the system a maximum of 1 OS thread, and to use green threading within each of these OS thread to simulate an unlimited number of threads.  That way you can have a form of green threading and still be able to take advantage of multiple processors.
 
 
----- Original Message -----
Sent: Tuesday, October 31, 2006 3:48 AM
Subject: RE: Future of StrongTalk

 
-----Original Message-----

Josh Scholar

unread,
Oct 31, 2006, 2:01:34 PM10/31/06
to Strongtalk-general

Cafe Alpha wrote:
> My vote on the GC would be to give each thread it's own allocation block. Of course on multiprocessor system that means that each eden pointer has to be in thread local storage - in my own version I was thinking of keeping it in a register!
>
> One unusual algorithm that may have some use, (because it makes it easy to get all threads to safe points) is to give each processor in the system a maximum of 1 OS thread, and to use green threading within each of these OS thread to simulate an unlimited number of threads. That way you can have a form of green threading and still be able to take advantage of multiple processors.
>
>

I'm not recommending that algorithm, exactly.

In my own design all times were safe for a GC, so there was no need for
safe points. Memory was pre-cleared, for instance. During a GC, I was
going suspend all non-GC threads that were in compiled code (calls
outside of the code could continue till return).

It's just that all of this talk about schema changes has got me worried
that the deoptimizer needs safe points more than a GC does. So if it's
really important to be able to lock down the whole system at safe
points then that suggesting about running multiple green thread systems
makes some sense.

Colin Putney

unread,
Oct 31, 2006, 2:46:46 PM10/31/06
to strongtal...@googlegroups.com

On Oct 31, 2006, at 10:42 AM, David Griswold wrote:

> I think Dan is talking about something else, which is that because
> all code
> in Strongtalk is actually held in mixins inside the VM, instance
> variable
> offsets can't be represented directly in the method as it is
> bytecode-compiled, because the instance variable offset is
> different for
> each class that the mixin is applied to. So the offsets are lazily
> computed
> at the point at which the mixin is applied.
>
> But once a mixin is applied, there is no extra dynamic work
> involved in
> instance variable access- it's just a single load instruction, in
> compiled
> code.

Aha. But that means that the semantics of the bytecode are "read from
ivar named X" and "write to ivar named Y", are they not? If the VM
can optimize that down to a single load instruction, great. It just
means it has to be careful about recomputing offsets when classes are
reshaped, which it apparently does just fine.

That's very cool. The more I learn about Strongtalk, the more I like
it. :-)

(And this would eliminate the need for special accessors in Ruby
code. The VM should just do the right thing.)

David Griswold

unread,
Oct 31, 2006, 3:37:35 PM10/31/06
to strongtal...@googlegroups.com

Your GC must have been conservative (non-relocating). Non-conservative GC
in general will require all threads to be at a safe point, otherwise, how do
you know where the GC roots are on the stack in compiled code? If a
register *ever* holds anything other than a tagged value, or if *any* heap
or stack modification requires more than one store to reach a GC consistent
state, you need safepoints. For example, if you are doing a smallinteger
calculation, and you have to retag the result aftwerwards, there is a window
of an instruction or two when the value in the register is not correctly
tagged. How will you tell if that is an valid GC root or not? I don't
understand how you could do compacting GC without safepoints.

This gets even more important when you start doing fancy optimization; for
example, when iterating through an array a good compiler can generate faster
code by keeping an interior pointer to the middle of an array that is being
iterated; that pointer is not a valid pointer to an object, yet still needs
to be translated into a root for the GC, and updated after relocation.
Without safepoints you would have to keep data structures that keep track of
the format and location of every value in every register and temporary
storage location, for every possible instruction pointer value in the entire
system.

The mixed green thread scheme you mentioned is interesting, although the VM
would still need to be internally multithreaded first, which is the big job.
While it would make rolling all threads forward to a safepoint faster, it's
not clear exactly how often that would be needed, or how slow that would be
relative to the scavenge itself, since all the thread stacks have to be
scanned anyway during a scavenge.

But if scavenging is a bottleneck, and there are very many threads, then it
would have the advantage that you could have just one Eden per CPU, which
means you could make the Edens much larger, and thus make scavenges
correspondingly much rarer, which could be a *big* help. But by making the
scavenges rarer, you correspondingly reduce the benefit of making it faster
to roll all threads forward to a safepoint. These things can be hard to
reason about.

Deoptimization is designed to only occur in uncommon cases, and to eliminate
itself at equilibrium, so the speed of deoptimization isn't that important,
including the cost of rolling any threads forwards. And off the top of my
head, I don't think deoptimization for uncommon traps (which are more
"common" than reflective changes ;-) would necessarily always have to
require rolling all threads forward to a safepoint.

-Dave


Cafe Alpha

unread,
Oct 31, 2006, 5:21:44 PM10/31/06
to strongtal...@googlegroups.com

----- Original Message -----
From: "David Griswold" <David.G...@acm.org>
To: <strongtal...@googlegroups.com>
Sent: Tuesday, October 31, 2006 12:37 PM
Subject: RE: Future of StrongTalk

>


> Your GC must have been conservative (non-relocating). Non-conservative GC
> in general will require all threads to be at a safe point, otherwise, how
do
> you know where the GC roots are on the stack in compiled code? If a
> register *ever* holds anything other than a tagged value, or if *any* heap
> or stack modification requires more than one store to reach a GC
consistent
> state, you need safepoints. For example, if you are doing a smallinteger
> calculation, and you have to retag the result aftwerwards, there is a
window
> of an instruction or two when the value in the register is not correctly
> tagged. How will you tell if that is an valid GC root or not? I don't
> understand how you could do compacting GC without safepoints.

Code annotation!! You know what's in each register by lookup on the program
counter.

>
> This gets even more important when you start doing fancy optimization; for
> example, when iterating through an array a good compiler can generate
faster
> code by keeping an interior pointer to the middle of an array that is
being
> iterated; that pointer is not a valid pointer to an object, yet still
needs
> to be translated into a root for the GC, and updated after relocation.
> Without safepoints you would have to keep data structures that keep track
of
> the format and location of every value in every register and temporary
> storage location, for every possible instruction pointer value in the
entire
> system.
>

"Without safepoints you would have to keep data structures that keep track
of the format and location of every value in every register and temporary
storage location, for every possible instruction pointer value in the entire
system."

Well, yes exactly. I thought it was worth the trouble in order to get full
threading without the overhead of polling.

I was planning collect internal pointers, but not store them in objects.
It's more work, but if the only internal pointers are in registers or a few
stack temps it's not so bad. If you really think internal pointers are too
much trouble, then you just generate code that keeps the non internal
version on the stack.

> The mixed green thread scheme you mentioned is interesting, although the
VM
> would still need to be internally multithreaded first, which is the big
job.
> While it would make rolling all threads forward to a safepoint faster,
it's
> not clear exactly how often that would be needed, or how slow that would
be
> relative to the scavenge itself, since all the thread stacks have to be
> scanned anyway during a scavenge.

I was thinking about the need for schema changes and the like, not for GC
which doesn't need safe points, really.

>
> But if scavenging is a bottleneck, and there are very many threads, then
it
> would have the advantage that you could have just one Eden per CPU, which
> means you could make the Edens much larger, and thus make scavenges
> correspondingly much rarer, which could be a *big* help. But by making
the
> scavenges rarer, you correspondingly reduce the benefit of making it
faster
> to roll all threads forward to a safepoint. These things can be hard to
> reason about.
>
> Deoptimization is designed to only occur in uncommon cases, and to
eliminate
> itself at equilibrium, so the speed of deoptimization isn't that
important,
> including the cost of rolling any threads forwards. And off the top of my
> head, I don't think deoptimization for uncommon traps (which are more
> "common" than reflective changes ;-) would necessarily always have to
> require rolling all threads forward to a safepoint.
>
> -Dave
>

Rolling threads forward otherwise involves suspending all threads and
setting breakpoints at the next basic block in each. You need code
annotation to even accomplish that, don't you!!! And it DOES sound awfully
slow. If someone has hundreds of threads running, it could take seconds to
stop the world every time you need a deoptimization - where as with the
green method it could be a small fraction of a second.

And if you're going so far as to annotate code and set breakpoints to roll
threads forward, then you could be using exceptions and code annotation for
other things, such as escaping from small integer loops to bigint on integer
overflow, thus saving the overhead of checking overflow bits, and you could
use exceptions instead of write barriers, thus saving overhead on every
instance variable store.

I missed breakfast this morning. I better go get some food now.

Josh Scholar

Cafe Alpha

unread,
Oct 31, 2006, 5:32:00 PM10/31/06
to strongtal...@googlegroups.com
By the way, I remember reading that the .NET CLR uses safe points. If it
does support mulitprocessors (does anyone know?) then maybe it does the
multiple green threading I suggested.

Josh

David Griswold

unread,
Nov 1, 2006, 7:05:10 AM11/1/06
to strongtal...@googlegroups.com

> -----Original Message-----
> From: strongtal...@googlegroups.com
> [mailto:strongtal...@googlegroups.com]On Behalf Of Cafe Alpha
> Sent: Tuesday, October 31, 2006 11:22 PM
> To: strongtal...@googlegroups.com
> Subject: Re: Future of StrongTalk
>
>
>
>

That's exactly what safepoints are. What you are describing is just putting
a safepoint at every instruction in the system. But the problem is the
amount of data required to store all those annotations would be enormous,
much larger than the code itself. Even when only putting safepoints at
calls, type-tests and backward branches, it generates a lot of data; in
Strongtalk a lot of work was put into compressing that information so that
it isn't out of control, even with the far fewer safepoints we use. So it
is the space issue that keeps people from putting safepoints on every
instruction, that's all.

> [...]


>
> Rolling threads forward otherwise involves suspending all threads and
> setting breakpoints at the next basic block in each. You need code
> annotation to even accomplish that, don't you!!! And it DOES
> sound awfully
> slow. If someone has hundreds of threads running, it could take
> seconds to
> stop the world every time you need a deoptimization - where as with the
> green method it could be a small fraction of a second.

The JVM rolls all threads forward to a safepoint on each scavenge (or at
least it did when I was there), using a small eden for every thread, and it
is very fast.

-Dave


David Griswold

unread,
Nov 1, 2006, 7:05:11 AM11/1/06
to strongtal...@googlegroups.com

> -----Original Message-----
> From: strongtal...@googlegroups.com
> [mailto:strongtal...@googlegroups.com]On Behalf Of Cafe Alpha
> Sent: Tuesday, October 31, 2006 11:32 PM
> To: strongtal...@googlegroups.com
> Subject: Re: Future of StrongTalk
>
>
>

> By the way, I remember reading that the .NET CLR uses safe points. If it
> does support mulitprocessors (does anyone know?) then maybe it does the
> multiple green threading I suggested.

I doubt it. It probably does just the same thing we did in the JVM.
Rolling threads forward works just fine.

-Dave


Cafe Alpha

unread,
Nov 2, 2006, 2:38:05 AM11/2/06
to strongtal...@googlegroups.com

I think we've both been exaggurating and not thinking things through in this
arguement.

I posted after skipping breakfast, and I wasn't thinking clearly.

I was wrong to assume that rolling forward has to be expensive - it CAN be
expensive because to do things safely you're requiring at least one task
switch per thread that you want to roll forward, and you may have to wait
for that thread to be scheduled - but since you've stopped the whole world,
it will only be a long wait if the system is bogged down by a DIFFERENT
program. The catastrophically bad case is where you have hundreds of theads
in Strongtalk and a server bogged down by hundreds of threads or processes
outside of Strongtalk, because, assuming that windows simply schedules any
resumed thread, round robin, the paused to roll the threads forward will be
proportional to n*m where n is the number thread in Strongtalk and m is the
wait until a thread is scheduled to run, ie the number of theads OUTSIDE of
Strongtalk.

But, I also know that it is possible fake the whole thing under Windows with
NO task switches because you're allowed to read and alter thread
continuations under Windows. As long as you know that your compiled code
doesn't rely on thread local storage or other thread-local mappings, you
should be able restore the state in the current thread well enough to set a
non-exception breakpoint and jump into the code, run to the next safe point
then update the appropriate continuation record. That would be FAST no
matter what; it could be asking for incompatibility problems with future
versions of the OS and with future versions of the CPUs, however.

Partially I just wanted to pitch the idea of using exceptions. I'd started
experimenting with write-locking pages for a generational garbage collector
and with catching exceptions on int overflows (for writing fast short int
code, in the average case).

That's because I love the idea that you could drop ANY C++ code into this
engine and it would run as fast or faster than the original. But in C++
instance variables are just as fast as local variables, so programmers write
into them in loops, there being no reason not to. Similarly, if an
optimizer can unbox ints and floats and catch the overflows in exceptions
then Strongtalk can be 100% as fast as C for digital signal processing,
imagine that!

--

Anyway, your own assertion that "annotations would be enormous, much larger
than the code itself" is also wrong.

Most of the data is just 1 bit per register:
a register is a relocatable (possibly internal) pointer or it isn't!
And since most bytes aren't the start of an instruction and many/most
instructions won't change this bit array, most of this data could be
compressed to 1 bit specifying that the array didn't change. You could
probably get 4 to 1 compression this way.

If you used a lookup table of instruction widths, then you could probably
get 4 to 1 compression from that alone, maybe 10 to 1 combined with the
run-length bit.

--

Also I think we were talking past each other about what safe points mean
about threading; you said that safe points and code annotation are the same
thing. The article talking about the .net CLR safe points was talking about
_not _switching _threads _until _the _safe _point! That's green threading
by putting polling code at the end of every basic block.

And with threading by polling, you never need to roll thread forward.

THAT'S what I thought you were talking about when you said "safe points".

David Griswold

unread,
Nov 2, 2006, 7:31:29 AM11/2/06
to strongtal...@googlegroups.com

There is more to it than just that. If you pass values on the stack you
also need to record information on changes to stack value liveness and
adjustments to the stack pointer. If you use interior pointers you also
need to make sure you know how to derive the original pointer from the
interior pointer; that is not necessarily obvious, depending on the heap
format. Things like the write barrier can cause problems, because there is
a window between the store to update the object and the store to update the
card mark (not relevant if you use read-locked pages instead of card marking
as you suggest, although I think I've heard that there are other problems
with doing that). Allocation would also need to be done in a way that is GC
safe even in the middle of a partially completed allocation; in general you
could never update the heap in any way that required more than one store to
leave memory in a GC-consistent format, which is a pretty big restriction.

The only effort I know of to do safepoints at every instruction is described
in
http://portal.acm.org/citation.cfm?id=301652&coll=portal&dl=ACM&CFID=5034807
&CFTOKEN=50586361. They seemed to get pretty good results, although their
scheme seems dependent on the particular calling conventions, stack usage,
and the fact that they don't generate interior pointers, which they admit
could cause problems. They don't talk about the allocation issue, though.
I don't know how well their scheme works in practice; you can never tell
just from the paper with these kinds of things.

-Dave


Cafe Alpha

unread,
Nov 2, 2006, 1:25:20 PM11/2/06
to strongtal...@googlegroups.com

----- Original Message -----
From: "David Griswold" <David.G...@acm.org>
To: <strongtal...@googlegroups.com>
Sent: Thursday, November 02, 2006 4:31 AM
Subject: RE: Future of StrongTalk
> >

The abstract says "We show how to use simple compression techniques to
reduce the size of the GC map to about 20% of the generated code size, a
result that is competitive with the best previously published results."

That sounds reasonable and realistic.

"..seems dependent on the particular calling conventions, stack usage.."

And yes of course, to get good garbage collection working in a concurrent
system (without safe points) you need fine control over the generated code.

I mentioned pre-clearing memory, that was in order to fix another one of
those sorts of concurrent collection problems. If you allocate from dirty
memory then you have garbage pointers that the GC will try to follow. But
you make it sound as if safe at every instruction = safe points at every
instruction. That isn't the terminology I've seen used before.

josh

David Griswold

unread,
Nov 2, 2006, 3:40:13 PM11/2/06
to strongtal...@googlegroups.com

> -----Original Message-----
> From: strongtal...@googlegroups.com
> [mailto:strongtal...@googlegroups.com]On Behalf Of Cafe Alpha
> Sent: Thursday, November 02, 2006 7:25 PM
> To: strongtal...@googlegroups.com
> Subject: Re: Future of StrongTalk
>
>
>
>

What do you think is the difference? A "Safe" point *is* just a point where
it is safe!
If it is safe at every instruction, that is a safepoint at every
instruction.

Anyway, right now we shouldn't get too wrapped up in these distant
theoretical discussions, until we get more people actually working on
Strongtalk. I think your idea of doing a Ruby adaptation is a good one;
let's focus on that.
-Dave


Reply all
Reply to author
Forward
0 new messages