Local v.s. global GC in Huemul/Hydra peninsulae-partitioned systems

1 view
Skip to first unread message

Klaus D. Witzel

unread,
Nov 15, 2008, 5:08:17 AM11/15/08
to moebius-proje...@googlegroups.com, Guillermo Adrián Molina
Hi folks,

I'm drawing from the very fruitful discussion with Guille and Igor, in
which we addressed *almost*unsolvable* problems with cross-heap pointerage
+GC+ in peninsulae-partitioned systems (thanks again for your patience and
contributions!).

I think that good use cases come from the knowledge-processing community
(main algorithm and examples in [1], [2]) and naturally high demand for
n-core parallel work (multiple of their queries).

In such a system the main .image contains intrinsic (base) knowledge which
the independent peninsulae/heaps/threads can access but typically *not*
manipulate; they have "temporary" character tasks+goals.

So all .images (mainland+peninsulae) can GC independent of each other,
with the following sole exception:

- before mainland can do full GC, it has to suspend all parallel threads

This is quite acceptable, the mainland .image does GUI and controls the
parallel work--nothing much else.

So, mainland can do incremental GC whenever it pleases. Relative to the
SqueakVM, the following *small* changes are needed in ObjectMemory:

- comparision for < youngSpace must also do >= startOfMemory
- peninsula GC must ignore cross-heap pointers

Other/more cross-heap interaction is not needed for the mentioned use
case; Igor's channels are sufficient for controlling+communication in a
(massive) parallel knowledge-processing system.

Cheers,
Klaus

P.S. recall that citeseer' site usually does not like being accessed over
weekends ...

[1] http://www.google.com/search?q=ConceptNet+Spreading+Activation

[2] http://www.google.com/search?q=Knowledge+Directed+Spreading+Activation

Igor Stasenko

unread,
Nov 15, 2008, 6:07:38 AM11/15/08
to moebius-proje...@googlegroups.com, Guillermo Adrián Molina
2008/11/15 Klaus D. Witzel <klaus....@cobss.com>:
>
> Hi folks,
>
> I'm drawing from the very fruitful discussion with Guille and Igor, in
> which we addressed *almost*unsolvable* problems with cross-heap pointerage
> +GC+ in peninsulae-partitioned systems (thanks again for your patience and
> contributions!).
>
> I think that good use cases come from the knowledge-processing community
> (main algorithm and examples in [1], [2]) and naturally high demand for
> n-core parallel work (multiple of their queries).
>
> In such a system the main .image contains intrinsic (base) knowledge which
> the independent peninsulae/heaps/threads can access but typically *not*
> manipulate; they have "temporary" character tasks+goals.
>
> So all .images (mainland+peninsulae) can GC independent of each other,
> with the following sole exception:
>
> - before mainland can do full GC, it has to suspend all parallel threads
>

this looks like inevitable in all models which i tried to imagine. :)
There is a need in a kind of barrier, which divides the old heap state
and new one, where you already identified/cleaned garbage. And since
heap is single one (in global meaning - a multiple heaps is just a
splitted single heap), you must do it synchronously.
The question, therefore is how to minimize the suspension time, and
where to introduce the checkpoint for interrupts in thread execution,
which should catch the interrupts at some safe point.

> This is quite acceptable, the mainland .image does GUI and controls the
> parallel work--nothing much else.
>
> So, mainland can do incremental GC whenever it pleases. Relative to the
> SqueakVM, the following *small* changes are needed in ObjectMemory:
>
> - comparision for < youngSpace must also do >= startOfMemory
> - peninsula GC must ignore cross-heap pointers
>
> Other/more cross-heap interaction is not needed for the mentioned use
> case; Igor's channels are sufficient for controlling+communication in a
> (massive) parallel knowledge-processing system.
>
> Cheers,
> Klaus
>
> P.S. recall that citeseer' site usually does not like being accessed over
> weekends ...
>
> [1] http://www.google.com/search?q=ConceptNet+Spreading+Activation
>
> [2] http://www.google.com/search?q=Knowledge+Directed+Spreading+Activation
>
>

--
Best regards,
Igor Stasenko AKA sig.

Klaus D. Witzel

unread,
Nov 15, 2008, 10:06:29 AM11/15/08
to Moebius project discussion, Guillermo Adrián Molina
On Nov 15, 12:07 pm, "Igor Stasenko" wrote:
> 2008/11/15 Klaus D. Witzel wrote:
...
> > So all .images (mainland+peninsulae) can GC independent of each other,
> > with the following sole exception:
>
> > - before mainland can do full GC, it has to suspend all parallel threads
>
> this looks like inevitable in all models which i tried to imagine.  :)

right :) this also a reason for putting the former email discussion up
here.

> There is a need in a kind of barrier, which divides the old heap state
> and new one, where you already identified/cleaned garbage. And since
> heap is single one (in global meaning - a multiple heaps is just a
> splitted single heap), you must do it synchronously.

Yes, this synchronity seems inevitable. Was there anything from a new
n-core VM @ OOPSLA?

> The question, therefore is how to minimize the suspension time, and
> where to introduce the checkpoint for interrupts in thread execution,
> which should catch the interrupts at some safe point.

I'd say: let mainland hw-interrupt the independent runners then wait
for *all* ACK, then GC, then resume the world.

> > Cheers,
> > Klaus
...

Igor Stasenko

unread,
Nov 15, 2008, 2:31:41 PM11/15/08
to moebius-proje...@googlegroups.com, Guillermo Adrián Molina
2008/11/15 Klaus D. Witzel <klaus....@cobss.com>:
>
> On Nov 15, 12:07 pm, "Igor Stasenko" wrote:
>> 2008/11/15 Klaus D. Witzel wrote:
> ...
>> > So all .images (mainland+peninsulae) can GC independent of each other,
>> > with the following sole exception:
>>
>> > - before mainland can do full GC, it has to suspend all parallel threads
>>
>> this looks like inevitable in all models which i tried to imagine. :)
>
> right :) this also a reason for putting the former email discussion up
> here.
>
>> There is a need in a kind of barrier, which divides the old heap state
>> and new one, where you already identified/cleaned garbage. And since
>> heap is single one (in global meaning - a multiple heaps is just a
>> splitted single heap), you must do it synchronously.
>
> Yes, this synchronity seems inevitable. Was there anything from a new
> n-core VM @ OOPSLA?
>

I contacted with Dave who presented his work for running squeak on a
tilera chips (also watched the video).
From what i understood , he using a kind of shared heap, where each
oop can be associated with different core.
So, its different than the way how Hydra does it.
Btw, his implementation reqires a world freeze during GC scavenge phase as well.

>> The question, therefore is how to minimize the suspension time, and
>> where to introduce the checkpoint for interrupts in thread execution,
>> which should catch the interrupts at some safe point.
>
> I'd say: let mainland hw-interrupt the independent runners then wait
> for *all* ACK, then GC, then resume the world.
>

well, this introducing some complexity. You can't know where interrupt
will happen:
- in ST code (good case)
- in native code (no so good)
- in OS/foreign function (awfull case)

this why i think there should be a checkpoint inserted at safe
position of code (most probably it can be safely put into a method
lookup/dispatch routine) to check if there are any outstanding events.
Then if any thread stops - we will know where it stops and can have an
assumptions of its current stack/registers state.

What i think is one of the best features of green threading model
(despite it runs on single core) is a cheap parallelism - e.g. you can
have a dozens of Processes each doing own job. It is clear that if we
make Process == native thread, we will lost 'cheap' part, because
instantiating new native thread consumes a considerable amount of
address space for stack as well as hidden OS-dependent state. While
squeak's process is a few bytes long :)
This is why i against making Process == native thread.

Klaus D. Witzel

unread,
Nov 15, 2008, 3:25:09 PM11/15/08
to Moebius project discussion, gui...@losmolina.com.ar
On Nov 15, 8:31 pm, Igor Stasenko wrote:
> 2008/11/15 Klaus D. Witzel wrote:
> > ...
> >> > So all .images (mainland+peninsulae) can GC independent of each other,
> >> > with the following sole exception:
>
> >> > - before mainland can do full GC, it has to suspend all parallel threads
>
> >> this looks like inevitable in all models which i tried to imagine.  :)
>
> > right :) this also a reason for putting the former email discussion up
> > here.
>
> >> There is a need in a kind of barrier, which divides the old heap state
> >> and new one, where you already identified/cleaned garbage. And since
> >> heap is single one (in global meaning - a multiple heaps is just a
> >> splitted single heap), you must do it synchronously.
>
> > Yes, this synchronity seems inevitable. Was there anything from a new
> > n-core VM @ OOPSLA?
>
> I contacted with Dave who presented his work for running squeak on a
> tilera chips (also watched the video).
> From what i understood , he using a kind of shared heap, where each
> oop can be associated with different core.
> So, its different than the way how Hydra does it.

I wonder how Dave would address parallel alloc with *one* shared heap
(Guille's main argument pro multiple heaps b/o malloc performance
pressure).

> Btw, his implementation reqires a world freeze during GC scavenge phase as well.

Arrg.

> >> The question, therefore is how to minimize the suspension time, and
> >> where to introduce the checkpoint for interrupts in thread execution,
> >> which should catch the interrupts at some safe point.
>
> > I'd say: let mainland hw-interrupt the independent runners then wait
> > for *all* ACK, then GC, then resume the world.
>
> well, this introducing some complexity. You can't know where interrupt
> will happen:
> - in ST code (good case)
> - in native code (no so good)
> - in OS/foreign function (awfull case)

Nah, async i/o faces this situation as well. A good solution to async
i/o => a good solution for suspending the world during global GC.

> this why i think there should be a checkpoint inserted at safe
> position of code (most probably it can be safely put into a method
> lookup/dispatch routine) to check if there are any outstanding events.
> Then if any thread stops - we will know where it stops and can have an
> assumptions of its current stack/registers state.

Sure.

> What i think is one of the best features of green threading model
> (despite it runs on single core) is a cheap parallelism - e.g. you can
> have a dozens of Processes each doing own job. It is clear that if we
> make Process == native thread, we will lost 'cheap' part, because
> instantiating new native thread consumes a considerable amount of
> address space for stack as well as hidden OS-dependent state. While
> squeak's process is a few bytes long :)
> This is why i against making Process == native thread.

You're right. Then, how about: #primitiveCreateAndRunNativeThread
takes these arguments

- a bytearray oop for the new heap
- a context for what is to be done in parallel

and then proceeds with

- #clone special objects table
- #clone Scheduler things
- do what a regular #fork does to the block
- do this all in the bytearray heap

This should be feasible given Hydra's objectified interpreter
instance, which would be the 1st object allocated in the bytearray
heap.

Igor Stasenko

unread,
Nov 15, 2008, 5:33:40 PM11/15/08
to moebius-proje...@googlegroups.com, gui...@losmolina.com.ar
2008/11/15 Klaus D. Witzel <klaus....@cobss.com>:
>

Yeah, i have to return to it, to implement this primitive and try
creating a child heap and playing with it. There is nothing
complicated, and problem not in actual implementation, but how to
detect unwanted stuff, referenced by different objects as we traverse
graph from most important objects which required in our computation,
to less important.
I'm not very happy with the fact, that to create a separate object
memory we need to transfer not only data but behavior as well (since
in smalltalk everything is data). Some kind of sharing would be very
useful here. For instance VW having fixed memory - this could be a
place where most methods could live and its safe to use them in
concurrent manner.

Another approach is distributed model made by Craig in Spoon using a
marking algorithm, which marks all methods during code execution. In
this way he can determine a possible set of behavior to transfer to
another domain to make application working. Of course it can't detect
all the possible branches, especially for applications which expect
user interaction, but at least most frequently used.
The rest of objects behavior could be 'imprinted' on demand - as soon
as there any attempt to use it.

Reply all
Reply to author
Forward
0 new messages