OGC2 - major improvement of optional garbage collector for C++

Mirek Fidler

unread,

Nov 9, 2002, 3:01:24 AM11/9/02

to

Hi,

I achieved some major improvements in my attempt to create library
based
solution for non-conservative optional garbage collector.

Original version (presented 2 weeks ago) is at

http://www.volny.cz/cxl

New version (implementation for 32-bit flat pointer architectures)
with
small example can be found at

http://www.volny.cz/cxl/ogc2/

It has following improvements:

* it is much faster (3 - 5 times). Currently, it is almost as fast
as
Boehm's conservative GC. There is only one remaining performance problem
-
sweep phase - that is naturally slower for C++ optional collector than
for
non-optional one (as we need to call destructors immediately).

* sizeof(gc_ptr<T>) was reduced from 12 to 8 bytes (on 32 bit
platform)
while increasing gc_ptr performance (gc_ptr destruction times were
significantly reduced).

One drawback:

* So far I have not implemented managment of large blocks (>1024). I
already know how to do it, but I had no time and energy do write the
actual
code.

I think this GC attemtp is already as good that it could serve well
as
(anti)argument in discussion about adding language based GC to C++. From
my
current perspective, optional GC could be effectively implemented using
library. Also, non-optional one will never work well with destructors.

Mirek

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Larry Evans

unread,

Nov 9, 2002, 4:01:54 PM11/9/02

to

Mirek Fidler wrote:
> I achieved some major improvements in my attempt to create library
> based
> solution for non-conservative optional garbage collector.
>

[snip]

> New version (implementation for 32-bit flat pointer architectures)
> with
> small example can be found at
>
> http://www.volny.cz/cxl/ogc2/
>

Code comments would help me understand it better. I'm guessing that:

1)BlockInfo is a "type descriptor" somewhat like GC_descr in:
http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/gc_typedh.txt
in that the BlockInfo::ptrs allow enumeration of the "internal pointers"
[as defined on p. 23 of
ftp://ftp.cse.ucsc.edu/pub/tr/ucsc-crl-93-20.ps.Z ]

This conclusion is based on looking at while(p) loop at line 51 of
gc_collect.cpp.

2)GC::FlushArena, defined in gc_ptr.cpp, calculates the
BlockInfo::ptrs as well as the root pointers.

3)The complexity of GC::FlushArena is O(A) where A is the size of
the arcs in the pointers graph. This is based on the assumption
that GC::Arena GC::arena contains pointers to all the pointers
(i.e. arcs). The constant factor is the constant time to execute
FindBlockInfo (obviously a vast improvement over FindBlock in ogc)

Are 1-3) correct?

Also, based on a cursory look long ago at the way allocation is done
in BGC and also attardi et.al's cmm
[ftp://ftp.di.unipi.it/pub/project/posso/cmm], it looks like OGC2 uses
some of the same methods. What about merging OGC2 with cmm or BGC?
That would take advantage of the work already done with those
collectors as well as yours. I hacked on cmm's conservative collector
long ago to get it to enumerate the vertices (i.e. base pointers[
http://www.memorymanagement.org/glossary/b.html#base.pointer ]) so I
could sweep them once I'd used Detlef's method to get the internal
pointers and my own template method to define the roots. If you want
it, I could post it in boost files section. However, there was some
dissent about this "merging" by some boosters.

Furthermore, OGC2 memory requirements are V+A to keep track of the
vertices(in GC::PageInfo** GC::page?) as well as arcs(in GC::RawPtr**
GC::arena). The V is unavoidable, I think, since mark-sweep must
somehow access all the vertices when sweeping. That leaves the A
term. Now OGC2 nees the A to calculate the internal pointers(IA for
"internal arcs") as well as root pointers (RA for root arcs), where
IA*RA=0 and IA+RA=A (where * is set intersection and + set union).
Since Detlef's method calculates this once before main, this would
eliminate this memory need for calculating IA as well as the "if(q)
then block" in GC::FlushArena. This still requires A for filtering
(in the "if(q) else block") the RA from the A's. However, this also
means that once this filter occurs, there's no longer any need for the
IA to be stored in GC::arena since Detlef's method stores them as part
of a class variable. Thus, using Detlef's method would reduce the
memory and time complexity by just finding roots as follows:

First, define V = RV+IV, where RV is the stack and static
memory areas and IV is the heap.

RA = GC::arena;
foreach vertex in IV
{ foreach internal arc, ia, of vertex //using Detlef method
{ rm ia from RA
;}
;}

The only reason GC::arena would acquire an IA would be because
GC::RawPtr::RawPtr() was called for some vertex in IV.

Does this makes sense? Also, what about moving this thread to
gcl...@iecc.com since it's getting pretty specific for c++?
---

Mirek Fidler

unread,

Nov 10, 2002, 7:32:17 PM11/10/02

to

> Code comments would help me understand it better. I'm guessing that:

Sorry. I plan to write some description as I did with previous
version...

> 1)BlockInfo is a "type descriptor" somewhat like GC_descr in:
>
http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/gc_typedh.txt

BlockInfo contains information about finzalizer (destructor) and
internal pointers.

> 2)GC::FlushArena, defined in gc_ptr.cpp, calculates the
> BlockInfo::ptrs as well as the root pointers.

Right. Note one important thing - using arena avoids examinination of
most temporary pointers. In simple syntetic benchmark I did, average 50% of
pointers in arena died before FlushArena was called. In real world examples,
this could be even more.

> 3)The complexity of GC::FlushArena is O(A) where A is the size of
> the arcs in the pointers graph. This is based on the assumption
> that GC::Arena GC::arena contains pointers to all the pointers
> (i.e. arcs). The constant factor is the constant time to execute
> FindBlockInfo (obviously a vast improvement over FindBlock in ogc)

Right.

> Also, based on a cursory look long ago at the way allocation is done
> in BGC and also attardi et.al's cmm
> [ftp://ftp.di.unipi.it/pub/project/posso/cmm], it looks like OGC2 uses
> some of the same methods.

Yes. Separate heap allocator is inspired by BGC's one. I only simplified
it (and perhaps improved a little bit :).

> What about merging OGC2 with cmm or BGC?
> That would take advantage of the work already done with those
> collectors as well as yours.

Oh, why ? Merging would took significantly more development time...
Also, BGC is now about 100KB of sources. I really do not want to be as big
:) Note also that this is still rather experiment than real production code.
Anyway, I can turn it to production quality code if there is any interest in
C++ community...

> Furthermore, OGC2 memory requirements are V+A to keep track of the
> vertices(in GC::PageInfo** GC::page?)

Not exactly. OGC2 uses 16 byte descriptor per each 1024 bytes page
(that is PageInfo). Plus it adds 8 bytes per allocated block - 4 bytes are
used as pointer to finalizer (unavoidable), 4 bytes are used as link to
internal pointers. Since you have to align memory blocks at 8 bytes
boundaries anyway, block info costs practically nothing. Then there are
about 6 bytes per 1024 bytes allocated (PageInfo **page), sadly this applies
also to memory managed by normal heap (page contains NULL for such page).
Anyway, practically this is not an issue.

Plus, there is additional memory requirement (comapared to BGC) caused
by fact, that all gc_ptrs are twice as big as normal ones.

> as well as arcs(in GC::RawPtr**
> GC::arena). The V is unavoidable, I think, since mark-sweep must
> somehow access all the vertices when sweeping.

Right.

> That leaves the A
> term. Now OGC2 nees the A to calculate the internal pointers(IA for
> "internal arcs") as well as root pointers (RA for root arcs), where
> IA*RA=0 and IA+RA=A (where * is set intersection and + set union).

In benchmark I did, time consumption is

TIMING Sweep : 10.445 s / 145 =72.034 ms
TIMING Mark : 11.638 s / 145 =80.262 ms
TIMING FlushArena : 1.595 s / 92173 =17.209 us

So most time is spend in mark and sweep phase. FlushArena is not an
issue. This is most likely due to fact that when FlushArena is triggered,
most block pointers are still in CPU cache, so managing them is much faster
than during mark-phase.

> Since Detlef's method calculates this once before main, this would
> eliminate this memory need for calculating IA as well as the "if(q)
> then block" in GC::FlushArena. This still requires A for filtering
> (in the "if(q) else block") the RA from the A's. However, this also
> means that once this filter occurs, there's no longer any need for the
> IA to be stored in GC::arena since Detlef's method stores them as part
> of a class variable. Thus, using Detlef's method would reduce the
> memory and time complexity by just finding roots as follows:
>
> First, define V = RV+IV, where RV is the stack and static
> memory areas and IV is the heap.
>
> RA = GC::arena;
> foreach vertex in IV
> { foreach internal arc, ia, of vertex //using Detlef method
> { rm ia from RA
> ;}
> ;}
>
> The only reason GC::arena would acquire an IA would be because
> GC::RawPtr::RawPtr() was called for some vertex in IV.
>
> Does this makes sense? Also, what about moving this thread to

This approach has one big problem: In optional GC, static and stack are
not only areas that can contain pointers. You can have them also in heap
memory that is not managed by GC. So to determine where pointer is would be
as complex as FindBlockInfo is now anyway.

Let me also note one important thing: While there is a lot of smart
ideas how to create high-performance GC, none of them can be judged before
some benchmarks are run.

I have not seen any benchmarks of all those nice and smart GC ideas that
were in boost libraries. Before that, I do not believe that any smart
management would lead to real performance improvements.

For my development, I take BGC as reference simply because it is fastest
solution currently available. My mark phase is already as good as in BGC
(perhaps better). Only remaining problem is sweep phase - and there main
problem is that C++ needs to call destructors - that makes me still somewhat
slower.

> gcl...@iecc.com since it's getting pretty specific for c++?

Well, all this stuff was started with discussion about adding GC to
C++... So I think this is somewhat hot topic for C++...

Mirek

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated. To submit articles, try just posting with ]

Larry Evans

unread,

Nov 10, 2002, 7:53:45 PM11/10/02

to

Larry Evans wrote:
> Mirek Fidler wrote:
[snip]

> Furthermore, OGC2 memory requirements are V+A to keep track of the
> vertices(in GC::PageInfo** GC::page?) as well as arcs(in GC::RawPtr**
> GC::arena). The V is unavoidable, I think, since mark-sweep must
> somehow access all the vertices when sweeping. That leaves the A
> term. Now OGC2 nees the A to calculate the internal pointers(IA for
> "internal arcs") as well as root pointers (RA for root arcs), where
> IA*RA=0 and IA+RA=A (where * is set intersection and + set union).

[snip]

> of a class variable. Thus, using Detlef's method would reduce the
> memory and time complexity by just finding roots as follows:
>
> First, define V = RV+IV, where RV is the stack and static
> memory areas and IV is the heap.
>
> RA = GC::arena;
> foreach vertex in IV
> { foreach internal arc, ia, of vertex //using Detlef method
> { rm ia from RA
> ;}
> ;}
>
> The only reason GC::arena would acquire an IA would be because
> GC::RawPtr::RawPtr() was called for some vertex in IV.
>

Thinking a little more about it, there's no reason to test the ia of
vertices more than once; hence, keeping two sets of vertices, old_IV
and new_IV, where new_IV are the vertices created since the last
execution of the above loop, would be all that needed to be processed
during the next collection. Hence:

RA = GC::arena;
foreach vertex in new_IV

{ foreach internal arc, ia, of vertex //using Detlef method
{ rm ia from RA
;}
;}

old_IV+=new_IV;
new_IV=null_set;

So, the time complexity for updating RA would be the number of arcs
contained by the vertices in new_IV.

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated. To submit articles, try just posting with ]

Larry Evans

unread,

Nov 10, 2002, 11:16:38 PM11/10/02

to

Mirek Fidler wrote:
[snip]

>> 2)GC::FlushArena, defined in gc_ptr.cpp, calculates the
>> BlockInfo::ptrs as well as the root pointers.
>

>>What about merging OGC2 with cmm or BGC?
>>That would take advantage of the work already done with those
>>collectors as well as yours.
>
>
> Oh, why ? Merging would took significantly more development time...
> Also, BGC is now about 100KB of sources. I really do not want to be as big
> :) Note also that this is still rather experiment than real production code.
> Anyway, I can turn it to production quality code if there is any interest in
> C++ community...

Sounds good. I tried to understand BGC and gave up. I then tried cmm
and had more success. OGC2 looks a lot more manageable. Charles
Fiterman one time urged me to try Baker's treadmill algorithm (a
non-moving copying collector), and maybe by modifying your code I'd
have a chance to implement it. Thanks.

[snip]

>
>>as well as arcs(in GC::RawPtr**
>>GC::arena). The V is unavoidable, I think, since mark-sweep must
>>somehow access all the vertices when sweeping.
>
>
> Right.
>
>
>> That leaves the A

>>term. Now OGC2 needs the A to calculate the internal pointers(IA for

I disagree. The advantage of Detlefs method is that the internal pointer
offsets for a specific type, call it Subject, are calculated only once
(see:
mk_internal_pointers_descriptor_of<Subject>::mk_internal_pointers_descriptor_of(void)
in:
http://groups.yahoo.com/group/boost/files/shared_cyclic_ptr/shared_cyclic_ptr.zip
in file boost/shared_cyclic_ptr/mk_internal_pointers_descriptor_of.hpp
)
It doesn't matter whether Subject is on the stack or in the heap, the
offsets are the same.

I'm proposing that the ONLY change to OGC2 code to take advantage of
Detlefs method is to replace the GC::FlushArena with the equivalent of
the above pseudo code and the BlockInfo::ptrs with a pointer to:
mk_internal_pointers_descriptor_of<Subject>::c_my_type.m_ip_descriptor
and make the appropriate changes to the loop after the:
"Marking root pointed block "
block in void GC::Collect().

> Let me also note one important thing: While there is a lot of smart
> ideas how to create high-performance GC, none of them can be judged before
> some benchmarks are run.

True, I'll but together the code and run a benchmark.
---

Mirek Fidler

unread,

Nov 11, 2002, 1:09:58 PM11/11/02

to

> > The only reason GC::arena would acquire an IA would be because
> > GC::RawPtr::RawPtr() was called for some vertex in IV.
> >
> Thinking a little more about it, there's no reason to test the ia of
> vertices more than once; hence, keeping two sets of vertices, old_IV
> and new_IV, where new_IV are the vertices created since the last
> execution of the above loop, would be all that needed to be processed
> during the next collection. Hence:
>
> RA = GC::arena;
> foreach vertex in new_IV
> { foreach internal arc, ia, of vertex //using Detlef method
> { rm ia from RA
> ;}
> ;}
> old_IV+=new_IV;
> new_IV=null_set;
>
> So, the time complexity for updating RA would be the number of arcs
> contained by the vertices in new_IV.

Of course. But this is already implemented in OGC2. I have to write full
description of current code.... In fact, new_IV is represented by arena.
old_IV is represented by 'ptrs' member of block info. And there is 'roots'
that represents root pointers.

I have some minor improvements on my mind (mostly to improve sweep
phase). Expect somewhat faster OGC2.1 in a week or so :)

Mirek
---

Mirek Fidler

unread,

Nov 11, 2002, 2:12:48 PM11/11/02

to

Yes. But problem is that you still have to perform some form of
examination to filter out root pointers. Complexity of such examination will
be most likely same as complexity of FindBlockInfo.

Mirek

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated. To submit articles, try just posting with ]

Larry Evans

unread,

Nov 13, 2002, 6:12:59 PM11/13/02

to

Mirek Fidler wrote:
>> > This approach has one big problem: In optional GC, static and stack

[snip]

>
> Yes. But problem is that you still have to perform some form of
> examination to filter out root pointers. Complexity of such examination will
> be most likely same as complexity of FindBlockInfo.
>

I thought only heap objects were placed in IV; hence, any pointers contained
by members of IV wouldn't be root pointers. Have I misunderstood something?

Some might think a more direct way for determining the root pointers would be
to check, in GC::RawPtr::RawPtr(), whether this is on the stack in the same way
that BGC checks whether something is on the stack; however, this would fail in
case of:
vector<gc_ptr<T> > a_vec;
since, even though each gc_ptr<T> is NOT on the stack, it should be considered
a root pointer.
---

John Nagle

unread,

Nov 13, 2002, 6:13:17 PM11/13/02

to

Mirek Fidler wrote:

> BlockInfo contains information about finzalizer (destructor) and
> internal pointers.

Please don't confuse finalizers and destructors. They
have very different semantics. Forgetting this is punishable
by race conditions.

John Nagle
Animats
---