An accurate, pauseless & deterministic GC for C++

65 views
Skip to first unread message

Mingnan G.

unread,
Nov 17, 2007, 8:11:46 PM11/17/07
to
The HnxGC Libraray - an accurate, pauseless & deterministic garbage
collector for C++ application, is released at

http://hnxgc.harnixworld.com

(MD5: b96a83d0899317e732abb46a5e1f2dd5)

Features include:

Accurate - HnxGC is a fully accurate tracing garbage collector. We
don't use any tricks to guess/distinguish pointers from other data.
All effective references are identified. There is no leak of garbage
objects uncollected and all associated native resources will be
freed.

Pauseless - HnxGC come with a lock-free and block-free concurrent
collector with no need to scan root-set area. We don't need to take a
snapshot to determine application status, thus the normal execution of
application thread will never be interrupted, suspended, blocked or
forced to spin-waiting by our collector, no matter in a multi-
processors or an uni-processor environment.

Deterministic - HnxGC always executes the destructor of an object when
application program drops the last reference to the object. In
addition, application program can declare dependence rules for each
object to control the destructing ordering of cyclic garbage objects.
The system will figure out an ordering that satisfies all user-
declared rules. Thus, application programmers have more controls on
how unreachable objects are reclaimed, more compliance to RAII
(Resource Acquisition Is Initialization) design pattern, such as
closing file handle or network connection in the destructor routine of
objects.

Efficient - Even HnxGC provide deterministic reclamation feature
similiar to the one in regular reference counting algorithm, we remove
the cost of reference maintenance significantly. The cost of most-
frequently-used reference copying, such as passing references as
parameters or return values between function calls, are removed. More
advanced technique, such as GlobalMemoryFence, are used to eliminate
the memory ordering cost for a multi-processors platforms.

Easy-to-Use and Portable - Just linking with our library, you can
enjoy all of above features and the more. You can even use our library
with .NET managed code in visual C++. There is no too much
restriction, you can use template, union, bit-fields, multiple-
inheritance, object composition, and array of objects in HnxGC as
efficiency as in traditional C++. The source code is available, and
the only required for your application is a standard C++ compiler,
such as Visual C++ or GNU C++. No use of any special features of any
specific compiler, nor any platform specific system calls. Can even
run without virtual memory support. Write your garbage-collected C++
application once, and it is easy to port to a wider array of
platforms, no constraint to any specific platform vendors.

Juha Nieminen

unread,
Nov 18, 2007, 10:20:15 AM11/18/07
to
Mingnan G. wrote:
> http://hnxgc.harnixworld.com

"The author of HnxGC has filed several patents of HnxGC in the United
States and other countries. Whoever without authority makes, uses,
offers to sell, or sells any of these patented invention, within the
United States and/or other countries infringes the patent(s)."

You are claiming patent protection for something which has not been
granted a patent yet?

Besides, you should really acquaint yourself about how patents work
outside the US. You just can't patent something in the US and then claim
that using the algorithm outside the US infringes some patent law. Most
other countries do not have software patents, do not respect software
patents of other countries, and above all, even if it was a valid
patent, they allow the *use* of such patents without express permission
(even if distributing products using the patent is prohibited). Just
claiming that "you can't use this without permission" has no legal
validity. You are claiming more rights than you really have.

Good thing you didn't claim copyright prohibits people from using your
algorithms, as algorithms do not fall into the realm of copyright.

Joe Seigh

unread,
Nov 18, 2007, 10:55:11 AM11/18/07
to

CC'ing c.p.t

I don't know what the GlobalMemoryFence is yet since that part is not
filled in yet. Generally with shared pointers you need acquire/release
semantics which can't be elided.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Joe Seigh

unread,
Nov 18, 2007, 11:03:05 AM11/18/07
to
(sorry for the repost but it probably would help if I actually included c.p.t)

CC'ing c.p.t

osmium

unread,
Nov 18, 2007, 11:11:13 AM11/18/07
to
"Juha Nieminen" wrote:

You seem to have some hostility problems with the patent system, nothing
wrong with that. However:

It is, and has always been, standard in the USA to warn people that a patent
has been applied for. I think the lack of such a warning was a part of the
LZW fiasco a few years ago.

The OPs mention of "other countries" instead of saying "the planet" contains
a clear implication that he is speaking of *some* other countries.

Using number of countries (most) is a pitiful metric. It equates, say,
Sierra Leone with Germany. Germany is more likely to produce patents by
their own people and are thus interested in the whole notion of patents..
As I understand it, the Vatican is a country, and so is France - why treat
them the same?


Chris Thomasson

unread,
Nov 18, 2007, 2:50:40 PM11/18/07
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:ZmZ%i.2695$Jy1.83@trndny02...

> (sorry for the repost but it probably would help if I actually included
> c.p.t)
>
> Mingnan G. wrote:
>> The HnxGC Libraray - an accurate, pauseless & deterministic garbage
>> collector for C++ application, is released at
>>
>> http://hnxgc.harnixworld.com
>>

[...]

Have you addressed the concerns I had:

http://groups.google.com/group/comp.lang.c++/browse_frm/thread/95007ffdf246d50e

?

Chris Thomasson

unread,
Nov 18, 2007, 2:51:57 PM11/18/07
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:TcOdnUISFt2BC93a...@comcast.com...

Are you still using smart-pointers?

Chris Thomasson

unread,
Nov 18, 2007, 2:56:05 PM11/18/07
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:TcOdnUISFt2BC93a...@comcast.com...

AFAICT, it seems like your using an atomic reference counting algorithm with
smart pointers. Do you need to use interlocked instructions to modify the
count? How does you reference counting technique get around prior art? I
assume that you did a full-blown patent search/feasibility check...

Erik Wikström

unread,
Nov 18, 2007, 3:47:08 PM11/18/07
to
On 2007-11-18 17:11, osmium wrote:
> "Juha Nieminen" wrote:
>
>> Mingnan G. wrote:
>>> http://hnxgc.harnixworld.com
>>
>> "The author of HnxGC has filed several patents of HnxGC in the United
>> States and other countries. Whoever without authority makes, uses,
>> offers to sell, or sells any of these patented invention, within the
>> United States and/or other countries infringes the patent(s)."

>> Besides, you should really acquaint yourself about how patents work


>> outside the US. You just can't patent something in the US and then claim
>> that using the algorithm outside the US infringes some patent law. Most
>> other countries do not have software patents, do not respect software
>> patents of other countries, and above all, even if it was a valid
>> patent, they allow the *use* of such patents without express permission
>> (even if distributing products using the patent is prohibited). Just
>> claiming that "you can't use this without permission" has no legal
>> validity. You are claiming more rights than you really have.

> The OPs mention of "other countries" instead of saying "the planet" contains

> a clear implication that he is speaking of *some* other countries.

Actually, if I say "within the United States and/or other countries" I
mean *all* countries, including the US, and I think that my
interpretation is quite common. This however is totally off-topic in
this group so I will say no more on the subject.

--
Erik Wikström

Roland Pibinger

unread,
Nov 18, 2007, 4:43:04 PM11/18/07
to
On Sat, 17 Nov 2007 17:11:46 -0800 (PST), "Mingnan G." wrote:
>The HnxGC Libraray - an accurate, pauseless & deterministic garbage
>collector for C++ application, is released
[...]

>Deterministic - HnxGC always executes the destructor of an object when
>application program drops the last reference to the object.

Apart from the commercial background and apart from the apparent
contradiction in 'deterministic garbage collector' this library, IMO,
touches some interesting aspects:
A future C++-like language should have deterministic destruction but
no 'delete' operator. It's mostly a task for the compiler to determine
the right scope for destruction, e.g. by escape analysis (to a lesser
extent some runtime mechanism (reference counting?) needs to be
involved). Moreover, there should be no difference in syntax between
creating objects on the 'stack' and creating them on the 'heap'. This
distinction becomes an unimportant implementation detail for the
programmer.


--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch

Chris Thomasson

unread,
Nov 18, 2007, 10:56:36 PM11/18/07
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:ZmZ%i.2695$Jy1.83@trndny02...
> (sorry for the repost but it probably would help if I actually included
> c.p.t)
>
> Mingnan G. wrote:
>> The HnxGC Libraray - an accurate, pauseless & deterministic garbage
>> collector for C++ application, is released at
>>
>> http://hnxgc.harnixworld.com

Why not use a simple lock-free proxy garbage collector (e.g., PDR):

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/82376b7295e6ce1a

http://groups.google.com/group/comp.programming.threads/search?hl=en&group=comp.programming.threads&q=PDR

? Here is a free example:

http://home.comcast.net/~vzoom/demos/pc_sample.c


This uses interlocked operations for extracting the necessary epoch
information, but its way better then rw-mutexs for protected read-size proxy
collected region. How do you beat RCU read-side performance characteristics
in general?

The proxy method has its advantages, such as the ability to plain/raw atomic
loads and stores w/ dependant load memory-barriers to access the shared
data-structures; no interlocking and #StoreLoad barriers required for
read-side critical regions. Please note that the membar mentioned is a NOP
on virtually every system out there, except the Alpha architecture:

https://coolthreads.dev.java.net/servlets/ProjectForumMessageView?forumID=1797&messageID=11068

https://coolthreads.dev.java.net/servlets/ProjectForumMessageView?messageID=11001&forumID=1797


The proxy collector allows the writers to use many different techniques,
including but definitely not limited to POSIX mutexs and lock-free
algorithms. I would consider STM to be a candidate for the writer side of a
reader pattern; its about the only thing that STM is useful for, IMHO at
least. Solving the deadlocking problem can be done with correct locking
pattern; STM is not needed for that...

How does your stuff get around prior art? Please inform us when your patent
application is published. I would like to read it as I am interested in the
way you algorithms works. I like implementation details. You mentions that
you do not make any systems calls? You get around using libc and POSIX API
correct? You don't need assembly language to do the atomic/membar stuff? I
am interested in your implementation details indeed.


Message has been deleted

Mingnan G.

unread,
Nov 18, 2007, 11:13:51 PM11/18/07
to
Joe Seigh wrote:
> I don't know what the GlobalMemoryFence is yet since that part is not
> filled in yet. Generally with shared pointers you need acquire/release
> semantics which can't be elided.

By downloading the source code, you will find in HNXGC_MT directory
which uses the Global Memory Fence technology to elide acquire/
release
semantics in a multi-processors environment, such as for Windows 64-
bit
IA64 platform. The binary code for Windows 64-bit IA64 is also
available.

--
Mingnan Guo


Chris Thomasson

unread,
Nov 19, 2007, 1:22:06 AM11/19/07
to
"Mingnan G." <lela...@yahoo.com.cn> wrote in message
news:e6f2cf61-122a-4a57...@s8g2000prg.googlegroups.com...

> >
>> I don't know what the GlobalMemoryFence is yet since that part is not
>> filled in yet. Generally with shared pointers you need acquire/release
>> semantics which can't be elided.
>>
>
> By downloading the source code, you will find in HNXGC_MT directory
> which uses
> the Global Memory Fence technology to elide acquire/release semantics
> in a multi-processors environment, such as for Windows 64-bit IA64

> platform. The binary code
> for Windows 64-bit IA64 is also available.

You don't make systems calls (e.g., no POSIX), you don't tie to a platform
(e.g., no assembly language) and your GlobalMemoryFence is 100% portable?

Are you tracking quiescent state epochs? How do you get around RCU, SMR,
vZOOM and other prior art? I don't have time to read thought a lot of
source-code right now. Can you please explain some details without me having
to read source-code? We have discussed a lot of ways to get around explicit
memory barriers here in cpt. Can you give some details on how your is
fundamentally different that my free proxy collector example, or Joes
SMR+SMR? You don't need acquire/release? Really? You have to be tracking
something to ensure stores don't escape the drop-to-zero condition in
reference counting in general? Do tell. Please? Thanks for your time.

Arnold Hendriks

unread,
Nov 19, 2007, 1:57:33 PM11/19/07
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:d5idnVx6RpmGt9za...@comcast.com...

> "Mingnan G." <lela...@yahoo.com.cn> wrote in message
> news:e6f2cf61-122a-4a57...@s8g2000prg.googlegroups.com...
>> By downloading the source code, you will find in HNXGC_MT directory
>> which uses
>> the Global Memory Fence technology to elide acquire/release semantics
>> in a multi-processors environment, such as for Windows 64-bit IA64
>> platform. The binary code
>> for Windows 64-bit IA64 is also available.

> Are you tracking quiescent state epochs? How do you get around RCU, SMR,

> vZOOM and other prior art? I don't have time to read thought a lot of

Either way, being outside the US, he may not care about prior art :)


hyc

unread,
Nov 20, 2007, 4:00:05 AM11/20/07
to
On Nov 18, 10:22 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Mingnan G." <leland...@yahoo.com.cn> wrote in message

You'd rather read a vague patent application written in legalese than
actual source code? Talk about a masochist. What makes you think there
would ever *be* a patent application? What makes you even think
software patents are in any way a proper use of anyone's time? By the
way, patents generally can't be applied for after publication. If this
code is already available for download, then there won't be any patent
application.

hyc

unread,
Nov 20, 2007, 4:21:32 AM11/20/07
to
On Nov 20, 1:00 am, hyc <h...@highlandsun.com> wrote:
> What makes you think there
> would ever *be* a patent application?

Never mind. The web site says patent-pending. Sigh...

Mingnan G.

unread,
Nov 20, 2007, 8:57:25 AM11/20/07
to
On Nov 20, 1:00 am, hyc <h...@highlandsun.com> wrote:
> would ever *be* a patent application? What makes you even think
> software patents are in any way a proper use of anyone's time? By the

You worry too much. For a non-commercial educational purpose, you can
apply
for a *FREE* license for this software. See follows:

Permission to use, copy and distribute this software and its
documentation
for a non-commercial educational purpose can be granted free
by electronic registration via hnxgc.harnixworld.com, provided that
this
copyright notice appears in all copies.

Juha Nieminen

unread,
Nov 20, 2007, 11:58:36 AM11/20/07
to
hyc wrote:
> By the
> way, patents generally can't be applied for after publication. If this
> code is already available for download, then there won't be any patent
> application.

That my be true in Europe but not in the US.

Matthias Buelow

unread,
Nov 20, 2007, 1:12:03 PM11/20/07
to
Mingnan G. wrote:

> You worry too much. For a non-commercial educational purpose, you can
> apply
> for a *FREE* license for this software. See follows:

[blah]

If your development is so breathtakingly innovative and of general
applicability to C++ programming, why don't you make it available to the
whole programming community as free software under a liberal license.

Instead of spamming this newsgroup with your product advertisement and
annoying everyone with "software patent" nonsense.

Arnold Hendriks

unread,
Nov 20, 2007, 4:30:44 PM11/20/07
to
"Juha Nieminen" <nos...@thanks.invalid> wrote in message
news:47431142$0$3519$39db...@news.song.fi...
In Europe, software patents cannot be enforced at all. (Although they can be
registered, to keep the EPO busy)


Chris Thomasson

unread,
Nov 20, 2007, 7:32:02 PM11/20/07
to
"Mingnan G." <lela...@yahoo.com.cn> wrote in message
news:919adfc5-0e9d-422e...@e10g2000prf.googlegroups.com...

Please inform us when you have filled in the following page:

http://hnxgc.harnixworld.com/algo_gmf.htm

Humm, well I think I well go ahead and sign-up and download the source-code
sometime today.

Chris Thomasson

unread,
Nov 20, 2007, 7:35:32 PM11/20/07
to
"Mingnan G." <lela...@yahoo.com.cn> wrote in message
news:919adfc5-0e9d-422e...@e10g2000prf.googlegroups.com...

[...]

I can't seem to find the sign-up page. I sent you an e-mail requesting
further instruction.

Thanks.

Chris Thomasson

unread,
Nov 20, 2007, 9:32:37 PM11/20/07
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:GomdndSAuZZA5t7a...@comcast.com...

STUPID ME!!!!

I found it, and have downloaded your source-code. Anyway, it seems like your
using some form of reference counting (need to examine the code to find you
exactly algorithm) and using tracing GC to handle all of the circular
references.

Chris Thomasson

unread,
Nov 20, 2007, 9:53:37 PM11/20/07
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:ZmZ%i.2695$Jy1.83@trndny02...
> (sorry for the repost but it probably would help if I actually included
> c.p.t)
[...]

> CC'ing c.p.t
>
> I don't know what the GlobalMemoryFence is yet since that part is not
> filled in yet. Generally with shared pointers you need acquire/release
> semantics which can't be elided.
[...]

I took a quick look at the code. AFAICT, he creates a thread per-processor
and binds it to that processor.


So if your running an a 64-processor system, he creates:

Thread 1 - Bound To CPU 1
Thread 2 - Bound To CPU 2
Thread 3 - Bound To CPU 3
Thread 4 - Bound To CPU 4
Thread 5 - Bound To CPU 5
Thread 6 - Bound To CPU 6

[and on and on to 64]

Those threads basically sit on an event and wait. When they are awoken, they
execute a membar and atomically decrement a counter and check to see if it
dropped to zero, which means that every "CPU" thread has executed it.
AFAICT, this is exactly the same as the synchronize_rcu() function
implemented with a daemon per-cpu. So, there is really nothing new here at
all. I think he is going to have a problem with prior are for sure.

As for the smart pointer reference counting, I haven't looked yet as I was
mainly interested in GlobalMemoryFence (a.k.a, synchronize_rcu()).


Any thoughts?

Mingnan G.

unread,
Nov 21, 2007, 3:54:48 AM11/21/07
to
On Nov 20, 6:53 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Joe Seigh" <jseigh...@xemaps.com> wrote in message

synchronize_rcu is for locking(lock-free), GlobalMemoryFence is for
memory semantics for MP.
Totally two different things.

Joe Seigh

unread,
Nov 21, 2007, 6:52:23 AM11/21/07
to

I figured it was some form of RCU w/ memory barriers in the quiescent states.
That by itself doesn't let you elide memory barriers. Whether you can elide
a membar depends on the specific situation. And the proofs can be a bit
complicated depending on the situation (at least for SMR+RCU).

I'm guessing from the docs, the refcounting is a bit like atomic_ptr with
the distinguishing of local non-shared vs. global heap shared reference counts.

With refcounting, it was always safe to use a raw reference as long as you had
(owned) one refcount. There seems to be a wrapper class that is the logical
equivalent.

The burden of proof is on him. He needs to show how his stuff is different
from prior art. That means explicitly listing the prior art and contrasting
it with his stuff.

Chris Thomasson

unread,
Nov 21, 2007, 5:10:20 PM11/21/07
to
"Mingnan G." <lela...@yahoo.com.cn> wrote in message
news:b2a15ae9-d72c-45a1...@l1g2000hsa.googlegroups.com...

synchronize_rcu() produces the exact same overall effect as
GlobalMemoryFence. When the daemon run on each CPU, it constitutes a global
memory barrier. The well known technique is not tied to just the writer-side
of a non-blocking reader-pattern. You can use this for other things. For
instance, I could implement the reference counting algorithm you use with
synchronize_rcu(), no problem.


> GlobalMemoryFence is for
> memory semantics for MP.
> Totally two different things.

I believe that they do the same thing. BTW, have you looked at Joe Seighs
prior art? He has a more scaleable method of extracting "global" quiescent
states:

http://atomic-ptr-plus.sourceforge.net


I had done some in-house work on this in an experimental setting so luckily
I knew how to sketch out a solution to Joe Seighs challenge to show a
Windows implementation of a "global" memory fence:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5d4e81b46352905d


I say that those are more scaleable because you don't have to create a
thread per-cpu, and issue a global broadcast. I don't think that would go
over well with a distributed NUMA system with a shitload of processors. I
could make your system work on a NUMA, however you would have to change your
API to allow for thread grouping.

Chris Thomasson

unread,
Nov 21, 2007, 5:31:40 PM11/21/07
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:XZU0j.1586$Mr.1194@trnddc04...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:ZmZ%i.2695$Jy1.83@trndny02...
[...]

> I figured it was some form of RCU w/ memory barriers in the quiescent
> states.
> That by itself doesn't let you elide memory barriers. Whether you can
> elide
> a membar depends on the specific situation. And the proofs can be a bit
> complicated depending on the situation (at least for SMR+RCU).

Yeah. For instance, the polling logic needs to do some things (e.g., add
extra epoch) to allow you to have release semantics. If your doing your fast
SMR, the polling logic needs to scan the hazard pointers in a way that gives
them acquire semantics. Been there done that. ;^)

> I'm guessing from the docs, the refcounting is a bit like atomic_ptr with
> the distinguishing of local non-shared vs. global heap shared reference
> counts.

Thats what I am leaning to. I have not examined the refcount logic in the
source-code I downloaded enough to make a definitive answer. The first thing
I will check for is wether or not it uses interlocking instructions. If not,
then it has to be a form of distributed counting. If so, I will compare and
contrast against the method I created for vZOOM ref-counting:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/1216861ac568b2be


> With refcounting, it was always safe to use a raw reference as long as you
> had
> (owned) one refcount. There seems to be a wrapper class that is the
> logical
> equivalent.

Indeed. Its also safe to take and use a non-NULL raw "read" reference w/
dd-load from a data-structure being protected by a proxy collector:


pc_read& read = proxy_read(&pc_anchor);

// inside a read-only epoch
//_______________________________________
node* n = load_depends(&shared_lifo);
while (n) {
node* const nx = load_depends(&n->nx);
read_only_process(n);
n = nx;
}

proxy_release(read);


> The burden of proof is on him. He needs to show how his stuff is
> different
> from prior art. That means explicitly listing the prior art and
> contrasting
> it with his stuff.

Yup. He also has to do it with fewer claims than before; the USPO issued new
rules that effect the number of claims you can have. Here the are:


___________________________________________________________________

(A) - No more than twenty-five total claims.


(B) - No more than five independent claims.


(C) - No more than two continuations per-patent.


(D) - Identify all co-pending patent applications.

___________________________________________________________________


Luckily, my vZOOM stuff passes all the above. Although, I was wondering how
RCU is going to explain the numerous continuations? I think that is going to
be a problem...


Mingnan G.

unread,
Nov 21, 2007, 5:54:06 PM11/21/07
to
HnxGC can perform Lock-free concurrent garbage collection even without
GlobalMemoryFence. GlobalMemoryFence is designed to remove extra
memory ordering semantics of HnxGC, such as Acquire/Release semantics
for IA64. We don't need to detect quiescent states as RCU does to
perform lock-free access.

Although GMF is very interesting, but it is only useful for some
processor platforms. For example, x86 architecture has enforced memory
semantics for atomic operations, so GMF is not needed even in a x86
multi-processors environment. I will describe and discuss GMF later in
the documentation of hnxgc.harnixworld.com

Anyway, thanks for your interests in HnxGC. I don't have experiences
on NUMA, but it will be great if you
can make it work on a NUMA and get some performance benchmarks.

Chris Thomasson

unread,
Nov 21, 2007, 7:51:02 PM11/21/07
to
"Mingnan G." <lela...@yahoo.com.cn> wrote in message
news:e6feb91f-7b7c-4c2c...@t47g2000hsc.googlegroups.com...

> HnxGC can perform Lock-free concurrent garbage collection even without
> GlobalMemoryFence. GlobalMemoryFence is designed to remove extra
> memory ordering semantics of HnxGC, such as Acquire/Release semantics
> for IA64.

Sure. But that's more expensive. Can the code as is run without
GlobalMemoryFence and still elude Acquire/Release?

> We don't need to detect quiescent states as RCU does to
> perform lock-free access.

AFAICT, you using GMF to lower the overhead of reference counter mutations.
I can't quite see if your using it to get release semantics. As for acquire,
well I am talking about avoiding a #StoreLoad | #StoreStore barrier, I guess
the would be a store-acquire to be more precise...


> Although GMF is very interesting, but it is only useful for some
> processor platforms. For example, x86 architecture has enforced memory
> semantics for atomic operations, so GMF is not needed even in a x86
> multi-processors environment.

Some sort of GMF is required on x86 to get the #StoreLoad barrier in a
lock-free reader pattern. You can use GC for that pattern an it has similar
concerns indeed. For instance, you have to use a GMF to implement SMR. You
have to keep in mind that the implied memory barrier for stores on x86
(#LoadStore | #StoreStore) is not strong enough from keeping some subsequent
operations from rising above it...


> I will describe and discuss GMF later in
> the documentation of hnxgc.harnixworld.com
>
> Anyway, thanks for your interests in HnxGC.

Sure. Your stuff looks like it will work. I was just trying to compare and
contrast it against prior art. I can't seem to find a difference between GMF
and synchronize_rcu()...


> I don't have experiences
> on NUMA, but it will be great if you
> can make it work on a NUMA and get some performance benchmarks.

I bet it can work, 95% sure it could... Here is how I did it for my vZOOM
algorithm:

Distributed reference counting and polling based epoch detection with high
locality wrt the distance of the memory from the CPU. You network the local
polling entities with wait-free distributed message passing. That should
scale to thousands upon thousands of processors. Applications are advised to
keep their actions as local as possible.

Chris Thomasson

unread,
Nov 22, 2007, 1:54:53 AM11/22/07
to
http://groups.google.com/group/comp.lang.c++/msg/b3e1d10e219908f1

the last three, or so, paragraphs are relevant to synchronization epoch
detection. Global Memory Fence is, well, AFICT for now, covered by prior
art on multiple fronts.

Nick Keighley

unread,
Nov 22, 2007, 4:59:17 AM11/22/07
to
On 20 Nov, 18:12, Matthias Buelow <m...@incubus.de> wrote:
> Mingnan G. wrote:

> > You worry too much. For a non-commercial educational purpose, you can
> > apply
> > for a *FREE* license for this software. See follows:
>
> [blah]
>
> If your development is so breathtakingly innovative and of general
> applicability to C++ programming, why don't you make it available to the
> whole programming community as free software under a liberal license.

why should he?

Matthias Buelow

unread,
Nov 22, 2007, 9:28:48 AM11/22/07
to
Nick Keighley wrote:

>> If your development is so breathtakingly innovative and of general
>> applicability to C++ programming, why don't you make it available to the
>> whole programming community as free software under a liberal license.
>
> why should he?

Because locking it up with a patent effectively removes it from the
common pool of knowledge (at least in the U.S.). It isn't even as if it
hadn't been invented yet -- it essentially cannot be (re-)invented by
someone else (until the patent runs out, of course) and thus is
effectively lost to humanity (or at least, the US software community.)

As I don't live in the US, I personally couldn't care less about the
software patent nonsense but still.

Nick Keighley

unread,
Nov 23, 2007, 3:48:13 AM11/23/07
to
On 22 Nov, 14:28, Matthias Buelow <m...@incubus.de> wrote:
> Nick Keighleywrote:

there's a difference between not patenting it and releasing it as
free software.

I don't like software patents either.


--
Nick Keighley

Matthias Buelow

unread,
Nov 23, 2007, 5:06:01 AM11/23/07
to
Nick Keighley wrote:

> there's a difference between not patenting it and releasing it as
> free software.

True.. but such an algorithm, if unpatented (the desirable case), could
be reimplemented relatively easily, thus rendering the original
implementation commercially worthless to its author. So it's rather
logical to me to release it as free software in the first place.

Chris Thomasson

unread,
Nov 23, 2007, 1:28:54 PM11/23/07
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:BJGdnfBWbsdmTdna...@comcast.com...

> "Mingnan G." <lela...@yahoo.com.cn> wrote in message
> news:e6feb91f-7b7c-4c2c...@t47g2000hsc.googlegroups.com...
>> HnxGC can perform Lock-free concurrent garbage collection even without
>> GlobalMemoryFence. GlobalMemoryFence is designed to remove extra
>> memory ordering semantics of HnxGC, such as Acquire/Release semantics
>> for IA64.
>
> Sure. But that's more expensive. Can the code as is run without
> GlobalMemoryFence and still elude Acquire/Release?
>
>
>
>> We don't need to detect quiescent states as RCU does to
>> perform lock-free access.
>
> AFAICT, you using GMF to lower the overhead of reference counter
> mutations. I can't quite see if your using it to get release semantics. As
> for acquire, well I am talking about avoiding a #StoreLoad | #StoreStore
> barrier, I guess the would be a store-acquire to be more precise...
>
>
>
>
>> Although GMF is very interesting, but it is only useful for some
>> processor platforms. For example, x86 architecture has enforced memory
>> semantics for atomic operations, so GMF is not needed even in a x86
>> multi-processors environment.
>
> Some sort of GMF is required on x86 to get the #StoreLoad barrier in a
> lock-free reader pattern. You can use GC for that pattern an it has
> similar concerns indeed. For instance, you have to use a GMF to implement
> SMR.

[...]

I need to make a correction to the last sentence. You can certainly use GMF
to implement SMR, but its not required. You can also use explicit memory
barrier calls...

Chris Thomasson

unread,
Nov 26, 2007, 1:20:09 PM11/26/07
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:ZmZ%i.2695$Jy1.83@trndny02...
> (sorry for the repost but it probably would help if I actually included
> c.p.t)
>
> Mingnan G. wrote:
>> The HnxGC Libraray - an accurate, pauseless & deterministic garbage
>> collector for C++ application, is released at
>>
>> http://hnxgc.harnixworld.com
>>
>> (MD5: b96a83d0899317e732abb46a5e1f2dd5)
>>
>> Features include:
>>
[...]

Some observations on your pointer manipulation functions...

_________________________________________________________________
- I notice that the _hnxgc_assign_lptr function is made up of more than an
interlocked update, or simple distributed reference count increment. I
notice at multiple lines of code that makes calls into other object
functions made up of multiple lines of code themselves.


- I notice that the _hnxgc_assign_mptr function has more lines than
_hnxgc_assign_lptr, and it does make calls to functions made up of multiple
lines of code.


- I am thinking that a CLockedPtr is analog of a shared pointer, and
CWeakPtr is analog of local pointer. Am I Right?


- I don't see support for lock-free operations on the CLockedPtr objects.
Refer to this post:

http://groups.google.com/group/comp.lang.c++/msg/eada4a93d932430e

_________________________________________________________________

Humm, well, I am thinking that pointer the DWCAS version of Joes atomic_ptr
would be good to test against. Its also good to keep in mind that circular
references can be designed around. I can give you some examples if you
want...


Have you benchmarked this against a proxy garbage collector yet? This would
be in the context of a reader/writer solution of course.. I am thinking that
it will outperform your GC simply because the pointer access within the
collected region can be completely naked on most of the existing
architectures out there. Another advantage of this type of garbage
collection is that you can clearly separate the writers from the readers.
Also, refer to last few paragraphs of following message:

http://groups.google.com/group/comp.lang.c++/browse_frm/thread/95007ffdf246d50e


Also, the implementation of a proxy collector can be very lightweight...
Check this out:

http://groups.google.com/group/comp.programming.threads/msg/f83460262d48e916


I used that code as a benchmark against my vZOOM stuff. The code is
basically in a pseudo-code state because some error/sanity checks for
brevity...

Any thoughts on this?

Chris Thomasson

unread,
Nov 26, 2007, 1:22:39 PM11/26/07
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:5KCdnV6JM49Mkdba...@comcast.com...

> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:ZmZ%i.2695$Jy1.83@trndny02...
>> (sorry for the repost but it probably would help if I actually included
>> c.p.t)
>>
>> Mingnan G. wrote:
>>> The HnxGC Libraray - an accurate, pauseless & deterministic garbage
>>> collector for C++ application, is released at
>>>
>>> http://hnxgc.harnixworld.com
>>>
>>> (MD5: b96a83d0899317e732abb46a5e1f2dd5)
>>>
>>> Features include:
>>>
> [...]
>
> Some observations on your pointer manipulation functions...
>
> _________________________________________________________________
> - I notice that the _hnxgc_assign_lptr function is made up of more than an
> interlocked update, or simple distributed reference count increment. I
> notice at multiple lines of code that makes calls into other object
> functions made up of multiple lines of code themselves.
>
[...]

I also notice that you use a barrier in one of those pointer assignment
functions. Also, what does the macro HNX_CONFIG_FASTCALL do in relation to
_hnxgc_assign_lptr?

Chris Thomasson

unread,
Nov 26, 2007, 1:24:04 PM11/26/07
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:5KCdnV6JM49Mkdba...@comcast.com...
[...]

> Have you benchmarked this against a proxy garbage collector yet? This
> would be in the context of a reader/writer solution of course.. I am
> thinking that it will outperform your GC simply because the pointer access
> within the collected region can be completely naked on most of the
> existing architectures out there. Another advantage of this type of
> garbage collection is that you can clearly separate the writers from the
> readers. Also, refer to last few paragraphs of following message:
>
> http://groups.google.com/group/comp.lang.c++/browse_frm/thread/95007ffdf246d50e
>
[...]

Wrong link! Here is the right one:

http://groups.google.com/group/comp.lang.c++/msg/b3e1d10e219908f1


Mingnan G.

unread,
Nov 28, 2007, 7:13:31 PM11/28/07
to
On Nov 26, 10:20 am, "Chris Thomasson" <cris...@comcast.net> wrote:
> - I notice that the _hnxgc_assign_lptr function is made up of more than an
> interlocked update, or simple distributed reference count increment. I
> notice at multiple lines of code that makes calls into other object
> functions made up of multiple lines of code themselves.

Yes, basically _hnxgc_assign_lptr is handling that a new reference is
assigned to a CLockedPtr pointer. This will include two operations
(so
you notice more than an interlocked update), the first one is to
decrement the
RC of the object that the pointer originally pointed to, the second
one is
to increment the RC of the object that the pointer will point to
after
assignment.

Note: assign a CLockedPtr to another CLockedPtr will not cause any
RC operation, so will not execute the _hnxgc_assign_lptr function.

I give an example, suppose two pointers: 'p1' points to 'A', 'p2'
points to 'B',
'p1' is a CLockedPtr pointer, then considering the following C++
statement:

p1 = p2;

(1) if 'p2' is alos a CLockedPtr pointer, then there is no RC
operation, no
calling to _hnxgc_assign_lptr function.

(2) if 'p2' is other type of pointer, such as CWeakPtr, CMemberPtr,
then the
_hnxgc_assign_lptr is called. It will complete following works:
a. increment the RC of object 'B';
b. decrement the RC of object 'A'.
c. set the pointer 'p1' pointing to 'B';

> - I notice that the _hnxgc_assign_mptr function has more lines than
> _hnxgc_assign_lptr, and it does make calls to functions made up of multiple
> lines of code.

the situation _hnxgc_assign_mptr is like _hnxgc_assign_lptr. It also
include a
RC decrement and a RC increment operation.

- Mingnan Guo

Chris Thomasson

unread,
Nov 29, 2007, 6:29:11 PM11/29/07
to
"Mingnan G." <lela...@yahoo.com.cn> wrote in message
news:5a2620ce-6cfe-4d37...@s36g2000prg.googlegroups.com...

> On Nov 26, 10:20 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>> - I notice that the _hnxgc_assign_lptr function is made up of more than
>> an
>> interlocked update, or simple distributed reference count increment. I
>> notice at multiple lines of code that makes calls into other object
>> functions made up of multiple lines of code themselves.
[...]

>
> Note: assign a CLockedPtr to another CLockedPtr will not cause any
> RC operation, so will not execute the _hnxgc_assign_lptr function.
>
> I give an example, suppose two pointers: 'p1' points to 'A', 'p2'
> points to 'B',
> 'p1' is a CLockedPtr pointer, then considering the following C++
> statement:
>
> p1 = p2;
>
> (1) if 'p2' is alos a CLockedPtr pointer, then there is no RC
> operation, no
> calling to _hnxgc_assign_lptr function.
>
> (2) if 'p2' is other type of pointer, such as CWeakPtr, CMemberPtr,
> then the
> _hnxgc_assign_lptr is called. It will complete following works:
> a. increment the RC of object 'B';
> b. decrement the RC of object 'A'.
> c. set the pointer 'p1' pointing to 'B';
>
>> - I notice that the _hnxgc_assign_mptr function has more lines than
>> _hnxgc_assign_lptr, and it does make calls to functions made up of
>> multiple
>> lines of code.
>
> the situation _hnxgc_assign_mptr is like _hnxgc_assign_lptr. It also
> include a
> RC decrement and a RC increment operation.

Thank you for that information Mingnan as it help me understand what's going
on much better. Okay, so I think what you are saying is that CLockedPtr
provides the strong thread-safety level and the following pseudo-code would
be legal if the refcount object was replaced with CLockedPtr:
_____________
static refcount<foo> g_foo(new foo);

threads_a_x() {
for(;;) {
refcount<foo> l_foo(g_foo);
if (l_foo) {
l_foo->foo();
}
}
}


threads_y_z() {
for(;;) {
refcount<foo> l_foo(new foo);
g_foo = l_foo;
}

}
______________


This example will not work if the implementation of refcount does not have
the strong thread-safety guarantee. Refer to the following thread for far
more details of what I am getting at here:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e5167941d32340c6


Can I do lock-free programming with your smart-pointers? Refer to the
following post for more info:

http://groups.google.com/group/comp.lang.c++/msg/eada4a93d932430e

Is that a legal programming style within the scope of your GC?

Mingnan G.

unread,
Nov 30, 2007, 12:14:45 PM11/30/07
to
On Nov 29, 3:29 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
> Thank you for that information Mingnan as it help me understand what's going
> on much better. Okay, so I think what you are saying is that CLockedPtr
> provides the strong thread-safety level and the following pseudo-code would
> be legal if the refcount object was replaced with CLockedPtr:
> _____________
> static refcount<foo> g_foo(new foo);
>
> threads_a_x() {
> for(;;) {
> refcount<foo> l_foo(g_foo);
> if (l_foo) {
> l_foo->foo();
> }
> }
>
> }
>
> threads_y_z() {
> for(;;) {
> refcount<foo> l_foo(new foo);
> g_foo = l_foo;
> }
>
> }
>
> ______________
>
> This example will not work if the implementation of refcount does not have
> the strong thread-safety guarantee. Refer to the following thread for far
> more details of what I am getting at here:
>
> http://groups.google.com/group/comp.programming.threads/browse_frm/th...

>
> Can I do lock-free programming with your smart-pointers? Refer to the
> following post for more info:
>
> http://groups.google.com/group/comp.lang.c++/msg/eada4a93d932430e
>
> Is that a legal programming style within the scope of your GC?- Hide quoted text -
>
> - Show quoted text -

Yes, I think it is okay for this example. But lock-free programming in
HnxGC application should be with very cautions, the lock-free (non-
blocking) feature of HnxGC just means that pointer assignment in
application mutator threads is lock-free with backgound concurrently
running collector thread. Correct lock-free mechanism *between
application threads* as in this example is the responsibility of
application program not HnxGC collector. We don't provide extra lock-
free mechanism above traditional C/C++ programming for application.
Accessing a shared resource normally should under some synchronization
mechanism protection, e.g. mutex, unless you deliberately act without
lock for performance.

For illustration, I modified your example a little as following:

--------------------------------------------------

// atomically shared with multiple threads (it should be on nature-
boundary address, e.g. 4 byte for 32-bit platform)
static void * volatile g_foo = NULL;

threads_a_x() {
for(;;) {
lp<CFoo> l_foo;

l_foo.attach(g_foo); // read atomically from 'g_foo'

if (l_foo) {
l_foo->foo();
}
}
}

threads_y_z() {
for(;;) {
lp<CFoo> l_foo = gcnew CFoo;
void *p = l_foo.detach();

// write to 'g_foo' with racing
if(atomicCompareAndSwap(g_foo, 0, p) != 0) {
// failed, restore to 'l_foo' to reclaim this object
l_foo.attach(p);
}
}
}

-----------------------------------------

Actually, if an application maintains consistency of reference
relationship before and after an reference operation, the operation
can be invisible to HnxGC collector and it is free to use any multi-
threading mechanisms as you like. The "moving reference" mechanism in
HnxGC Smart Pointer is also an example, refer to the following link
for more details:

http://hnxgc.harnixworld.com/algo_refcount.htm ,

and

"Assignment between 2 CLockedPtr" in http://hnxgc.harnixworld.com/prog_b03.htm

Dmitriy Vyukov

unread,
Dec 4, 2007, 5:53:17 AM12/4/07
to
On Nov 22, 1:10 am, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Mingnan G." <leland...@yahoo.com.cn> wrote in message
> http://groups.google.com/group/comp.programming.threads/browse_frm/th...

>
> I say that those are more scaleable because you don't have to create a
> thread per-cpu, and issue a global broadcast. I don't think that would go
> over well with a distributed NUMA system with a shitload of processors. I
> could make your system work on a NUMA, however you would have to change your
> API to allow for thread grouping.


BTW, did you see this function:
FlushProcessWriteBuffers()
http://msdn2.microsoft.com/en-us/library/ms683148.aspx

MSDN contains only following indistinct description:
"The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It
guarantees the visibility of write operations performed on one
processor to the other processors."

I can't understand what it means. And one will find nothing about this
function in the whole web - just few links to msdn. But I think this
is very promising - just generate IPI to all other processors, and
they execute membar straight away.


Dmitriy V'jukov

Chris Thomasson

unread,
Dec 4, 2007, 5:25:25 PM12/4/07
to
"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:7477cf32-4620-41ef...@s8g2000prg.googlegroups.com...
[...]

> BTW, did you see this function:
> FlushProcessWriteBuffers()
> http://msdn2.microsoft.com/en-us/library/ms683148.aspx
>
> MSDN contains only following indistinct description:
> "The function generates an interprocessor interrupt (IPI) to all
> processors that are part of the current process affinity. It
> guarantees the visibility of write operations performed on one
> processor to the other processors."
>
> I can't understand what it means. And one will find nothing about this
> function in the whole web - just few links to msdn. But I think this
> is very promising - just generate IPI to all other processors, and
> they execute membar straight away.

Humm. Well, it seems as if it would work. I think it might be analog of the
GlobalMemoryFence (aka, synchronize_rcu) used in Mingnan's GC.

Joe Seigh

unread,
Dec 4, 2007, 8:34:52 PM12/4/07
to

McKenny has a patent application for essentially the same thing. Also
one of the patents I did on an early form of something similar to RCU
cites a patent issued in 1976 which discloses a method by MVS to cause
all processors to purge their TLB's by issuing signal processor instructions.

Reply all
Reply to author
Forward
0 new messages