The Hoard Scalable Memory Allocator

Emery Berger

unread,

Oct 5, 2007, 10:53:30 AM10/5/07

to

The Hoard memory allocator is a fast, scalable, and memory-efficient
memory allocator. It runs on a variety of platforms, including Linux,
Solaris (x86 and Sparc), and Windows. Hoard is a drop-in replacement
for malloc() that can dramatically improve application performance,
especially for multithreaded programs running on multicore CPUs or
multiprocessors. No change to your source is necessary.

This release (3.7) provides substantial performance enhancements for
both 32-
bit and 64-bit systems, especially Linux x86-64.

Source and pre-compiled binaries are available at http://www.hoard.org.

--
Emery Berger
Assistant Professor
Dept. of Computer Science
University of Massachusetts Amherst
http://www.cs.umass.edu/~emery

Chris Thomasson

unread,

Oct 5, 2007, 11:08:20 PM10/5/07

to

"Emery Berger" <emery....@gmail.com> wrote in message
news:1191596010....@50g2000hsm.googlegroups.com...

> The Hoard memory allocator is a fast, scalable, and memory-efficient
> memory allocator.

[...]

Hoard is left in the dust when compared to:

http://groups.google.com/group/comp.arch/browse_frm/thread/24c40d42a04ee855

Markus Elfring

unread,

Oct 6, 2007, 2:45:09 AM10/6/07

to

> Hoard is a drop-in replacement for malloc() that can dramatically
> improve application performance, especially for multithreaded programs
> running on multicore CPUs or multiprocessors.

How much will complete error detection and handling affect its speed?

Would anybody like to try a comparison with the following approaches?
- http://www.nedprod.com/programs/portable/nedmalloc/index.html
- http://www.garret.ru/~knizhnik/threadalloc/readme.html
- http://portal.acm.org/citation.cfm?id=1133967 (McRT-Malloc)
- http://www.fourmilab.ch/bget/

Regards,
Markus

David Schwartz

unread,

Oct 7, 2007, 10:50:18 PM10/7/07

to

On Oct 5, 8:08 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> Hoard is left in the dust when compared to:
>

> http://groups.google.com/group/comp.arch/browse_frm/thread/24c40d42a0...

Certainly. It's a lot faster to cross the street in a jalopy than a
747.

DS

Chris Thomasson

unread,

Oct 8, 2007, 12:36:59 AM10/8/07

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:1191811818....@y42g2000hsy.googlegroups.com...

Have you looked at my work wrt scaleable allocation? I posted it here....

My algorithm beats the pants off hoard... I dare you to disprove me!
\

llothar

unread,

Oct 8, 2007, 1:50:54 AM10/8/07

to

On 5 Okt., 21:53, Emery Berger <emery.ber...@gmail.com> wrote:

> Solaris (x86 and Sparc), and Windows. Hoard is a drop-in replacement
> for malloc() that can dramatically improve application performance,
> especially for multithreaded programs running on multicore CPUs or
> multiprocessors. No change to your source is necessary.

Is it only me who thinks - especially in this newsgroup - that manual
memory mangement and multicore CPU's are pieces that do not belong
together. Doing a lot of multithreaded code, i can really not see any
future for this combination. It is way to unproductive.

Yes i'm sure Chris Thomasson will come up with every possible
argument
to deny this.

Chris Thomasson

unread,

Oct 8, 2007, 2:11:52 AM10/8/07

to

"llothar" <llo...@web.de> wrote in message
news:1191822654.7...@v3g2000hsg.googlegroups.com...

Do you an automatic "highly-efficient" memory management technique? I am
interested in hearing about your methods indeed.

Chris Thomasson

unread,

Oct 8, 2007, 2:14:21 AM10/8/07

to

[...]

> Do you an automatic "highly-efficient"

^^^^^^^^^^^
have

> memory management technique?

> I
^^^^^^^^^^

If so,

David Schwartz

unread,

Oct 8, 2007, 8:24:50 AM10/8/07

to

On Oct 7, 9:36 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> Have you looked at my work wrt scaleable allocation? I posted it
> here....

Yes.

> My algorithm beats the pants off hoard... I dare you to disprove me!

Your algorithm (optimized slab) solves a completely different problem
from the one Hoard solves (optimized multi-threaded memory allocator).
You can't compare them.

DS

David Schwartz

unread,

Oct 8, 2007, 8:26:28 AM10/8/07

to

On Oct 7, 10:50 pm, llothar <llot...@web.de> wrote:

> Is it only me who thinks - especially in this newsgroup - that manual
> memory mangement and multicore CPU's are pieces that do not belong
> together. Doing a lot of multithreaded code, i can really not see any
> future for this combination. It is way to unproductive.

What is "manual memory management"? Do you mean as opposed to
automatic garbage collection? Automatic garbage collection has always
seemed like an awful kluge to me, useful only to people who don't
understand how to properly manage an object's lifetime.

Or do you mean something completely different, probably equally
wrong. ;)

DS

Chris Thomasson

unread,

Oct 8, 2007, 4:40:30 PM10/8/07

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:1191846290.9...@k79g2000hse.googlegroups.com...

My algorithm _is_ an "optimized multi-threaded memory allocator". You can
use a slab-based scheme as a general purpose allocator. I did not want to
stick all the segregated slab logic into the pseudo-code because I wanted to
keep things simple:

http://groups.google.com/group/comp.programming.threads/msg/1d3aeaa6ebf5181f

http://groups.google.com/group/comp.programming.threads/msg/bc019dc04362be41

Anyway, doesn't Hoard use segregated slabs of objects under the hood?

> You can't compare them.

I believe that you can compare two multi-threaded allocators. I am thinking
of releasing the version of the allocator I submitted to Sun Microsystems
under the GPL license for use in free software only.

Chris Thomasson

unread,

Oct 8, 2007, 8:15:41 PM10/8/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:gZ6dnQ32o4dcCZfa...@comcast.com...
[...]

> I believe that you can compare two multi-threaded allocators. I am
> thinking of releasing the version of the allocator I submitted to Sun
> Microsystems under the GPL license for use in free software only.

Humm... I think I will use the following license instead of GPL:

http://appcore.home.comcast.net/vzoom/vzoom_personal_licence.pdf

Too bad that not compatible with SourceForge. At least I don't think it
is...

Logan Shaw

unread,

Oct 8, 2007, 11:30:17 PM10/8/07

to

David Schwartz wrote:
> What is "manual memory management"? Do you mean as opposed to
> automatic garbage collection? Automatic garbage collection has always
> seemed like an awful kluge to me, useful only to people who don't
> understand how to properly manage an object's lifetime.

Garbage collection is useful within a certain niche. You basically
need to satisfy all or most of the following conditions:

(1) Real-time behavior is not necessary.
(2) You are working with a lot of data where it's irrelevant to the
functioning of your program whether objects (and data structures)
are destroyed immediately or a week from now. That is, you have
a lot of objects where memory is the only important resource and
you don't need to clean up external resources (filehandles, etc.).
(3) You do not need/want to follow a keep-it-simple-stupid approach
and you can accept the added complexity of a GC implementation
within your system.
(4) You want to practice modularity and separation of concerns to the
extent that you gain something by eliminating memory management
("who owns this object and when does the ownership transfer")
questions from the interfaces between modules.

To me, #4 is a key point. If I have some object which two different
and unrelated subsystems are going to deal with, it's nice to be able
to hand both of them a reference to the same object and not make them
play the "last one to leave, turn out the lights" coordination game
between themselves.

For example, suppose I have an object that represents the data parsed
from a config file. Several modules might need to share that object
and get some information out of it (a network module needs to get an
IP address to listen on, another module needs to get some pathnames),
but maybe none of them needs the information for very long (maybe only
to initialize stuff, or maybe not, and maybe they need it forever).
Because you have a bunch of unrelated modules with only a small
tangential thing in common, it's best if they don't have to know or
understand about how the other modules deal with the tangential
issue of coordinating when it's OK to free that memory. If they can
all remain blissfully ignorant about it, they can have a cleaner, more
general design.

Anyway, the point is that if all of the above conditions are met, you
might be in a niche where garbage collection actually makes your code
better because it frees you up to be a bit more modular. If you are
outside that niche, then garbage collection is often not worth it. But
the niche where all 4 of the above conditions are satisfied is actually
a fairly big niche.

- Logan

llothar

unread,

Oct 9, 2007, 1:24:29 AM10/9/07

to

On 8 Okt., 13:11, "Chris Thomasson" <cris...@comcast.net> wrote:

> Do you an automatic "highly-efficient" memory management technique? I am
> interested in hearing about your methods indeed.

It's not about highly-efficient, thats in many cases much more
irrelevant then
application stability and coding performance. Multithreading is
already extremely
expensive to do and manuall memory mangement ist even more expensive.

So for me as a MT application level programmer i would never go back
to
malloc/free and any kind of C++/C/Pascal etc. Too dangerous to less
benefits.

llothar

unread,

Oct 9, 2007, 1:27:21 AM10/9/07

to

On 8 Okt., 19:26, David Schwartz <dav...@webmaster.com> wrote:

> What is "manual memory management"? Do you mean as opposed to
> automatic garbage collection? Automatic garbage collection has always
> seemed like an awful kluge to me, useful only to people who don't
> understand how to properly manage an object's lifetime.

I forgot the world is only full of perfect people, coding in perfect
mood with perfect
concentration on there work and always perfect designs.

Sorry but your argument sounds like one from an experienced kid.

GC if useable (and only few applications can't use it) is the real
productivity boost
and the real advancement in software development in the last decade
(not UML oder OO
or whatever).

llothar

unread,

Oct 9, 2007, 1:53:40 AM10/9/07

to

On 9 Okt., 10:30, Logan Shaw <lshaw-use...@austin.rr.com> wrote:

> (1) Real-time behavior is not necessary.

Agree. RT and GC do not work well.

> (2) You are working with a lot of data where it's irrelevant to the
> functioning of your program whether objects (and data structures)
> are destroyed immediately or a week from now. That is, you have
> a lot of objects where memory is the only important resource and
> you don't need to clean up external resources (filehandles, etc.).

It is indeed a stupid idea to use the finalizers for anything else
then memory
management. For file handles there should be an explicit close etc.
And i even use the GC_free() function of the Boehm GC to explicitly
free large
chunks of code immediately when i finished using it (for example
images or large texts)
But > 95% of all code don't need fine tuning.

> (3) You do not need/want to follow a keep-it-simple-stupid approach
> and you can accept the added complexity of a GC implementation
> within your system.

Sorry thats bullshit. If the GC is integrated it doesn't affect the
KISS.
Filesystems are very complex parts but they are integrated so you can
use them too.
Maybe your failure is that you think of an additional layer because
you use an ancient
language (second failure for good application development).

> (4) You want to practice modularity and separation of concerns to the
> extent that you gain something by eliminating memory management
> ("who owns this object and when does the ownership transfer")
> questions from the interfaces between modules.
>
> To me, #4 is a key point.

Yes, this and the use of immutable data objects whereever possible
(like strings and tuples and even
the cons lists of Lisp) is the success factor for good MT programming.

> If you are
> outside that niche, then garbage collection is often not worth it. But
> the niche where all 4 of the above conditions are satisfied is actually
> a fairly big niche.

Yes as the number of new written applications in Java or C# shows.

Marcel Müller

unread,

Oct 9, 2007, 4:00:26 AM10/9/07

to

Logan Shaw schrieb:

> Garbage collection is useful within a certain niche. You basically
> need to satisfy all or most of the following conditions:
>
> (1) Real-time behavior is not necessary.
> (2) You are working with a lot of data where it's irrelevant to the
> functioning of your program whether objects (and data structures)
> are destroyed immediately or a week from now. That is, you have
> a lot of objects where memory is the only important resource and
> you don't need to clean up external resources (filehandles, etc.).
> (3) You do not need/want to follow a keep-it-simple-stupid approach
> and you can accept the added complexity of a GC implementation
> within your system.
> (4) You want to practice modularity and separation of concerns to the
> extent that you gain something by eliminating memory management
> ("who owns this object and when does the ownership transfer")
> questions from the interfaces between modules.

I would add:

(5) You are dealing with cyclic references of objects that do not have a
natural parent child order. In this case anything but a GC is rather
complicated.

Marcel

Chris Thomasson

unread,

Oct 9, 2007, 4:52:03 AM10/9/07

to

"llothar" <llo...@web.de> wrote in message

news:1191907469....@o3g2000hsb.googlegroups.com...

> On 8 Okt., 13:11, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Do you an automatic "highly-efficient" memory management technique? I am
>> interested in hearing about your methods indeed.
>
> It's not about highly-efficient, thats in many cases much more
> irrelevant then
> application stability and coding performance. Multithreading is
> already extremely
> expensive to do and manuall memory mangement ist even more expensive.

[...]

1: EXACTLY how is multi-threading extremely expensive?

2: EXACTLY how is manual memory management of all things extremely
expensive?

You put those questions to a crappy programmer, well yes their
implementation will likely be very bad... However, I am interested in why
_you_ think that threading and "traditional" memory management is a bad idea
TM? Please explain in great detail.

Chris Thomasson

unread,

Oct 9, 2007, 5:07:58 AM10/9/07

to

"llothar" <llo...@web.de> wrote in message

news:1191909220.3...@v3g2000hsg.googlegroups.com...

> On 9 Okt., 10:30, Logan Shaw <lshaw-use...@austin.rr.com> wrote:
>
>> (1) Real-time behavior is not necessary.
>
> Agree. RT and GC do not work well.
>
>> (2) You are working with a lot of data where it's irrelevant to the
>> functioning of your program whether objects (and data structures)
>> are destroyed immediately or a week from now. That is, you have
>> a lot of objects where memory is the only important resource and
>> you don't need to clean up external resources (filehandles, etc.).
>
> It is indeed a stupid idea to use the finalizers for anything else
> then memory
> management. For file handles there should be an explicit close etc.
> And i even use the GC_free() function of the Boehm GC to explicitly
> free large
> chunks of code immediately when i finished using it (for example
> images or large texts)
> But > 95% of all code don't need fine tuning.

"95% of all code" is __WAY__ to ambiguous to mean anything remotely useful
at all; you simply _need_ to drill down on what you actually mean? In
_fact_, most, if not _all_, multi-threading algorithms _benefit_ from
_fine-tuning_ indeed. If you don't know about MT and fine-tuning, well, to
bad for you. Remember, all MT algorithms (e.g.,
lock-based/wait-free/lock-free) will _ALL_ benefit from good
engineering/architectural design skills for sure. Period. If you don't
design your synchronization strategy as efficiently as possible,well, your
applications chances of any sort of stability may suffer indeed!

>> (3) You do not need/want to follow a keep-it-simple-stupid approach
>> and you can accept the added complexity of a GC implementation
>> within your system.
>
> Sorry thats bullshit. If the GC is integrated it doesn't affect the
> KISS.

IMVHO, that is 100% PURE bullshit in _real-world_practice_ ! GC defeats KISS
by forcing you to do things like:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/7b8b62ceef171988

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d

How do all of those kludgly/tedious workarounds directly support the KISS
principal? Do tell...
[...]

>> (4) You want to practice modularity and separation of concerns to the
>> extent that you gain something by eliminating memory management
>> ("who owns this object and when does the ownership transfer")
>> questions from the interfaces between modules.
>>
>> To me, #4 is a key point.
>
> Yes, this and the use of immutable data objects whereever possible
> (like strings and tuples and even
> the cons lists of Lisp) is the success factor for good MT programming.

Speak for yourself Sir! Non-Mutable data is not a "fair realization" of
real-world uses of MT in general. Mutable data works VERY well within the
realm of multi-threading. We can use efficient algorithms like RCU, Joes
SMR+RCU, or my vZOOM to achieve a very-scaleable success factor for good MT
programming that uses mutable data-structures all over the place.

>> If you are
>> outside that niche, then garbage collection is often not worth it. But
>> the niche where all 4 of the above conditions are satisfied is actually
>> a fairly big niche.
>
> Yes as the number of new written applications in Java or C# shows.

Are you trolling?

Chris Thomasson

unread,

Oct 9, 2007, 5:09:09 AM10/9/07

to

"Marcel Müller" <news.5...@spamgourmet.com> wrote in message
news:470b355e$0$16107$9b4e...@newsspool1.arcor-online.net...

Yeah. GC can drop the complication factor, _unless_, you are really into
fine-tuning for scalability.

Chris Thomasson

unread,

Oct 9, 2007, 5:11:20 AM10/9/07

to

"llothar" <llo...@web.de> wrote in message

news:1191907641.1...@o80g2000hse.googlegroups.com...

> On 8 Okt., 19:26, David Schwartz <dav...@webmaster.com> wrote:
>
>> What is "manual memory management"? Do you mean as opposed to
>> automatic garbage collection? Automatic garbage collection has always
>> seemed like an awful kluge to me, useful only to people who don't
>> understand how to properly manage an object's lifetime.
>
> I forgot the world is only full of perfect people, coding in perfect
> mood with perfect
> concentration on there work and always perfect designs.
>
> Sorry but your argument sounds like one from an experienced kid.

Your argument should like one from an less experienced kid? Na... I must be
mistaken.

> GC if useable (and only few applications can't use it) is the real
> productivity boost
> and the real advancement in software development in the last decade
> (not UML oder OO
> or whatever).

It a productivity boost for sure. However, when does high-productivity end
up destroying quality?

Chris Thomasson

unread,

Oct 9, 2007, 5:14:17 AM10/9/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:C8Sdnfyg9fon2Zba...@comcast.com...

> "llothar" <llo...@web.de> wrote in message
> news:1191907641.1...@o80g2000hse.googlegroups.com...
>> On 8 Okt., 19:26, David Schwartz <dav...@webmaster.com> wrote:
>>
>>> What is "manual memory management"? Do you mean as opposed to
>>> automatic garbage collection? Automatic garbage collection has always
>>> seemed like an awful kluge to me, useful only to people who don't
>>> understand how to properly manage an object's lifetime.
>>
>> I forgot the world is only full of perfect people, coding in perfect
>> mood with perfect
>> concentration on there work and always perfect designs.
>>
>> Sorry but your argument sounds like one from an experienced kid.
>
> Your argument should like one from an less experienced kid?

^^^^^^^^^^^^^

Your argument _sounds_ like one from a less experienced kid?

Chris Thomasson

unread,

Oct 9, 2007, 5:16:07 AM10/9/07

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:1191846388.1...@g4g2000hsf.googlegroups.com...

> On Oct 7, 10:50 pm, llothar <llot...@web.de> wrote:
>
>> Is it only me who thinks - especially in this newsgroup - that manual
>> memory mangement and multicore CPU's are pieces that do not belong
>> together. Doing a lot of multithreaded code, i can really not see any
>> future for this combination. It is way to unproductive.

I feel the need to ask this question again:

It's a productivity boost for sure. However, when does high-productivity end
up destroying quality?

Joe Seigh

unread,

Oct 9, 2007, 6:23:44 AM10/9/07

to

Logan Shaw wrote:
[...]

It's pretty rare in single threaded programming to need GC unless you're
doing programming with digraphs or something.

GC can seem to be useful in multi-threaded programming but that's only
because no one's bothered to work out proper usage of reader/writer
patterns. You basically end up with informal lock-free reader semantics
without being explicitly aware that they are not the same as lock based
reader/writer solutions.

This is also why STM is going to be problematic. STM requires all references, not
just write references, to be part of the transaction. Kind of difficult to
do that if you aren't even aware of what references your code is actually
using.

The great thing about GC and what makes it so successful is that it allows
one to completely ignore memory management problems. Kind of like how
spread sheets allow you ignore calculations problems because they always
give you a result, probably not a correct one, but a result nonetheless.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

llothar

unread,

Oct 9, 2007, 10:13:22 AM10/9/07

to

On 9 Okt., 16:09, "Chris Thomasson" <cris...@comcast.net> wrote:

> Yeah. GC can drop the complication factor, _unless_, you are really into
> fine-tuning for scalability.

Can't say this. You can fine tune GC applications but often such a
fine tuning is not important.
If you have 4 or 8 cores you normally need every help to make use of
them and this is
a 100% scale per additional core. While finetuning might be nice and
save a few tens of percent.

Sure if you have something as easy as a webserver you don't have the
scaling problem at all.

But in many applications, especially non server ones it is just wasted
time: "premature optimization".

llothar

unread,

Oct 9, 2007, 10:18:49 AM10/9/07

to

On 9 Okt., 16:16, "Chris Thomasson" <cris...@comcast.net> wrote:

> It's a productivity boost for sure. However, when does high-productivity end
> up destroying quality?

I don't see a serious competition here. Did structured programming
destroying
quality. Did object orientation destroying quality (well i may not
improve quality either).

llothar

unread,

Oct 9, 2007, 10:22:20 AM10/9/07

to

On 9 Okt., 16:07, "Chris Thomasson" <cris...@comcast.net> wrote:

> IMVHO, that is 100% PURE bullshit in _real-world_practice_ ! GC defeats KISS
> by forcing you to do things like:
>

> http://groups.google.com/group/comp.programming.threads/browse_frm/th...

So there are problems with weak references. A concept that only exists
because
of a GC. Come on this is a very bad argument.

And on the otherhand optimizing with different heaps and different
allocators that you
need is KISS?

> Are you trolling?

Not my intention but i knew before it is almost useless to discuss
this with you.

J de Boyne Pollard

unread,

Oct 9, 2007, 1:27:24 PM10/9/07

to

CT> I am thinking of releasing the version of the allocator I
CT> submitted to Sun Microsystems under the GPL license
CT> for use in free software only.
CT>
CT> Humm... I think I will use the following license instead of GPL:
CT>
CT> http://appcore.home.comcast.net/vzoom/vzoom_personal_licence.pdf

That is not a free software licence. It denies freedoms 0, 2, and 3
of the Free Software Definition to an entire class of people. That is
not an open source licence. It fails to satisfy items 1, 3, 5, and 6
of the Open Source Definition. I suggest that if you are going to
make a big deal about your competing with Hoard, you don't fall at the
first hurdle by producing software that is neither free nor open,
unlike Hoard. Hoard is dual-licensed, and one of the licences is the
GNU General Public Licence. Encumbering your software with patent
claims doesn't help your case, either. Your attempts to deny everyone
else their freedoms with your non-free software licences and your
patent claims will almost certainly be rejected.

<URL:http://fsf.org./licensing/essays/free-sw.html>
<URL:http://opensource.org./docs/definition.php>

Ian Collins

unread,

Oct 9, 2007, 3:01:39 PM10/9/07

to

Chris Thomasson wrote:
> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:gZ6dnQ32o4dcCZfa...@comcast.com...
> [...]
>> I believe that you can compare two multi-threaded allocators. I am
>> thinking of releasing the version of the allocator I submitted to Sun
>> Microsystems under the GPL license for use in free software only.
>
> Humm... I think I will use the following license instead of GPL:
>
> http://appcore.home.comcast.net/vzoom/vzoom_personal_licence.pdf
>

If you intend submitting something to be included in OpenSolaris, use
the CDDL, or BSD license.

--
Ian Collins.

Chris Thomasson

unread,

Oct 9, 2007, 6:13:58 PM10/9/07

to

"llothar" <llo...@web.de> wrote in message

news:1191939740.8...@50g2000hsm.googlegroups.com...

>
>> Are you trolling?
>
> Not my intention but i knew before it is almost useless to discuss
> this with you.

Sorry for being a dick!

Chris Thomasson

unread,

Oct 9, 2007, 11:51:18 PM10/9/07

to

"llothar" <llo...@web.de> wrote in message

news:1191939740.8...@50g2000hsm.googlegroups.com...

> On 9 Okt., 16:07, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> IMVHO, that is 100% PURE bullshit in _real-world_practice_ ! GC defeats
>> KISS
>> by forcing you to do things like:
>>
>> http://groups.google.com/group/comp.programming.threads/browse_frm/th...
>
>> http://groups.google.com/group/comp.programming.threads/browse_frm/th...
>
> So there are problems with weak references. A concept that only exists
> because
> of a GC. Come on this is a very bad argument.

You have to explicitly try and work around a pain-in-the-neck that only
exists because a GC is present. Try to efficient caching with an overbearing
GC breathing down your applications neck...

> And on the otherhand optimizing with different heaps and different
> allocators that you
> need is KISS?

Optimizing with different allocators that follow a _standard_interface_ can
be a workable solution indeed. Trying to conform your application to the
"particular" flavor of GC that is _imposed_ by a "particular" implementation
of a VM is _not_really_portable_, and IMHO, certainly doesn't follow KISS in
any way, shape or form...

[...]

Chris Thomasson

unread,

Oct 10, 2007, 12:02:00 AM10/10/07

to

"llothar" <llo...@web.de> wrote in message

news:1191939202.8...@o3g2000hsb.googlegroups.com...

> On 9 Okt., 16:09, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Yeah. GC can drop the complication factor, _unless_, you are really into
>> fine-tuning for scalability.
>
> Can't say this. You can fine tune GC applications but often such a
> fine tuning is not important.

Apparently it is important. Are you saying that all the efforts wrt the
links I posted:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/7b8b62ceef171988

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d

are basically worthless?

> If you have 4 or 8 cores you normally need every help to make use of
> them and this is
> a 100% scale per additional core. While finetuning might be nice and
> save a few tens of percent.

Fine-tuning your syncronization algorithms into a coherent overall scheme is
the key to real scalability wrt adding more and more processors. Think
distirbuted syncronization techniques:

http://groups.google.com/group/comp.programming.threads/msg/20136dd228988817

http://groups.google.com/group/comp.programming.threads/msg/cab0505f87084c2b

> Sure if you have something as easy as a webserver you don't have the
> scaling problem at all.

You don't have a scaling problem at all? Really? I suppose you might be
thinking that some of the ck10 solutions are a waste of time?

> But in many applications, especially non server ones it is just wasted
> time: "premature optimization".

Non-Servers... Well, you seem to be confining a server to the realm of the
internet... This is not a completely correct way of thinking, IMVHO of
course. Servers can exist in the context of shared memory and provide
synchronization mediums to communication with their various clients. Think
of a daemon that manages a lock-free database and serves it to a plurality
of processes via shared memory and virtually zero-overhead wait-free queues?
How is the transaction between a process

Chris Thomasson

unread,

Oct 10, 2007, 12:04:15 AM10/10/07

to

"J de Boyne Pollard" <j.deboyn...@tesco.net> wrote in message
news:1191950844.4...@g4g2000hsf.googlegroups.com...

Can I download Hoard, make some changes, post the source code free of
charge, and charge a fee for a commercial product that uses a function
offered by the compiled binary of the modified source code?

David Schwartz

unread,

Oct 10, 2007, 12:42:25 AM10/10/07

to

On Oct 8, 1:40 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> My algorithm _is_ an "optimized multi-threaded memory allocator".

Yes, but a specialized one. It cannot do what hoard does.

> You can
> use a slab-based scheme as a general purpose allocator. I did not want to
> stick all the segregated slab logic into the pseudo-code because I wanted > to keep things simple:

You cannot use a slab-based scheme as a general purpose allocator. You
can use a slab-base scheme to make a general purpose allocator, which
you can then use as a general-purpose allocator.

> Anyway, doesn't Hoard use segregated slabs of objects under the hood?

It makes no difference what hoard does under the hood. Even if your
algorithm does some of the same things hoard does under the hood, it
is still not comparable to hoard. It may be comparable to hoard's slab
logic, and if its slabs outperform hoard slabs, that may or may not
mean that it will outperform hoard if turned into a general-purpose
memory allocator.

> > You can't compare them.
>
> I believe that you can compare two multi-threaded allocators.

Fine, let's compare your algorithm with Hoard. Your algorithm is a
slab implementation, hoard is a general-purpose replacement for
'malloc'. Comparison over.

> I am thinking
> of releasing the version of the allocator I submitted to Sun Microsystems
> under the GPL license for use in free software only.

If you can find anybody who cares, more power to you. That you are
comparing a slab implementation it to hoard suggests that there's no
reason anyone should.

DS

Chris Thomasson

unread,

Oct 10, 2007, 2:25:15 AM10/10/07

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:1191991345.3...@y42g2000hsy.googlegroups.com...

> On Oct 8, 1:40 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> My algorithm _is_ an "optimized multi-threaded memory allocator".
>
> Yes, but a specialized one. It cannot do what hoard does.

Sure it can:

>> You can
>> use a slab-based scheme as a general purpose allocator. I did not want to
>> stick all the segregated slab logic into the pseudo-code because I wanted
>> > to keep things simple:
>
> You cannot use a slab-based scheme as a general purpose allocator.

That not what I ultimately stated. I qualified the first sentence with the
following:

"I did not want to stick all the SEGREGATED slab logic into the pseudo-code
because I wanted to keep things simple."

> You
> can use a slab-base scheme to make a general purpose allocator, which
> you can then use as a general-purpose allocator.

Of course. That's what I was trying to get across. I think I have sparked
adequate response to make the proposal of a general-purpose memory allocator
performance test, which would be defined by this newsgroup. A free-software
license should not be the ONLY compatible license... A worst case scenario:
you should be able to sign an NDA; run the tests and post the results on
this newsgroup. Lets stick to Standard C 'malloc/free' interface. Everybody
can run the test and post their results here. I know vZOOM and StreamFlow
can beat Hoard. If somebody else can beat us, well I want to run their test
suite on my own multi-processor systems and post the results here... Good or
bad.

I am going to post my solution here under restrictive license that will
allow you to download and benchmark it against your allocators performance
in your specific applications usage-graph of malloc/free issued by threads
over its lifetime...

Lets set a deadline submission date. Hoard is already done... I want to be
able to allow multiple competitors in this scalability test, beside Hoard
and vZOOM...

>> Anyway, doesn't Hoard use segregated slabs of objects under the hood?
>
> It makes no difference what hoard does under the hood. Even if your
> algorithm does some of the same things hoard does under the hood, it
> is still not comparable to hoard.

Alls I ask is that you download my allocator code (that I am going to post
*here* before 2008) and benchmark it against Hoard and report your results
here using a high level of verbosity and truthfulness. Agreed?

> It may be comparable to hoard's slab
> logic, and if its slabs outperform hoard slabs, that may or may not
> mean that it will outperform hoard if turned into a general-purpose
> memory allocator.

There are quirks with many allocation strategies that an application can
generate:

http://groups.google.com/group/comp.programming.threads/msg/e7cedbbc91c6825b

http://groups.google.com/group/comp.programming.threads/msg/a4353e0f587658e0

http://groups.google.com/group/comp.programming.threads/msg/21c8dbf3ed29b812

http://groups.google.com/group/comp.programming.threads/msg/b400d56c7151cf00

>> > You can't compare them.
>>
>> I believe that you can compare two multi-threaded allocators.
>
> Fine, let's compare your algorithm with Hoard. Your algorithm is a
> slab implementation, hoard is a general-purpose replacement for
> 'malloc'. Comparison over.

Hoard is a slab implementation as well. Its segregated slab. Okay? You seem
to not be able to visualize how the synchronization-aspect of my presented
allocator pseudo-code can be used as a general allocator. Segregated slabs.
The pseudo-code directs slab requests to the same bucket for brevity. I am
definitely going to release my code so that every reader of this group can
benchmark it against Hoard. Please David, I ask you to join in the
festivities. Expect a post here before a month-and-a-half. I think we should
work on a benchmark application that is compatible with 'malloc/free' for
this newsgroup to use.

>> I am thinking
>> of releasing the version of the allocator I submitted to Sun Microsystems
>> under the GPL license for use in free software only.
>
> If you can find anybody who cares, more power to you. That you are
> comparing a slab implementation it to hoard suggests that there's no
> reason anyone should.

HOARD USES SEGREGATED SLAB AS WELL? My pseudo-code demonstrated a highly
elegant synchronization scheme that is perfectly compatible with basically
any slab-based scheme out there. Please read the following, for me David:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/eb87f9cb2cb62d84

My synchronization scheme is so broad that it is compatible with many single
threaded allocators. Imaging using the Microsoft Heap* API with the
HEAP_NO_SERIALIZE flag assertiveness for every call in a heavily
multi-thread scenario? My algorithm can allow you to realize that situation
fully and clearly. Can Hoard transform inherently single threaded allocation
schemes into full-blown scaleable multi-threaded 'malloc/free'? I think NOT.

In Lay-mans terms:

vZOOM has the algorithms that allow you to transform most single-threaded
allocators into highly scaleable multi-threaded general purpose allocators.

See? Why should I post a full blown segregated-slab storage scheme here when
everybody already knows how to do that? I was focusing on how to efficiently
apply synchronization to it within the context of a multi-threaded
environment.

Chris Thomasson

unread,

Oct 10, 2007, 2:31:12 AM10/10/07

to

[...]

> Non-Servers... Well, you seem to be confining a server to the realm of the
> internet... This is not a completely correct way of thinking, IMVHO of
> course. Servers can exist in the context of shared memory and provide
> synchronization mediums to communication with their various clients. Think
> of a daemon that manages a lock-free database and serves it to a plurality
> of processes via shared memory and virtually zero-overhead wait-free
> queues?

> How is the transaction between a process

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[to correct that last sentence:]

How is the transaction between a process and a daemon via. shared memory
that exists on the same machine different than one that is across a wire to
another nearby server, or perhaps a remote one? Remember, a process can die
in the middle of a mutual exclusion operation within a shared memory
operation... This crash and possible corruption of mutable shared
data-structures would basically be analog of CRC check in a networkable
"coherent" protocol (e.g., TCP)...

Chris Thomasson

unread,

Oct 10, 2007, 2:53:16 AM10/10/07

to

[...]

> Of course. That's what I was trying to get across. I think I have sparked
> adequate response to make the proposal of a general-purpose memory
> allocator performance test, which would be defined by this newsgroup. A
> free-software license should not be the ONLY compatible license...

[...]

Fine... How should this group go about defining its standard test
applications/suite that will ultimately be used to drill down on any
allocator submission which follows the 'malloc/free' API?

Chris Thomasson

unread,

Oct 10, 2007, 3:00:47 AM10/10/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:gr-dnaVFU8JG6JHa...@comcast.com...

This newsgroup, and the various experts that frequent it, should be able to
define a clever memory allocation benchmark, which is worthy of being highly
respected within the multi-threading programming community at large!

Any thoughts? Lets do this!

Logan Shaw

unread,

Oct 10, 2007, 3:05:24 AM10/10/07

to

llothar wrote:
> On 9 Okt., 10:30, Logan Shaw <lshaw-use...@austin.rr.com> wrote:

>> (3) You do not need/want to follow a keep-it-simple-stupid approach
>> and you can accept the added complexity of a GC implementation
>> within your system.
>
> Sorry thats bullshit. If the GC is integrated it doesn't affect the
> KISS.

I won't deny that you can follow KISS and use GC at the same time. I
just meant that in some systems, GC does not come for free (or cheap),
and adding it can make things complicated rather than simple. For
example, suppose you are doing some embedded programming, the only
convenient compiler available is a C compiler, and you only have space
for 4K of executable code. In that case the "system" is the entire
embedded device (not just your code base), and the added complexity
doesn't cause reliability problems, but it does cause your code to be
bigger. Another example might be dealing with code in a safety-critical
environment where every line of code needs to be hand-audited.

It's probably pretty easy to satisfy #3, but I thought I'd state it
anyway, because if you can't satisfy it, you don't want garbage
collection.

- Logan

Logan Shaw

unread,

Oct 10, 2007, 3:09:02 AM10/10/07

to

Chris Thomasson wrote:
> It's a productivity boost for sure. However, when does high-productivity
> end
> up destroying quality?

Given N hours to complete a project, would you rather spend it on making
sure you have memory-management issues correct, or would you rather spend
it making sure you have *other* issues correct?

The answer to that question probably depends on the project and the balance
of priorities, but at least some of the time, eliminating part of the work
could free you up to invest more time in quality in other areas.

- Logan

Chris Thomasson

unread,

Oct 10, 2007, 4:02:24 AM10/10/07

to

"Logan Shaw" <lshaw-...@austin.rr.com> wrote in message
news:470c7a8b$0$32557$4c36...@roadrunner.com...

> Chris Thomasson wrote:
>> It's a productivity boost for sure. However, when does high-productivity
>> end
>> up destroying quality?
>
> Given N hours to complete a project, would you rather spend it on making
> sure you have memory-management issues correct, or would you rather spend
> it making sure you have *other* issues correct?

If you can't get memory management issues correct then your in a bit of
trouble from the get go. GC is a bandage that does not prevent systemic
infection(s).

> The answer to that question probably depends on the project and the
> balance
> of priorities, but at least some of the time, eliminating part of the work
> could free you up to invest more time in quality in other areas.

Proper memory management is a substantial influence on scalability overall.
Eliminating part of the work by relying on *automatic*, perhaps
silver-bullet quick fix remedies for such important aspects of your
applications can end up biting you somewhere in the posterior down the line.

Anthony Williams

unread,

Oct 10, 2007, 5:08:50 AM10/10/07

to

"Chris Thomasson" <cri...@comcast.net> writes:

> Can I download Hoard, make some changes, post the source code free of
> charge,

Yes, because it's licensed under the GPL.

> and charge a fee for a commercial product that uses a function
> offered by the compiled binary of the modified source code?

Not under the GPL license. If you additionally purchase a commercial licence
then you can.

Anthony
--
Anthony Williams
Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

J de Boyne Pollard

unread,

Oct 10, 2007, 6:07:43 AM10/10/07

to

CT> Can I download Hoard, make some changes, post the source
CT> code free of charge, ...

AW> Yes, because it's licensed under the GPL.

CT> ... and charge a fee for a commercial product that uses a
CT> function offered by the compiled binary of the modified
CT> source code?

AW> Not under the GPL license.

Incorrect. Come now! This is a GPL Frequently Given Answer. The
correct answer, which you will find at <URL:http://fsf.org./licensing/
licenses/gpl-faq.html#DoesTheGPLAllowMoney>, as well as on Hoard's own
web site at <URL:http://prisms.cs.umass.edu./emery/index.php?
page=frequently-asked-questions-2>, is that one can sell a product
that incorporates free software, such as Hoard, as long as that
product is itself free software. In other words, the product
incorporating the code must, overall, provide users with the same four
freedoms from the Free Software Definition that the incorporated code
does.

Anthony Williams

unread,

Oct 10, 2007, 6:46:06 AM10/10/07

to

My assumption was that when Chris said "charge a fee for a commerical product"
he meant a CLOSED SOURCE product, or at least one that wasn't itself GPL.

Chris Thomasson

unread,

Oct 10, 2007, 4:11:28 PM10/10/07

to

"Anthony Williams" <anthon...@yahoo.com> wrote in message
news:3awj9r...@yahoo.com...

>J de Boyne Pollard <j.deboyn...@tesco.net> writes:
>
>> CT> Can I download Hoard, make some changes, post the source
>> CT> code free of charge, ...
>>
>> AW> Yes, because it's licensed under the GPL.
>>
>> CT> ... and charge a fee for a commercial product that uses a
>> CT> function offered by the compiled binary of the modified
>> CT> source code?
>>
>> AW> Not under the GPL license.
>>

>> Incorrect. [...]

>
> My assumption was that when Chris said "charge a fee for a commerical
> product"
> he meant a CLOSED SOURCE product, or at least one that wasn't itself GPL.

I was thinking of charging a fee for a commercial product and posting
full-source code of all the modified GPL code it uses for free. Some people
are not going to want to deal with the source code; they can choose to pay a
fee instead. What they would get from the fee could include the compiled
binaries, links to the source code, manual, consulting options, ect...

Chris Thomasson

unread,

Oct 10, 2007, 4:21:07 PM10/10/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:zYqdnSVNAq1wrZDa...@comcast.com...
[...]

> I was thinking of charging a fee for a commercial product and posting
> full-source code of all the modified GPL code it uses for free.

Would GPL force me to provide the source code for an entire commercial
product?

Also, could I could abstract all of the modified GPL software into a
separate shared library (e.g., libgplmod.so) and post all of the code for
free, then create a closed-source commercial product that uses something in
'libgplmod' and charge a fee?

If GPL forces me to provide the source-code of any software that links to
'libgplmod' then you would have to rely on the following:

> Some people are not going to want to deal with the source code; they can
> choose to pay a fee instead. What they would get from the fee could
> include the compiled binaries, links to the source code, manual,
> consulting options, ect...

in order to many any money...

Chris Thomasson

unread,

Oct 10, 2007, 8:41:00 PM10/10/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:zYmdnTiD0N-srpDa...@comcast.com...

> If GPL forces me to provide the source-code of any software that links to
> 'libgplmod' then you would have to rely on the following:

"J de Boyne Pollard" <j.deboyn...@tesco.net> wrote in message
news:1192010863.1...@d55g2000hsg.googlegroups.com...

>>"is that one can sell a product that incorporates free software, such as
>>Hoard, as long as that
>>product is itself free software."

I read that as any product that links with a GPL'd library has to be free
software.

Chris Thomasson

unread,

Oct 10, 2007, 8:45:38 PM10/10/07

to

"Emery Berger" <emery....@gmail.com> wrote in message
news:1191596010....@50g2000hsm.googlegroups.com...
> The Hoard memory allocator is a fast, scalable, and memory-efficient
> memory allocator.

[...]

Are you still using locks when a thread tries to free a block of memory that
it did not allocate itself? This of course is in the context of per-thread
memory pools... Last time I read the Hoard source-code was about year or two
ago...

David Schwartz

unread,

Oct 10, 2007, 8:55:43 PM10/10/07

to

On Oct 9, 11:25 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> My synchronization scheme is so broad that it is compatible with many single
> threaded allocators. Imaging using the Microsoft Heap* API with the
> HEAP_NO_SERIALIZE flag assertiveness for every call in a heavily
> multi-thread scenario? My algorithm can allow you to realize that situation
> fully and clearly. Can Hoard transform inherently single threaded allocation
> schemes into full-blown scaleable multi-threaded 'malloc/free'? I think NOT.
>
> In Lay-mans terms:

You are comparing an engine to a car.

DS

Logan Shaw

unread,

Oct 11, 2007, 12:52:01 AM10/11/07

to

Chris Thomasson wrote:
> "Logan Shaw" <lshaw-...@austin.rr.com> wrote in message
> news:470c7a8b$0$32557$4c36...@roadrunner.com...

>> Given N hours to complete a project, would you rather spend it on making

>> sure you have memory-management issues correct, or would you rather spend
>> it making sure you have *other* issues correct?

> If you can't get memory management issues correct then your in a bit of
> trouble from the get go. GC is a bandage that does not prevent systemic
> infection(s).

Your statement seems to imply that garbage collection cannot be part of
a 'correct' solution.

> Proper memory management is a substantial influence on scalability
> overall.

I realize we are in comp.programming.threads, but scalability is not
always the primary concern. Maybe correctness is more important.
Although I can see it'd be reasonable to assume scalability is a
primary concern in this particular discussion given that the word
"scalable" appears in the subject line. I was speaking in broader
terms, so maybe we are just talking about two different things.

> Eliminating part of the work by relying on *automatic*, perhaps
> silver-bullet quick fix remedies for such important aspects of your
> applications can end up biting you somewhere in the posterior down the
> line.

Of course that depends on what you are trying to achieve. If you have
a project where some scalability is required but overall just achieving
correctness will be a challenge in the time allotted, then choosing
something that can reduce complexity (like garbage collection) can be
a good decision. If I am forced to choose between correct and scalable,
I will choose the correctness 100% of the time. If there is sufficient
time to achieve both, then garbage collection might be a hindrance to
scalability and might be a bad choice.

- Logan

Chris Thomasson

unread,

Oct 11, 2007, 2:22:28 AM10/11/07

to

"Logan Shaw" <lshaw-...@austin.rr.com> wrote in message

news:470dabf2$0$18936$4c36...@roadrunner.com...
[...]

> If I am forced to choose between correct and scalable,
> I will choose the correctness 100% of the time. If there is sufficient
> time to achieve both, then garbage collection might be a hindrance to
> scalability and might be a bad choice.

Yeah, time is of the essence.

Chris Thomasson

unread,

Oct 11, 2007, 2:26:34 AM10/11/07

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:1192064143.6...@19g2000hsx.googlegroups.com...

Combining the lock-free algorithm presented in the slab-allocator with
per-thread MS Heap* w/ HEAP_NO_SERIALIZE results in a full-blown
multi-threaded general purpose allocator. Yes the lock-free aspect of the
thing is analog of an engine... However, once you apply the engine to a user
provided single-threaded heap (e.g., ms heap) you end up with a very fast
car.

J de Boyne Pollard

unread,

Oct 12, 2007, 4:24:07 AM10/12/07

to

JdeBP> Incorrect. Come now! This is a GPL Frequently Given
JdeBP> Answer. The correct answer, which you will find at
JdeBP> <URL:http://fsf.org./licensing/licenses/gpl-
faq.html#DoesTheGPLAllowMoney>,
JdeBP> as well as on Hoard's own web site at
JdeBP> <URL:http://prisms.cs.umass.edu./emery/index.php?
page=frequently-asked-questions-2>,
JdeBP> is that one can sell a product that incorporates free
JdeBP> software, such as Hoard, as long as thatproduct is
JdeBP> itself free software. In other words, the product
JdeBP> incorporating the code must, overall, provide users
JdeBP> with the same four freedoms from the Free Software
JdeBP> Definition that the incorporated code does.

AW> My assumption was that when Chris said "charge a
AW> fee for a commerical product" he meant a CLOSED
AW> SOURCE product, or at least one that wasn't itself GPL.

That assumption, that "commercial" and "closed source" are synonyms,
is _exactly_ the error that the GPL FGA is combatting.

J de Boyne Pollard

unread,

Oct 12, 2007, 4:36:12 AM10/12/07

to

CT> I read that as any product that links with a GPL'd library has
CT> to be free software.

... which has nothing to do with whether one charges money for it.
Read the several pages whose URLs were given. This is basic
definition-of-free-software stuff.

Anthony Williams

unread,

Oct 12, 2007, 4:48:37 AM10/12/07

to

J de Boyne Pollard <j.deboyn...@tesco.net> writes:

My assumption was about Chris's use of the term, not about "commercial" and
"closed source" in general --- I know many examples of software licensed under
the GPL which you can pay money for. However, given the common misconceptions
it's probably wise to make that explicit.

Anthony Williams

unread,

Oct 12, 2007, 4:59:44 AM10/12/07

to

"Chris Thomasson" <cri...@comcast.net> writes:

> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:zYqdnSVNAq1wrZDa...@comcast.com... [...]
>> I was thinking of charging a fee for a commercial product and posting
>> full-source code of all the modified GPL code it uses for free.
>
> Would GPL force me to provide the source code for an entire commercial product?

Pretty much. See below.

> Also, could I could abstract all of the modified GPL software into a
> separate shared library (e.g., libgplmod.so) and post all of the code for
> free, then create a closed-source commercial product that uses something in
> 'libgplmod' and charge a fee?

If the GPL part is provided in a separate binary for which an
alternative non-GPL implementation can be substituted without recompiling the
main app, you might not have to make the source code for the full application
available. I expect such an alternative implementation would have to be
actually available, and not just theoretically available. I am not a lawyer,
so if you intend to try this you probably ought to get proper legal advice.

> If GPL forces me to provide the source-code of any software that links to
> 'libgplmod' then you would have to rely on the following:
>
>> Some people are not going to want to deal with the source code; they can
>> choose to pay a fee instead. What they would get from the fee could include
>> the compiled binaries, links to the source code, manual, consulting
>> options, ect...
>
> in order to many any money...

Yes. That's what some of the commercial linux distributors do --- provide
binaries, printed manuals, support, etc.

You may choose to only distribute copies yourself to people who are willing to
pay (i.e. don't make the source code a public download, but one that's only
available to people who paid for the binaries), but there's nothing to stop
your customers distributing their own versions built from the source code.

Sascha Bohnenkamp

unread,

Oct 12, 2007, 5:45:37 AM10/12/07

to

> Combining the lock-free algorithm presented in the slab-allocator with
> per-thread MS Heap* w/ HEAP_NO_SERIALIZE results in a full-blown
> multi-threaded general purpose allocator.

I hope it is allowed to allocate in one thread and release in another ....

David Schwartz

unread,

Oct 12, 2007, 7:30:37 AM10/12/07

to

On Oct 12, 1:59 am, Anthony Williams <anthony_w....@yahoo.com> wrote:

> If the GPL part is provided in a separate binary for which an
> alternative non-GPL implementation can be substituted without recompiling the
> main app, you might not have to make the source code for the full application
> available. I expect such an alternative implementation would have to be
> actually available, and not just theoretically available. I am not a lawyer,
> so if you intend to try this you probably ought to get proper legal advice.

If there's some reason you think that makes a difference, I'd love to
hear it. I can't imagine why that would be so.

In any event, my answer would be that so long as neither work is a
derivative of the other, you are just fine. Since hoard contains none
of your code and your code contains no hoard code (assuming you don't
design it specifically around hoard) you are perfectly fine. They are
two legally-independent works. There is no reason to license for one
should affect the other.

DS

David Schwartz

unread,

Oct 12, 2007, 7:33:25 AM10/12/07

to

On Oct 10, 5:41 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> I read that as any product that links with a GPL'd library has to be free
> software.

That's a commonly-made but nonsensical claim. That could only be true
if a linker could create a derivative work. That could only be true if
a linker could create a new work (since a derivative work is a new
work, distinct from the work it is derivative of). But a linker has no
creativity, it's purely a mechanical process. It can't create a new
work, so it can't create a new derivative work.

If you take two works, A and B, and feed them into a linker, all you
can get out is A and B, perhaps combined with each other and bits of
the linker. If you could get a new work that way, the linker would be
entitled to copyright on the new work (since it made it), which is
nonsensical.

DS

Anthony Williams

unread,

Oct 12, 2007, 7:50:46 AM10/12/07

to

David Schwartz <dav...@webmaster.com> writes:

Like I said, I am not a lawyer, but the "assuming you don't design it
specifically around hoard" is the bit I was referring to --- if your code only
works with hoard, you're in a gray area, and ought to get proper advice.

Chris Thomasson

unread,

Oct 12, 2007, 9:35:58 AM10/12/07

to

"Sascha Bohnenkamp" <bohne...@mevisbreastcare.de> wrote in message
news:5n8u21F...@mid.individual.net...

Yes it is. That's where the lock-free algorithm comes into play...

Here is the "basic" vZOOM allocation scheme:

________________________
- create a thread-local instance of a user-defined single-threaded
allocator in every thread (e.g., ms heap w/ HEAP_NO_SERIALIZE).

- allocation requests are forwarded to this thread-local user allocator
directly.

- if free request goes from thread that allocated block (e.g., the origin
thread), then free request is forwarded to this thread-local user allocator.

- if free request goes from another thread, then you accumulate this block
in per-thread stack-based freelist "belonging to the origin thread", using
single atomic CAS.

- blocks from this freelist is actually reused/freed in batches using
single atomic SWAP when thread allocates/deallocates another block. For
instance, a thread that fails to allocate from its thread-local heap will do
a SWAP on the freelist and try and fulfill the allocation request from
there.
________________________

I decoupled the allocation logic from the synchronization mechanism such
that single-threaded allocators can now be used in full-blown multi-threaded
environments. IMHO, it's a really neat feature. You could even set things up
where a thread A uses a different allocator than thread B, yet they are
still able to allocate; pass around and free blocks between themselves. Its
an extremely flexible framework.

This can be an important invention. Here is the basic sales pitch:

Question: How do you create a multi-threaded allocator, without creating a
multi-threaded allocator?

Answer: Create a single-threaded allocator and tell the vZOOM library to
multi-thread it for you...

Wow! :^)

Alexander Terekhov

unread,

Oct 12, 2007, 1:21:57 PM10/12/07

to

Anthony Williams wrote:
[...]

> If the GPL part is provided in a separate binary for which an
> alternative non-GPL implementation can be substituted without recompiling the
> main app, you might not have to make the source code for the full application
> available. I expect such an alternative implementation would have to be
> actually available, and not just theoretically available. I am not a lawyer,
> so if you intend to try this you probably ought to get proper legal advice.

To quote Lee Hollaar (who worked on Internet, copyright, and patent
issues as a U.S. Senate Judiciary Committee Fellow):

-----
One can tie oneself in knots trying to make sense of the GPL and
the statements made about it. It ignores provisions of the copyright
statutes that allow the modification or redistribution of works
without permission of the copyright owner. It talks about "derived"
works which don't seem to be the same as "derivative works." And
the explanations from RMS and others often make little sense, as
in the case where something was a derived work until somebody wrote
a non-GPLed math library compatible with the GPLed one.
-----

rfis...@gmail.com

unread,

Oct 13, 2007, 5:47:09 AM10/13/07

to

J de Boyne Pollard scrisse:

[...]

> AW> My assumption was that when Chris said "charge a
> AW> fee for a commerical product" he meant a CLOSED
> AW> SOURCE product, or at least one that wasn't itself GPL.
>
> That assumption, that "commercial" and "closed source" are synonyms,
> is _exactly_ the error that the GPL FGA is combatting.

Not very well it would seem! Obviously its attempts to reclaim
the pejoratives "open source" and "free software" have failed.
It could be time to change tactic and try a euphemistic locution
(c.f. differently abled).

RF

rani_s...@hotmail.com

unread,

Oct 14, 2007, 11:27:24 PM10/14/07

to

On Oct 12, 6:35 am, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Sascha Bohnenkamp" <bohnenk...@mevisbreastcare.de> wrote in message

> > I hope it is allowed to allocate in one thread and release in another ....
>
> Yes it is. That's where the lock-free algorithm comes into play...
> Here is the "basic" vZOOM allocation scheme:
>
> ________________________
> - create a thread-local instance of a user-defined single-threaded
> allocator in every thread (e.g., ms heap w/ HEAP_NO_SERIALIZE).
>
> - allocation requests are forwarded to this thread-local user allocator
> directly.
>
> - if free request goes from thread that allocated block (e.g., the origin
> thread), then free request is forwarded to this thread-local user allocator.
>
> - if free request goes from another thread, then you accumulate this block
> in per-thread stack-based freelist "belonging to the origin thread", using
> single atomic CAS.

What happens when a thread exits and there are still outstanding
allocations related to its heap?

Chris Thomasson

unread,

Oct 15, 2007, 12:37:29 AM10/15/07

to

<rani_s...@hotmail.com> wrote in message
news:1192418844.6...@y27g2000pre.googlegroups.com...

Excellent question. This can be VERY important. For instance, I have built
several single-threaded allocators that are entirely based on the memory
provided by their own stacks. The stack goes bye-bye when its parent thread
exits! That scenario is handled by recognizing a specific condition that
indicates that other threads have references before it exits. If other
threads have references, it waits on a condition-variable. Since the act of
recognizing this specific condition sets a flag (it uses XCHG to set the
anchor of its per-thread freelist to 1), all subsequent threads which own
references will eventually free them, notice the flag on the freelist since
there were not the origin, and subsequently use XADD to decrement the
refcount. The thread that drops it to zero, signals the condition-variable
which wakes the "exiting" thread enabling it to free itself to the operation
system.

99% of the logic is contained in the slab-allocator code I posted here. The
post does not contain the per-thread condvars. If you are having problems
understanding this, perhaps I can post some pseudo-code.

?

P.S.

using a plurality of threads stack space to implement multi-threaded
allocator is perfect for embedded systems that have very limited heap space,
or none at all. You can create a scaleable heap from stack space... Neat!

rani_s...@hotmail.com

unread,

Oct 16, 2007, 3:48:11 PM10/16/07

to

On Oct 14, 9:37 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> <rani_shar...@hotmail.com> wrote in message

> > On Oct 12, 6:35 am, "Chris Thomasson" <cris...@comcast.net> wrote:
> >> ________________________
> >> - create a thread-local instance of a user-defined single-threaded
> >> allocator in every thread (e.g., ms heap w/ HEAP_NO_SERIALIZE).
>
> >> - allocation requests are forwarded to this thread-local user allocator
> >> directly.
>

> > What happens when a thread exits and there are still outstanding
> > allocations related to its heap?
>
> Excellent question. This can be VERY important. For instance, I have built
> several single-threaded allocators that are entirely based on the memory
> provided by their own stacks. The stack goes bye-bye when its parent thread
> exits! That scenario is handled by recognizing a specific condition that
> indicates that other threads have references before it exits. If other
> threads have references, it waits on a condition-variable.

Bounding the ref-counting (of outstanding allocations) to the thread
itself seems to be usage sensitive for which there might be use cases
in which threads are exhausted.
Don't you think that it makes more sense to bound the ref-counting to
the heap itself?
In that case you can even recycle "thread-less" heaps when new threads
are created and therefore to bound the number of heaps with the total
number of threads.

I'm referring to the HeapAlloc use case and not thread's stack that
you mentioned (not sure how you can accomplish that in portable and
reliable way).

Thanks,
Rani

Chris Thomasson

unread,

Oct 16, 2007, 9:19:58 PM10/16/07

to

<rani_s...@hotmail.com> wrote in message
news:1192564091.1...@y27g2000pre.googlegroups.com...

> On Oct 14, 9:37 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>> <rani_shar...@hotmail.com> wrote in message
>> > On Oct 12, 6:35 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>> >> ________________________
>> >> - create a thread-local instance of a user-defined single-threaded
>> >> allocator in every thread (e.g., ms heap w/ HEAP_NO_SERIALIZE).
>>
>> >> - allocation requests are forwarded to this thread-local user
>> >> allocator
>> >> directly.
>>
>> > What happens when a thread exits and there are still outstanding
>> > allocations related to its heap?
>>
>> Excellent question. This can be VERY important. For instance, I have
>> built
>> several single-threaded allocators that are entirely based on the memory
>> provided by their own stacks. The stack goes bye-bye when its parent
>> thread
>> exits! That scenario is handled by recognizing a specific condition that
>> indicates that other threads have references before it exits. If other
>> threads have references, it waits on a condition-variable.
>
> Bounding the ref-counting (of outstanding allocations) to the thread
> itself seems to be usage sensitive for which there might be use cases
> in which threads are exhausted.
> Don't you think that it makes more sense to bound the ref-counting to
> the heap itself?

The only time I have to bind the counting to the thread is when a user
defined allocator depends on thread local sturcutres. I created an allocator
for it that is based on thread stacks, therefore its bound to the thread.

> In that case you can even recycle "thread-less" heaps when new threads
> are created and therefore to bound the number of heaps with the total
> number of threads.

> I'm referring to the HeapAlloc use case and not thread's stack that

> you mentioned.

Bingo! I use exactly that for every but the per-stack allocator. You are
catching on to how my allocator works.

> (not sure how you can accomplish that in portable and
> reliable way).

I have to put vZOOM on Quadros OS for an ARM processor... The setup had no
heap! However, a Quadros task has some stack space... So, I simply created a
single threaded allocator per-task that used this space. I plugged it into
vZOOM via. single api call, and it worked. Bingo, they had a highly
scaleable heap out of thin air!

Chris Thomasson

unread,

Oct 16, 2007, 9:26:56 PM10/16/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:F_WdnRTC3bW1_4ja...@comcast.com...
[...]

>> I'm referring to the HeapAlloc use case and not thread's stack that
>> you mentioned.
>
> Bingo! I use exactly that for every but the per-stack allocator. You are
> catching on to how my allocator works.
>> (not sure how you can accomplish that in portable and
>> reliable way).
>
> I have to put vZOOM on Quadros OS for an ARM processor...

^^^^^^^^^^^^^^^

Need to change the word 'have' to 'had'. The job was accomplished a while
ago.

> The setup had no heap! However, a Quadros task has some stack space... So,
> I simply created a single threaded allocator per-task that used this
> space. I plugged it into vZOOM via. single api call, and it worked.

The design team wanted to hack up a heap by modifying the operating system
as the source-code was provided by quadros. I said no fuc%ing way are we
going to muck up the OS code when you can license an extremely simple
solution for your problem.

> Bingo, they had a highly scaleable heap out of thin air!

:^)

rani_s...@hotmail.com

unread,

Oct 18, 2007, 12:02:16 AM10/18/07

to

> scaleable heap out of thin air!- Hide quoted text -
>
> - Show quoted text -

What is the point of creation and destruction of the per-thread heap?
There are case in which you don't control the creation and destruction
of threads (e.g. MS kernel32.dll thread-pool and thread from external
components calling my DLL).

Thanks,
Rani

Chris Thomasson

unread,

Oct 19, 2007, 9:55:38 PM10/19/07

to

<rani_s...@hotmail.com> wrote in message
news:1192680136.8...@z24g2000prh.googlegroups.com...

> On Oct 16, 6:19 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

[...]

>>
>> I have to put vZOOM on Quadros OS for an ARM processor... The setup had
>> no
>> heap! However, a Quadros task has some stack space... So, I simply
>> created a
>> single threaded allocator per-task that used this space. I plugged it
>> into
>> vZOOM via. single api call, and it worked. Bingo, they had a highly
>> scaleable heap out of thin air!- Hide quoted text -
>>
>> - Show quoted text -
>
> What is the point of creation and destruction of the per-thread heap?

The point of creation is:

void* malloc(size_t sz) {
per_thread_heap* _this =
pthread_getspecific(...);
if (! _this) {
/* create heap */
pthread_setspecific(...);
}
}

The point of destruction is the destructor function registered with
pthread_key_create.

> There are case in which you don't control the creation and destruction
> of threads (e.g. MS kernel32.dll thread-pool and thread from external
> components calling my DLL).

Yes. The "stack" based allocator is the only setup that has problems working
in that scenario.

rani_s...@hotmail.com

unread,

Oct 20, 2007, 4:42:30 AM10/20/07

to

On Oct 19, 6:55 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> <rani_shar...@hotmail.com> wrote in message

> > What is the point of creation and destruction of the per-thread heap?
>
> The point of creation is:
>
> void* malloc(size_t sz) {
> per_thread_heap* _this =
> pthread_getspecific(...);
> if (! _this) {
> /* create heap */
> pthread_setspecific(...);
> }
>
> }
>
> The point of destruction is the destructor function registered with
> pthread_key_create.

Does such pthread_key_create based destructor also work when the
shared module is unloaded (e.g. COM DLL)?
The interesting case is thread is still running when the DLL is
unloaded.
Do you destruct all the per-thread heaps at that point of DLL unload?

Thanks,
Rani

Dave Butenhof

unread,

Oct 20, 2007, 6:54:00 PM10/20/07

to

First, off, WHAT shared library? The one implementing the
thread-specific malloc()-wrapper? If you destroyed all the per-thread
heaps when that's unloaded, what would become of other code that has
stashed references to allocated memory in that heap? There's no way to
do this gracefully. Do you think it could or should do something based
on the shared library containing each malloc() CALL SITE? That'd be
horrendously complicated.

(And, no, POSIX TSD doesn't -- and couldn't -- work that way in any case
because there's no association to a shared library. Each thread
associates a thread-specific value with a shared key. Even deleting the
key doesn't asynchronously remove values from existing threads; nor is
operation of the per-thread destructor in any dependent on the location
or existence of the pthread_key_t value.)

Now, if the per-thread heap was a "thread local storage" value,
including the windows declspec(thread), it WOULD be associated with a
shared library. I'm not actually sure about the Windows semantics here,
but while many implementations support some form of language-specified
thread-local storage the semantics are completely non-standard. Some
won't even work with dynamically loaded shared libraries -- the thread
local data won't be instantiated. Some that DO work with dynamic loads
won't work with unloads. In this case, cleanup on unload would be fairly
catastrophic because there are likely to be many pointers into the heaps
stored elsewhere. (But then again, even if you unloaded the library
defining this wrapped malloc, the system would be unlikely to actually
unload it due to references from the other libraries that reference its
symbols.)

Chris Thomasson

unread,

Oct 21, 2007, 5:18:21 AM10/21/07

to

Wow! :^)

Any thoughts/comments/suggestions/rants/raves?

;^)

Chris Thomasson

unread,

Oct 21, 2007, 5:20:23 AM10/21/07

to

[comp.lang.c++ added because its going to standardize fine-grain, very
low-level multi-threading primitives; refer cpp-threads mailing list...]

Chris Thomasson

unread,

Oct 21, 2007, 5:22:03 AM10/21/07

to

[Ignore the following/do not respond to this message; reply to the following
instead:]

http://groups.google.com/group/comp.programming.threads/msg/97d2db3b6fd4c36b

Sorry for any confusions.

Bill Todd

unread,

Oct 21, 2007, 9:44:11 AM10/21/07

to

Describing how blocks eventually get deallocated from the free lists
back to the originating thread's heap (without requiring interlocks that
defeat your goal of not having them on such heaps) so that those heaps
don't become so fragmented that they become unusable might be nice. So
might a simulation to demonstrate that the free lists themselves don't
create increasingly unusable (effectively, fragmented) storage for their
temporary owners (surely you're not assuming that all block requests are
equal in size).

- bill

Chris Thomasson

unread,

Oct 21, 2007, 9:55:29 AM10/21/07

to

"Dave Butenhof" <david.b...@hp.com> wrote in message
news:ffe0u8$ht1$1...@usenet01.boi.hp.com...

> rani_s...@hotmail.com wrote:
>> On Oct 19, 6:55 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>>> <rani_shar...@hotmail.com> wrote in message
>>>> What is the point of creation and destruction of the per-thread heap?
>>> The point of creation is:
>>>
>>> void* malloc(size_t sz) {
>>> per_thread_heap* _this =
>>> pthread_getspecific(...);
>>> if (! _this) {
>>> /* create heap */
>>> pthread_setspecific(...);
>>> }
>>>
>>> }
>>>
>>> The point of destruction is the destructor function registered with
>>> pthread_key_create.
>>
>> Does such pthread_key_create based destructor also work when the
>> shared module is unloaded (e.g. COM DLL)?
>> The interesting case is thread is still running when the DLL is
>> unloaded.
>> Do you destruct all the per-thread heaps at that point of DLL unload?
>
> First, off, WHAT shared library? The one implementing the thread-specific
> malloc()-wrapper? If you destroyed all the per-thread heaps when that's
> unloaded, what would become of other code that has stashed references to
> allocated memory in that heap? There's no way to do this gracefully.

[...]

Humm. Well, the actual code in the per-thread key dtor:

void slab_sys_tls_dtor( void *state )
{
per_thread_t *_this = state;
int count = slab_sys_shared_pop( _this, 1 ) + 1;

if ( XADD( &_this->refs, -count ) == count )
{
/* destroy _this */
}
}

So, if any other library is using block from per-thread heap, well, the heap
will be around for the duration.

Chris Thomasson

unread,

Oct 21, 2007, 9:57:22 AM10/21/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:0d6dncep3uuxxIba...@comcast.com...

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8245f4b48591fc69

rani_s...@hotmail.com

unread,

Oct 21, 2007, 2:51:29 PM10/21/07

to

On Oct 21, 6:55 am, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Dave Butenhof" <david.buten...@hp.com> wrote in message

> will be around for the duration.- Hide quoted text -

Seem strange that the per-thread heap outlives its owner (i.e. shared
library). What happens if the shared library is loaded/unloaded
multiple times? Unless I'm missing something, it seems that you might
quickly exhaust the TSD keys.

I thought that you can assume that upon module unload there are no
more outstanding allocations from its heap (i.e. no leaks) and
therefore you can safely destroy all the per-thread heaps.

I'm quite sure that this can be safely done on windows via the
*serialized* DllMain notifications but according to David Butenhof
it's not possible using POSIX's pthread_key_create/pthread_key_delete.
I'll start another thread to clarify the "TSD and shared libraries"
issue.

Thanks,
Rani

Message has been deleted

Chris Thomasson

unread,

Oct 21, 2007, 6:24:24 PM10/21/07

to

<rani_s...@hotmail.com> wrote in message
news:1192992689.7...@v23g2000prn.googlegroups.com...

> On Oct 21, 6:55 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "Dave Butenhof" <david.buten...@hp.com> wrote in message

[...]

>>
>> So, if any other library is using block from per-thread heap, well, the
>> heap
>> will be around for the duration.- Hide quoted text -
>
> Seem strange that the per-thread heap outlives its owner (i.e. shared
> library). What happens if the shared library is loaded/unloaded
> multiple times? Unless I'm missing something, it seems that you might
> quickly exhaust the TSD keys.
>
> I thought that you can assume that upon module unload there are no
> more outstanding allocations from its heap (i.e. no leaks) and
> therefore you can safely destroy all the per-thread heaps.
>
> I'm quite sure that this can be safely done on windows via the
> *serialized* DllMain notifications but according to David Butenhof
> it's not possible using POSIX's pthread_key_create/pthread_key_delete.
> I'll start another thread to clarify the "TSD and shared libraries"
> issue.

I know that Hoard uses per-thread heaps, however, I need to look at the
source code to see how it handles destroying them.

Dave Butenhof

unread,

Oct 22, 2007, 6:19:22 AM10/22/07

to

You're confusing mechanisms. Windows notifiers on DLL load/unload have
nothing to do with thread-specific data. UNIX dlopen/dlclose has
different semantics -- in particular, there's no explicit user
application notifier. Windows TSD has no more explicit connection with a
specific shared library than POSIX TSD. The TSD key itself is
APPLICATION scope; the TSD values are thread-scope.

(Again, "TLS" is an entirely different LANGUAGE/LOADER concept, which
may or may not be integrated with dynamic shared library behavior
depending on the platform... as its entirely non-standard, there's no
reliable specification.)

Given a DLL unload notification, you can certainly destroy a TSD key at
that point, if you know that no other code cares (this is one way to
implement TLS, in fact). Destroying a per-thread heap at that point is a
different matter; those semantics have nothing to do with dynamic shared
libraries or TSD.

Chris Thomasson

unread,

Oct 22, 2007, 7:09:17 AM10/22/07

to

"Dave Butenhof" <david.b...@hp.com> wrote in message

news:ffhtfa$555$1...@usenet01.boi.hp.com...

> rani_s...@hotmail.com wrote:
>> On Oct 21, 6:55 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Seem strange that the per-thread heap outlives its owner (i.e. shared
>> library). What happens if the shared library is loaded/unloaded
>> multiple times? Unless I'm missing something, it seems that you might
>> quickly exhaust the TSD keys.
>>
>> I thought that you can assume that upon module unload there are no
>> more outstanding allocations from its heap (i.e. no leaks) and
>> therefore you can safely destroy all the per-thread heaps.
>>
>> I'm quite sure that this can be safely done on windows via the
>> *serialized* DllMain notifications but according to David Butenhof
>> it's not possible using POSIX's pthread_key_create/pthread_key_delete.
>> I'll start another thread to clarify the "TSD and shared libraries"
>> issue.

[...]

>
> Given a DLL unload notification, you can certainly destroy a TSD key at
> that point, if you know that no other code cares (this is one way to
> implement TLS, in fact). Destroying a per-thread heap at that point is a
> different matter; those semantics have nothing to do with dynamic shared
> libraries or TSD.

Right. You have to ensure that you don't destroy the heap if any other
thread has references to blocks within it. In my setup, a DLL unlock
notification can destroy the TSD key's fine. The TSD destructors for that
library will have done their job of modifying/examining the reference count
and deferring destruction accordingly. In that case the threads in different
libraries that have references to blocks in the heap will eventually free
those references, and the heap will be destroyed when the count drops to
zero.

Chris Thomasson

unread,

Oct 22, 2007, 7:30:16 AM10/22/07

to

"Bill Todd" <bill...@metrocast.net> wrote in message
news:HcWdnZMPHa6BxYba...@metrocastcablevision.com...

> Chris Thomasson wrote:
>> [comp.lang.c++ added because its going to standardize fine-grain, very
>> low-level multi-threading primitives; refer cpp-threads mailing list...]
>>
>>
>> I decoupled the allocation logic from the synchronization mechanism such
>> that single-threaded allocators can now be used in full-blown
>> multi-threaded
>> environments.

[...]

>> Here is the "basic" vZOOM allocation scheme:
>>
>> ________________________

[...]

>> ________________________
>>
>>
>>
>> Any thoughts/comments/suggestions/rants/raves?
>
> Describing how blocks eventually get deallocated from the free lists back
> to the originating thread's heap (without requiring interlocks that defeat
> your goal of not having them on such heaps) so that those heaps don't
> become so fragmented that they become unusable might be nice. So might a
> simulation to demonstrate that the free lists themselves don't create
> increasingly unusable (effectively, fragmented) storage for their
> temporary owners (surely you're not assuming that all block requests are
> equal in size).

I use the per-thread user-provided allocator, to allocate a simple
per-thread segregated array of slabs. Something like:

thread::slab[0].sz = sizeof(void*);
thread::slab[1].sz = thread::slab[0].sz * 2;
thread::slab[2].sz = thread::slab[1].sz * 2;
thread::slab[3].sz = thread::slab[2].sz * 2;
thread::slab[4].sz = thread::slab[3].sz * 2;

Each slab is aligned on a common boundary value such that you can round an
address of one of its blocks down to that boundary in order to get at its
anchor/header. Each slab anchor has a freelist; thread-id and blocks of
memory.

- When a thread deallocates a block it rounds down to get at its slab
anchor; compares its thread-id with its own and places it into the slab's
freelist via. CAS if they do not match.

- A thread allocates by locating the right slab based on the requested size;
try to get block from that slab; if its empty, it does a SWAP on the slabs
freelist and tries to get the block from there. If that fails, it then tries
to allocate directly from the user-allocator.

Chris Thomasson

unread,

Oct 22, 2007, 10:23:36 AM10/22/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:rLGdnersrM0DFYHa...@comcast.com...

> "Bill Todd" <bill...@metrocast.net> wrote in message
> news:HcWdnZMPHa6BxYba...@metrocastcablevision.com...

[...]

>>> Any thoughts/comments/suggestions/rants/raves?
>>
>> Describing how blocks eventually get deallocated from the free lists back
>> to the originating thread's heap (without requiring interlocks that
>> defeat your goal of not having them on such heaps) so that those heaps
>> don't become so fragmented that they become unusable might be nice. So
>> might a simulation to demonstrate that the free lists themselves don't
>> create increasingly unusable (effectively, fragmented) storage for their
>> temporary owners (surely you're not assuming that all block requests are
>> equal in size).
>
> I use the per-thread user-provided allocator, to allocate a simple
> per-thread segregated array of slabs. Something like:
>

[...]

The user can request vZOOM to bypass the segregated slabs and forward to the
user-defined allocator directly.

Blocks are deallocated from the free-list directly back to the user-defined
allocator, in this case. Something like:

__________________________________________
void* malloc(size_t sz) {
per_thread* const _this = pthread_getspecific(...);
void* buf = _this->fp_user_defined_malloc(sz);
if (! buf) {
block* blk = SWAP(&_this->freelist, 0);
while(blk) {
block* const next = blk->next;
_this->fp_user_defined_free(blk);
blk = next;
}
buf = _this->fp_user_defined_malloc(sz);
}
return buf;
}
__________________________________________

Does that answer your question?

rani_s...@hotmail.com

unread,

Oct 22, 2007, 12:34:25 PM10/22/07

to

On Oct 22, 4:09 am, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Dave Butenhof" <david.buten...@hp.com> wrote in message

How do you protect against TSD destructor calls (upon thread exit
notifications) after the library was unloaded? Notice that the
destructor code was unmapped when the library was unloaded unless you
are bounding the lifetime of the library to the lifetime of process
and that's defeats the ability to dynamically unload such libraries.

In case that pthread_key_delete guarantees no TSD destructor calls
after it returns (problematic case is pending ones) then you can
safely use it upon library load for proper cleanup but if I understand
David correctly there is no such guarantee.

Rani

Dave Butenhof

unread,

Oct 22, 2007, 2:27:18 PM10/22/07

to

A lot of dlclose() implementations never actually unmap any memory --
partly because dependencies like this are difficult to resolve safely.

An implementation that can track dependencies well enough to safely
unmap memory has to consider dangling callbacks like TSD destructors,
process exit handlers, and so forth. Those callback handlers might be
removed (as they can't do any good -- that's what Tru64 UNIX did), or
they might count as pending references that prevent unmapping -- or
something else entirely.

rani_s...@hotmail.com

unread,

Oct 22, 2007, 2:52:58 PM10/22/07

to

On Oct 22, 3:19 am, Dave Butenhof <david.buten...@hp.com> wrote:

> libraries or TSD.-

Sorry if I confused some terms though from usage pov POSIX TSD and
windows TLS are quite similar as they are meant to solve the same
problem (the pthread_create_key and TlsAlloc APIs looks almost the
same except for the TSD destructor). I admit that I favor the POSIX
interface since using the TSD destructor it abstracts away "low level"
library wide details like loader notifications (DLL_THREAD_ATTACH/
DETACH). OTOH, I also want interface that can be safely used in
facilities (e.g. hosted in shared libraries) and providing better
guarantees for pthread_key_delete is essential for that (i.e. no TSD
destructor async call after delete returns). IMHO, any facility with
async notifications should provide way to safely cancel them since
otherwise there is no safe way for destructing it before process
termination. I saw the "ref-counting" technique in your book but it
seems to be lacking since it bounds the lifetime of the facility
(using TSD) to the lifetime of the threads that accessed it and
therefore independent facility shutdown is not possible.

I just want to clarify that I'm *not* expecting the TSD interface to
manage the user defined items (e.g. per-thread heap) for which the
application should have additional bookkeeping in order to destruct
them regardless of the TSD destructor (e.g. upon the destruction of
the memory manager). Higher level TSD facility should probably
encapsulate such lifetime management for both thread-exit
notifications and shutdown.

I'll highly appreciate it if you'll reply to the other mail thread
that I started about TSD and shared libraries in which I described my
motivation for safer TSD API.

Thanks,
Rani

BTW - Though I solely work on windows (no POSIX) your book is my main
guidance book for programming in multi-threaded environments.

Chris Thomasson

unread,

Oct 22, 2007, 6:06:01 PM10/22/07

to

<rani_s...@hotmail.com> wrote in message
news:1193070865.3...@q3g2000prf.googlegroups.com...

That's why you don't call pthread_key_delete until all of your threads are
joined. Proper DLL should allow users to join its threads and cleanup before
you unload it. After that, the TSD destructor calls will have ran. Something
like:

dll_t* const _this = dll_load("foo.dll");

int (*fp_dll_startup) (void) =
dll_getfunc(_this, "foo_startup");

int (*fp_dll_shutdown) (void) =
dll_getfunc(_this, "foo_shutdown");

int status = fp_dll_startup();
if (! status) {
status = fp_dll_shutdown();
}

dll_unload(_this);

Why would you unload a DLL when it still has active threads? This is a QOI
issue wrt DLL design.

Chris Thomasson

unread,

Oct 22, 2007, 6:18:46 PM10/22/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:J7qdnclGyrxegIDa...@comcast.com...

> <rani_s...@hotmail.com> wrote in message
> news:1193070865.3...@q3g2000prf.googlegroups.com...
>> On Oct 22, 4:09 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>>> "Dave Butenhof" <david.buten...@hp.com> wrote in message
>>>
>>> news:ffhtfa$555$1...@usenet01.boi.hp.com...

[...]

>> In case that pthread_key_delete guarantees no TSD destructor calls
>> after it returns (problematic case is pending ones) then you can
>> safely use it upon library load for proper cleanup but if I understand
>> David correctly there is no such guarantee.

I would expect a proper implementation to simply wipe out any pending
destructors. If that happens in my design, you get a memory leak.

> That's why, IMHO, you should not call pthread_key_delete until all of the

> threads are joined. Proper DLL should allow users to join its threads and
> cleanup before you unload it. After that, the TSD destructor calls will

> have surely ran. Something like:

rani_s...@hotmail.com

unread,

Oct 22, 2007, 6:37:14 PM10/22/07

to

On Oct 22, 3:06 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> <rani_shar...@hotmail.com> wrote in message

I think that you are missing the point. The user of the DLL is not
aware about implementation details like per-thread heap when he simply
wants to unload the DLL. Now because the DLL is internally using per-
thread heap it can't be unloaded since it's bound to the lifetime of
the active threads. Assuming that one of these threads is the main
thread and therefore the DLL can't be unloaded before process
termination. Without the per-thread heap it's possible to unload the
DLL at any given time so your implementation is restrictive in that
sense. Since you are promoting a general purpose memory manager you
should think on how to preserve such important usability (i.e. provide
proper memory manager destruction).

With proper TSD API you should be able to destruct the memory manager
upon DLL unload (e.g. foo_shutdown).

Rani

Chris Thomasson

unread,

Oct 22, 2007, 8:42:03 PM10/22/07

to

<rani_s...@hotmail.com> wrote in message
news:1193092634.9...@q5g2000prf.googlegroups.com...

> On Oct 22, 3:06 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>> <rani_shar...@hotmail.com> wrote in message
>>

[...]

> I think that you are missing the point. The user of the DLL is not
> aware about implementation details like per-thread heap when he simply
> wants to unload the DLL. Now because the DLL is internally using per-
> thread heap it can't be unloaded since it's bound to the lifetime of
> the active threads. Assuming that one of these threads is the main
> thread and therefore the DLL can't be unloaded before process
> termination. Without the per-thread heap it's possible to unload the
> DLL at any given time so your implementation is restrictive in that
> sense. Since you are promoting a general purpose memory manager you
> should think on how to preserve such important usability (i.e. provide
> proper memory manager destruction).

Hoard has the same exact issues, and it is a general purpose memory manager.
Have you seen how they do it?

> With proper TSD API you should be able to destruct the memory manager
> upon DLL unload (e.g. foo_shutdown).

You cannot destroy a heap if it is being referenced by something.

Chris Thomasson

unread,

Oct 22, 2007, 8:43:48 PM10/22/07

to

<rani_s...@hotmail.com> wrote in message
news:1193092634.9...@q5g2000prf.googlegroups.com...

> On Oct 22, 3:06 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>> <rani_shar...@hotmail.com> wrote in message

[...]

>> Why would you unload a DLL when it still has active threads? This is a
>> QOI
>> issue wrt DLL design.
>
> I think that you are missing the point.

DLL's can be unloaded in my scheme. A per-thread heap created by a thread in
DLL A can persist across a dll_unload if a thread in DLL B has a reference
to a block.

Chris Thomasson

unread,

Oct 22, 2007, 8:47:29 PM10/22/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:jPCdnX2havOt34Da...@comcast.com...

>
> <rani_s...@hotmail.com> wrote in message
> news:1193092634.9...@q5g2000prf.googlegroups.com...
>> On Oct 22, 3:06 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>>> <rani_shar...@hotmail.com> wrote in message
>>>
> [...]
>> I think that you are missing the point. The user of the DLL is not
>> aware about implementation details like per-thread heap when he simply
>> wants to unload the DLL. Now because the DLL is internally using per-
>> thread heap it can't be unloaded since it's bound to the lifetime of
>> the active threads. Assuming that one of these threads is the main
>> thread and therefore the DLL can't be unloaded before process
>> termination. Without the per-thread heap it's possible to unload the
>> DLL at any given time so your implementation is restrictive in that
>> sense. Since you are promoting a general purpose memory manager you
>> should think on how to preserve such important usability (i.e. provide
>> proper memory manager destruction).
>
> Hoard has the same exact issues, and it is a general purpose memory
> manager. Have you seen how they do it?

[...]

StreamFlow has exact same issues as well.

rani_s...@hotmail.com

unread,

Oct 22, 2007, 9:02:49 PM10/22/07

to

On Oct 22, 5:42 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> <rani_shar...@hotmail.com> wrote in message
>
> news:1193092634.9...@q5g2000prf.googlegroups.com...
>
> > On Oct 22, 3:06 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> >> <rani_shar...@hotmail.com> wrote in message
>
> [...]
> > I think that you are missing the point. The user of the DLL is not
> > aware about implementation details like per-thread heap when he simply
> > wants to unload the DLL. Now because the DLL is internally using per-
> > thread heap it can't be unloaded since it's bound to the lifetime of
> > the active threads. Assuming that one of these threads is the main
> > thread and therefore the DLL can't be unloaded before process
> > termination. Without the per-thread heap it's possible to unload the
> > DLL at any given time so your implementation is restrictive in that
> > sense. Since you are promoting a general purpose memory manager you
> > should think on how to preserve such important usability (i.e. provide
> > proper memory manager destruction).
>
> Hoard has the same exact issues, and it is a general purpose memory manager.
> Have you seen how they do it?

Maybe the Hoard memory manager is restricted to have the lifetime of
the process and its intended to be used has process wide allocator.
That's similar to MS processes heap (GetProcessHeap) without the
ability to create private heaps.
OTOH, I do believe that it's not very difficult to provide TSD without
such restriction.

> > With proper TSD API you should be able to destruct the memory manager
> > upon DLL unload (e.g. foo_shutdown).
>
> You cannot destroy a heap if it is being referenced by something.

Unloading the DLL while there are still pending allocations from its
memory manager his a caller pre-condition violation (i.e. leaks).
There is no way to refer or release to such memory since the memory
manager is gone. This is basic lifetime management.
You can safely assume that there are no pending allocations on DLL
unload (in debug builds you can assert no leaks).

Rani

rani_s...@hotmail.com

unread,

Oct 22, 2007, 9:10:32 PM10/22/07

to

On Oct 22, 11:27 am, Dave Butenhof <david.buten...@hp.com> wrote:

Windows has the same problem due to the non properly designed API
QueueUserWorkItem.
QueueUserWorkItem is simpler to create thread but there is no way to
know for sure when it's finished to run the worker function that might
be located in a DLL. Therefore it's common to see services that avoid
DLL unloading due to usage of QueueUserWorkItem (tough there are some
limited "tricks" to overcome that in Vista). In our project we are
disallowing usage of such restrictive error-prone functions and favor
async facilities with proper shutdown.

Rani

Chris Thomasson

unread,

Oct 23, 2007, 4:04:05 PM10/23/07

to

<rani_s...@hotmail.com> wrote in message
news:1193101369....@e34g2000pro.googlegroups.com...

Hoard creates the per-thread heaps in libhoard.so, and assigns them to
thread with pthread_get/setspecific. That's what my stuff does. It creates
an instance of a user-defined allocator, and binds it to threads with
pthread_get/setspecific.

>> > With proper TSD API you should be able to destruct the memory manager
>> > upon DLL unload (e.g. foo_shutdown).
>>
>> You cannot destroy a heap if it is being referenced by something.
>
> Unloading the DLL while there are still pending allocations from its
> memory manager his a caller pre-condition violation (i.e. leaks).
> There is no way to refer or release to such memory since the memory
> manager is gone. This is basic lifetime management.

[...]

Unloading the DLL while it has running threads is a violation as well. Also,
in my internal testing, things seem to work fine within several dynamically
loaded libraries. The creating of the per-thread heaps occurs in the
libvzoom.so, just like all ms heaps are created in kernel.32/ntdll.dll.

Chris Thomasson

unread,

Oct 23, 2007, 4:04:59 PM10/23/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:9e2dnaOi9P0az4Pa...@comcast.com...

The allocator is created in libvzoom.so, just as hoards per-thread heaps are
created in libhoard.so.

Chris Thomasson

unread,

Oct 24, 2007, 1:03:57 AM10/24/07

to

<rani_s...@hotmail.com> wrote in message
news:1193101369....@e34g2000pro.googlegroups.com...
[...]

> Unloading the DLL while there are still pending allocations from its
> memory manager his a caller pre-condition violation (i.e. leaks).
> There is no way to refer or release to such memory since the memory
> manager is gone. This is basic lifetime management.
> You can safely assume that there are no pending allocations on DLL
> unload (in debug builds you can assert no leaks).

Unloading there DLL while there are thread that exist within it is an error,
IMHO at last!

rani_s...@hotmail.com

unread,

Oct 24, 2007, 12:22:45 PM10/24/07

to

> libvzoom.so, just like all ms heaps are created in kernel.32/ntdll.dll.- Hide quoted text -
>
> - Show quoted text -

We are actually debating about the threads that use the per-thread
heap but are not owned by the DLL. Notice that using per-thread heap
should not imply thread ownership to allow DLL unload. Think about the
case in which the main thread loaded the DLL, called some function
that instantiated a per-thread heap for the main thread and than
unloaded the DLL (i.e. load_dll, call_dll and unload_dll).

Nevertheless I'll try to address you concern.

Any object has to properly release the resources that it owns upon
destruction. Not being able to do so yield that it can *not* be
destructed and that usually indicates about flawed design. Notice that
objects that contains such objects will have the same limitation (i.e.
recursive limitation).

DLL that owns thread is a private case of the above. To unload the DLL
it has to properly release the thread s that it owns which is usually
done by some sort of cancellation (e.g. SetEvent(hCancel), waiting for
it to terminate (i.e. wait on the handle) and then destructing it
(i.e. close the handle).

Rani