Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Storage management techniques in C

0 views
Skip to first unread message

Richard Harter

unread,
Dec 24, 2009, 2:38:33 AM12/24/09
to

In this thread I would like to explore storage allocation
techniques in C programming.

One of the performance issues of concern in C programming is the
amount of time spent in allocating and deallocating storage. In
measurements that I have made and that others have reported
upwards of 25% of the time used is spent in the allocator - and
it can be much higher, depending on the allocator design.

The issue is simple; the more calls that are made to malloc and
free, the greater the time cost of memory allocation. What is
more, the likelihood of memory leaks goes up with the number of
calls to the allocator.

What should be done to reduce the cost of allocation?

One approach, one that is vigorously recommended by some, is to
add garbage collection to C. Garbage collection simplifies user
code and can be faster because it simplifies coding. I dare say
that in many applications it is a satisfactory solution. Sad to
say, it is not an entirely reliable solution in C programming.

An alternative is to use special purpose techniques. I will
list several here with the hope that others will point out
others. There are two main strategies used - slab chipping and
reuse.

The idea in slab chipping is to allocate a "large" block and then

mark off individual buffers by increasing a pointer into the
block. The win here is that allocation is cheap and only the
slab needs be deallocated. The catch is that all buffers chipped

from the slab must be harvestable, i.e., no longer in use at slab

deallocation time.

The idea in reuse is to reuse already allocated buffer space.
Thus if we create buffer A and dispose of it, and then create
buffer B and dispose of it, we can use the space for A and then
for B, eliminating the intermediate deallocation and allocation.
The win is eliminating many allocation/deallocation pairs. The
danger is simultaneous use, i.e., using the space for A and B at
the same time.

(1) Multiple copies

Suppose we need m buffers. Instead of doing m allocations of
individual buffers we allocate an array of m buffers. This can
work very nicely. There is a catch. The array of buffers can
only be freed when all of the m buffers are no longer needed.

(2) LIFO storage

Suppose we have some buffers that satisfy the LIFO pattern,
i.e., the last buffer allocated is the first one deallocated.
These buffers can be allocated from a storage stack. Seemingly
no special code for this pattern is needed - the LIFO pattern is
embedded in the C language.

(3) FIFO storage

Suppose instead that we have buffers that satisfy the FIFO
pattern, i.e., the eldest buffer is always the first one
deallocated. This pattern can be implemented using a ring
buffer. If there is a known bound for the amount of space used
by the buffers we don't even need to record deallocation provided

that the size of the ring buffer is large enough.

In the more general case the usual first and last pointers will
be needed; likewise a strategy for dealing with overflow is
necessary. One way to deal with overflow is to create a new,
larger ring buffer, take all new allocations from it, and drain
the old ring buffer until it is empty.

The FIFO pattern can be generalized to "almost FIFO". Instead of
always deallocating the eldest, deallocate an old buffer with the
proviso that there is an age limit for buffers.

(4) Persistent scratch space

Suppose f is a function that uses temporary scratch space that
has to come from the heap. Suppose further that f is not
recursive; i.e., it does not call itself directly or indirectly.
Naive code allocates and deallocates scratch space each time f is

called. Instead we could make the scratch space persistent and
resizable. (In C persistent is called static.) Then, except for

resizing, no allocations or deallocations are needed.

Persistent scratch space can be shared. Suppose f1...fn are n
mutually non-recursive functions, i.e., they don't call each
other, either directly or indirectly. Then functions f1...fn can

share the same persistent scratch space. Sharing does not
significantly decrease the number of allocations/deallocations;
however it does decrease the amount of space used. It may also
improve performance because the shared scratch space is more
likely to be cache resident.

(5) Limited duration space

Suppose that we need buffers, some of which may be resized, and
suppose that we know that at some definite point in the future
all of the buffers will be stale and can be discarded. One way
to handle this is to allocate a slab and allocate using slab
chipping. At the definite point in the future the slab is
deallocated and all of the stale buffers go away.

Instead of allocating one large slab a good alternative is to
allocate smaller blocks (increasing block size each allocation)
and record them in a linked list.

Perhaps there are other techniques that people might like to
mention.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.

bartc

unread,
Dec 24, 2009, 6:35:06 AM12/24/09
to

"Richard Harter" <c...@tiac.net> wrote in message
news:4b331a37....@text.giganews.com...

> One of the performance issues of concern in C programming is the
> amount of time spent in allocating and deallocating storage. In
> measurements that I have made and that others have reported
> upwards of 25% of the time used is spent in the allocator - and
> it can be much higher, depending on the allocator design.

You're talking about C's malloc(), realloc() and free() functions?

> The issue is simple; the more calls that are made to malloc and
> free, the greater the time cost of memory allocation. What is
> more, the likelihood of memory leaks goes up with the number of
> calls to the allocator.
>
> What should be done to reduce the cost of allocation?

I would like more detailed figures than the 25% you mentioned before
creating complicated new strategies.

How much in each of malloc, realloc and free? What proportion of the calls,
or of the total time spent, is in different sized allocations (ie. are most
of the calls related to allocating/deallocating fairly small blocks)? What
proportion of calls are for the same size allocation? And so on.

> (1) Multiple copies

> (2) LIFO storage

> (3) FIFO storage

> (4) Persistent scratch space

> (5) Limited duration space

> Perhaps there are other techniques that people might like to
> mention.

I think there are many other patterns.

For example, large numbers of smallish allocations, all of the same size
(similar to your (1), but probably much bigger). This can easily be handled
by allocating one big block (which can be augmented if needed) and
allocating/deallocating from that, using free lists. This would be very
fast.

Another is where the allocations may be mixed size (but still smallish), but
do not need to be freed until the end of some processing step. Again,
allocate a large block (and add more as needed if difficult to estimate
total), then deallocate the large blocks at the end using perhaps just one
or two free() calls.

And another, where a block has to grow in size using realloc. Depending on
how efficient you think realloc might be, allocating spare capacity and
testing for that before needing to call realloc might help.

But I'm sure these are all things that people will try as soon as they find
allocation is a bottleneck.

And myself, for small allocations (up to 1KB), I use my own allocator based
on a large allocated block or two. This is faster because the
allocation/deallocation functions take a short, fixed time (no searching for
things or moving stuff around). (However, if the large blocks involved
become mostly empty, this space cannot be used elsewhere, eg, to allocate a
4KB block.)

--
Bartc

Richard Harter

unread,
Dec 24, 2009, 2:00:17 PM12/24/09
to
On Thu, 24 Dec 2009 11:35:06 GMT, "bartc" <ba...@freeuk.com>
wrote:

>
>"Richard Harter" <c...@tiac.net> wrote in message
>news:4b331a37....@text.giganews.com...
>
>> One of the performance issues of concern in C programming is the
>> amount of time spent in allocating and deallocating storage. In
>> measurements that I have made and that others have reported
>> upwards of 25% of the time used is spent in the allocator - and
>> it can be much higher, depending on the allocator design.
>
>You're talking about C's malloc(), realloc() and free() functions?

Essentially, yes. AFAIK the major open source implementations
all use high quality allocators - this definitely wasn't true
years ago. I've never looked at Microsoft's allocators, but I
would be surprised if they weren't efficient.

>
>> The issue is simple; the more calls that are made to malloc and
>> free, the greater the time cost of memory allocation. What is
>> more, the likelihood of memory leaks goes up with the number of
>> calls to the allocator.
>>
>> What should be done to reduce the cost of allocation?
>
>I would like more detailed figures than the 25% you mentioned before
>creating complicated new strategies.
>
>How much in each of malloc, realloc and free? What proportion of the calls,
>or of the total time spent, is in different sized allocations (ie. are most
>of the calls related to allocating/deallocating fairly small blocks)? What
>proportion of calls are for the same size allocation? And so on.

Sorry, I no longer have precise numbers at hand - they are buried
away in industrial reports. In any case, case studies are only
guides.

The general pattern I have seen is that in the initial
implementations of applications allocation costs are buried by
the overall general inefficiency. Allocation costs only become
significant after the worst inefficiencies have been carved away.

As far as the relative cost of malloc, realloc, and free are
concerned, that depends on the allocator implementation.
Commonly the cost of malloc is about 50% more than the cost of
free. On average realloc is only slightly less than the cost of
malloc+free.

Most of the cases that I am familiar with the block sizes are
mostly small and intermediate - same size allocation doesn't make
much difference. YMMV.

>
>> (1) Multiple copies
>
>> (2) LIFO storage
>
>> (3) FIFO storage
>
>> (4) Persistent scratch space
>
>> (5) Limited duration space
>
>> Perhaps there are other techniques that people might like to
>> mention.
>
>I think there are many other patterns.
>
>For example, large numbers of smallish allocations, all of the same size
>(similar to your (1), but probably much bigger). This can easily be handled
>by allocating one big block (which can be augmented if needed) and
>allocating/deallocating from that, using free lists. This would be very
>fast.

That is (1). Free lists are problematic. What can happen is
that the space becomes fragmented, i.e., the pointers jump all
over the place. When this happens you can run into cache faults.

Another issue is that if you have blooms (an explosion of
allocated space that dies of rapidly) you end up with large
amounts of dead space that can't be deallocated.

That said, multiple copies with a free list is simple and
(usuallY) fast. However it is no simpler nor faster than the
"complicated" alternatives.


>
>Another is where the allocations may be mixed size (but still smallish), but
>do not need to be freed until the end of some processing step. Again,
>allocate a large block (and add more as needed if difficult to estimate
>total), then deallocate the large blocks at the end using perhaps just one
>or two free() calls.

That is (5).

>
>And another, where a block has to grow in size using realloc. Depending on
>how efficient you think realloc might be, allocating spare capacity and
>testing for that before needing to call realloc might help.
>
>But I'm sure these are all things that people will try as soon as they find
>allocation is a bottleneck.
>
>And myself, for small allocations (up to 1KB), I use my own allocator based
>on a large allocated block or two. This is faster because the
>allocation/deallocation functions take a short, fixed time (no searching for
>things or moving stuff around). (However, if the large blocks involved
>become mostly empty, this space cannot be used elsewhere, eg, to allocate a
>4KB block.)

Well done. It's a start.

Thanks for the comments.

BGB / cr88192

unread,
Dec 24, 2009, 5:31:21 PM12/24/09
to

"Richard Harter" <c...@tiac.net> wrote in message
news:4b331a37....@text.giganews.com...
>
>
> In this thread I would like to explore storage allocation
> techniques in C programming.
>
> One of the performance issues of concern in C programming is the
> amount of time spent in allocating and deallocating storage. In
> measurements that I have made and that others have reported
> upwards of 25% of the time used is spent in the allocator - and
> it can be much higher, depending on the allocator design.
>
> The issue is simple; the more calls that are made to malloc and
> free, the greater the time cost of memory allocation. What is
> more, the likelihood of memory leaks goes up with the number of
> calls to the allocator.
>
> What should be done to reduce the cost of allocation?
>

<snip>

one thing that generally works fairly well IME for lots of objects all of a
particular type and size (a fairly common case IME), is to have a special
"free" function which doesn't actually free the object, but instead adds it
to a type-specific free-list.

when allocating an object, first one can check that the free-list is not
NULL, and if this is the case, simply grab the first item and update the
list to the next item.

if the list is empty, fall back to a more general purpose allocator.
periodically, one may also free the free list (for example, via a GC
callback, to maintain a memory quota, ...).

in general, this can notably speed up object allocation/deallocation for a
given object type.

Gene

unread,
Dec 24, 2009, 5:37:49 PM12/24/09
to
> Richard Harter, c...@tiac.nethttp://home.tiac.net/~cri,http://www.varinoma.com

> Infinity is one of those things that keep philosophers busy when they
> could be more profitably spending their time weeding their garden.- Hide quoted text -


There's always the old GNU obstack idea, which combines a couple of
yours. An obstack is a big chunk of memory that can be allocated in
arbitrary-sized pieces, with a "mark-release" facility to free items
allocated within the obstack in LIFO order. Entire obstacks are freed
in any order automatically when all the items in them are released.

The idea is that a common pattern is to build a significant data
structure composed of small nodes, use it for its intended purpose,
then get rid of it. An example where this occurs is in a compiler,
where you're processing procedures or modules in order, building
syntax trees, optmization graphs, code blocks, etc. Some of these can
be thrown away when as each procedure is completed. You'd allocate
such structures in an obstack.

Obstacks grow automatically by chaining more big chunks of the same
size. The chunks can in turn be managed efficiently because same-size
blocks don't fragment.

Bruce Cook

unread,
Dec 25, 2009, 11:57:12 AM12/25/09
to
Richard Harter wrote:

>
>
> In this thread I would like to explore storage allocation
> techniques in C programming.
>
> One of the performance issues of concern in C programming is the
> amount of time spent in allocating and deallocating storage. In
> measurements that I have made and that others have reported
> upwards of 25% of the time used is spent in the allocator - and
> it can be much higher, depending on the allocator design.
>
> The issue is simple; the more calls that are made to malloc and
> free, the greater the time cost of memory allocation. What is
> more, the likelihood of memory leaks goes up with the number of
> calls to the allocator.

Obviously different C implementations (potentially) use very different
memory allocation algorithms. Are the measurements you have made across
differing C platforms ?

[...]


> An alternative is to use special purpose techniques. I will
> list several here with the hope that others will point out
> others. There are two main strategies used - slab chipping and
> reuse.

[...]

What you are discussing is essentially replacing the "native" c library
allocation functions with your own, and the various allocation algorithms
you could choose to do that.

This unfortunately doesn't reduce the number of calls to allocation
routines, it just moves those calls up one level. (The calls will be to your
functions instead of the C runtime functions).

Unless the C runtime libraries you're working with are particularly
inefficient I would not expect this to give you any gain - Anyone have any
profiling information for this ?

The exception here is where the C runtime is doing OS level memory
allocation calls - there is potentially a higher cost here than a simply in-
memory allocation list.

The solution for this would be to look at tunable parameters for chunk size
allocation in the C runtime libraries; the larger the chunk allocated from
OS resources, the fewer the more expensive OS allocation calls required.

I at least one C runtime library I have worked with, this is exactly how
things worked: memory allocation was done using in-memory free lists, when
it ran out of free space, an OS call was made to extended the processes'
address space & that new chunk added to free lists. Chunk size was tunable.

So the question is really: is it the malloc()/free() calls which are
expensive, or the resultant OS allocation that is the real concern ?

Bruce

Richard Harter

unread,
Dec 25, 2009, 2:34:18 PM12/25/09
to
On Thu, 24 Dec 2009 15:31:21 -0700, "BGB / cr88192"
<cr8...@hotmail.com> wrote:

>
>"Richard Harter" <c...@tiac.net> wrote in message
>news:4b331a37....@text.giganews.com...

<snip>
>
>one thing that generally works fairly well IME for lots of objects all of a
>particular type and size (a fairly common case IME), is to have a special
>"free" function which doesn't actually free the object, but instead adds it
>to a type-specific free-list.
>
>when allocating an object, first one can check that the free-list is not
>NULL, and if this is the case, simply grab the first item and update the
>list to the next item.
>
>if the list is empty, fall back to a more general purpose allocator.
>periodically, one may also free the free list (for example, via a GC
>callback, to maintain a memory quota, ...).
>
>in general, this can notably speed up object allocation/deallocation for a
>given object type.

It's a good technique; I've used it myself. However there are
situations where it doesn't work very well. See my reply to
bartc.


Richard Harter

unread,
Dec 25, 2009, 3:08:57 PM12/25/09
to
On Thu, 24 Dec 2009 14:37:49 -0800 (PST), Gene
<gene.r...@gmail.com> wrote:

>On Dec 24, 2:00=A0pm, c...@tiac.net (Richard Harter) wrote:
>> On Thu, 24 Dec 2009 11:35:06 GMT, "bartc" <ba...@freeuk.com>

<snip>


>
>
>There's always the old GNU obstack idea, which combines a couple of
>yours. An obstack is a big chunk of memory that can be allocated in
>arbitrary-sized pieces, with a "mark-release" facility to free items
>allocated within the obstack in LIFO order. Entire obstacks are freed
>in any order automatically when all the items in them are released.
>
>The idea is that a common pattern is to build a significant data
>structure composed of small nodes, use it for its intended purpose,
>then get rid of it. An example where this occurs is in a compiler,
>where you're processing procedures or modules in order, building
>syntax trees, optmization graphs, code blocks, etc. Some of these can
>be thrown away when as each procedure is completed. You'd allocate
>such structures in an obstack.
>
>Obstacks grow automatically by chaining more big chunks of the same
>size. The chunks can in turn be managed efficiently because same-size
>blocks don't fragment.

I've never seen obstacks, but I will look at them. I use a
package I call storage pools (stgpool) that is similar in some
respects. Significant differences:

(1) There are no mark/release tags within a pool. The
presumption is that all items in the pool can be deleted at the
same time. Dead space is ignored until pool deletion time.

(2) Pools use chained blocks; however block sizes start out
small. Each block is larger than its predecessor.

(3) Release strategy is separated from storage management; code
using storage pools decides when to close a pool. Typically this
is done when some task or a set of tasks are completed - the pool
is used for allocated storage needed within the task.

I have the impression that my storage pools are substantially
lighter weight than obstacks. Thanks for mentioning them.

Gene

unread,
Dec 25, 2009, 7:26:37 PM12/25/09
to
On Dec 25, 3:08 pm, c...@tiac.net (Richard Harter) wrote:
> On Thu, 24 Dec 2009 14:37:49 -0800 (PST), Gene
>
> lighter weight than obstacks.  Thanks for mentioning them.- Hide quoted text -

Obstacks are actually very light weight. Allocation is an "enough
space" calculation and infrequent possible new chunk allocation
followed by a pointer increment.

Release checks to see if the released address entails any full chunks
(a couple of pointer comparisons) and, if necessary, pushes the tail
chain onto a free list of chunks. The op is O(n) for n freed chunks.

Now that you mention the term storage pool, I'll mention that a
prudent thing to do is use different functions or perhaps macros to
allocate like items (vice calling malloc in all cases) so that
experimenting with special purpose allocators after profiling is
simplified.

In the Ada language, there is a built-in facility for per-class
storage allocation called "storage pools." I suppose there are no hard
real time programmers participating here, as another disadvantage of
most general purpose allocators is wide variation between best and
worst case run time behavior. I believe a major reason for storage
pools in Ada was to allow embedded programmers to substitute highly
deterministic algorithms to meet hard time constraints.

Richard Harter

unread,
Dec 26, 2009, 2:49:04 PM12/26/09
to
On Fri, 25 Dec 2009 16:26:37 -0800 (PST), Gene
<gene.r...@gmail.com> wrote:

>On Dec 25, 3:08=A0pm, c...@tiac.net (Richard Harter) wrote:
>> On Thu, 24 Dec 2009 14:37:49 -0800 (PST), Gene
>>
>> <gene.ress...@gmail.com> wrote:

>> >On Dec 24, 2:00=3DA0pm, c...@tiac.net (Richard Harter) wrote:
>> >> On Thu, 24 Dec 2009 11:35:06 GMT, "bartc" <ba...@freeuk.com>
>>
<snip>
>>

>> I've never seen obstacks, but I will look at them. =A0I use a


>> package I call storage pools (stgpool) that is similar in some

>> respects. =A0Significant differences:
>>
>> (1) There are no mark/release tags within a pool. =A0The


>> presumption is that all items in the pool can be deleted at the

>> same time. =A0Dead space is ignored until pool deletion time.


>>
>> (2) Pools use chained blocks; however block sizes start out

>> small. =A0Each block is larger than its predecessor.


>>
>> (3) Release strategy is separated from storage management; code

>> using storage pools decides when to close a pool. =A0Typically this


>> is done when some task or a set of tasks are completed - the pool
>> is used for allocated storage needed within the task.
>>
>> I have the impression that my storage pools are substantially

>> lighter weight than obstacks. =A0Thanks for mentioning them.

>
>Obstacks are actually very light weight. Allocation is an "enough
>space" calculation and infrequent possible new chunk allocation
>followed by a pointer increment.

"Lightweight" may have been an unfortunate wording - I wasn't
thinking of performance costs. Nice special purpose allocators
are supposed to be very fast. I was thinking more of things like
how much space an initial chunk takes up, usage commands, and
anticipated usage.

>
>Release checks to see if the released address entails any full chunks
>(a couple of pointer comparisons) and, if necessary, pushes the tail
>chain onto a free list of chunks. The op is O(n) for n freed chunks.

Incorporating release marking is an expense, not so much in terms
of per operation cost as in coding and design cost. There has to
be release statement executed for each release. Doing this is
needful for obstacks because they implement LIFO storage. I
hadn't thought of parsing and syntax trees originally.

It's a tradeoff thing; obstacks are more efficient users of
storage, my storage pools take less coding effort.

>
>Now that you mention the term storage pool, I'll mention that a
>prudent thing to do is use different functions or perhaps macros to
>allocate like items (vice calling malloc in all cases) so that
>experimenting with special purpose allocators after profiling is
>simplified.

That may not be all that easy. What are "like items"? Are they
items in the same context, or are they items of the same type?



>
>In the Ada language, there is a built-in facility for per-class
>storage allocation called "storage pools." I suppose there are no hard
>real time programmers participating here, as another disadvantage of
>most general purpose allocators is wide variation between best and
>worst case run time behavior. I believe a major reason for storage
>pools in Ada was to allow embedded programmers to substitute highly
>deterministic algorithms to meet hard time constraints.

That could be. I took a quick look at them, but I don't yet
understand what the benefits are supposed to be.

BGB / cr88192

unread,
Dec 31, 2009, 6:32:51 PM12/31/09
to

"Richard Harter" <c...@tiac.net> wrote in message
news:4b35129c...@text.giganews.com...

cases where objects are used in bursts, partial solutions:
keep track of number of free items and start freeing if there are too many;
in some cases, maybe register a callback with the GC which can trigger bulk
freeing if memory is tight.

fragmentation:
allocate objects in lumps (granted, this may not always turn out well, and
may require special MM/GC support).

actually, I have not really had too many problems with fragmentation IME,
although I suspect this may be due to my usage of generally "unorthodox"
allocator algos (AKA: bitmap and cell-based allocators, which oddly almost
no one too far outside LISP-land tend to use...).

granted, fragmentation-related performance issues may be difficult to
detect, so possibly I might not have noticed them...


in general though, in my newer GC I keep cons cells and object-cells in
different places, though mostly this is more due to technical reasons.

cons cells are in one place, with a special cons-specific allocator (special
cases: a cons is always 2 pointers, and a cons is never more than 1 cell);
object cells are in another place, and use a different allocator (for small
objects, the allocator is specialized for small objects, as finding space
for larger objects via bitmap searches would be slow);
larger objects use free lists.

one can debate what the exact ideal cutoff is between a cell-based allocator
and a free-list allocator though, but IME it hasn't been too much of an
issue thus far.

a cell allocator has a small per-object overhead at the cost of a
linear-space overhead, and a free-list allocator has a higher per-object
overhead with no linear-space overhead.

for example, with masses of 10 byte objects, a 40 byte object header would
be a killer;
but, with multi-MB objects, a 6-8% space overhead would be a killer (say, if
for every 10MB there were 820kB of bitmaps...), whereas the 40 byte object
header would be insignificant.

hence, the use of different strategies depending on object size, ...


>


bartc

unread,
Jan 1, 2010, 6:00:03 AM1/1/10
to
"BGB / cr88192" <cr8...@hotmail.com> wrote in message
news:hhjcb5$pet$1...@news.albasani.net...

> for example, with masses of 10 byte objects, a 40 byte object header would
> be a killer;
> but, with multi-MB objects, a 6-8% space overhead would be a killer (say,
> if for every 10MB there were 820kB of bitmaps...), whereas the 40 byte
> object header would be insignificant.

What are the bitmaps used for?

With 1 bit per byte, the overhead on 10MB is about 1.25MB. But with 1 bit
per 16 bytes say, the overhead is some 80KB, some 0.8%.

--
Bartc

BGB / cr88192

unread,
Jan 1, 2010, 9:04:07 PM1/1/10
to

"bartc" <ba...@freeuk.com> wrote in message
news:TAk%m.20822$Ym4....@text.news.virginmedia.com...

> "BGB / cr88192" <cr8...@hotmail.com> wrote in message
> news:hhjcb5$pet$1...@news.albasani.net...
>
>> for example, with masses of 10 byte objects, a 40 byte object header
>> would be a killer;
>> but, with multi-MB objects, a 6-8% space overhead would be a killer (say,
>> if for every 10MB there were 820kB of bitmaps...), whereas the 40 byte
>> object header would be insignificant.
>
> What are the bitmaps used for?
>

space allocation and keeping track of which objects exist and where they
exist at, ...

this can be constrasted with the strategy of using the object-headers and
linked lists for memory management.
for example, with bitmaps, one need not actually even have object headers,
but in my case they exist mostly to contain semantic information (exact
object size, type, flags, ... but, given a lack of any embedded pointers, it
allows the entire header to be packed into 64 bits in my case...).

some of my early allocators in this style did not have object headers, but
eventually these headers became a part of the core MM/GC, as this made it so
that the information available in the header was available for every object
(eliminating the occasional "odd man out" which did not have a header).


for example, in a "traditional" MM, one could have a collection of
free-lists (of assorted object sizes), and find a memory block of sufficient
size from the needed list. in this case, the object would then likely be
migrated to a list of live objects, and have the size of the object managed
in the header via a size field.

so, a memory block might look something like:
struct MemBlock_s {
MemBlock *next_list; //next block in free-list / live-list
MemBlock *next_phys; //next block physically (so we can merge / coalesce
blocks)
size_t size; //object size
int flags; //some flags and stuff
...
//data starts here
};

for very small objects, this header is massive...


if the object is, instead, an external bit pattern, we don't need to keep
track of objects via such headers.
similarly, coalescing is "free", since freeing any 2 (or more) adjacent
objects will create a big hole in the bitmap, meaning a space which can hold
a bigger object (IOW: coalescing free space is free).

similarly, for GC it tends to be MUCH faster than when working with
free-lists.


the one, notable, cost with this strategy is that the performance turns to
crap if ones' objects are large, hence there is a good reason to manage
larger objects via a free list instead (these 2 strategies seem to work
fairly well when used in conjunction).

this works well, also, since large objects tend to be fairly rare vs small
objects, and the small number of large objects greatly reduces time used in
things like linked-list hopping.

at the same time, I can also speed up the large-object GC by using a sorted
array of large objects, so that I can locate objects quickly via a binary
search, ...


> With 1 bit per byte, the overhead on 10MB is about 1.25MB. But with 1 bit
> per 16 bytes say, the overhead is some 80KB, some 0.8%.
>

it is typically 8 bits per 16-bytes in my current GC (or 6.25%).
so, it is not "strictly" a bitmap in the 1-bit-per-item sense, but I lack a
better term.

in this setup:
2 bits: cell type (free/header/data/reserved);
2 bits: cell color (white/black/gray/extra_black);
3 bits: gc-type (conservative/atomic, 'GCP': defiled, ref-count/many)
1 bit: (?...)

sadly, I can't really remember how all this works, as I didn't exactly
document it so well, and getting this much required going back and trying to
decipher a lot of the evil bit-twiddly...

earlier versions of the GC (from years ago) used 4 bits per cell, but this
was expanded to 8 bits both to improve performance and provide a few more
bits for other uses.

a bunch more info goes on in the object header (more bit-twiddly).

'GCP' was an ill-advised feature (an attempt at pointer-based precise
collection to be used in-conjunction with conservative collection), which
has mostly died back (not part of the main API anymore), but it first added
ref-counting (in this incarnation, another past "sibling" GC had also added
ref-counting earlier).

there is also conservative ref-counting which is implemented via 4 bits
located in the object header, which is now used instead.


> --
> Bartc


0 new messages