supporting header-less blocks in memory allocators...

Chris M. Thomasson

unread,

Oct 2, 2008, 10:28:39 AM10/2/08

to

I was thinking of an idea of expanding a current limitation in some of my
allocators which support the use of header-less blocks. The way I currently
do it is to simply align chunks on a fairly large boundary, say 16 pages.
These chunks contain a header at the head. Layout is simple:

[H][CHUNK]

A 16-page chunk would be aligned on a 16-page chunk boundary. If a
header-less block needs to access its header, it simple rounds its address
up to a chunk boundary, and subtracts the size of the chunk plus the size of
the header. This has a limitation that the largest memory allocation can't
be larger than the size of a chunk. To overcome this, I was thinking about
generating a unique GUID per-computer, and using that as a header
identifier. The header would store the GUID right after it. So, the layout
for a multi-chunk allocation would look like:

[H][GUID][CHUNK1][CHUNK2]

Now, when a header-less block needs to find its chunk header it simply,
rounds its address up to a chunk boundary; subtracts the size of the chunk
plus the size of the GUID; compares the existing memory contents with the
GUID; if it does not find a match, it subtracts the size of the a chunk
minus the size of the GUID and repeats the comparison. This process is
repeated until a match is hit. After that, it subtracts the size of the
header, and bingo, it has a pointer to its chunk header.

The process is simple but is has a caveat! If for some reason, the memory
contents of a chunk that is not the true header just happens to contain data
which matches the GUID, well, BOOOOOM!!!!!!! Oh SHI%!!! The program will be
royally screwed... OUCH!

First of all, what do you think of the scheme?

Also, are there better ways to support 100% header-less blocks in a memory
allocator? IMVHO, header-less blocks are a MAJOR plus when designing highly
efficient memory allocation implementations. Think about it... A 4-byte
allocation actually means 4-bytes are allocated internally. No 4-bytes plus
internal block header information! That's crap and totally wastes memory on
a per-allocation basis; which can quickly add up indeed.

There is a way to minimize the GUID scheme, but it requires two sections of
virtual memory which can be wasteful. It involves block sizes <= chunk size
to be allocated from a separate memory pool than block sizes that are >
chunk size. The former can use the normal scheme of simple rounding up and
subtracting to get chunk header... However, the latter would need to use the
GUID search algorithm.

any thoughts?

BTW, does any of you know off-hand if Hoard uses header-less blocks?

Thanks for your time!

Chris M. Thomasson

unread,

Oct 2, 2008, 10:30:14 AM10/2/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:6O4Fk.12965$ex3....@newsfe02.iad...

>I was thinking of an idea of expanding a current limitation in some of my
>allocators which support the use of header-less blocks. The way I currently
>do it is to simply align chunks on a fairly large boundary, say 16 pages.
>These chunks contain a header at the head. Layout is simple:
>
> [H][CHUNK]
>
> A 16-page chunk would be aligned on a 16-page chunk boundary. If a
> header-less block needs to access its header, it simple rounds its address
> up to a chunk boundary, and subtracts the size of the chunk plus the size
> of the header. This has a limitation that the largest memory allocation
> can't be larger than the size of a chunk. To overcome this, I was thinking
> about generating a unique GUID per-computer,

Or, perhaps generate a unique GUID per-process; I think that would be much
safer...

> and using that as a header identifier. The header would store the GUID
> right after it. So, the layout for a multi-chunk allocation would look
> like:
>
>
> [H][GUID][CHUNK1][CHUNK2]
>

[...]

Eric Sosman

unread,

Oct 2, 2008, 10:55:36 AM10/2/08

to

A couple questions arise right away. First, what information
does this header provide; why do you need it; what's it for? Second,
how does a chunk discover its own size? Are all chunks the same
fixed size, or is the size stored somewhere in the chunk -- and if
you're already storing metadata in the chunk, why not store the
rest of the header there, too?

An observation: The GUID doesn't seem to add much value. Any
unlikely signature would do just as well -- that is to say, just as
poorly.

A counter-question: Why must headers be adjacent to chunks? Would
it help if they lived in their own dedicated data structure? I've used
skip-lists in a separate area for this purpose as part of a malloc()
intended for debugging; keeping the metadata apart from the allocated
data helps limit the damage from common programming errors.

> Also, are there better ways to support 100% header-less blocks in a
> memory allocator? IMVHO, header-less blocks are a MAJOR plus when
> designing highly efficient memory allocation implementations. Think
> about it... A 4-byte allocation actually means 4-bytes are allocated
> internally. No 4-bytes plus internal block header information! That's
> crap and totally wastes memory on a per-allocation basis; which can
> quickly add up indeed.

Well, you need *some* way to store the size information, and maybe
other information as well. I suppose you could use the block address
to encode size: A block address of binary ...xxx100 would indicate a
size of four bytes, ...xx1000 would indicate eight bytes, and so on.
Fragmentation would be horrific, though.

> There is a way to minimize the GUID scheme, but it requires two sections
> of virtual memory which can be wasteful. It involves block sizes <=
> chunk size to be allocated from a separate memory pool than block sizes
> that are > chunk size. The former can use the normal scheme of simple
> rounding up and subtracting to get chunk header... However, the latter
> would need to use the GUID search algorithm.

You haven't described your allocator's purposes and constraints
fully enough for me to follow this paragraph.

--
Eric....@sun.com

Dmitriy V'jukov

unread,

Oct 2, 2008, 11:41:31 AM10/2/08

to

Chris M. Thomasson wrote:

> There is a way to minimize the GUID scheme, but it requires two sections of
> virtual memory which can be wasteful. It involves block sizes <= chunk size
> to be allocated from a separate memory pool than block sizes that are >
> chunk size. The former can use the normal scheme of simple rounding up and
> subtracting to get chunk header... However, the latter would need to use the
> GUID search algorithm.

If object is bigger than 16 pages, I think it's perfectly Ok to spent
16 bytes on header.
What is the point of huge headerless blocks?

Dmitriy V'jukov

Chris M. Thomasson

unread,

Oct 2, 2008, 12:29:16 PM10/2/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
news:280c8b35-0fe0-403a...@q35g2000hsg.googlegroups.com...

But how could I efficiently intermingle large blocks and small blocks from a
single reserved memory pool? Anyway, I don't think I was thinking clearly
when I wrote the following portion of my original message

"There is a way to minimize the GUID scheme, but it requires two sections of
virtual memory which can be wasteful. It involves block sizes <= chunk size
to be allocated from a separate memory pool than block sizes that are >
chunk size. The former can use the normal scheme of simple rounding up and
subtracting to get chunk header... However, the latter would need to use the
GUID search algorithm."

I would not need to use the GUID search algorithm at all if I split the
reserved virtual memory address space into 2 sections and _strictly_
enforced it. For instance, I reserve 4 gigs of memory, and split it into two
sections; one for allocations <= chunk size, and one for allocations > chunk
size. But that wastes memory. What if the application only made small
allocations and wanted to use more than 2 gigs? Or if the application only
did large allocations and wanted to use more than 2 gigs? This scheme would
allow me to get around the search algorithm by using the following
algorithm; pseudo-code:
__________________________________________________________________________
#define CACHE_LINE 128UL
#define PAGE_SIZE 4096UL
#define CHUNK_SIZE (PAGE_SIZE * 16UL)
#define RESERVE_SIZE 0xFFFFFFFFUL
#define LARGE_SIZE (RESERVE_SIZE / 2UL)

static unsigned char* reserved_mem = VirtualAlloc(RESERVE_SIZE);
static unsigned char* small_mem = reserved_mem;
static unsigned char* large_mem = small_mem + LARGE_SIZE;

struct header {
struct header* next;
struct header* prev;
per_thread* thread;
size_t size;
};

/* small chunks aligned on CHUNK_SIZE boundary! */
union chunk {
struct header header;
unsigned char pad[CACHE_LINE];
};

bool
is_mem_large(unsigned char* mem) {
if (mem < large_mem) {
return false;
}
return true;
}

union chunk*
chunk_from_mem(unsigned char* mem) {
union chunk* chunk;
if (! is_mem_large(mem)) {
chunk = (union chunk*)
((ALIGN(mem, CHUNK_SIZE) - sizeof(union chunk))
} else {
chunk = (union chunk*)(mem - sizeof(union chunk);
}
return chunk;
}

void free(void* mem) {
per_thread* const this_thread = pthread_getspecific(...);
union chunk* const chunk = chunk_from_mem(mem);
if (chunk->header.this.thread) {
if (chunk->header.this.thread == this_thread) {
per_thread_local_free(chunk, mem);
} else {
per_thread_remote_free(chunk, mem);
}
} else {
global_free(chunk);
}
}
__________________________________________________________________________

That would work fine, but it has limitations in that small chunks could not
be allocated from the large chunk pool and vise-versa. However, if I used
the GUID search thing, then the entire reserved memory could be organized
into adjacent chunks and both small and large allocations could use the same
pool. What am I missing here? I hope its not something ridiculously simple!

Humm... Perhaps splitting the pool is not so bad. Humm...

;^/

Chris M. Thomasson

unread,

Oct 2, 2008, 1:17:04 PM10/2/08

to

"Eric Sosman" <Eric....@sun.com> wrote in message
news:1222959200.617375@news1nwk...

The most important thing, is that it contains a pointer to what thread
currently owns the chunk. I do this so that allocation's on a non-empty
chunk do no need any synchronization whatsoever. Also, local frees do not
need any sync. Remote frees need a single atomic RMW, and local allocation
boundary conditions, such as chunk-empty, needs a single atomic RMW. Its
highly scaleable design. Here is brief pseudo-code for such an algorithm:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8245f4b48591fc69

This is not optimized because each block has a header. I can get rid of the
header by using the following:

http://groups.google.com/group/comp.programming.threads/msg/bc019dc04362be41

http://groups.google.com/group/comp.programming.threads/msg/1d3aeaa6ebf5181f

As you can see, this technique gives 100% header-less blocks.

> Second,
> how does a chunk discover its own size? Are all chunks the same
> fixed size,

Small chunks would all be the same size. Large chunks would be any size >
the small chunk size. I would use small chunks in per-thread heap, and large
chunks in global heap. When a thread is created it requests some chunks from
the "global chunk server" and builds a segerated bucket based heap. Or, I
could do a region allocator and have a single doubly linked list of heaps.
BTW, here is fully working example code of such a per-thread region
allocator:

http://webpages.charter.net/appcore/vzoom/malloc/sergey_vzmem_thread.html

In this case, std::malloc is the global chunk server. This allocator has
100% header-less blocks. Blocks can be variant sizes, as long as their less
than the region size; the `THREAD_ALLOCATOR_HEAP_SIZE' macro determines this
fixed size. This is the problem I am trying to overcome. I want to be able
to allocate blocks larger than the region size in this allocator. I can do
it like this:

http://groups.google.com/group/comp.programming.threads/msg/12f4a1fc1adbe993

But it has limitations. I think I can use the GUID search HACK to go ahead
and get around those...

> or is the size stored somewhere in the chunk -- and if
> you're already storing metadata in the chunk, why not store the
> rest of the header there, too?

Typically, I would store the following in a chunk header for a slab-based
allocator with fixed sized chunks:

struct chunk {
struct chunk* next;
struct chunk* prev;
per_thread* thread;
void* blocks;
size_t blksize;
};

`chunk::thread' points to the thread which owns it. `chunk::blocks' is a
LIFO of unused memory. `chunk::blksize' points to the block sizes. For a
region allocator the header can be like:

struct chunk {
struct chunk* next;
struct chunk* prev;
per_thread* thread;
unsigned char* base_mem;
size_t offset;
};

where `chunk::base_mem + chunk::offset' is the next available memory. The
advantage is that blocks can be any size < chunk size. The advantage the
slab-allocator has over region is that freed blocks can be reused
immediately. A disadvantage slabs have is that memory sizes requested by the
application will be rounded up to `sizeof(void*). That is any size that is
less than sizeof(void*) will be set to sizeof(void*), all sizes greater than
that stay the same. But, you already know the differences between slab and
region allocation, so enough with that.

> An observation: The GUID doesn't seem to add much value. Any
> unlikely signature would do just as well -- that is to say, just as
> poorly.

Its not bullet-proof in any way shape or form! However, I can't really think
of any other way except the method I laied out for Dmitriy. Humm... I need
some clever suggestions!

> A counter-question: Why must headers be adjacent to chunks?

A block of memory needs to be able to get at this header using just the
address to itself. Remember, its blocks are header-less. However, turns out
that an address to a block is all the data it needs in order to get its
containing chunks header. The algorithm is simple... The block rounds its
address up to the size of a chunk boundary, then subtracts the size of the
chunk plus the size of chunks header. Bingo! It has a pointer to the chunks
header. For slab-allocator the block can then figure out its size by using
the `chunk::blksize' member. For region allocator, well, the block could
care less how big it is.

> Would
> it help if they lived in their own dedicated data structure? I've used
> skip-lists in a separate area for this purpose as part of a malloc()
> intended for debugging; keeping the metadata apart from the allocated
> data helps limit the damage from common programming errors.

How could a header-less block get at these external chunk headers without
any synchronization whatsoever? Or, better question, how would you
efficiently map a chunk to these external headers such that no sync was
needed? In my scheme small chunk's (16 pages) headers are owned on a
per-thread basis; thus eliminating the need for synchronization on nearly
all cases. The exception is when a block grabs its chunks header during the
freeing process and realizes that the thread which owns the chunk is not the
one calling `free()'. This requires the use of a single atomic RMW. For
slab-allocator this would be single-width CAS, for regions it would be
fetch-and-add.

>> Also, are there better ways to support 100% header-less blocks in a
>> memory allocator? IMVHO, header-less blocks are a MAJOR plus when
>> designing highly efficient memory allocation implementations. Think about
>> it... A 4-byte allocation actually means 4-bytes are allocated
>> internally. No 4-bytes plus internal block header information! That's
>> crap and totally wastes memory on a per-allocation basis; which can
>> quickly add up indeed.
>
> Well, you need *some* way to store the size information, and maybe
> other information as well. I suppose you could use the block address
> to encode size: A block address of binary ...xxx100 would indicate a
> size of four bytes, ...xx1000 would indicate eight bytes, and so on.
> Fragmentation would be horrific, though.

I am rounding block address and perform subtraction to get at its containing
chunks header. It works. I posted some working example code, perhaps you
should take a quick look at it. Any questions are welcome indeed.

>> There is a way to minimize the GUID scheme, but it requires two sections
>> of virtual memory which can be wasteful. It involves block sizes <= chunk
>> size to be allocated from a separate memory pool than block sizes that
>> are > chunk size. The former can use the normal scheme of simple rounding
>> up and subtracting to get chunk header... However, the latter would need
>> to use the GUID search algorithm.
>
> You haven't described your allocator's purposes and constraints
> fully enough for me to follow this paragraph.

I hope I have begun to start describing it in some more detail; how am I
doing? Should I give full high-level pseudo-code of entire allocator
algorithm I am thinking of?

;^|

Dmitriy V'jukov

unread,

Oct 2, 2008, 2:00:02 PM10/2/08

to

On Oct 2, 8:29 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> But how could I efficiently intermingle large blocks and small blocks from a
> single reserved memory pool?

In one of my programs I used following scheme:

template<typename derived_t>
class object_t
{
static void* operator new (size_t sz)
{
// derived_t must be most derived type
assert(sz == sizeof(derived_t));
// resolved at compile-time
if (sizeof(derived_t) <= 4)
return thread().fixed_alloc_4.alloc();
else if (sizeof(derived_t) <= 8)
return thread().fixed_alloc_8.alloc();
else if (sizeof(derived_t) <= 16)
return thread().fixed_alloc_8.alloc();
else if (sizeof(derived_t) <= 32)
return thread().fixed_alloc_8.alloc();
[...]
else
return ::malloc(sizeof(derived_t));
}

static void operator delete (void* p)
{
// resolved at compile-time
if (sizeof(derived_t) <= 4)
thread().fixed_alloc_4.free(p);
else if (sizeof(derived_t) <= 8)
thread().fixed_alloc_8.free(p);
else if (sizeof(derived_t) <= 16)
thread().fixed_alloc_8.free(p);
else if (sizeof(derived_t) <= 32)
thread().fixed_alloc_8.free(p);
[...]
else
::free(p);
}
};

Dmitriy V'jukov

Chris M. Thomasson

unread,

Oct 2, 2008, 2:50:57 PM10/2/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:a5656716-2077-4177...@v53g2000hsa.googlegroups.com...

On Oct 2, 8:29 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > But how could I efficiently intermingle large blocks and small blocks
> > from a
> > single reserved memory pool?

> In one of my programs I used following scheme:

[...]

Please correct me if I am wrong, but it seems as if this particular
allocator interface would "need" to know what type it was dealing with in
advance; correct? In other words, it probably could not be used to overload
global operator new/delete right? Or, is `derived_t' dealing with internal
allocator structures. Where am I going wrong here Dmitriy? You know, I was
thinking about creating something special for C++. But, I am a C programmer,
and felt the need to support my language of choice. The cool thing about C++
is that you can do exactly what you did. Also, you can overload class local
delete operator and have it automatically return the size of the dynamic
allocation be it single object or array. If the user allocated an array, you
simply divide this size of object size, and bam you have array size. I think
C++ is neat, but I have to support C++. I mean, the region allocator I did
in C++ did not even think about taking advantage of its "features". A C guy
thinks well, need to overload global new/delete. We don't think about the
fact that C++ can provide the size of an allocation in the class-local
overloaded delete operator.

Humm... I need to learn C++! I have some problems reacting to error
conditions in dtors as per my following query over in `comp.lang.c++':

http://groups.google.com/group/comp.lang.c++/browse_frm/thread/ec97ab562016d016

lol; I think ASM and program C; am I totally lost?

;^D

BTW, I wonder why C++ could not return the size of dynamic allocation in
global overloaded delete operator? Humm... There must be a perfectly
straight forward reason indeed. Also, try to cut me some slack if I ask
totally retarded questions about C++... :^P

Chris M. Thomasson

unread,

Oct 2, 2008, 2:55:02 PM10/2/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:%D8Fk.39667$hX5....@newsfe06.iad...

> "Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
> news:a5656716-2077-4177...@v53g2000hsa.googlegroups.com...
> On Oct 2, 8:29 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
>> > But how could I efficiently intermingle large blocks and small blocks
>> > from a
>> > single reserved memory pool?
>
>> In one of my programs I used following scheme:
>
> [...]
>
> Please correct me if I am wrong, but it seems as if this particular
> allocator interface would "need" to know what type it was dealing with in
> advance; correct? In other words, it probably could not be used to
> overload global operator new/delete right? Or, is `derived_t' dealing with
> internal allocator structures. Where am I going wrong here Dmitriy? You
> know, I was thinking about creating something special for C++. But, I am a
> C programmer, and felt the need to support my language of choice. The cool
> thing about C++ is that you can do exactly what you did. Also, you can
> overload class local delete operator and have it automatically return the
> size of the dynamic allocation be it single object or array. If the user
> allocated an array, you simply divide this size of object size, and bam
> you have array size. I think C++ is neat, but I have to support C++.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Ummm... That last sentence should have read as:

I think C++ is neat, but I have to support C.

;^|

[...]

What do you think Dmitriy... Should I create pseudo general-purpose
allocator for C++ only that is not able to overload global new/delete
operator? It could be extremely efficient indeed if it took advantage of the
fact that C++ can return the size of the deallocation directly to the
allocator. Humm... Its fairly enticing indeed!

;^D

Dmitriy V'jukov

unread,

Oct 2, 2008, 3:00:46 PM10/2/08

to

On Oct 2, 10:50 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> Please correct me if I am wrong, but it seems as if this particular
> allocator interface would "need" to know what type it was dealing with in
> advance; correct?

Yes,

> In other words, it probably could not be used to overload
> global operator new/delete right?

Yes.

> Or, is `derived_t' dealing with internal
> allocator structures. Where am I going wrong here Dmitriy? You know, I was
> thinking about creating something special for C++. But, I am a C programmer,
> and felt the need to support my language of choice. The cool thing about C++
> is that you can do exactly what you did. Also, you can overload class local
> delete operator and have it automatically return the size of the dynamic
> allocation be it single object or array. If the user allocated an array, you
> simply divide this size of object size, and bam you have array size.

It's not always true. For example:

template<typename derived_t>
struct object_t

{
static void* operator new (size_t sz)
{

return malloc(sizeof(derived_t));
}
};

struct derived1 : object_t<derived1>
{
int x1;
};

struct derived2 : derived1
{
int x2;
};

derived2* d2 = new derived2 (); // CRASH! Allocate memory only for
derived1, derived2 will not fit into that memory

Dmitriy V'jukov

unread,

Oct 2, 2008, 3:07:41 PM10/2/08

to

On Oct 2, 10:55 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> I think C++ is neat, but I have to support C.

You can mimic this pattern in C:

#define ALLOC(T) \
if (sizeof(T) <= 4) \
return thread().fixed_alloc_4.alloc(); \
if (sizeof(T) <= 8) \
return thread().fixed_alloc_8.alloc(); \
[...]
else \
return ::malloc(sizeof(T)) \
/**/

#define FREE(T, p) \
if (sizeof(T) <= 4) \
thread().fixed_alloc_4.free(p); \
if (sizeof(T) <= 8) \
thread().fixed_alloc_8.free(p); \
[...]
else \
::free(p) \
/**/

Usage:

struct my_obj_s
{
int x, y;
};

typedef struct my_obj_s my_obj_t;

my_obj_t* x = ALLOC(my_obj_t);
...
FREE(my_obj_t, x);

It's still not global malloc/free, and user still have to supply exact
size of allocation to free(). But it's C.

Dmitriy V'jukov

Chris M. Thomasson

unread,

Oct 2, 2008, 3:20:25 PM10/2/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:QH8Fk.39668$hX5....@newsfe06.iad...

Well, I could encourage user to no do anything like:
_____________________________________________________________
#include <cstdio>

int main() {
char* buf = new char[1234];
delete [] buf;
std::getchar();
return 0;
}

_____________________________________________________________

But instead, do something like:
_____________________________________________________________
template<std::size_t T_size>
class buf : public allocator_base<buf<T_size> > {
char mem[T_size];
};

int main() {
buf<1234>* b = new buf<1234>;
delete b;

b = new buf<1234>[5];
delete [] b;

std::getchar();
return 0;
}
_____________________________________________________________

The back-end support code would be like:
_____________________________________________________________
#include <cstdio>
#include <cstdlib>
#include <new>

struct custom_allocator {
static void* allocate(std::size_t size)
throw(std::bad_alloc()) {
void* const mem = ::operator new(size);
std::printf("custom_allocator::allocate(%p, %lu)\n",
(void*)mem, (unsigned long)size);
return mem;
}

static void deallocate(void* const mem, std::size_t size)
throw() {
std::printf("custom_allocator::deallocate(%p, %lu)\n",
(void*)mem, (unsigned long)size);
::operator delete(mem);
}
};

template<typename T>
struct allocator_base {
void* operator new(std::size_t size) throw(std::bad_alloc()) {
return custom_allocator::allocate(size);
}

void* operator new[](std::size_t size) throw(std::bad_alloc()) {
return custom_allocator::allocate(size);
}

void operator delete(void* mem) throw() {
if (mem) {
custom_allocator::deallocate(mem, sizeof(T));
}
}

void operator delete [](void* mem, std::size_t size) throw() {
if (mem) {
custom_allocator::deallocate(mem, size);
}
}
};
_____________________________________________________________

Humm... AFAICT, this is pretty neat for a C guy to look at!

;^P

Chris M. Thomasson

unread,

Oct 2, 2008, 3:30:09 PM10/2/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:aabd7f24-429e-4276...@j22g2000hsf.googlegroups.com...

> On Oct 2, 10:50 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
> > Please correct me if I am wrong, but it seems as if this particular
> > allocator interface would "need" to know what type it was dealing with
> > in
> > advance; correct?
>
> Yes,
>
> > In other words, it probably could not be used to overload
> > global operator new/delete right?
>
> Yes.
>
> > Or, is `derived_t' dealing with internal
> > allocator structures. Where am I going wrong here Dmitriy? You know, I
> > was
> > thinking about creating something special for C++. But, I am a C
> > programmer,
> > and felt the need to support my language of choice. The cool thing about
> > C++
> > is that you can do exactly what you did. Also, you can overload class
> > local
> > delete operator and have it automatically return the size of the dynamic
> > allocation be it single object or array. If the user allocated an array,
> > you
> > simply divide this size of object size, and bam you have array size.

> It's not always true. For example:

[...]

Check this program out:
_____________________________________________________________________

#include <cstdio>
#include <cstdlib>
#include <new>

struct custom_allocator {
static void* allocate(std::size_t size)
throw(std::bad_alloc()) {
void* const mem = ::operator new(size);
std::printf("custom_allocator::allocate(%p, %lu)\n",
(void*)mem, (unsigned long)size);
return mem;
}

static void deallocate(void* const mem, std::size_t size)
throw() {
std::printf("custom_allocator::deallocate(%p, %lu)\n",
(void*)mem, (unsigned long)size);
::operator delete(mem);
}
};

template<typename T>
struct allocator_base {

static void* operator new(std::size_t size)

throw(std::bad_alloc()) {
return custom_allocator::allocate(size);
}

static void* operator new[](std::size_t size)

throw(std::bad_alloc()) {
return custom_allocator::allocate(size);
}

static void operator delete(void* mem)

throw() {
if (mem) {
custom_allocator::deallocate(mem, sizeof(T));
}
}

static void operator delete [](void* mem, std::size_t size)

throw() {
if (mem) {
custom_allocator::deallocate(mem, size);
}
}
};

template<std::size_t T_size>
class buf : public allocator_base<buf<T_size> > {
char mem[T_size];
};

class buf2 : public buf<1234> {
char mem2[1000];
};

int main() {
buf2* b = new buf2;
delete b;

b = new buf2[5];
delete [] b;

return 0;
}

_____________________________________________________________________

Well, on GCC, it seems to calculate correct sizes... Alls I would need to
hard core optimizations would be the size on delete... Humm... What say you?

Chris M. Thomasson

unread,

Oct 2, 2008, 3:30:51 PM10/2/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:C39Fk.39672$hX5....@newsfe06.iad...
[...]

> template<typename T>
> struct allocator_base {
> void* operator new(std::size_t size) throw(std::bad_alloc()) {
> return custom_allocator::allocate(size);
> }
>
> void* operator new[](std::size_t size) throw(std::bad_alloc()) {
> return custom_allocator::allocate(size);
> }
>
> void operator delete(void* mem) throw() {
> if (mem) {
> custom_allocator::deallocate(mem, sizeof(T));
> }
> }
>
> void operator delete [](void* mem, std::size_t size) throw() {
> if (mem) {
> custom_allocator::deallocate(mem, size);
> }
> }
> };

Whoops! I forgot to make the member-function static!!! Yikes!

> _____________________________________________________________
[...]

Dmitriy V'jukov

unread,

Oct 2, 2008, 3:24:01 PM10/2/08

to

On Oct 2, 11:20 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> b = new buf<1234>[5];

> Humm... AFAICT, this is pretty neat for a C guy to look at!

Don't miss caveat I describe here:
http://groups.google.com/group/comp.programming.threads/msg/1463529870b08125?hl=en

Dmitriy V'jukov

Chris M. Thomasson

unread,

Oct 2, 2008, 3:32:18 PM10/2/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:c1a7e985-758d-460b...@t65g2000hsf.googlegroups.com...

What about:

http://groups.google.com/group/comp.programming.threads/msg/11a98f6836d38880

Is that undefined behavior?

Chris M. Thomasson

unread,

Oct 2, 2008, 3:35:16 PM10/2/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:Me9Fk.39675$hX5....@newsfe06.iad...

OOOOOHHHHH!!!

It works perfectly fine upon array delete... However, NOT on single object
delete!!!!! WTF!!!! Here is exact output:

custom_allocator::allocate(00246C50, 2234)
custom_allocator::deallocate(00246C50, 1234)
custom_allocator::allocate(00247760, 11174)
custom_allocator::deallocate(00247760, 11174)

What the FUC%!!!!! I see, I am using sizeof(T)... SHI%.

Chris M. Thomasson

unread,

Oct 2, 2008, 3:38:26 PM10/2/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:Kc9Fk.39673$hX5....@newsfe06.iad...

[...]

Okay. I see it works on array dtor. It does not work on single object
dtor... Please check this out! Run the origin program. I get an output of:

custom_allocator::allocate(00246C50, 2234)
custom_allocator::deallocate(00246C50, 1234)
custom_allocator::allocate(00247760, 11174)
custom_allocator::deallocate(00247760, 11174)

Now, swap out the origin main function with the following:

int main() {
buf2* b = new buf2[1];
delete [] b;

b = new buf2[5];
delete [] b;

std::getchar();

return 0;
}

Here is the output I get now:

custom_allocator::allocate(00246C50, 2238)
custom_allocator::deallocate(00246C50, 2238)

custom_allocator::allocate(00247760, 11174)
custom_allocator::deallocate(00247760, 11174)

Wow... It works now! Shi%T!!!!!

;^O

Chris M. Thomasson

unread,

Oct 2, 2008, 3:48:45 PM10/2/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:aabd7f24-429e-4276-bbcc-> 790bbc...@j22g2000hsf.googlegroups.com...

> Yes.

Well, apparently I find bug in GCC, or work around... Try this:

template<typename derived_t>
struct object_t
{
static void* operator new (size_t sz)
{
return malloc(sizeof(derived_t));
}
};

struct derived1 : object_t<derived1>
{
int x1;
};

struct derived2 : public derived1, public object_t<derived2>
{
int x2;
};

Here is full blown example code that compiles with GCC and MSVC:

______________________________________________________________
#include <cstdio>
#include <cstdlib>
#include <new>

struct custom_allocator {
static void* allocate(std::size_t size)
throw(std::bad_alloc()) {
void* const mem = ::operator new(size);
std::printf("custom_allocator::allocate(%p, %lu)\n",
(void*)mem, (unsigned long)size);
return mem;
}

static void deallocate(void* const mem, std::size_t size)
throw() {

std::printf("custom_allocator::deallocate(%p, %lu)\n",
(void*)mem, (unsigned long)size);
::operator delete(mem);
}
};

template<typename T>
struct allocator_base {

static void* operator new(std::size_t size)

throw(std::bad_alloc()) {
return custom_allocator::allocate(size);
}

static void* operator new[](std::size_t size)

throw(std::bad_alloc()) {
return custom_allocator::allocate(size);
}

static void operator delete(void* mem)

throw() {
if (mem) {
custom_allocator::deallocate(mem, sizeof(T));
}
}

static void operator delete [](void* mem, std::size_t size)

throw() {
if (mem) {
custom_allocator::deallocate(mem, size);
}
}
};

template<std::size_t T_size>
class buf {
char mem[T_size];
};

class buf2 : public buf<1234>, public allocator_base<buf2> {
char mem2[1000];
};

int main() {
buf2* b = new buf2;
delete b;

b = new buf2[5];
delete [] b;

return 0;
}
______________________________________________________________

Here is output I get:

custom_allocator::allocate(00246C50, 2234)
custom_allocator::deallocate(00246C50, 2234)

custom_allocator::allocate(00247760, 11174)
custom_allocator::deallocate(00247760, 11174)

Work around or UB? Well, it seems that it is 100% UB because the output I
get on MSVC 8 is:

custom_allocator::allocate(00362850, 2234)
custom_allocator::deallocate(00362850, 2234)
custom_allocator::allocate(00366B68, 11170)
custom_allocator::deallocate(00366B68, 2234)

WTF? No wonder I am a C guy!!!!

;^/

Chris M. Thomasson

unread,

Oct 2, 2008, 5:47:48 PM10/2/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:_f7Fk.12973$ex3...@newsfe02.iad...

> "Eric Sosman" <Eric....@sun.com> wrote in message

[...]

>> Second,
>> how does a chunk discover its own size? Are all chunks the same
>> fixed size,
>
> Small chunks would all be the same size. Large chunks would be any size >
> the small chunk size. I would use small chunks in per-thread heap, and
> large chunks in global heap. When a thread is created it requests some
> chunks from the "global chunk server" and builds a segerated bucket based
> heap. Or, I could do a region allocator and have a single doubly linked
> list of heaps. BTW, here is fully working example code of such a
> per-thread region allocator:
>
> http://webpages.charter.net/appcore/vzoom/malloc/sergey_vzmem_thread.html

[...]

Here is context for the odd name of the file on the Charter web-server...
Where did the name Sergey come from? Well, here of course:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/4038a7a8a4f5a7cb
(read all for full context...)

Dmitriy V'jukov

unread,

Oct 3, 2008, 6:40:04 AM10/3/08

to

On Oct 2, 11:30 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > void operator delete [](void* mem, std::size_t size) throw() {
> > if (mem) {
> > custom_allocator::deallocate(mem, size);
> > }
> > }
> > };
>
> Whoops! I forgot to make the member-function static!!! Yikes!

It's implicitly static. You can add static for clarity, but it's not
required.
It's like declaring function as virtual when you override virtual
function in base class. You can mark it as virtual, or you can not
mark it as virtual - no difference, it anyway will be virtual.

Dmitriy V'jukov

unread,

Oct 3, 2008, 6:42:50 AM10/3/08

to

On Oct 2, 11:38 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> buf2* b = new buf2[1];
> delete [] b;

The problem is that you have to specify *exact* dynamic type of array
elements to delete[]. This is why it works - compiler knows exact type
(and so sizeof) of elements.

Dmitriy V'jukov

Michel Decima

unread,

Oct 3, 2008, 7:56:34 AM10/3/08

to

Dmitriy V'jukov a �crit :

But if you use the 'sz' parameter of operator new to call malloc(),
then you will allocate enough memory for derived2.
And you can overload the class-local delete operator with a second
size_t parameter, you will known the size of the object to deallocate:

#include <stdlib.h>
#include <stdio.h>

void* alloc_8() { printf( "alloc_8\n") ; return malloc( 8 ) ; }
void* alloc_16() { printf( "alloc_16\n") ; return malloc( 16 ) ; }
void free_8( void* ptr ) { printf( "free_8\n" ); ::free( ptr ) ; }
void free_16( void* ptr ) { printf( "free_16\n" ); ::free( ptr ) ; }

template<typename derived_t>
struct object_t
{

// derived_t is not required to be most derived type

static void* operator new ( size_t sz )
{
// resolution at runtime
if ( sz <= 8 )
return alloc_8() ;
else if ( sz <= 16 )
return alloc_16() ;
else
return ::malloc( sz ) ;
}

static void operator delete ( void* ptr, size_t sz )
{
// resolution at runtime
if ( sz <= 8 )
return free_8( ptr ) ;
else if ( sz <= 16 )
return free_16( ptr ) ;
else
::free( ptr ) ;
}
} ;

struct A : object_t< A >
{
virtual ~A() {}
long x ;
} ;

struct B : A
{
long y ;
} ;

int main()
{
printf( "sizeof(A)=%lu\n", sizeof( A ) ) ;
A* a = new A ;
delete a ;

printf( "sizeof(B)=%lu\n", sizeof( B ) ) ;
B* b = new B ;
delete b ;
}

Dmitriy V'jukov

unread,

Oct 3, 2008, 8:52:20 AM10/3/08

to

On Oct 3, 3:56 pm, Michel Decima <michel.dec...@orange-ft.com> wrote:

> > derived2* d2 = new derived2 (); // CRASH! Allocate memory only for
> > derived1, derived2 will not fit into that memory
>
> But if you use the 'sz' parameter of operator new to call malloc(),
> then you will allocate enough memory for derived2.
> And you can overload the class-local delete operator with a second
> size_t parameter, you will known the size of the object to deallocate:

Cool! I completely miss this moment.
The point is that if user calls delete for object and object doesn't
have virtual destructor, then static and dynamic types of object MUST
be identical. So compiler can always determine correct size.

ISO C++ 12.5/4
If a delete-expression begins with a unary :: operator, the
deallocation function’s name is looked up in
global scope. Otherwise, if the delete-expression is used to
deallocate a class object whose static type has a
virtual destructor, the deallocation function is the one found by the
lookup in the definition of the dynamic
type’s virtual destructor (12.4).104) Otherwise, if the delete-
expression is used to deallocate an object of
class T or array thereof, the static and dynamic types of the object
shall be identical and the deallocation
function’s name is looked up in the scope of T.

The only thing I don't like is dynamic dispatch on size.
Allocation/deallocation functions in fixed size allocator can be as
small as 5 machine instructions. So dynamic dispatch on size can
increase cost of allocation several times!

The only thing we can do is:

__forceinline static void* operator new ( size_t sz )
{
// resolution at runtime
if ( sz <= 4 && sizeof(derived_t) <= 4 )
return alloc_4() ;
else if ( sz <= 8 && sizeof(derived_t) <= 8 )
return alloc_8() ;
else if ( sz <= 16 && sizeof(derived_t) <= 16 )

return alloc_16() ;
else
return ::malloc( sz ) ;
}

and pray for inlining :)

Dmitriy V'jukov

Michel Decima

unread,

Oct 3, 2008, 9:28:13 AM10/3/08

to

Dmitriy V'jukov a ï¿½crit :

> The only thing I don't like is dynamic dispatch on size.
> Allocation/deallocation functions in fixed size allocator can be as
> small as 5 machine instructions. So dynamic dispatch on size can
> increase cost of allocation several times!

Mmh. I think my programs don't need as much optimisation as yours ;)

> The only thing we can do is:
>
> __forceinline static void* operator new ( size_t sz )
> {
> // resolution at runtime
> if ( sz <= 4 && sizeof(derived_t) <= 4 )
> return alloc_4() ;
> else if ( sz <= 8 && sizeof(derived_t) <= 8 )
> return alloc_8() ;
> else if ( sz <= 16 && sizeof(derived_t) <= 16 )
> return alloc_16() ;
> else
> return ::malloc( sz ) ;
> }
>
> and pray for inlining :)

Yes.

Chris M. Thomasson

unread,

Oct 4, 2008, 1:19:59 AM10/4/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:4eee5b3c-df5a-4b28...@a70g2000hsh.googlegroups.com...

On Oct 3, 3:56 pm, Michel Decima <michel.dec...@orange-ft.com> wrote:

> > > derived2* d2 = new derived2 (); // CRASH! Allocate memory only for
> > > derived1, derived2 will not fit into that memory
> >
> > But if you use the 'sz' parameter of operator new to call malloc(),
> > then you will allocate enough memory for derived2.
> > And you can overload the class-local delete operator with a second
> > size_t parameter, you will known the size of the object to deallocate:

> Cool! I completely miss this moment.
> The point is that if user calls delete for object and object doesn't
> have virtual destructor, then static and dynamic types of object MUST
> be identical. So compiler can always determine correct size.

Also, I don't think that your `object_t' class needs to be a template at
all. Here is test code:
_________________________________________________________________

#include <cstdio>
#include <cstdlib>
#include <new>

struct custom_allocator {
static void* allocate(std::size_t size)

throw(std::bad_alloc) {
void* const mem = std::malloc(size);
if (! mem) {
throw std::bad_alloc();

}
std::printf("custom_allocator::allocate(%p, %lu)\n",
(void*)mem, (unsigned long)size);
return mem;
}

static void deallocate(void* const mem, std::size_t size)
throw() {
if (mem) {

std::printf("custom_allocator::deallocate(%p, %lu)\n",
(void*)mem, (unsigned long)size);

std::free(mem);
}
}
};

struct allocator_base {

void* operator new(std::size_t size)

throw(std::bad_alloc) {
return custom_allocator::allocate(size);
}

void* operator new [](std::size_t size)

throw(std::bad_alloc) {
return custom_allocator::allocate(size);
}

void operator delete(void* mem, std::size_t size)
throw() {
custom_allocator::deallocate(mem, size);
}

void operator delete [](void* mem, std::size_t size)
throw() {

custom_allocator::deallocate(mem, size);
}
};

template<std::size_t T_size>
class buf : public allocator_base {
char mem[T_size];
public:
virtual ~buf() throw() {}
};

class buf2 : public buf<1234> {
char mem2[1000];
};

int main() {
buf<1024>* b1 = new buf<1024>;
delete b1;

buf2* b2 = new buf2;
delete b2;

b2 = new buf2[5];
delete [] b2;

return 0;
}

_________________________________________________________________

On every version of GCC I have, I get the following output on a 32-bit
machine:

custom_allocator::allocate(00246C50, 1028)
custom_allocator::deallocate(00246C50, 1028)
custom_allocator::allocate(002472A8, 2240)
custom_allocator::deallocate(002472A8, 2240)
custom_allocator::allocate(002472A8, 11204)
custom_allocator::deallocate(002472A8, 11204)

On every version of MSVC, I get:

custom_allocator::allocate(00365B28, 1028)
custom_allocator::deallocate(00365B28, 1028)
custom_allocator::allocate(00362850, 2240)
custom_allocator::deallocate(00362850, 2240)
custom_allocator::allocate(00366FA8, 11204)
custom_allocator::deallocate(00366FA8, 2240)

Well, AFFLICT, MSVC has a NASTY bug indeed! That too bad. There is nothing
worse that a bug in the fuc%ing compiler! Ouch.

[...]

> The only thing I don't like is dynamic dispatch on size.
> Allocation/deallocation functions in fixed size allocator can be as
> small as 5 machine instructions. So dynamic dispatch on size can
> increase cost of allocation several times!

> The only thing we can do is:

> __forceinline static void* operator new ( size_t sz )
> {
> // resolution at runtime
> if ( sz <= 4 && sizeof(derived_t) <= 4 )
> return alloc_4() ;
> else if ( sz <= 8 && sizeof(derived_t) <= 8 )
> return alloc_8() ;
> else if ( sz <= 16 && sizeof(derived_t) <= 16 )
> return alloc_16() ;
> else
> return ::malloc( sz ) ;
> }
>
> and pray for inlining :)

lol. ;^D

Chris M. Thomasson

unread,

Oct 4, 2008, 1:41:41 AM10/4/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:68aa47ab-a67b-46d3...@u57g2000hsf.googlegroups.com...

On Oct 2, 10:55 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > I think C++ is neat, but I have to support C.

> You can mimic this pattern in C:

> [...]

> It's still not global malloc/free, and user still have to supply exact
> size of allocation to free(). But it's C.

Humm... What do you think about the following technique; here is a full
example program:
_________________________________________________________________________
#include <stdio.h>
#include <stdlib.h>

static void*
thread_alloc_4(size_t size) {
void* const mem = malloc(size);
printf("thread_alloc_4(%lu) -> %p\n",
(unsigned long int)size, mem);
return mem;
}

static void
thread_free_4(void* mem, size_t size) {
printf("thread_free_4(%lu) -> %p\n",
(unsigned long int)size, mem);
free(mem);
}

static void*
thread_alloc_8(size_t size) {
void* const mem = malloc(size);
printf("thread_alloc_8(%lu) -> %p\n",
(unsigned long int)size, mem);
return mem;
}

static void
thread_free_8(void* mem, size_t size) {
printf("thread_free_8(%lu) -> %p\n",
(unsigned long int)size, mem);
free(mem);
}

#define MALLOC(mp_ptr) ( \
(sizeof(*(mp_ptr)) <= 4) \
? thread_alloc_4(sizeof(*(mp_ptr))) \
: (sizeof(*(mp_ptr)) <= 8) \
? thread_alloc_8(sizeof(*(mp_ptr))) \
: malloc(sizeof(*(mp_ptr))) \
)

#define FREE(mp_ptr) ( \
(sizeof(*(mp_ptr)) <= 4) \
? thread_free_4((mp_ptr), sizeof(*(mp_ptr))) \
: (sizeof(*(mp_ptr)) <= 8) \
? thread_free_8((mp_ptr), sizeof(*(mp_ptr))) \
: free((mp_ptr)) \
)

struct foo1 {
int a;
};

struct foo2 {
int a;
int b;
};

struct foo3 {
char mem[6];
};

int main() {
struct foo1* f1 = MALLOC(f1);
struct foo2* f2 = MALLOC(f2);
struct foo3* f3 = MALLOC(f3);
FREE(f3);
FREE(f2);
FREE(f1);
return 0;
}
_________________________________________________________________________

Humm... That should work fine.

Dmitriy V'jukov

unread,

Oct 6, 2008, 3:02:32 AM10/6/08

to

On Oct 4, 9:19 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> On every version of MSVC, I get:
>
> custom_allocator::allocate(00365B28, 1028)
> custom_allocator::deallocate(00365B28, 1028)
> custom_allocator::allocate(00362850, 2240)
> custom_allocator::deallocate(00362850, 2240)
> custom_allocator::allocate(00366FA8, 11204)
> custom_allocator::deallocate(00366FA8, 2240)
>
> Well, AFFLICT, MSVC has a NASTY bug indeed! That too bad. There is nothing
> worse that a bug in the fuc%ing compiler! Ouch.

This should help:

void operator delete [](void* mem, std::size_t size) throw()
{

#if defined(_MSC_VER)
# if _MSC_VER == 1310 || _MSC_VER == 1400 || _MSC_VER == 1500
size = size * *(size_t*)mem + sizeof(size_t);
# else
# error "check this MSVC ver for the error with delete[]"
# endif
#endif
custom_allocator::deallocate(mem, size);
}

Dmitriy V'jukov