Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

newbie: ar[1] at end of POD

0 views
Skip to first unread message

andre...@yahoo.com

unread,
Sep 12, 2006, 5:17:22 PM9/12/06
to
Hi,

I am creating pools of small integer arrays. There are individual
allocs from the pool, but
no frees to the pool, and one free for the whole pool. The pool
creates chunks of 10000 elem
arrays. I subdivide the chunk into Snippet as follows:

class Snippet {
friend class SnippetPool;

int ar[1]; // first element is length
// enough room in Snippet object to hold more
elems
// this is the ONLY data member of POD

public:

int length () const
{
return ar[0];
}

int& operator [] (int i)
{
return ar[i+1]; // bounds check omitted
}
};

My question is if I have a large chunk of int*, how do I create
Snippets of length N
from these chunks? Want it to work on all platforms! For example,

class SnippetPool {
// assume that we have one chunk big enough to
// hold all snippets, and destructor throws away chunk

int* buffptr;
int buffcnt;

Snippet* Alloc (int len)
{
len++;
if (len < 1 || len > buffcnt)
throw MyException();

void* p = buffptr;
buffptr += len;

// does the following make assumptions about the C++
// implementation of POD? What's better?
Snippet* s = static_cast<Snippet*>(p);

// set the logical length for caller
// but leave rest of array as garbage
s->ar[0] = len-1;
return s;
}
};


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

kanze

unread,
Sep 13, 2006, 9:47:23 AM9/13/06
to
andre...@yahoo.com wrote:

> I am creating pools of small integer arrays. There are
> individual allocs from the pool, but no frees to the pool, and
> one free for the whole pool. The pool creates chunks of 10000
> elem arrays. I subdivide the chunk into Snippet as follows:

> class Snippet {
> friend class SnippetPool;

> int ar[1]; // first element is length
> // enough room in Snippet object to hold more
> elems
> // this is the ONLY data member of POD

> public:

> int length () const
> {
> return ar[0];
> }

> int& operator [] (int i)
> {
> return ar[i+1]; // bounds check omitted
> }

Note that this is undefined behavior unless i is -1. Both in C
and in C++.

> };

> My question is if I have a large chunk of int*, how do I
> create Snippets of length N from these chunks? Want it to
> work on all platforms!

I'm not sure I understand. An int* is not an int, and if you
have a large chunk of int*, you have int*, and not Snippet.

> For example,

> class SnippetPool {
> // assume that we have one chunk big enough to
> // hold all snippets, and destructor throws away chunk

> int* buffptr;
> int buffcnt;

> Snippet* Alloc (int len)
> {
> len++;
> if (len < 1 || len > buffcnt)
> throw MyException();

> void* p = buffptr;
> buffptr += len;

> // does the following make assumptions about the C++
> // implementation of POD?

Definitly. (Note too that Snippet is NOT a POD.)

> What's better?

Maintaining unsigned char*, and adding the actual size of a
Snippet.

> Snippet* s = static_cast<Snippet*>(p);

> // set the logical length for caller
> // but leave rest of array as garbage
> s->ar[0] = len-1;
> return s;
> }
> };

It looks to me that first, you are trying to use the dirty
struct hack, which is forbidden in both C and C++. C has since
added VLA, in order to support something similar; C++ hasn't.
It's possible to do something similar, with:

struct Snippet
{
int length ;
int* buffer()
{
return reinterpret_cast< int* >( this + 1 ) ;
}
} ;

As the reinterpret_cast suggests, it is a bit playing with fire,
and you have to be very careful about alignment issues in
particular. (At least at one point in time, g++ did something
similar in its implementation of std::basic_string. And
std::basic_string<double> was a recepe for instant core dump on
my machine.)

Second, if you're managing memory, unless all elements are
really the same type (which isn't the case here), you should use
unsigned char* to manage it, calculating each time how many
bytes you need. (Here too, you need to pay attention to
alignment issues.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

alex

unread,
Sep 13, 2006, 11:57:58 AM9/13/06
to
andre...@yahoo.com wrote:
> Hi,
>
> I am creating pools of small integer arrays. There are individual
> allocs from the pool, but
> no frees to the pool, and one free for the whole pool. The pool
> creates chunks of 10000 elem
> arrays. I subdivide the chunk into Snippet as follows:
>
> class Snippet {
> friend class SnippetPool;
>
> int ar[1]; // first element is length
> // enough room in Snippet object to hold more
elems
> // this is the ONLY data member of POD
> public:
> int length () const
> {
> return ar[0];
> }
> int& operator [] (int i)
> {
> return ar[i+1]; // bounds check omitted
> }
> };

Strictly speaking, this is not a POD since it have member functions.

{ incorrect. it's not a POD because 'ar' is private. -mod }

If the purpose of such a class is to implement bounds checking then I
would use something like this:

#include <stdexcept>

class Snippet
{
public:
Snippet(int* _begin, size_t _length)
: begin(_begin), length(_length)
{
}
size_t get_length(void) const
{
return length;
}
int& operator[](size_t offset) const
{
if (offset >= length)
throw std::out_of_range(__FUNCTION__);
return begin[offset];
}
private:
int* begin;
size_t length;
};

> My question is if I have a large chunk of int*, how do I create
> Snippets of length N
> from these chunks? Want it to work on all platforms! For example,
>
> class SnippetPool {
> // assume that we have one chunk big enough to
> // hold all snippets, and destructor throws away chunk
>
> int* buffptr;
> int buffcnt;
>
> Snippet* Alloc (int len)
> {
> len++;
> if (len < 1 || len > buffcnt)
> throw MyException();
>
> void* p = buffptr;
> buffptr += len;
>
> // does the following make assumptions about the C++
> // implementation of POD? What's better?
> Snippet* s = static_cast<Snippet*>(p);
>
> // set the logical length for caller
> // but leave rest of array as garbage
> s->ar[0] = len-1;
> return s;
> }
> };

For this class I would use something like:

#include <vector>

class SnippetPool
{
public:
SnippetPool(size_t capacity)
: buffer(capacity), room(capacity)
{
}
Snippet alloc(size_t length)
{
if (length > room)
throw std::bad_alloc(__FUNCTION__);
int* p = &buffer[0] + (buffer.size() - room);
room -= length;
return Snippet(p, length);
}
private:
std::vector< int > buffer;
size_t room;
};

1. Quite portable.
2. Snippet object is returned by value, not by pointer, and may be
copied safely.
3. The buffer elements are initialized upon SnippetPool construction.

The classes may be further generalized using templates if you want.
Simply add

template< typename T >

right before class declarations. And replace 'int' with 'T' where
appropriate.


Alex

Greg Herlihy

unread,
Sep 13, 2006, 12:26:47 PM9/13/06
to
andre...@yahoo.com wrote:
> Hi,
>
> I am creating pools of small integer arrays. There are individual
> allocs from the pool, but
> no frees to the pool, and one free for the whole pool. The pool
> creates chunks of 10000 elem
> arrays. I subdivide the chunk into Snippet as follows:
>
> class Snippet {
> friend class SnippetPool;
>
> int ar[1]; // first element is length
> // enough room in Snippet object to hold more
> elems
> // this is the ONLY data member of POD
>
> public:
>
> int length () const
> {
> return ar[0];
> }
>
> int& operator [] (int i)
> {
> return ar[i+1]; // bounds check omitted
> }
> };
>
> My question is if I have a large chunk of int*, how do I create
> Snippets of length N
> from these chunks?

First, I would consult the Boost "pool" library for one implementation
of a memory pool. It seems as if the requirements in this case have
similiarities to a memory pool. Generally, in a memory pool each
unallocated block contains a pointer to the next free block (or zero if
there is none). The head of this "free block" list is kept at the
beginning of the pool itself.

If the blocks are not all of the same size in this case, then an
additional field holding the size of the free block would be necessary.
It would be straightforward to a declare a simple struct:

struct FreeBlock
{
FreeBlock *nextFreeBlock;
unsigned size;
};

to represent a free block header. Allocating and freeing blocks would
involve changing the pointer and the size field as necessary.

Greg

andre...@yahoo.com

unread,
Sep 13, 2006, 12:26:17 PM9/13/06
to
{ reminder to all: please do not quote signatures. thanks! -mod }

kanze wrote:
>
> It looks to me that first, you are trying to use the dirty
> struct hack, which is forbidden in both C and C++. C has since
> added VLA, in order to support something similar; C++ hasn't.

Can you explain the "dirty struct"? Do you mean that if I have an
ar[1] at the end of a struct, and I have made sure there is enough
memory to read and write to ar[N], where N is less than the logical
length, that this is not ok? It seems like an ar[1] where ar[1] is the
last (and only) data member of Snippet solves the issue of instant
allocation and no need to free. (Note that ar[0] holds the length, and
ar[2] holds the second logical element, not the third.)

> It's possible to do something similar, with:
>
> struct Snippet
> {
> int length ;
> int* buffer()
> {
> return reinterpret_cast< int* >( this + 1 ) ;
> }
> } ;
>
> As the reinterpret_cast suggests, it is a bit playing with fire,
> and you have to be very careful about alignment issues in
> particular. (At least at one point in time, g++ did something
> similar in its implementation of std::basic_string. And
> std::basic_string<double> was a recepe for instant core dump on
> my machine.)
>
> Second, if you're managing memory, unless all elements are
> really the same type (which isn't the case here), you should use
> unsigned char* to manage it, calculating each time how many

Is unsigned char always 8 bits?

andre...@yahoo.com

unread,
Sep 13, 2006, 2:39:21 PM9/13/06
to

alex wrote:

> 2. Snippet object is returned by value, not by pointer, and may be
> copied safely.

I think you're idea is better than mine. The snippet object is not an
object
at all, just a value. This will work fine in my implementation.

Just for informative purposes, can anyone explain the dangers inherent
in putting an ar[1] at the end of a Snippet (or especially if its the
only data member, and there are no virtual functions), and then
accessing elements beyond index of 0 under the assumption that the
SnippetPool created a bigger buffer to hold the Snippet? I'm not going
to do this because I like Alex's idea, but I just want to know what's
wrong with my original idea. I guess this is the dirty struct concept
but I would like to know how that can go wrong.

Andy

kanze

unread,
Sep 14, 2006, 8:10:09 AM9/14/06
to
andre...@yahoo.com wrote:
> { reminder to all: please do not quote signatures. thanks! -mod }

> kanze wrote:

> > It looks to me that first, you are trying to use the dirty
> > struct hack, which is forbidden in both C and C++. C has
> > since added VLA, in order to support something similar; C++
> > hasn't.

> Can you explain the "dirty struct"? Do you mean that if I
> have an ar[1] at the end of a struct, and I have made sure
> there is enough memory to read and write to ar[N], where N is
> less than the logical length, that this is not ok?

If you declare something as ar[n], any access ar[i], with i < 0
or i >= n, is undefined behavior. The C standard was carefully
worded to allow fat pointers and full bounds checking. (I
believe that in fact, Centerline once sold a compiler which did
this checking.)

The technique of declaring a struct something like:

struct S
{
size_t length ;
char data[ 1 ] ;
} ;

then doing something like:

S p = malloc( sizeof( S ) + n ) ;
p->length = n ;

and then using the struct as if it had a final member data[n]
was known to the C committee, where it was commonly called the
"dirty struct hack", or something along those lines. It was a
conscious decision (at least on the part of some members---my
own information concerning this comes from discussions with Tom
Plum) of the the C committee to not support it.

Other members of the committee apparently didn't realize that
they had banned it:-); at any rate, the C committee later
decided that support for something like this was necessary, and
went even further, and adopted VLA's (which can also be used to
simulate alloca---another common feature that wasn't supported
in standard C). To date, as far as I know, there has not even
been a proposal to integrate VLA into C++; this will probably
remain an additional difference between the two languages. (If
anyone on the committee knows differently, please correct me.
I've not been able to follow the committee activities anywhere
near as much as I'd like in the last couple of years.)

In practice, of course, the technique is widely used in C;
widely enough that I can't imagine a compiler not allowing it.

> It seems like an ar[1] where ar[1] is the last (and only) data
> member of Snippet solves the issue of instant allocation and
> no need to free. (Note that ar[0] holds the length, and ar[2]
> holds the second logical element, not the third.)

That's what I understood. The fact that C allows bounds
checking, including bounds checking on derived pointers (the
"fat pointers" I mentionned earlier) means that it is undefined
behavior, and that an implementation is allowed to make it fail.
(In practice, if I were working in C, and the profiler showed
that the extra allocations were a serious problem, I'd probably
use it anyway.)

> > It's possible to do something similar, with:

> > struct Snippet
> > {
> > int length ;
> > int* buffer()
> > {
> > return reinterpret_cast< int* >( this + 1 ) ;
> > }
> > } ;

> > As the reinterpret_cast suggests, it is a bit playing with
> > fire, and you have to be very careful about alignment issues
> > in particular. (At least at one point in time, g++ did
> > something similar in its implementation of
> > std::basic_string. And std::basic_string<double> was a
> > recepe for instant core dump on my machine.)

> > Second, if you're managing memory, unless all elements are
> > really the same type (which isn't the case here), you should
> > use unsigned char* to manage it, calculating each time how
> > many

> Is unsigned char always 8 bits?

No. It is the classical type for raw memory, however.

If you want to be able to generate arbitrary addresses, with any
desired alignment, you need a pointer to a character type. In
C, if you want to be sure of being able to read and write the
memory, regardless of its actually type, you can only access it
as an unsigned char. (C++ also allows char, I think, although
there is some ambiguity concerning whether an implementation can
always force a -0 to a +0 when copying: the representations are
different on 1's complement and signed magnitude machines.
Which, of course, isn't a real concern for many of us.)

Martin Bonner

unread,
Sep 15, 2006, 10:14:01 AM9/15/06
to

andre...@yahoo.com wrote:
> alex wrote:
>
> > 2. Snippet object is returned by value, not by pointer, and may be
> > copied safely.
>
> I think you're idea is better than mine. The snippet object is not an
> object
> at all, just a value. This will work fine in my implementation.
>
> Just for informative purposes, can anyone explain the dangers inherent
> in putting an ar[1] at the end of a Snippet (or especially if its the
> only data member, and there are no virtual functions),

The standards (both C and C++) say that it is undefined behaviour to
access the second element of an array which has been declared with one
element.

The compiler can make your code do what it likes without anyone being
able to claim that it is not a C++ compiler (or C compiler) because of
what it made your code do.

*In practise*, if you get memory overwriting issues and start to use a
tool to find them, you will (hopefully) get all sorts of false
positives from the tool about this. (Hopefully, because otherwise it
probably isn't going to be able to find your real issue.)

0 new messages