Generic linked list with internal storage?

Jef Driesen

unread,

Apr 8, 2010, 6:20:06 AM4/8/10

to

Hi,

A typical (double) linked list is implemented with these two data
structures:

typedef struct node_t {
node_t *prev;
node_t *next;
void *data;
} node_t;

typedef struct list_t {
node_t *head;
node_t *tail;
} list_t;

typedef struct mydata_t {
...
} mydata_t;

The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for
the user defined mydata_t structure.

Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data? Thus I don't want to tie my list to
the mydata_t structure like this:

typedef struct node_t {
node_t *prev;
node_t *next;
mydata_t data;
} node_t;

Maybe allocate storage for the node_t and the mydata_t structure in a
single block. Something like this:

node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
mydata_t *data = node + sizeof (node_t);

But I think this might cause trouble due to alignment issues?

Thanks,

Jef

Malcolm McLean

unread,

Apr 8, 2010, 6:36:14 AM4/8/10

to

On 8 Apr, 11:20, Jef Driesen <jefdrie...@hotmail.com.invalid> wrote:
>
> node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
> mydata_t *data = node + sizeof (node_t);
>
> But I think this might cause trouble due to alignment issues?
>

Yes. In practice it's unlikely that a structure consisting only of
pointers would break alignment, and even less likely that a couplet of
two pointers would do so. But it is possible and allowed by the
standard.
You can still store and extract the data by treating it as raw bytes,
but that's not very efficient.

MaciekL

unread,

Apr 8, 2010, 6:48:04 AM4/8/10

to Jef Driesen

Hi,

> A typical (double) linked list is implemented with these two data
> structures:
>
> typedef struct node_t {
> node_t *prev;
> node_t *next;
> void *data;
> } node_t;
>
> typedef struct list_t {
> node_t *head;
> node_t *tail;
> } list_t;
>
> typedef struct mydata_t {
> ...
> } mydata_t;

You can ommit aligment issue with follwowing example:

typedef struct node_mydata_t {
node_t node;
mydata_t mydata;
};

void function(void)
{
node_mydata_t * node_mydata = malloc(sizeof(*node_mydata));
node_t * node = &(node_mydata->node); /* same address as
node_mydata - node is a first element in structure */
node->data = &(node_mydata->mydata)
...
}

Best Regards

--
Maciek

MaciekL

unread,

Apr 8, 2010, 6:53:35 AM4/8/10

to Jef Driesen

On 2010-04-08 12:48, MaciekL wrote:

Invalid declaration:

> typedef struct node_mydata_t {
> node_t node;
> mydata_t mydata;
> };

fixed:

typedef struct node_mydata_t {
node_t node;
mydata_t mydata;

} node_mydata_t;

--
Maciek

jacob navia

unread,

Apr 8, 2010, 6:57:43 AM4/8/10

to

Jef Driesen a écrit :

In the containers library I am working I added a field "ElementSize" to
the list header (your list_t), and when I allocate a node I allocate with:

node_t *n = malloc(sizeof(node)+list->ElementSize);

To assign the data I do:

memcpy(node->data,object,list->ElementSize);

The node structure is declared exactly like yours but at the end I
differ with:

typedef struct node_t {
node_t *prev;
node_t *next;

char data[];
} node_t;

Note that this is using C99.

Malcolm McLean

unread,

Apr 8, 2010, 7:04:17 AM4/8/10

to

On 8 Apr, 11:48, MaciekL <__nospam__mac...@o2.pl> wrote:
>
> You can ommit aligment issue with follwowing example:
>
> typedef struct node_mydata_t {
> node_t node;
> mydata_t mydata;
>
> };
>

Then the linked list isn't generic.
What we need is some sort of way of specifying padding to the next
alignment point. Let's say, to make it complicated, that pointers are
4 bytes whilst long doubles need an alignment of ten (the strictest
alignment).

Position Bytes Contents
0 4 prev
4 4 next
8 2 padding
10 any arbitrary data (eg, a long double)

We need some portable way of inserting the two padding bytes. Im not
sure that this is possible in ANSI C.

ImpalerCore

unread,

Apr 8, 2010, 11:31:05 AM4/8/10

to

On Apr 8, 6:20 am, Jef Driesen <jefdrie...@hotmail.com.invalid> wrote:
> Hi,
>
> A typical (double) linked list is implemented with these two data
> structures:
>
> typedef struct node_t {
> node_t *prev;
> node_t *next;
> void *data;
>
> } node_t;
>
> typedef struct list_t {
> node_t *head;
> node_t *tail;
>
> } list_t;
>
> typedef struct mydata_t {
> ...
>
> } mydata_t;
>
> The major disadvantage of this approach is that the data contents of the
> list needs to be stored outside of the list data structures. Thus if you
> want to store some user defined data in the list, you end up with two
> memory allocations per node: one for the node_t structure, and one for
> the user defined mydata_t structure.

That is one of the drawbacks of using a void* in a linked list.
However, since the linked list nodes using void* are always the same
size (12 bytes assuming 32-bit pointers), this can be somewhat
mitigated by using some memory pooling technique. The GLib
implementation has a "memory slice" that it uses to obtain linked
list, but I've never done any measurements to see what the performance
effect of it is.

> Is there a way to avoid this additional allocation, but without loosing
> the ability to store custom data? Thus I don't want to tie my list to
> the mydata_t structure like this:
>
> typedef struct node_t {
> node_t *prev;
> node_t *next;
> mydata_t data;
>
> } node_t;

This is not a generic linked list, its a specific linked list for
mydata_t. You need some generic mechanism like a void* or a resizable
byte array.

> Maybe allocate storage for the node_t and the mydata_t structure in a
> single block. Something like this:
>
> node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
> mydata_t *data = node + sizeof (node_t);
>
> But I think this might cause trouble due to alignment issues?

Since this is in reference to a generic list, there may be alignment
issues in general. However, for many structures, you can use the
technique and get away with it, it's a caveat of the technique. I
know very little about alignment issues in general, but many memory
debuggers use a similar technique to pad a memory allocation with a
struct to maintain properties of the memory returned, like the size of
the allocation, start and end addresses for fence checking. You may
be able to glean how they handle alignment problems from their
source. DMalloc is one memory debugger that you could inspect the
code to see if they have something that you could use towards this
effect.

> Thanks,
>
> Jef

While the separate memory allocation cost of a linked list node with a
void* and its corresponding structure can be a major disadvantage,
there are some *major advantages* to using a void* in a generic linked
list interface.

1. Using function pointers to abstract behavior on the structure
pointed to by void*.

The two main function pointer types I'm referring to are these.

typedef void (*c_free_function_t)( void* );
typedef int (*c_compare_function_t)( const void* p, const void* q );

Consider what I use for my linked list free function.

\code
void c_list_free( c_list_t* list, c_free_function_t free_fn )
{
c_list_t* node;

while ( list )
{
node = list;
list = list->next;

if ( free_fn ) {
(*free_fn)( node->object );
}
c_free( node );
}
}
\endcode

The free_fn parameter abstracts what is required to properly destroy
the list node object. You can think of free_fn as a parallel to a C++
object destructor. If the linked list is responsible for the objects
it maintains, all you need to do to clean up the list is pass the
right destructor function. If the linked list is pointing to char*
arrays allocated with 'malloc', you can pass 'free' as the argument
and it will release each node's memory. If the structure is more
complicated, (for instance I have an auto-allocating structure called
c_string_t that maintains a dynamically allocated buffer), I can pass
a destructor specific to that c_string_t object. If the linked list
is a shallow reference to data, NULL can be passed to just destroy the
nodes themselves. I find this function convenient since I don't have
to write an iterating loop to release node resources.

While the free function is the appetizer, the comparison function is
the main course. It allows you to abstract list operations that
require comparisons. Typically this involves some kind of sorting,
but there other creative uses for them as well.

Consider a function to insert a generic object into a sorted linked
list (crosses my fingers that the code doesn't wrap too badly).

\code
c_list_t* c_list_insert_sorted( c_list_t* list,
void* object,
c_compare_function_t cmp_fn )
{
c_list_t* tmp_list = list;
c_list_t* node = NULL;
int cmp;

if ( cmp_fn )
{
/* c_new is a macro wrapper around malloc */
node = c_new( c_list_t, 1 );
if ( node )
{
node->object = object;
node->prev = NULL;
node->next = NULL;

if ( !list ) {
return node;
}

/*
* When defining the comparison function, it's second argument
is
* required to be the same type as the first, which should be
the
* structure pointed to by the list node's object pointer.
*/
cmp = (*cmp_fn)( object, tmp_list->object );

while ( (tmp_list->next) && (cmp > 0) )
{
tmp_list = tmp_list->next;
cmp = (*cmp_fn)( object, tmp_list->object );
}

if ( (!tmp_list->next) && (cmp > 0) )
{
tmp_list->next = node;
node->prev = tmp_list;
return list;
}

if ( tmp_list->prev )
{
tmp_list->prev->next = node;
node->prev = tmp_list->prev;
}
node->next = tmp_list;
tmp_list->prev = node;

if ( tmp_list == list ) {
return node;
}
}
}

return list;
}
\endcode

I can use this engine to insert any kind of object into a homogenous
list as long as I have some method of defining an order. Here are a
couple of comparison functions you could use to sort strings that use
char*.

int c_vstrcmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;

return c_strcmp( s1, s2 );
}

int vstrlencmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;

int slen1 = (int)strlen( s1 );
int slen2 = (int)strlen( s2 );

return slen1 - slen2;
}

Note that one orders strings alphanumerically while the other orders
strings by their length.

\code
int main( void )
{
c_list_t* grocery_list = NULL;
c_list_t* l;
size_t i;
char* s;

char* food_items[] = {
"flour", "milk", "butter", "sugar", "bread", "apples", "eggs"
};

for ( i = 0; i < C_ARRAY_N( food_items ); ++i )
{
s = c_strdup( food_items[i] );
if ( s ) {
grocery_list = c_list_insert_sorted( grocery_list, s,
c_vstrcmp );
}
}

printf( "Grocery List:" );
for ( l = grocery_list; l != NULL; l = l->next ) {
printf( " %s", (char*)l->object );
}
printf( "\n" );

c_list_free( grocery_list, c_free );

return EXIT_SUCCESS;
}
\endcode

\output
Grocery List: apples bread butter eggs flour milk sugar
\endoutput

2. The other main advantage of using a void* or equivalent is the
ability to create *shallow copies* of the list.

Tying the data structure into the linked list node makes creating
views of your data really expensive. When the data is referenced by a
void*, all I need to do to create a linked list that is a shallow
reference to another linked list is to copy the pointer. I don't want
to reallocate a potentially big struct tied to the linked list node in
order to make a different view. Here is my function for creating
shallow copy of a linked list.

\code
c_list_t* c_list_copy( c_list_t* list )
{
c_list_t* new_list = NULL;
c_list_t* node;
c_list_t* last;
c_bool list_alloc_failed;

if ( list )
{
new_list = c_new( c_list_t, 1 );
if ( new_list )
{
new_list->object = list->object;
new_list->prev = NULL;
last = new_list;
list = list->next;

list_alloc_failed = FALSE;
while ( list && !list_alloc_failed )
{
last->next = c_new( c_list_t, 1 );
if ( last->next )
{
last->next->prev = last;
last = last->next;
last->object = list->object;
list = list->next;
}
else {
list_alloc_failed = TRUE;
}
}

last->next = NULL;

if ( list_alloc_failed )
{
while ( new_list )
{
node = new_list;
new_list = new_list->next;

c_free( node );
}
new_list = NULL;
}
}
}

return new_list;
}
\endcode

\code
int main( void )
{
c_list_t* donut_recipe = NULL;
c_list_t* low_cal;
c_list_t* l;
size_t i;
char* s;

char* ingredients[] = {
"dry yeast", "water", "milk", "shortening", "sugar", "salt",
"all-purpose flour", "eggs", "vanilla extract"
};
char* high_calories[] = { "shortening", "sugar", "eggs" };

for ( i = 0; i < C_ARRAY_N( ingredients ); ++i )
{
s = c_strdup( ingredients[i] );
if ( s ) {
donut_recipe = c_list_insert_sorted( donut_recipe, s,
c_vstrcmp );
}
}

low_cal = c_list_copy( donut_recipe );

for ( i = 0; i < C_ARRAY_N( high_calories ); ++i ) {
low_cal = c_list_remove_if( low_cal, high_calories[i], c_vstrcmp,
NULL );
}

printf( "---Donut Recipe (low calorie)---\n" );
for ( l = low_cal; l != NULL; l = l->next ) {
printf( "%s\n", (char*)l->object );
}
printf( "\n" );

printf( "---Donut Recipe---\n" );
for ( l = donut_recipe; l != NULL; l = l->next ) {
printf( "%s\n", (char*)l->object );
}

c_list_free( low_cal, NULL );
c_list_free( donut_recipe, c_free );

return EXIT_SUCCESS;
}
\endcode

The output of the above program is

\output
---Donut Recipe (low calorie)---
all-purpose flour
dry yeast
milk
salt
vanilla extract
water

---Donut Recipe---
all-purpose flour
dry yeast
eggs
milk
salt
shortening
sugar
vanilla extract
water
\endoutput

Note that I removed some ingredients from the original donut recipe
without *modifying* or *copying* the original data. This is done
through the 'c_list_remove_if' function that is defined by the
following.

/*!
* \brief Remove all objects from a \c c_list_t that satisfy the
* comparison function \c cmp_fn. If none of the objects
* match, the \c c_list_t is unchanged.
* \param list A \c c_list_t.
* \param object The object to remove.
* \param cmp_fn The function to call for each object comparison. It
* should return 0 when the objects match.
* \param free_fn The list object free function.
* \return The new start of the \c c_list_t.
*/
c_list_t* c_list_remove_if( c_list_t* list, void* object,
c_compare_function_t cmp_fn, c_free_function_t free_fn );

Let's examine the call to c_list_remove_if a bit further

\code
low_cal = c_list_remove_if( low_cal, high_calories[i], c_vstrcmp,
NULL );
\endcode

The low_cal is the start of the linked list.
The high_calories[i] is a specific string to search for using the
'c_vstrcmp' function to determine equality.
The last argument is NULL because this is a shallow copy of the list
data.

At the end, the 'low_cal' list view is freed using
'c_list_free( low_cal, NULL )' because the list doesn't own the memory
its pointing to.

These views can be anything from sorting records by a particular
field, removing nodes that don't match a constraint, all maintained
for the low-low cost of creating a list node.

I'm obviously slanted towards using the void* to implement a linked
list in a generic way. I think the linked list with the contained
data have a place, but they should be used for specific, not generic
applications. And by specific, I mean that if the generic list is too
slow, and can be verified via profiling, then go about creating a
custom linked list for the situation.

Best regards,
John D.

bartc

unread,

Apr 9, 2010, 6:45:27 AM4/9/10

to

"Jef Driesen" <jefdr...@hotmail.com.invalid> wrote in message
news:hpkals$4l5$1...@ikarus.fw.belnet.be...

Why not include a node_t (or just next and prev pointers) at the start of
the mydata_t struct?

(The list_t stuff would have to be a bit different though, either a
dedicated list_t for each mydata_t, or discrete head/tail pointers of type
*mydata_t)

--
Bartc

Jef Driesen

unread,

Apr 9, 2010, 7:47:26 AM4/9/10

to

On 8/04/2010 17:31, ImpalerCore wrote:
>> The major disadvantage of this approach is that the data contents of the
>> list needs to be stored outside of the list data structures. Thus if you
>> want to store some user defined data in the list, you end up with two
>> memory allocations per node: one for the node_t structure, and one for
>> the user defined mydata_t structure.
>
> That is one of the drawbacks of using a void* in a linked list.
> However, since the linked list nodes using void* are always the same
> size (12 bytes assuming 32-bit pointers), this can be somewhat
> mitigated by using some memory pooling technique. The GLib
> implementation has a "memory slice" that it uses to obtain linked
> list, but I've never done any measurements to see what the performance
> effect of it is.

Implementing memory pooling is a little overkill for my little app.

>> Maybe allocate storage for the node_t and the mydata_t structure in a
>> single block. Something like this:
>>
>> node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
>> mydata_t *data = node + sizeof (node_t);
>>
>> But I think this might cause trouble due to alignment issues?
>
> Since this is in reference to a generic list, there may be alignment
> issues in general. However, for many structures, you can use the
> technique and get away with it, it's a caveat of the technique. I
> know very little about alignment issues in general, but many memory
> debuggers use a similar technique to pad a memory allocation with a
> struct to maintain properties of the memory returned, like the size of
> the allocation, start and end addresses for fence checking. You may
> be able to glean how they handle alignment problems from their
> source. DMalloc is one memory debugger that you could inspect the
> code to see if they have something that you could use towards this
> effect.

I want to avoid non-portable code if possible. It's not that I don't
want to use external storage with void* pointers. I was just wondering
if it was possible to avoid it.

> While the separate memory allocation cost of a linked list node with a
> void* and its corresponding structure can be a major disadvantage,
> there are some *major advantages* to using a void* in a generic linked
> list interface.
>
> 1. Using function pointers to abstract behavior on the structure
> pointed to by void*.
>
> The two main function pointer types I'm referring to are these.
>
> typedef void (*c_free_function_t)( void* );
> typedef int (*c_compare_function_t)( const void* p, const void* q );
>

> [... lots of interesting code ...]

One of the disadvantages with the external storage is that to destroy
the list, you have to walk the list and destroy each data item first.
This free function is a nice workaround.

> 2. The other main advantage of using a void* or equivalent is the
> ability to create *shallow copies* of the list.
>

> [... lots of interesting code ...]

I don't really need this functionality at the moment.

> I'm obviously slanted towards using the void* to implement a linked
> list in a generic way. I think the linked list with the contained
> data have a place, but they should be used for specific, not generic
> applications. And by specific, I mean that if the generic list is too
> slow, and can be verified via profiling, then go about creating a
> custom linked list for the situation.

I highly doubt the external storage will be too slow. Part of my
question was simply to find out whether it was possible to implement
something like what I proposed, but without running into problems due to
alignment, non-portable behavior, etc.

Jef Driesen

unread,

Apr 9, 2010, 7:47:49 AM4/9/10

to

On 9/04/2010 12:45, bartc wrote:
>> A typical (double) linked list is implemented with these two data
>> structures:
>>
>> typedef struct node_t {
>> node_t *prev;
>> node_t *next;
>> void *data;
>> } node_t;
>>
>> typedef struct list_t {
>> node_t *head;
>> node_t *tail;
>> } list_t;
>>
>> typedef struct mydata_t {
>> ...
>> } mydata_t;
>>
>> The major disadvantage of this approach is that the data contents of the
>> list needs to be stored outside of the list data structures. Thus if you
>> want to store some user defined data in the list, you end up with two
>> memory allocations per node: one for the node_t structure, and one for the
>> user defined mydata_t structure.
>>
>> Is there a way to avoid this additional allocation, but without loosing
>> the ability to store custom data?
>

> Why not include a node_t (or just next and prev pointers) at the start of
> the mydata_t struct?
>
> (The list_t stuff would have to be a bit different though, either a
> dedicated list_t for each mydata_t, or discrete head/tail pointers of type
> *mydata_t)

That makes the list is non generic, and that's not what I want.

SG

unread,

Apr 9, 2010, 7:55:11 AM4/9/10

to

jacob navia wrote:
>
> In the containers library I am working I added a field "ElementSize" to
> the list header (your list_t), and when I allocate a node I allocate with:
>
> node_t *n = malloc(sizeof(node)+list->ElementSize);
>
> To assign the data I do:
>
> memcpy(node->data,object,list->ElementSize);
>
> The node structure is declared exactly like yours but at the end I
> differ with:
>
> typedef struct node_t {
> node_t *prev;
> node_t *next;
> char data[];
> } node_t;
>
> Note that this is using C99.

But how do you access the data stored in that node? Say, you want to
store a list of doubles. Do you extract the double by memcopying the
bytes back into a double variable or do you write something like
(double*)(&node->data[0])? Alignment seems to be a problem in the
latter case which is IMHO unfortunate. OK, you could try to compute
how much padding would be necessary, store this offset in the "list
header" and allocate a couple of more bytes, I suppose.

If I remember correctly the Linux kernel uses something like this:

struct listlinks_t {
struct listlinks_t *prev;
struct listlinks_t *next;
};

#define containerof(ptr,type,member) \
((type*)((char*)ptr - offsetof(type,member)))

struct mli { /* mli = "my list item", concrete example */
struct listlinks_t listlinks;
double value;
};

(I didn't test the code, but I think the idea is clear)

where the list data structure only deals with listlinks_t structs and
alters some pointers of type listlinks_t*. When the user wants to
access the data of one list node he/she writes something like this:

struct listlinks_t *node = ...;
struct mli *p = containerof(node,struct mli,listlinks);
p->value;

The user manages the memory of the list items him/herself. In other
places the Linux kernel uses its own "kobject model" which includes
reference counting and something I would describe as "virtual
destructors" which allows the data structure (i.e. "kset") to take
control of the life-time management of objects.

IMHO, this is a nice solution -- for C code.

At first glimpse, it shares many similarities with what std::list<T>
can give you in C++. The only practical difference I see here is that
in C you could accidentally use the wrong type/member "parameters" for
the containerof macro. This is because the type information is lost
during the pointer arithmetic/casting hacks and the compiler can't
help you spot these kinds of errors for that reason. The compiler
"trusts" the programmer to get it right. In a way this reminds me of
the early Java days where containers' elements stored "references to
objects" which required manual casting in user code. Java's solution
to that problem was "generics". So, Java has a solution for this. C++
has a solution for this. What about C? Are you [not specifically you,
Jacob, but everyone here who's interested in a generic container
library written in C] comfortable with these kinds of pointer casts
and pointer arithmetics? Since there's no kind of runtime type
checking to see if a cast is valid at runtime nor is it desirable to
do something like this per default in a supposedly generic container
library for overhead reasons, wouldn't it make sense to get a little
help from the compiler via type checking at compile-time? I currently
don't see how this could be done in C, unfortunately. But maybe it's
just me. Maybe I'm overrating type-safety.

Cheers,
SG

SG

unread,

Apr 9, 2010, 8:20:36 AM4/9/10

to

ImpalerCore wrote:
>
> While the separate memory allocation cost of a linked list node with a
> void* and its corresponding structure can be a major disadvantage,
> there are some *major advantages* to using a void* in a generic linked
> list interface.

If you go for the first approach (data embedded in the "node object")
it allows you to store a void pointer as data. So, while the second
approach /forces/ you to do one extra allocation, the first approach
is more flexible in that it allows you to emulate the behaviour of the
other approach (by storing pointers as data).

Cheers,
SG

Malcolm McLean

unread,

Apr 9, 2010, 8:34:12 AM4/9/10

to

On 9 Apr, 12:55, SG <s.gesem...@gmail.com> wrote:

> jacob navia wrote:
>
> wouldn't it make sense to get a little
> help from the compiler via type checking at compile-time? I currently
> don't see how this could be done in C, unfortunately.
>

You could do it by creating a variable type caled "type", to hold type
values like "int", or "struct fred". It would be a simple enumeration
with a field for levels of indirection.
The vast majority of the use would be in constants that would
propagate.

SG

unread,

Apr 9, 2010, 9:41:24 AM4/9/10

to

On 9 Apr., 14:34, Malcolm McLean wrote:
> On 9 Apr, 12:55, SG wrote:
> > jacob navia wrote:
> > > [...]
> > [...] wouldn't it make sense to get a little

> > help from the compiler via type checking at compile-time? I
> > currently don't see how this could be done in C, unfortunately.

(I meant: ...without adding new language features)

> You could do it by creating a variable type caled "type", to hold type
> values like "int", or "struct fred". It would be a simple enumeration
> with a field for levels of indirection.
> The vast majority of the use would be in constants that would
> propagate.

Care to explain this a little more? Are you talking about a
hypothetical language feature? Does what you suggest help at compile-
time or at run-time? How does it help?

Cheers,
SG

ImpalerCore

unread,

Apr 9, 2010, 10:34:01 AM4/9/10

to

I will agree that using my generic linked list approach where the list
node uses a void*, I'm not able to avoid the extra allocation, (at
least not without some macro-ey and clever hackery). In a kernel
where things are intended to be very tight, this is a use-case where
the extra allocation would be highly frowned upon.

When trying to develop a linked list for my application, I looked at a
few popular implementations. The thing that caught my eye the most
was GLib's, primarily because of their use of a function pointer to
abstract sorting and search on the linked list via a comparison
function pointer. The concept was easily to understand, and I decided
to use their approach as a starting point. The linux kernel's
implementation was clever, but since it lacked an obvious method to
search and sort a generic list, I was pulled towards the GLib
implementation. Since I didn't find that the linux kernel's version
had a method to abstract sorting and searching, I didn't view it's
other benefits that strongly. But if someone did the work to
integrate a comparison function interface into the linux linked list
interface, it would get my attention.

Best regards,
John D.

> Cheers,
> SG

ImpalerCore

unread,

Apr 9, 2010, 10:58:12 AM4/9/10

to

Just curious, what kinds of list sizes are you talking about?

The linux kernel linked list implementation may be something you can
go for if you don't need an abstraction for searching and sorting the
linked list. You may find its approach more useful if efficiency is a
high priority. It's an interesting implementation to study.

I honestly don't have enough experience to know if what you're
intending is portable. I use that technique you mention to create a
special allocator wrapper that keeps track of the number of malloc
function calls used and then return NULL to insert allocation faults
into my code to test response to allocation errors.

Someone with more experience may be able to give more of the pitfalls
of the technique, but I haven't been exposed to an environment where
that technique has caused a problem yet.

Best regards,
John D.

ImpalerCore

unread,

Apr 9, 2010, 11:33:57 AM4/9/10

to

Does it bother me that I don't have the compiler helping me out with
type safety with my linked list? No, not really, since there's not a
whole lot that one can do without either changing C or doing a lot of
extra work.

Casting void* to a specific pointer type doesn't bother me at all
simply because that's how generic things are done in C. It's well
defined, it's portable, so there isn't much of an issue for me there.
Sure it's a bit cumbersome to introduce casts when I need to access an
object in a linked list node, but it's not a deal killer to the point
that I find a different language to rewrite a project that is already
well entrenched in C.

Now doing things like allocating a single memory block and chopping it
up using pointer arithmetic and casting I'm less comfortable with, but
if it works for the platforms I need it too, I'll still do it if I
have to.

I don't think that type-safety is overrated. I think it should be
very highly rated. In fact, the issue is important enough that
several other languages were invented to handle type-safety better
than C. To me the question is if type-safety is so important that C
*needs* to be modified to support generics to remove this problem to
implement generic containers. At this point in time, my opinion is
no. If a programmer wants to swim in these shark-infested waters, who
are we to deny him.

SG

unread,

Apr 9, 2010, 12:34:26 PM4/9/10

to

On 9 Apr., 16:34, ImpalerCore wrote:

I see no reason why this design approach from the Linux kernel
prohibits arbitrary lists to be sorted according to arbitrary ordering
criteria. I would think that a generic list sort function a la

/**
* functions of type compare_t return TRUE if
* and only if *a comes before *b.
*/
typedef bool compare_t(void const *a, void const *b);

/**
* implements merge sort for linked lists
* using some arbitrary ordering criterion
*/
void list_sort(
struct listlinks_t *first,
ptrdiff_t link_member_offset,
compare_t *comp);

should do the trick. Here, list_sort doesn't need to know that all the
listlinks_t objects (that contain the prev/next pointers) are embedded
in some other struct containing the user data as well. But in order to
call the comparison function with the correct void pointers it needs
to know how to adjust the pointers similarly to how the containerof
macro works. This is done via the link_member_offset parameter. Usage
example:

struct my_double_item {
struct listlinks_t links;
double value;
};

bool my_double_compare(void const *a, void const *b)
{
return ((double const*)a) < ((double const*)b);
}

...
struct listlinks_t *first_node = ...;
list_sort(first_node,
offsetof(struct my_double_item,links),
&my_double_compare);
...

But what we get here is genericity via pointer casts, pointer
arithmetics (the type is actually "erased" which prevents the compiler
from helping us avoid type and pointer errors) and function pointers
(which do bring a certain runtime penalty compared to specialized
sorting functions with an "inline comparator"). Can you do better in
C? I don't know. Frankly, I'm not too happy about the type-safety
here. It looks a bit too hacky for my taste.

Cheers,
SG

ImpalerCore

unread,

Apr 9, 2010, 1:08:21 PM4/9/10

to

I didn't say that the linux list sort couldn't be done, it's just that
the feature wasn't there in plain sight, and GLib's was. I chose the
path of least resistance.

> /**
> * functions of type compare_t return TRUE if
> * and only if *a comes before *b.
> */
> typedef bool compare_t(void const *a, void const *b);
>
> /**
> * implements merge sort for linked lists
> * using some arbitrary ordering criterion
> */
> void list_sort(
> struct listlinks_t *first,
> ptrdiff_t link_member_offset,
> compare_t *comp);

and I guess list_search would look something like this?

struct listlinks_t* list_search( struct listlinks_t* first,
void* object,
ptrdiff_t link_member_offset,
compare_t* comp );

The easy answer to that question is, "use a different language",
*cough* C++ STL *cough*, that has better type safety built into the
language. These questions have been around for a very long time, and
C has tended to remain the way that it is with respect to type
safety. I think your criticisms are valid, and agree with them, and
if it was the only deciding factor, no one would choose C to do
anything of this sort. In fact, C was around when Stepanov was
working on his stuff, and he didn't use C. Hmmm, is that a
coincidence?

Do you believe that this is a worthy pursuit in C at all? You're not
alone if you don't think it is. However, there are some people who do
think it is.

jacob navia

unread,

Apr 9, 2010, 2:05:02 PM4/9/10

to

SG a écrit :

> jacob navia wrote:
>> In the containers library I am working I added a field "ElementSize" to
>> the list header (your list_t), and when I allocate a node I allocate with:
>>
>> node_t *n = malloc(sizeof(node)+list->ElementSize);
>>
>> To assign the data I do:
>>
>> memcpy(node->data,object,list->ElementSize);
>>
>> The node structure is declared exactly like yours but at the end I
>> differ with:
>>
>> typedef struct node_t {
>> node_t *prev;
>> node_t *next;
>> char data[];
>> } node_t;
>>
>> Note that this is using C99.
>
> But how do you access the data stored in that node?

Very easy. You use an iterator

Iterator *it = newIterator(mylist);
double *d;
for (d = GetFirst(it);
d != NULL;
d = it->GetNext(it) {
// Work with *d here
}

> Say, you want to
> store a list of doubles. Do you extract the double by memcopying the
> bytes back into a double variable or do you write something like
> (double*)(&node->data[0])?

You do not access the data directly. You use the API.

>
> At first glimpse, it shares many similarities with what std::list<T>
> can give you in C++. The only practical difference I see here is that
> in C you could accidentally use the wrong type/member "parameters" for
> the containerof macro. This is because the type information is lost
> during the pointer arithmetic/casting hacks and the compiler can't
> help you spot these kinds of errors for that reason. The compiler
> "trusts" the programmer to get it right. In a way this reminds me of
> the early Java days where containers' elements stored "references to
> objects" which required manual casting in user code. Java's solution
> to that problem was "generics". So, Java has a solution for this. C++
> has a solution for this. What about C? Are you [not specifically you,
> Jacob, but everyone here who's interested in a generic container
> library written in C] comfortable with these kinds of pointer casts
> and pointer arithmetics?

I have written a small utility (called "template.exe") that will take
a container written in a generic way, and will produce C code adapted
to the specific data structure you want (that should be typedefed)

I will publish its code and the sample input with the container
library.

SG

unread,

Apr 9, 2010, 2:57:49 PM4/9/10

to

On 9 Apr., 19:08, ImpalerCore wrote:
> SG wrote:
> > Frankly, I'm not too happy about the type-safety.

> > It looks a bit too hacky for my taste.
>
> The easy answer to that question is, "use a different language",
> *cough* C++ STL *cough*, that has better type safety built into the

:-)

> language. These questions have been around for a very long time,
> and C has tended to remain the way that it is with respect to type
> safety. I think your criticisms are valid, and agree with them,
> and if it was the only deciding factor, no one would choose C to
> do anything of this sort. In fact, C was around when Stepanov was
> working on his stuff, and he didn't use C. Hmmm, is that a
> coincidence?
>
> Do you believe that this is a worthy pursuit in C at all?

Depends on what you mean exactly. Is it worth to invent a new
language feature for C to "fix" this? No, I don't think so. I also
think that "generic containers" (whatever they look like) don't belong
into future C standards as part of the library. But that's just my
opinion. Still, I _am_ interested in what people can come up with in
C and what die-hard C programmers consider to be "elegant" designs of
"generic linked lists".

> You're not
> alone if you don't think it is. However, there are some people who do
> think it is.

Cheers,
SG

Eric Sosman

unread,

Apr 9, 2010, 3:59:05 PM4/9/10

to

On 4/9/2010 2:57 PM, SG wrote:
>
> [...] I also

> think that "generic containers" (whatever they look like) don't belong
> into future C standards as part of the library. But that's just my
> opinion. Still, I _am_ interested in what people can come up with in
> C and what die-hard C programmers consider to be "elegant" designs of
> "generic linked lists".

Taking the bait like a true DHCP: I think a generic linked
list package is about as useful as a generic array package.
The linked list is so simple a beast that the code to handle it
isn't complex or difficult enough to bother hiding away behind a
lot of genericized interfaces. Just stick a `struct node *next'
element in the data item of your choice and use open code.

If you've got a lot of linked lists of a lot of disparate
struct types, I could see putting a `struct linkstuff' at the
start of each "payload" struct and using functions that dealt
in `struct linkstuff' objects, casting as needed. This could
be handy with the more exotic flavors of linked list, like lists
built with self-relative offsets instead of pointers. But for
a plain vanilla linked list ... Why bother?

The principal benefits of generic "containers" are, IMHO,
two. For a complex or intricate container -- the AVL tree
that every CS student is forced to write and never uses again,
for example -- writing and debugging the code is enough labor
that it's worth capturing and re-using. But this consideration
doesn't apply for an ordinary linked list; it's more work to
learn the conventions of somebody's function suite than to just
diddle the links directly.

The other benefit is that a suitable package can "contain"
data items that are ignorant of their containment. You want to
put a bunch of `struct tm' instances in a list or a tree or some
such, fine: The list/tree/whatever package manages the navigation,
and the `struct tm' is untouched. In particular, the `struct tm'
can be contained in multiple different containers at the same
time, with its membership fluctuating as the program runs. But
if by "internal storage" you mean "one allocation for metadata
and payload," I don't think this can be made to work.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Seebs

unread,

Apr 9, 2010, 3:57:46 PM4/9/10

to

On 2010-04-09, Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
> Taking the bait like a true DHCP: I think a generic linked
> list package is about as useful as a generic array package.
> The linked list is so simple a beast that the code to handle it
> isn't complex or difficult enough to bother hiding away behind a
> lot of genericized interfaces. Just stick a `struct node *next'
> element in the data item of your choice and use open code.

I think the Linux kernel "embeddable list" may be a genuine
counterexample -- it allows things to be on multiple lists, for
one thing, with shared code.

> But for
> a plain vanilla linked list ... Why bother?

Generally, true. I don't even consciously notice adding a small
list implementation -- it's too trivial to notice.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

A. Bolmarcich

unread,

Apr 9, 2010, 4:02:53 PM4/9/10

to

On 2010-04-08, Jef Driesen <jefdr...@hotmail.com.invalid> wrote:
[snip]

> Is there a way to avoid this additional allocation, but without loosing
> the ability to store custom data? Thus I don't want to tie my list to
> the mydata_t structure like this:
>
> typedef struct node_t {
> node_t *prev;
> node_t *next;
> mydata_t data;
> } node_t;

Why not tie the list to the type of data being stored? Doing so allows a
compiler to do type checking that it could not if a generic approach were used.

A cost of having a different list node type for each data type stored in a list
is having a copy of the list functions for each list node type. However, for
lists the functions should be small enough to be suitable for inlining. A
compiler would see the instantiaton of the functions for each list type, but
the resulting code would be inlined pointer operations.

Willem

unread,

Apr 9, 2010, 4:20:17 PM4/9/10

to

Eric Sosman wrote:
) The principal benefits of generic "containers" are, IMHO,
) two. For a complex or intricate container -- the AVL tree
) that every CS student is forced to write and never uses again,
) for example -- writing and debugging the code is enough labor
) that it's worth capturing and re-using. But this consideration
) doesn't apply for an ordinary linked list; it's more work to
) learn the conventions of somebody's function suite than to just
) diddle the links directly.
)
) The other benefit is that a suitable package can "contain"
) data items that are ignorant of their containment. You want to
) put a bunch of `struct tm' instances in a list or a tree or some
) such, fine: The list/tree/whatever package manages the navigation,
) and the `struct tm' is untouched. In particular, the `struct tm'
) can be contained in multiple different containers at the same
) time, with its membership fluctuating as the program runs. But
) if by "internal storage" you mean "one allocation for metadata
) and payload," I don't think this can be made to work.

The third benefit is that when you decide that a linked list isn't
giving enough performance, you can swap in a generic AVL tree or
whatever an hey presto! Instant performance win! You can even
test which kind of generic container implementation gives you the
best oerformance, all with just a single change, because the interface
to all the containers is the same.

I have no idea if this happens in practise, though. I sure have
never seen it happen.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Ben Pfaff

unread,

Apr 9, 2010, 4:31:49 PM4/9/10

to

Eric Sosman <eso...@ieee-dot-org.invalid> writes:

> On 4/9/2010 2:57 PM, SG wrote:
>>
>> [...] I also
>> think that "generic containers" (whatever they look like) don't belong
>> into future C standards as part of the library. But that's just my
>> opinion. Still, I _am_ interested in what people can come up with in
>> C and what die-hard C programmers consider to be "elegant" designs of
>> "generic linked lists".
>
> Taking the bait like a true DHCP: I think a generic linked
> list package is about as useful as a generic array package.
> The linked list is so simple a beast that the code to handle it
> isn't complex or difficult enough to bother hiding away behind a
> lot of genericized interfaces. Just stick a `struct node *next'
> element in the data item of your choice and use open code.

I find that the power of generic linked list packages is in
nontrivial operations. Yes, there are some. For example, I
consider ll_splice() and ll_swap_range() below to be in that
category, in that it would take me a few minutes to rewrite them,
and probably a few hours to write an exhaustive unit test. But
I've already done both, so I don't have to do so again.

/* Linked list node. */
struct ll
{
struct ll *next; /* Next node. */
struct ll *prev; /* Previous node. */
};

/* Linked list. */
struct ll_list
{
struct ll null; /* Null node. */
};

/* Returns the node preceding LL in its list,
or the null node if LL is the first node in its list.
(In an empty list, the null node precedes itself.) */
static inline struct ll *
ll_prev (const struct ll *ll)
{
return ll->prev;
}

/* Removes R0...R1 from their current list
and inserts them just before BEFORE. */
void
ll_splice (struct ll *before, struct ll *r0, struct ll *r1)
{
if (before != r0 && r0 != r1)
{
/* Change exclusive range to inclusive. */
r1 = ll_prev (r1);

/* Remove R0...R1 from its list. */
r0->prev->next = r1->next;
r1->next->prev = r0->prev;

/* Insert R0...R1 before BEFORE. */
r0->prev = before->prev;
r1->next = before;
before->prev->next = r0;
before->prev = r1;
}
}

/* Exchanges the positions of A0...A1 and B0...B1,
which may be in the same list or different lists but must not
overlap. */
void
ll_swap_range (struct ll *a0, struct ll *a1, struct ll *b0, struct ll *b1)
{
if (a0 == a1 || a1 == b0)
ll_splice (a0, b0, b1);
else if (b0 == b1 || b1 == a0)
ll_splice (b0, a0, a1);
else
{
struct ll *x0 = ll_prev (a0), *x1 = a1;
struct ll *y0 = ll_prev (b0), *y1 = b1;
a1 = ll_prev (a1);
b1 = ll_prev (b1);
x0->next = b0;
b0->prev = x0;
b1->next = x1;
x1->prev = b1;
y0->next = a0;
a0->prev = y0;
a1->next = y1;
y1->prev = a1;
}
}

--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}

Eric Sosman

unread,

Apr 9, 2010, 4:33:04 PM4/9/10

to

On 4/9/2010 4:20 PM, Willem wrote:
> Eric Sosman wrote:
> ) The principal benefits of generic "containers" are, IMHO,

> ) two. [...]

>
> The third benefit is that when you decide that a linked list isn't
> giving enough performance, you can swap in a generic AVL tree or
> whatever an hey presto! Instant performance win! You can even
> test which kind of generic container implementation gives you the
> best oerformance, all with just a single change, because the interface
> to all the containers is the same.

That would (or could) be true for a "generic containers"
package, but the O.P. specifically said "generic linked list."
A generic "sequenced container" with assorted common-interface
implementations (simple linked list, skip list, starboard list)
might be useful. A generic "random-access container" with
assorted implementations (simple array, Judy tree) might also
be useful. The interfaces can hide implementation detail, and
therein lies some benefit. But if the implementation detail
("it has to be a linked list" or "it has to be an array") is
already specified, where's the gain in hiding?

--
Eric Sosman
eso...@ieee-dot-org.invalid

jacob navia

unread,

Apr 9, 2010, 5:00:01 PM4/9/10

to

SG a écrit :

Look "SG":

In the other thread you asked:

> But how do you access the data stored in that node?

My answer is:

Very easy. You use an iterator

Iterator *it = newIterator(mylist);
double *d;
for (d = GetFirst(it);
d != NULL;
d = it->GetNext(it) {
// Work with *d here
}

>> You're not

>> alone if you don't think it is. However, there are some people who do
>> think it is.
>

I do.

I have today finished the implementation of scapegoat trees, that
together with AVL trees, and redblack trees round up the tree section of
my library.

I have added a heap object, that amortizes allocations by allocating a
bigger block and avoids allocating each time a new element is added to
the container.

I will add the template executable to the library. That code allows you
to write generic code and use that template for any data structure you
want. Obviously it is an external utility (whose code is provided by the
library). The compiler is notmodified.

What is fun of that utility is that it is so simple (a few hundred lines
of C) that you can customize template expansion as you wish.

Please remember:

Here is comp.lang.c

If you like C++ and think C is only for "die ahrds" note that
comp.lang.c++ is a group dedicated to your preferred monstruosity.

Thanks in advance for staying on topic within this group.

Jef Driesen

unread,

Apr 9, 2010, 6:34:38 PM4/9/10

to

On 09/04/10 21:59, Eric Sosman wrote:
> Taking the bait like a true DHCP: I think a generic linked
> list package is about as useful as a generic array package.
> The linked list is so simple a beast that the code to handle it
> isn't complex or difficult enough to bother hiding away behind a
> lot of genericized interfaces. Just stick a `struct node *next'
> element in the data item of your choice and use open code.
>
> If you've got a lot of linked lists of a lot of disparate
> struct types, I could see putting a `struct linkstuff' at the
> start of each "payload" struct and using functions that dealt
> in `struct linkstuff' objects, casting as needed. This could
> be handy with the more exotic flavors of linked list, like lists
> built with self-relative offsets instead of pointers. But for
> a plain vanilla linked list ... Why bother?

Writing a linked list isn't very complex, but I prefer to spend my time
on the main program I'm writing, rather than having to rewrite the low
level data structures every time I need one. It's just more practical to
have a toolbox with well tested code that can be reused every time you
need it. Performance might be a little worse compared to a custom made
data structure, but if that's a problem you can always switch to a
better implementation.

SG

unread,

Apr 10, 2010, 3:47:01 AM4/10/10

to

On 9 Apr., 23:00, jacob navia wrote:

> SG wrote:
> > But how do you access the data stored in that node?
>
> My answer is:
>
> Very easy. You use an iterator
>
> Iterator *it = newIterator(mylist);
> double *d;
> for (d = GetFirst(it);
> d != NULL;
> d = it->GetNext(it) {
> // Work with *d here
> }

You've posted this fragment already. It sure looks nice. But I was
actually more interested in what you do in GetFirst and GetNext and
how you derive the double pointer with respect to possible alignment
problems. In this case you could argue that after two pointers (next/
prev) it's likely that the char buffer is well-aligned for storing
objects of many possible types and this would probably work on many
platforms already. But unless you have taken special care (manual
padding if necessary) this might break as far as I can tell -- that
is, assuming the double pointer points directly into the char buffer
of your node struct objects.

> I will add the template executable to the library. That code allows you
> to write generic code and use that template for any data structure you
> want. Obviously it is an external utility (whose code is provided by the
> library). The compiler is notmodified.

Interesting.

> If you like C++ and think C is only for "die ahrds" note that
> comp.lang.c++ is a group dedicated to your preferred monstruosity.

I didn't say "C is only for die-hards". But I did use -- actually, I
seem to have /misused/ -- the term "die-hard". I'm not a native
speaker. I picked "die-hard" because it felt right. It turns out that
it doesn't convey what I /meant/ to say, at all. For a lack of a
better term please just ignore it.

But it does strike me as a little odd, that you combined a plea for
staying on-topic with the "monstruosity" comment.

Cheers,
SG

Eric Sosman

unread,

Apr 10, 2010, 8:01:12 AM4/10/10

to

On 4/9/2010 6:34 PM, Jef Driesen wrote:
> [...]

>> If you've got a lot of linked lists of a lot of disparate
>> struct types, I could see putting a `struct linkstuff' at the
>> start of each "payload" struct and using functions that dealt
>> in `struct linkstuff' objects, casting as needed. This could
>> be handy with the more exotic flavors of linked list, like lists
>> built with self-relative offsets instead of pointers. But for
>> a plain vanilla linked list ... Why bother?
>
> Writing a linked list isn't very complex, but I prefer to spend my time
> on the main program I'm writing, rather than having to rewrite the low
> level data structures every time I need one. It's just more practical to
> have a toolbox with well tested code that can be reused every time you
> need it. Performance might be a little worse compared to a custom made
> data structure, but if that's a problem you can always switch to a
> better implementation.

Tastes differ, obviously. But ponder this: Suppose you
have a pointer `listnode' to a node in a linked list, and
another pointer `newnode' to a free-floating node you'd like
to insert, right after the existing node. Which of

insertAfter(listnode, newnode);

or

insertAfter(newnode, listnode);

is correct, and which is a bug? Is there any possible way
to declare insertAfter() so the compiler can spot the incorrect
usage? (Note that both parameters will be of the same type,
and neither can be `const'.) In short, is there any defense
against silly mistakes other than "RTFM?"

If instead you were looking at the open-code fragments

newnode->next = listnode->next;
listnode->next = newnode;

and

listnode->next = newnode->next;
newnode->next = listnode;

would it be easier to spot the error? More to the point,
perhaps, would it be harder to get it wrong in the first place?

Don't commit canaricide by cannon.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Tim Rentsch

unread,

Apr 10, 2010, 1:42:46 PM4/10/10

to

Jef Driesen <jefdr...@hotmail.com.invalid> writes:

> A typical (double) linked list is implemented with these two data
> structures:
>

> typedef struct node_t {
> node_t *prev;
> node_t *next;

> void *data;
> } node_t;
>
> typedef struct list_t {
> node_t *head;
> node_t *tail;
> } list_t;
>
> typedef struct mydata_t {
> ...
> } mydata_t;
>

> The major disadvantage of this approach is that the data contents of
> the list needs to be stored outside of the list data structures. Thus
> if you want to store some user defined data in the list, you end up
> with two memory allocations per node: one for the node_t structure,
> and one for the user defined mydata_t structure.
>

> Is there a way to avoid this additional allocation, but without
> loosing the ability to store custom data? Thus I don't want to tie my
> list to the mydata_t structure like this:
>
> typedef struct node_t {
> node_t *prev;
> node_t *next;
> mydata_t data;
> } node_t;
>

> Maybe allocate storage for the node_t and the mydata_t structure in a
> single block. Something like this:
>
> node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
> mydata_t *data = node + sizeof (node_t);
>
> But I think this might cause trouble due to alignment issues?

Yes, it might. In fact, almost certainly will, if the code has
any significant longevity.

It is possible to do something along the lines of what you're
asking about. Very briefly, the list "header" has an additional
member that stores a byte offset of where the node value is
stored, and what numeric value is stored in this offset member is
computed using one of the standard techniques for determining
alignment of a type. Doing this reliably and portably requires a
lot of attention to details regarding not only alignment but also
the rules about storing into structure members and padding bytes.
(There are some other details about what type is being used and
what interface to supply to read or store the data, but I'm
skipping over these.)

A more significant question, I think, is are you really sure you
want to do this, given that it seems like you aren't completely
sure of how to do it so it works? It's easy to sidestep all the
difficulties, just by doing two allocations. Until you know how
to avoid the problems, and avoid them reliably, you're probably
not in the best position to decide whether the single allocation
approach is a good one. So I would suggest either just doing
two allocations, or making learning how to implement a "data
and list header together" approach a pre-requisite to deciding
whether you do, in fact, actually want to use such an approach
in your program(s).

jacob navia

unread,

Apr 10, 2010, 4:34:14 PM4/10/10

to

Eric Sosman a écrit :

[snip]

> Which of
>
> insertAfter(listnode, newnode);
>
> or
>
> insertAfter(newnode, listnode);
>
> is correct, and which is a bug?

Which of

strcpy(dst,src);

or
for (int = 0; i<=strlen(src);i++)
dst[i] = src[i];

is clearer?

You will notice that for somebody that has never used strcpy the second
form is clearer, following your argument.

Fact is, strcpy is a *standard* function, i.e. one which is part of the
standard language and everybody uses it.

That is why I insist that a STANDARD list library would be a progress
for the language.

Willem

unread,

Apr 10, 2010, 4:57:09 PM4/10/10

to

jacob navia wrote:
) Which of
)
) strcpy(dst,src);
)
) or
) for (int = 0; i<=strlen(src);i++)
) dst[i] = src[i];
)
) is clearer?

You're missing the point, which would have been illustrated
by the following:

Which of

strcpy(src, dst);

or

for (i = 0; dst[i] != 0; i++)
src[i] = dst[i];

is clearer ?

ImpalerCore

unread,

Apr 10, 2010, 6:10:38 PM4/10/10

to

On Apr 9, 3:57 pm, Seebs <usenet-nos...@seebs.net> wrote:

> On 2010-04-09, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>
> > Taking the bait like a true DHCP: I think a generic linked
> > list package is about as useful as a generic array package.
> > The linked list is so simple a beast that the code to handle it
> > isn't complex or difficult enough to bother hiding away behind a
> > lot of genericized interfaces. Just stick a `struct node *next'
> > element in the data item of your choice and use open code.
>
> I think the Linux kernel "embeddable list" may be a genuine
> counterexample -- it allows things to be on multiple lists, for
> one thing, with shared code.
>
> > But for
> > a plain vanilla linked list ... Why bother?
>
> Generally, true. I don't even consciously notice adding a small
> list implementation -- it's too trivial to notice.

I'm curious what you consider a plain vanilla linked list? Perhaps if
you could describe the vanilla list feature subset, maybe by listing
functions of an existing implementation like GLib or the C++ STL list,
or is there even a function interface to it?

Do you feel that adding a c_compare_function_t makes it a 'chocolate
twist with jimmies' linked list?

Best regards,
John D.

> -s
> --
> Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nos...@seebs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny pictureshttp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

ImpalerCore

unread,

Apr 10, 2010, 6:43:38 PM4/10/10

to

That's pretty much my feelings as well. I personally don't have an
interest in pushing for any library feature into the standard,
although there is evidence that it can be a benefit a la C++ STL. The
downside is if a library feature becomes standard, and later discover
it was a mistake (gets), it complicates things.

Best regards,
John D.

ImpalerCore

unread,

Apr 10, 2010, 7:08:59 PM4/10/10

to

On Apr 10, 8:01 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> On 4/9/2010 6:34 PM, Jef Driesen wrote:
>
>
>
>
>
> > [...]
> >> If you've got a lot of linked lists of a lot of disparate
> >> struct types, I could see putting a `struct linkstuff' at the
> >> start of each "payload" struct and using functions that dealt
> >> in `struct linkstuff' objects, casting as needed. This could
> >> be handy with the more exotic flavors of linked list, like lists
> >> built with self-relative offsets instead of pointers. But for
> >> a plain vanilla linked list ... Why bother?
>
> > Writing a linked list isn't very complex, but I prefer to spend my time
> > on the main program I'm writing, rather than having to rewrite the low
> > level data structures every time I need one. It's just more practical to
> > have a toolbox with well tested code that can be reused every time you
> > need it. Performance might be a little worse compared to a custom made
> > data structure, but if that's a problem you can always switch to a
> > better implementation.
>
> Tastes differ, obviously. But ponder this: Suppose you
> have a pointer `listnode' to a node in a linked list, and
> another pointer `newnode' to a free-floating node you'd like
> to insert, right after the existing node. Which of
>
> insertAfter(listnode, newnode);
>
> or
>
> insertAfter(newnode, listnode);

That's a problem with any function that has identical arguments. Are
you going to stop using the C string library functions because of this
(rhetorical)?

> is correct, and which is a bug? Is there any possible way
> to declare insertAfter() so the compiler can spot the incorrect
> usage? (Note that both parameters will be of the same type,
> and neither can be `const'.) In short, is there any defense
> against silly mistakes other than "RTFM?"
>
> If instead you were looking at the open-code fragments
>
> newnode->next = listnode->next;
> listnode->next = newnode;
>
> and
>
> listnode->next = newnode->next;
> newnode->next = listnode;
>
> would it be easier to spot the error? More to the point,
> perhaps, would it be harder to get it wrong in the first place?
>
> Don't commit canaricide by cannon.

And how would you redesign the C string interface to solve your
conundrum?

I don't view the argument as strong since the C library itself has
that problem, and there isn't a lot of complaint about it either that
I've seen. Like all the str* functions, using a linked list library
would take a while to learn the feel of. Whether you continue to use
depends on how easy or hard it makes your life. One thing that lacks
in GLib's documentation is a good set of examples that demonstrate how
to use the list in certain use cases. The examples are tedious and
difficult to make relevant, so I can understand why they aren't
there. Contrast this with the std::list where several books have been
written to document usage, pitfalls, and the like.

I still go back and forth between whether I want to use the GLib style
where you have something like.

1. c_list_t* c_list_remove( c_list_t* list, c_list_t* node,
c_free_function_t free_fn );

\code fragment
node = c_list_search( list, "Find Me", c_vstrcmp );
if ( node ) {
list = c_list_remove( list, node, c_free );
}
\endcode

If you forget the 'list =' and instead have

\code fragment
node = c_list_search( list, "Find Me", c_vstrcmp );
if ( node ) {
c_list_remove( list, node, c_free );
}
\endcode

You introduce a subtle bug where if 'node' is the head of the list,
you've got serious problems.

If I use a c_list_t** instead, I don't need to make the user assign
the result back into the list. Does this make the interface bad? I'm
not sure. But I still think that a function interface that abstracts
the complicated stuff in a linked list is worthwhile.

Best regards,
John D.

> --
> Eric Sosman
> esos...@ieee-dot-org.invalid- Hide quoted text -
>
> - Show quoted text -

bartc

unread,

Apr 10, 2010, 8:53:01 PM4/10/10

to

"Jef Driesen" <jefdr...@hotmail.com.invalid> wrote in message
news:3YNvn.198409$Z63....@newsfe08.ams2...

> On 09/04/10 21:59, Eric Sosman wrote:
>> Taking the bait like a true DHCP: I think a generic linked
>> list package is about as useful as a generic array package.
>> The linked list is so simple a beast that the code to handle it
>> isn't complex or difficult enough to bother hiding away behind a
>> lot of genericized interfaces. Just stick a `struct node *next'
>> element in the data item of your choice and use open code.
>>
>> If you've got a lot of linked lists of a lot of disparate
>> struct types, I could see putting a `struct linkstuff' at the
>> start of each "payload" struct and using functions that dealt
>> in `struct linkstuff' objects, casting as needed. This could
>> be handy with the more exotic flavors of linked list, like lists
>> built with self-relative offsets instead of pointers. But for
>> a plain vanilla linked list ... Why bother?
>
> Writing a linked list isn't very complex, but I prefer to spend my time on
> the main program I'm writing, rather than having to rewrite the low level
> data structures every time I need one.

If I had access to such a generic linked list library, I'm not even sure I
would what to do with it.

I use linked lists a fair bit, but mostly each node exists in several
different linked lists simultaneously. Usually there are some up/down links
as well (the list forms part of a tree).

In the few cases I have a list forming a single chain, that tends to be a
'tight' structure that needs to be super-efficient, or that is somehow
special (for example part of a memory allocator; then the generic routine
can't allocate or free the memory for it; it *is* the memory!)

I also rarely use doubly-linked lists, so a 'prev' field would be wasted in
a generic routine.

And since it seems that custom create()/free() routines might be required
anyway, it's a simple matter to also do a couple of custom routines to do
whatever it is your generic code might do, but without the headaches of
making them fit into an awkward generic framework, and working out how to
get the data in or out, or even where it's located!

In short, I agree with Eric...

> It's just more practical to
> have a toolbox with well tested code that can be reused every time you
> need it.

But offset by the extra difficulties of trying to make generic routines fit
every situation. Perhaps have the tested, generic code, but in
pseudo-language pinned up near your desk. Then it can be adapted as
required. A bit like templates..

--
Bartc

jacob navia

unread,

Apr 11, 2010, 2:50:16 AM4/11/10

to

bartc a écrit :

In the case of a custom data structure the generic list can't be OK.

I agree.

> In the few cases I have a list forming a single chain, that tends to be a
> 'tight' structure that needs to be super-efficient, or that is somehow
> special (for example part of a memory allocator; then the generic routine
> can't allocate or free the memory for it; it *is* the memory!)
>

In the container library in this case you would use a list of pointers.
A list of pointers allocates just memory for a pointer at each addition.

In case you want to avoid ANY allocation, you write a custom allocator
that allocates from a fixed pool for instance.

> I also rarely use doubly-linked lists, so a 'prev' field would be wasted in
> a generic routine.
>

The container library offers single and double linked lists for that
reason.

> And since it seems that custom create()/free() routines might be required
> anyway, it's a simple matter to also do a couple of custom routines to do
> whatever it is your generic code might do, but without the headaches of
> making them fit into an awkward generic framework, and working out how to
> get the data in or out, or even where it's located!
>

Yes. That is why we find many hastily written list routines in each
application, all full of bugs because testing them was skipped because
they are so 'trivial' to write.

Then, the lack of standards makes each of them incompatible and needing
an adapter to work with other lists, etc etc.

> In short, I agree with Eric...
>

>> It's just more practical to
>> have a toolbox with well tested code that can be reused every time you
>> need it.
>
> But offset by the extra difficulties of trying to make generic routines
> fit every situation. Perhaps have the tested, generic code, but in
> pseudo-language pinned up near your desk. Then it can be adapted as
> required. A bit like templates..
>

Much better to write them in C and expand them at each occasion with a
simple utility.

Phil Carmody

unread,

Apr 11, 2010, 5:16:47 AM4/11/10

to

Willem <wil...@turtle.stack.nl> writes:
> jacob navia wrote:
> ) Which of
> )
> ) strcpy(dst,src);
> )
> ) or

> ) for (int i = 0; i<=strlen(src);i++) // [I added an 'i']

> ) dst[i] = src[i];
> )
> ) is clearer?

Given that they do different things, you can't compare clarity.

> You're missing the point, which would have been illustrated
> by the following:
>
> Which of
>
> strcpy(src, dst);
>
> or
>
> for (i = 0; dst[i] != 0; i++)
> src[i] = dst[i];
>
> is clearer ?

Well, both clearly have issues, so does that make them
equally clear?

Phil
--
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1

jacob navia

unread,

Apr 11, 2010, 6:13:24 AM4/11/10

to

Phil Carmody a écrit :

> Willem <wil...@turtle.stack.nl> writes:
>> jacob navia wrote:
>> ) Which of
>> )
>> ) strcpy(dst,src);
>> )
>> ) or
>> ) for (int i = 0; i<=strlen(src);i++) // [I added an 'i']
>> ) dst[i] = src[i];
>> )
>> ) is clearer?
>
>
> Given that they do different things, you can't compare clarity.
>

Step 0: The first character of src is copied into dst. If src
has length zero, the terminating zero is copied
Step 1: If the length of the source is greater than zero
the second character is copied.

etc

You are wrong (again) carmody.

Note the <=, that means that the terminating zero is copied.
Note that neither src nor dst are touched: they point always to
the start of the source and destination buffers.

pete

unread,

Apr 11, 2010, 7:45:26 AM4/11/10

to

Phil Carmody wrote:
>
> Willem <wil...@turtle.stack.nl> writes:
> > jacob navia wrote:
> > ) Which of
> > )
> > ) strcpy(dst,src);
> > )
> > ) or
> > ) for (int i = 0; i<=strlen(src);i++) // [I added an 'i']
> > ) dst[i] = src[i];
> > )
> > ) is clearer?
>
> Given that they do different things, you can't compare clarity.
>
> > You're missing the point, which would have been illustrated
> > by the following:
> >
> > Which of
> >
> > strcpy(src, dst);
> >
> > or
> >
> > for (i = 0; dst[i] != 0; i++)
> > src[i] = dst[i];
> >
> > is clearer ?
>
> Well, both clearly have issues, so does that make them
> equally clear?

You got it backwards.
jacob navia's for loop, does what strcpy does.

Willem's for loop, doesn't write a null byte.

--
pete

Eric Sosman

unread,

Apr 11, 2010, 8:40:49 AM4/11/10

to

On 4/10/2010 6:10 PM, ImpalerCore wrote:
> [...]

>>> But for
>>> a plain vanilla linked list ... Why bother?
>>
>> Generally, true. I don't even consciously notice adding a small
>> list implementation -- it's too trivial to notice.
>
> I'm curious what you consider a plain vanilla linked list?

I'll assume a basic familiarity with the notions of "list"
and of "link" (if these are lacking, ask any IUT student). As
expressed in C, a list's nodes are usually structs because this
is a convenient way to package payload and link(s) together in
one easily-managed blob. Each node contains a link to its
successor and perhaps also to its predecessor. The links are
usually struct pointers, sometimes array indices (if all the
nodes inhabit a single array), with a null pointer (or a special
index value) indicating "no successor/predecessor." There'll
also be some metadata: At the very least, a link to the list's
first node and often a link to the last as well (sometimes a
pointer-to-struct, sometimes a pointer-to-pointer-to-struct).
The metadata may be free-standing, or may inhabit a special
"list header" (especially for a circular list).

I'd consider the variations covered by the above to be
"plain vanilla" linked lists. Programmers should be familiar
with -- and comfortable with -- the basic operations of inserting,
deleting, searching, and traversing such lists. These operations
are so basic, in my view, that hiding the details away behind an
interface is (1) overkill and (2) obfuscatory. It makes about
as much sense to me as

int fetchFromIntArray(const int *array, size_t index) {
return array[index];
}
void storeIntoIntArray(int *array, size_t index, int value) {
array[index] = value;
}

What things might change a list's flavor to something other
than vanilla, to some flavor that might justify more opacity? A
non-exhaustive, er, list:

- Unusual links. If the links are self-relative offsets (as
might be used in lists shared between processes that map them
to different virtual addresses), or if the stored values are
the XOR of a forward and backward link, or some other odd
encoding is used, link manipulation may become intricate
enough to justify being packaged away somewhere.

- Bit-diddling. If the bits of the links' representation are
twiddled to encode additional information (e.g., by using
a few low-order "known to be zero" bits as flags), it's
probably best to put the encoding and decoding out of sight
somewhere. (The writers of LISP interpreters seem particularly
attracted to such techniques.)

- Peculiar nodes. If the list nodes are not structs, there's
usually something slightly exotic going on, worthy of being
hidden from us easily-scandalized maiden aunts.

- Mammoth metadata. If a list carries more than the usual
amount of metadata -- a node count, a link to the node most
recently inserted (at whatever position), a link to the node
at the midmost position, whatever -- it's probably best to
let a function package take care of the maintenance.

- Additional linkage. If there are additional links, as in
a skip list or some such, the burden of maintaining them
grows to the point where pre-packaging becomes justifiable.

... and so on. However, it doesn't seem to me that a "generic"
package is what's needed for this sort of thing; the oddities are,
well, odd, and not easily generified. If you've got a list in
shared memory, using self-relative links whose low-order bits
encode extra information about the pointed-to nodes, you'll most
likely write a few functions to manage lists of exactly that
description; the IUT Senior Thesis Generic List Package simply
won't suffice.

> Perhaps if
> you could describe the vanilla list feature subset, maybe by listing
> functions of an existing implementation like GLib or the C++ STL list,
> or is there even a function interface to it?

I'm not intimate with GLib, and I have an irrational dislike
for all things C++, so I'll take a pass on this one.

> Do you feel that adding a c_compare_function_t makes it a 'chocolate
> twist with jimmies' linked list?

Sorry; I don't even know what a c_compare_function_t might be.

--
Eric Sosman
eso...@ieee-dot-org.invalid

ImpalerCore

unread,

Apr 11, 2010, 9:18:46 AM4/11/10

to

The fact that you ignore std::list and the C++ STL, one of the best
things to happen to containers, does say something to me about your
bias. If you haven't tried to use GLib shows that you prefer to do
things your own way, which is fine by me. If you had actually tried
to use either of these libraries, and found them wanting for reasons
x, y, and z, that would be one thing. But not experiencing them first
hand makes your opinion rather difficult to digest since I have a
completely different background than you. I came from C++ to C, so
that probably describes my bias.

> > Do you feel that adding a c_compare_function_t makes it a 'chocolate
> > twist with jimmies' linked list?
>
> Sorry; I don't even know what a c_compare_function_t might be.

Something that abstracts comparison.

typedef int (*c_compare_function_t)( const void* p, const void* q )

If you manually encode a different search and sort for each struct and
fields that you manage, that's pretty hardcore.

jacob navia

unread,

Apr 11, 2010, 9:28:26 AM4/11/10

to

pete a écrit :

It is VERY easy to make mistakes in trivial function code as
this example demonstrates.

I do not doubt that willem and carmody are experienced C
programmers, but...

There is nothing more trivial that strcpy, yet even an experienced
programer will not see a small (trivial) error...

This reinforces my argument: even if lists are "trivial" to implement,
it is better to avoid reimplementing the wheel...

Richard Harter

unread,

Apr 11, 2010, 10:37:04 AM4/11/10

to

On Sun, 11 Apr 2010 07:45:26 -0400, pete <pfi...@mindspring.com>
wrote:

Correct. Willem's example illustrates the problem with
positional notation for arguments. One sees "src" and "dst" and
thinks of them as source and destination. If they are, then the
strcpy and the loop are both copying the source into the
destination.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
It's not much to ask of the universe that it be fair;
it's not much to ask but it just doesn't happen.

Eric Sosman

unread,

Apr 11, 2010, 10:36:16 AM4/11/10

to

On 4/11/2010 9:18 AM, ImpalerCore wrote:
> [...]

>>> Do you feel that adding a c_compare_function_t makes it a 'chocolate
>>> twist with jimmies' linked list?
>>
>> Sorry; I don't even know what a c_compare_function_t might be.
>
> Something that abstracts comparison.
>
> typedef int (*c_compare_function_t)( const void* p, const void* q )
>
> If you manually encode a different search and sort for each struct and
> fields that you manage, that's pretty hardcore.

For lists where node-to-node comparison is important (which
is not true of all lists), it seems to me that the comparison is
associated with the nodes and not with the lists/trees/treaps/...
in which they reside. I'll usually search for an "equal" node
with something like

for (ptr = head; ptr != NULL; ptr = ptr->next) {
if (compareNodes(ptr, key) == 0)
break;
}
if (ptr != NULL) ...

and not with

static int compareWrapper(const void *p, const void *q) {
return compareNodes(p, q);
}
...
ptr = searchList(head, key, compareWrapper);
if (ptr != NULL) ...

The former seems to me (eye of the beholder, don'cha know)
simpler than the latter. Safer, too, because the compiler will
squawk if I accidentally use the comparator for some different
node type.

Sorting a list (when that makes sense) is, I feel, better
done by a "generic list-sorting function" than by a "generic list
package." That is, it makes more sense to me to abstract the
sorting operation itself than to abstract the features of the
thing being sorted. I've written such a function (that I'm
inordinately proud of), and I use it whenever needed -- but I'll
still use open code to build the list in the first place, and to
navigate it after sorting. Similarly, I'll use qsort() to sort
an array, but I'll use [] rather than some function package to
access it before and after. I truly don't see why I shouldn't.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Phil Carmody

unread,

Apr 11, 2010, 12:16:52 PM4/11/10

to

pete <pfi...@mindspring.com> writes:
> Phil Carmody wrote:
>> Willem <wil...@turtle.stack.nl> writes:
>> > jacob navia wrote:
>> > ) Which of
>> > )
>> > ) strcpy(dst,src);
>> > )
>> > ) or
>> > ) for (int i = 0; i<=strlen(src);i++) // [I added an 'i']
>> > ) dst[i] = src[i];
>> > )
>> > ) is clearer?
>>
>> Given that they do different things, you can't compare clarity.
>

> You got it backwards.
> jacob navia's for loop, does what strcpy does.

char buff[]="hello world";
src=buff+1;
dest=buff;
if (flag) {
strcpy(dst, src); // UB
} else {
for (int i = 0; i<=strlen(src);i++) // not UB

dst[i] = src[i];
}

UB is not the same as not UB.

Tim Rentsch

unread,

Apr 12, 2010, 5:24:09 PM4/12/10

to

jacob navia <ja...@jacob.remcomp.fr> writes:

> [snip]

>
> This reinforces my argument: even if lists are "trivial" to implement,
> it is better to avoid reimplementing the wheel...

Simple linked lists predate the wheel. Historically
there is some question about whether the wheel was
invented before or after circular linked lists.

(Sorry, I wouldn't resist. :)

Nick

unread,

Apr 12, 2010, 5:28:21 PM4/12/10

to

Jef Driesen <jefdr...@hotmail.com.invalid> writes:

> From: Jef Driesen <jefdr...@hotmail.com.invalid>
> Subject: Re: Generic linked list with internal storage?
> Newsgroups: comp.lang.c
> Date: Fri, 09 Apr 2010 13:47:49 +0200
> Organization: BELNET - The Belgian Research Network

>
> On 9/04/2010 12:45, bartc wrote:
>>> A typical (double) linked list is implemented with these two data
>>> structures:
>>>
>>> typedef struct node_t {
>>> node_t *prev;
>>> node_t *next;
>>> void *data;
>>> } node_t;
>>>
>>> typedef struct list_t {
>>> node_t *head;
>>> node_t *tail;
>>> } list_t;
>>>
>>> typedef struct mydata_t {
>>> ...
>>> } mydata_t;
>>>
>>> The major disadvantage of this approach is that the data contents of the
>>> list needs to be stored outside of the list data structures. Thus if you
>>> want to store some user defined data in the list, you end up with two
>>> memory allocations per node: one for the node_t structure, and one for the
>>> user defined mydata_t structure.
>>>
>>> Is there a way to avoid this additional allocation, but without loosing
>>> the ability to store custom data?
>>

>> Why not include a node_t (or just next and prev pointers) at the start of
>> the mydata_t struct?
>>
>> (The list_t stuff would have to be a bit different though, either a
>> dedicated list_t for each mydata_t, or discrete head/tail pointers of type
>> *mydata_t)
>
> That makes the list is non generic, and that's not what I want.

But you write generic functions that work on a simple structure with
just the next and prev pointers. You just cast your specific structures
to these, and cast back to the specific types in your callback function.

The only constraint is that you have to have the same next/prev pointers
order in any structure that the generic functions work on.

This works because all structure pointers can be cast to others, and all
structures with common initial sequences have the same alignment for
that part.
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk

Tim Rentsch

unread,

Apr 12, 2010, 6:35:51 PM4/12/10

to

Nick <3-no...@temporary-address.org.uk> writes:

Actually I'm pretty sure that's not guaranteed. That is,
even though the offsets of the corresponding members might
have to be the same, the alignment of the struct as a whole
might be different. For example, if we have

struct foo { int i; double d; char b[4]; };
struct bas { int i; double d; char b[1]; };

I believe it's allowed for the alignment of a 'struct foo'
to be 4 even though the alignment of a 'struct bas' might
be only 1. I doubt any implementations actually do this,
but it's allowed (isn't it?) and not really so far-fetched
that an implementation that had made such a choice could
go badly wrong if a (struct bas*) were being used as a
(struct foo*).

Message has been deleted

Tim Rentsch

unread,

Apr 13, 2010, 1:12:02 AM4/13/10

to

Gene <gene.r...@gmail.com> writes:

> On Apr 12, 6:35 pm, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> Nick <3-nos...@temporary-address.org.uk> writes:
>> > Jef Driesen <jefdrie...@hotmail.com.invalid> writes:

> But the structs having different alignments don't cause a problem for
> the "generic" list implementation. The Standard requires that a
> pointer to one will access the common initial fields of the other.

No, it doesn't. The "common initial sequence rule" holds only when
the two structure objects in question are members of a union object
(an actual union object, not just a type relationship), and then
only when the definition of the union type in question is visible at
the point of through-other-struct-type access. Trying to do that
with two arbitrary structs like the ones above, which happen to have
the same types of members at the beginning, but that aren't part of
an actual union object having those structs as members, results in
undefined behavior.

stan

unread,

Apr 13, 2010, 12:13:02 PM4/13/10

to

Tim Rentsch wrote:
> Simple linked lists predate the wheel. Historically
> there is some question about whether the wheel was
> invented before or after circular linked lists.
>
> (Sorry, I wouldn't resist. :)

I'm pretty sure Knuth covered all of the above, in real time as they
were discovered; and exhaustively too!

(Sorry I had a lapse of judgement myself :)

Nick

unread,

Apr 13, 2010, 3:09:36 PM4/13/10

to

Tim Rentsch <t...@alumni.caltech.edu> writes:

I asked, with examples, about this a while ago, because I wanted to be
sure I was legal, and was assured I was. Anyone else got a view?

Can I do this (just typed in, not compiled or tested):

struct generic_list {
struct generic_list *next;
};

typedef void operate_function(void *list_item, void *args);

void iterate_over_list(void *list_param, operate_function *fn,
void *args) {
struct generic_list *list = list_param;
while(list) {
fn(list,args);
list = list->next;
}
}

And use this to operate on any list where the first item is a normal
"next" pointer?

[as in another thread, this is an example where I actually might well do
(*fn)(list,args)]

For the record I do, and it works. But I know how little that counts
for.

Seebs

unread,

Apr 13, 2010, 3:21:34 PM4/13/10

to

On 2010-04-13, Nick <3-no...@temporary-address.org.uk> wrote:
> I asked, with examples, about this a while ago, because I wanted to be
> sure I was legal, and was assured I was. Anyone else got a view?

I am pretty sure that the compiler would have to go out of its way
to break anything; consensus of the committee has been that "all
struct pointers smell the same" -- meaning, as long as you really do have
something as an initial sequence, the fact that it could have been in
a union somewhere in another module means that it has to work as though
it is in fact in a union somewhere.

Consider a pair of struct types, struct a and struct b, which have
a common initial subsequence. And consider:

union c { struct a a; struct b b; };
and
void unionize(struct a *a, struct b *b) {
union c c;
if (a)
c->a = *a;
if (b)
c->b = *b;
};

Since that can exist, and unions have to hold objects in their unaltered
form, and you have to be able to access the common initial subsequence
in a and b, a compiler has to work VERY HARD to not work if you
pass a pointer to struct a or b, suitably cast, to a function expecting
their common initial subsequence.

I think the intent is probably that it is supposed to work. It's an easier
pitch in the case where you have a separate struct for the common initial
subsequence, because it is definitely the case that a pointer to a structure,
suitably converted, is a valid pointer to its initial member, and vice
versa.

Oliver Jackson

unread,

Apr 13, 2010, 3:33:11 PM4/13/10

to

On Apr 11, 9:16 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:

> pete <pfil...@mindspring.com> writes:
> > Phil Carmody wrote:
> >> Willem <wil...@turtle.stack.nl> writes:
> >> > jacob navia wrote:
> >> > ) Which of
> >> > )
> >> > ) strcpy(dst,src);
> >> > )
> >> > ) or
> >> > ) for (int i = 0; i<=strlen(src);i++) // [I added an 'i']
> >> > ) dst[i] = src[i];
> >> > )
> >> > ) is clearer?
>
> >> Given that they do different things, you can't compare clarity.
>
> > You got it backwards.
> > jacob navia's for loop, does what strcpy does.
>
> char buff[]="hello world";
> src=buff+1;
> dest=buff;
> if (flag) {
> strcpy(dst, src); // UB} else {
>
> for (int i = 0; i<=strlen(src);i++) // not UB
> dst[i] = src[i];
>
> }
>
> UB is not the same as not UB.

That's retarded. The strcpy call invokes undefined behavior only
because the standard doesn't spell out strcpy's precise
implementation. Anyway, your your protest has absolutely nothing to
do with the point Jacob was making, which has to do with accidentally
passing arguments to a function in the wrong order.

Gene

unread,

Apr 13, 2010, 5:47:13 PM4/13/10

to

On Apr 13, 1:12 am, Tim Rentsch <t...@alumni.caltech.edu> wrote:

> undefined behavior.- Hide quoted text -

>
> - Show quoted text -

You're right. Thanks for the correction. I'll delete the earlier
posting.

Gene

unread,

Apr 13, 2010, 7:05:20 PM4/13/10

to

On Apr 13, 3:21 pm, Seebs <usenet-nos...@seebs.net> wrote:

> On 2010-04-13, Nick <3-nos...@temporary-address.org.uk> wrote:
>
> > I asked, with examples, about this a while ago, because I wanted to be
> > sure I was legal, and was assured I was. Anyone else got a view?
>
> I am pretty sure that the compiler would have to go out of its way
> to break anything; consensus of the committee has been that "all
> struct pointers smell the same" -- meaning, as long as you really do have
> something as an initial sequence, the fact that it could have been in
> a union somewhere in another module means that it has to work as though
> it is in fact in a union somewhere.
>
> Consider a pair of struct types, struct a and struct b, which have
> a common initial subsequence. And consider:
>
> union c { struct a a; struct b b; };
> and
> void unionize(struct a *a, struct b *b) {
> union c c;
> if (a)
> c->a = *a;
> if (b)
> c->b = *b;
> };
>
> Since that can exist, and unions have to hold objects in their unaltered
> form, and you have to be able to access the common initial subsequence
> in a and b, a compiler has to work VERY HARD to not work if you
> pass a pointer to struct a or b, suitably cast, to a function expecting
> their common initial subsequence.

This is the argument I remember, and it made sense to me. However,
then I read this:
http://coding.derkeiler.com/Archive/C_CPP/comp.lang.c/2010-03/msg00220.html
which seems to offer a meaningful counterexample. Here it is for
convenience:

#include <stdio.h>
struct common { int a, b; };
struct big_struct { struct common com; double c; };
struct small_struct { struct common com; };

int main(void)
{
struct big_struct big;
struct small_struct *otherp = (void *)&big;
big.com.a = 0;
otherp->com.a = 11;
printf("big.com.a = %d\n", big.com.a);
return 0;
}

Ben says his gcc prints 0 (rather than 11 as mine does).

>
> I think the intent is probably that it is supposed to work. It's an easier
> pitch in the case where you have a separate struct for the common initial
> subsequence, because it is definitely the case that a pointer to a structure,
> suitably converted, is a valid pointer to its initial member, and vice
> versa.

But hypothetically, the conversion could involve pointer arithmetic,
no? If so, even this is, again, a not-certain-to-be-portable way to
implement generic lists, the OP's question.

Seebs

unread,

Apr 13, 2010, 7:35:25 PM4/13/10

to

On 2010-04-13, Gene <gene.r...@gmail.com> wrote:
> #include <stdio.h>
> struct common { int a, b; };
> struct big_struct { struct common com; double c; };
> struct small_struct { struct common com; };
>
> int main(void)
> {
> struct big_struct big;
> struct small_struct *otherp = (void *)&big;
> big.com.a = 0;
> otherp->com.a = 11;
> printf("big.com.a = %d\n", big.com.a);
> return 0;
> }
>
> Ben says his gcc prints 0 (rather than 11 as mine does).

The interesting question, I guess, is whether simply declaring the
union (even if you don't use it) changes this. Or, for that matter,
having the addresses of one or more of these taken and passed
to something outside this module.

> But hypothetically, the conversion could involve pointer arithmetic,
> no?

So far as I can tell, no -- the pointer to the initial member has to
be the same as the pointer to the whole structure/union. (We spent a
while looking for this, but it's defined under the equality operator.)

> If so, even this is, again, a not-certain-to-be-portable way to
> implement generic lists, the OP's question.

Yeah. I think the problem is that strict aliasing rules have to be
subverted, and I think some compilers are smart enough to check whether
or not you've demonstrated that they're being subverted.

Tim Rentsch

unread,

Apr 13, 2010, 9:10:51 PM4/13/10

to

Nick <3-no...@temporary-address.org.uk> writes:

> "next" pointer? [snip]

Technically, it's undefined behavior. Practically speaking,
my guess is it will work on every implementation you're
likely to encounter. However, it's easy to devise
a perverse-but-conforming implementation where it would
fail.

I'm not sure how much this helps you, but if you follow
up with a more specific question I'll try to give a
more specific answer.

Tim Rentsch

unread,

Apr 13, 2010, 9:26:23 PM4/13/10

to

Seebs <usenet...@seebs.net> writes:

> On 2010-04-13, Nick <3-no...@temporary-address.org.uk> wrote:
>> I asked, with examples, about this a while ago, because I wanted to be
>> sure I was legal, and was assured I was. Anyone else got a view?
>
> I am pretty sure that the compiler would have to go out of its way
> to break anything; consensus of the committee has been that "all
> struct pointers smell the same" -- meaning, as long as you really do have
> something as an initial sequence, the fact that it could have been in
> a union somewhere in another module means that it has to work as though
> it is in fact in a union somewhere.
>
> Consider a pair of struct types, struct a and struct b, which have
> a common initial subsequence. And consider:
>
> union c { struct a a; struct b b; };
> and
> void unionize(struct a *a, struct b *b) {
> union c c;
> if (a)
> c->a = *a;
> if (b)
> c->b = *b;
> };

Of course you meant c.a and c.b, not c->a and c->b.

> Since that can exist, and unions have to hold objects in their unaltered
> form, and you have to be able to access the common initial subsequence
> in a and b, a compiler has to work VERY HARD to not work if you
> pass a pointer to struct a or b, suitably cast, to a function expecting
> their common initial subsequence.

Practically speaking casting one struct pointer type to another
will almost certainly work for any members that happen to have
the same offset (and members in the "common leading sequence"
certainly will). However technically it's still undefined
behavior. In fact even just doing the cast of the pointer is
undefined behavior, since the two structures aren't guaranteed to
have the same alignment requirements; pointers to structures all
have the same alignment, but the structures they point to don't
have to.

> I think the intent is probably that it is supposed to work. It's an easier
> pitch in the case where you have a separate struct for the common initial
> subsequence, because it is definitely the case that a pointer to a structure,
> suitably converted, is a valid pointer to its initial member, and vice
> versa.

The "common initial sequence rule" is required to work _only_ if
the struct object in question is contained in an actual union
object of the appropriate type. Not just if the union _type_
exists and is visible, but if the struct _object_ is actually in
a union _object_. If it's a freestanding struct, not contained
in a union object of a type that includes both structs, all bets
are off.

Seebs

unread,

Apr 13, 2010, 9:25:37 PM4/13/10

to

On 2010-04-14, Tim Rentsch <t...@alumni.caltech.edu> wrote:
> The "common initial sequence rule" is required to work _only_ if
> the struct object in question is contained in an actual union
> object of the appropriate type. Not just if the union _type_
> exists and is visible, but if the struct _object_ is actually in
> a union _object_. If it's a freestanding struct, not contained
> in a union object of a type that includes both structs, all bets
> are off.

This surprises me, although I think you're right.

Speaking only for myself, I would *prefer* that the spec simply
give the guarantee that, if the types of the first N members of
two structure types are the same types in the same order, then you
can use a pointer to one as a pointer to the other so long as
you only access those members. In practice, it's what people
seem to mostly expect, and I don't think the lost optimizations
will be all that big a deal. In particular, "restrict" lets us
specify the thing we're most likely to care about -- that the
two arguments to a function which are both pointers to structures
necessarily point to different structures.

Tim Rentsch

unread,

Apr 13, 2010, 9:48:26 PM4/13/10

to

Seebs <usenet...@seebs.net> writes:

> On 2010-04-13, Gene <gene.r...@gmail.com> wrote:
>> #include <stdio.h>
>> struct common { int a, b; };
>> struct big_struct { struct common com; double c; };
>> struct small_struct { struct common com; };
>>
>> int main(void)
>> {
>> struct big_struct big;
>> struct small_struct *otherp = (void *)&big;
>> big.com.a = 0;
>> otherp->com.a = 11;
>> printf("big.com.a = %d\n", big.com.a);
>> return 0;
>> }
>>
>> Ben says his gcc prints 0 (rather than 11 as mine does).
>
> The interesting question, I guess, is whether simply declaring the
> union (even if you don't use it) changes this.

Declaring a union type (including defining its members) makes
no difference. Only if the struct objects in question are
actually inside an actual union object does that provision
come into play.

> Or, for that matter,
> having the addresses of one or more of these taken and passed
> to something outside this module.

Practically speaking that might have an effect, but the
technical answer is still undefined behavior. The second line
of main(), putting '(void*)&big' into 'otherp', is already
undefined behavior; however, even if that works (as indeed
it is likely to) the accesses through 'otherp' are also
undefined behavior, for the reason you later point out --
an object of one type is being accessed through a different
(and not an otherwise allowed) type.

>> But hypothetically, the conversion could involve pointer arithmetic,
>> no?
>
> So far as I can tell, no -- the pointer to the initial member has to
> be the same as the pointer to the whole structure/union. (We spent a
> while looking for this, but it's defined under the equality operator.)
>
>> If so, even this is, again, a not-certain-to-be-portable way to
>> implement generic lists, the OP's question.
>
> Yeah. I think the problem is that strict aliasing rules have to be
> subverted, and I think some compilers are smart enough to check whether
> or not you've demonstrated that they're being subverted.

With some minor exceptions, it's always undefined behavior
to access an object of one type using a different type
if the object isn't in a union object that has both
types in it. Everyone knows about the exceptions for
accessing through (char*), and there a a couple of
others. But independent structs never qualify, outside
of the case where they are both in a union and the
actual object is in an actual union object.

Nick Keighley

unread,

Apr 14, 2010, 3:44:17 AM4/14/10

to

yes they /should/ but are they actually?

> These operations
> are so basic, in my view, that hiding the details away behind an
> interface is (1) overkill and (2) obfuscatory. It makes about
> as much sense to me as
>
> int fetchFromIntArray(const int *array, size_t index) {
> return array[index];
> }
> void storeIntoIntArray(int *array, size_t index, int value) {
> array[index] = value;
> }

I disagree. Accessing a list is /not/ as simple as accessing an array.
If you'd spent as much time as I have debugging other peoples' buggy
list implementaions you'd be much more in favour of a list library.
It's one of the things that attracted me to C++, or at least to the
STL. I also object to application inter-mingled with list manipulation
code.

find_matching_call();

shouldn't be replaced with inline list walking code. Hey one day we
may decide the over-head of walking a list is too expensive and switch
to a tree or hash.

This is called abstraction and there should be separation of domain
and implementation.

<snip>

> > Do you feel that adding a c_compare_function_t makes it a 'chocolate
> > twist with jimmies' linked list?
>
> Sorry; I don't even know what a c_compare_function_t might be.

me neither

--

In the development of the understanding of complex phenomena,
the most powerful tool available to the human intellect is
abstraction. Abstraction arises from the recognition of similarities
between certain objects, situations, or processes in the real world
and the decision to concentrate on these similarities and to ignore,
for the time being, their differences.
- C.A.R. Hoare

Nick Keighley

unread,

Apr 14, 2010, 3:57:16 AM4/14/10

to

On 11 Apr, 14:18, ImpalerCore <jadil...@gmail.com> wrote:
> On Apr 11, 8:40 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:

[vanilla list]

> > > Do you feel that adding a c_compare_function_t makes it a 'chocolate
> > > twist with jimmies' linked list?
>
> > Sorry; I don't even know what a c_compare_function_t might be.
>
> Something that abstracts comparison.
>
> typedef int (*c_compare_function_t)( const void* p, const void* q )
>
> If you manually encode a different search and sort for each struct and
> fields that you manage, that's pretty hardcore.

I've worked on a application that did this a *lot* (hand coded
searches)
It induced nausea

ImpalerCore

unread,

Apr 14, 2010, 10:35:24 AM4/14/10

to

On Apr 14, 3:57 am, Nick Keighley <nick_keighley_nos...@hotmail.com>
wrote:

Interesting. How severe were the symptoms?

The kinds of comparison functions I use the most are of the form

\code
int c_vstrcmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;

return c_strcmp( s1, s2 );
}

int c_intcmp( const void* p, const void* q )
{
const int* i1 = (const int*)p;
const int* i2 = (const int*)q;

return ( *i1 > *i2 ) - ( *i1 < *i2 );
}

/* Not usable for sorting, but for searching I find it quite useful.
*/
int vstrstrcmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;

return strstr( s1, s2 ) ? 0 : -1;
}
\endcode

For struct types, there are specific comparison functions for each
field I need to search for. This is from one of the samples that I
used to test my list implementation.

\code
struct departure_record_tag
{
int id;
char destination[20];
char airline[20];
int flight_number;
char time[10];
};
typedef struct departure_record_tag departure_record;

departure_record departure_data[] =
{
{ 1, "Atlanta", "Delta", 1963, "12:00 PM" },
{ 2, "Atlanta", "AirTran Airways", 189, "12:30 PM" },
{ 3, "Boston", "Delta", 6650, "12:00 PM" },
{ 4, "Boston", "American", 4480, "13:00 PM" },
...
}

int departure_compare_by_destination( const void* p, const void* q )
{
const departure_record* dr1 = (const departure_record*)p;
const departure_record* dr2 = (const departure_record*)q;

return c_vstrcmp( dr1->destination, dr2->destination );
}

int departure_compare_by_destination( const void* p, const void* q )
{
const departure_record* dr1 = (const departure_record*)p;
const departure_record* dr2 = (const departure_record*)q;

return c_vstrcmp( dr1->airline, dr2->airline );
}

int departure_compare_by_flight_number( const void* p, const void* q )
{
const departure_record* dr1 = (const departure_record*)p;
const departure_record* dr2 = (const departure_record*)q;

return c_intcmp( &dr1->flight_number, &dr2->flight_number );
}
\endcode

There are also some comparison functions I use for searching that
don't require instantiating a departure_record to use. I find them to
be convenient, but they are not necessary.

\code
/* Not usable for sorting. */
int search_departure_by_airline( const void* p, const void* q )
{
const departure_record* dr = (const departure_record*)p;
const char* airline = (const char*)q;

return c_vstrcmp( dr->airline, airline );
}

c_list_t* departures = NULL;
c_list_t* l;

/* Fill up departures with records. */

l = c_list_search( departures, "Delta",
search_departures_by_airline );
if ( l ) {
/* Do something with it. */
}
\endcode

The alternative is hand-coding an iterative loop for each specific
search, while I don't find it nausea inducing, I do find it carpal
tunnel inducing for the easy stuff. For the harder stuff, I still
write hand-written loops. And by harder I mean where I have more than
simple equality to look for, like 'eq|ne|lt|gt|lteq|gteq' or multiple
fields. I wish I could figure out a way to do a tcpdump like filter
string that I could use, but it seems quite a bit out of my league.

Just some more random thoughts in the pile.

Best regards,
John D.

Michael Angelo Ravera

unread,

Apr 14, 2010, 2:14:41 PM4/14/10

to

On Apr 10, 10:42 am, Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Jef Driesen <jefdrie...@hotmail.com.invalid> writes:
> > A typical (double) linked list is implemented with these two data
> > structures:
>
> > typedef struct node_t {
> > node_t *prev;
> > node_t *next;
> > void *data;
> > } node_t;
>
> > typedef struct list_t {
> > node_t *head;
> > node_t *tail;
> > } list_t;
>
> > typedef struct mydata_t {
> > ...
> > } mydata_t;
>
> > The major disadvantage of this approach is that the data contents of
> > the list needs to be stored outside of the list data structures. Thus
> > if you want to store some user defined data in the list, you end up
> > with two memory allocations per node: one for the node_t structure,
> > and one for the user defined mydata_t structure.
>
> > Is there a way to avoid this additional allocation, but without

> > loosing the ability to store custom data? Thus I don't want to tie my
> > list to the mydata_t structure like this:

>
> > typedef struct node_t {
> > node_t *prev;
> > node_t *next;

> > mydata_t data;
> > } node_t;
>
> > Maybe allocate storage for the node_t and the mydata_t structure in a
> > single block. Something like this:
>
> > node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
> > mydata_t *data = node + sizeof (node_t);
>
> > But I think this might cause trouble due to alignment issues?
>
> Yes, it might. In fact, almost certainly will, if the code has
> any significant longevity.
>
> It is possible to do something along the lines of what you're
> asking about. Very briefly, the list "header" has an additional
> member that stores a byte offset of where the node value is
> stored, and what numeric value is stored in this offset member is
> computed using one of the standard techniques for determining
> alignment of a type. Doing this reliably and portably requires a
> lot of attention to details regarding not only alignment but also
> the rules about storing into structure members and padding bytes.
> (There are some other details about what type is being used and
> what interface to supply to read or store the data, but I'm
> skipping over these.)
>
> A more significant question, I think, is are you really sure you
> want to do this, given that it seems like you aren't completely
> sure of how to do it so it works? It's easy to sidestep all the
> difficulties, just by doing two allocations. Until you know how
> to avoid the problems, and avoid them reliably, you're probably
> not in the best position to decide whether the single allocation
> approach is a good one. So I would suggest either just doing
> two allocations, or making learning how to implement a "data
> and list header together" approach a pre-requisite to deciding
> whether you do, in fact, actually want to use such an approach
> in your program(s).- Hide quoted text -

>
> - Show quoted text -

Assuming that the data contents are self-described, you SHOULD be able
to create a doublely linked list header structure that looks like
this:
typedef struct list_head
{
list_head * next;
list_head * prev;
grand_union data [0];
} list_head;

grand_union should be a union of all of the types with any alignment
issues something like:
typedef union grand_union
{
char *pc;
void *pv;
char c;
long l;
long long ll;
float f;
long float lf;
/* Include any other types that might have alignment issues here */
} grand_union;

In theory, the compiler should position the union within the struct in
such a way that all alignment issues are handled properly and the size
of the list_head struct will now make it possible to align on any of
the types included in grand_union (including a pointer, if that turns
you one).

The zero size allocation *IS* however, by some compilers, believed to
be nonstandard and will cost you a warning. If it works, you'd
allocate as follows:
new_node = malloc (sizeof (list_head) + mysize);

If you want to be completely standard, you can make the space
allocation be data [1] and use this allocation instead:
new_node = malloc (offsetof (list_head, data) + mysize);

If you have included a "mydatatype mdt" in the grand_union definiton,
you may allocate as follows:

new_node = malloc (offsetof (list_head, data.mdt[1]));

If your data is an array of long floats with num_cells occurances,
you could allocated as follows:

new_node = malloc (offsetof (list_head, data.lf[num_cells]));

Get the idea?

Eric Sosman

unread,

Apr 14, 2010, 4:15:15 PM4/14/10

to

On 4/14/2010 2:14 PM, Michael Angelo Ravera wrote:
> [...]

> Assuming that the data contents are self-described, you SHOULD be able
> to create a doublely linked list header structure that looks like
> this:
> typedef struct list_head
> {
> list_head * next;
> list_head * prev;
> grand_union data [0];
> } list_head;
>
>

> grand_union should be a union of all of the types [...]

>
> The zero size allocation *IS* however, by some compilers, believed to
> be nonstandard and will cost you a warning.

s/believed/correctly believed/
s/warning/diagnostic/

The C99 "flexible array member" is a way to do this cleanly.

> If it works, you'd
> allocate as follows:
> new_node = malloc (sizeof (list_head) + mysize);
>
> If you want to be completely standard, you can make the space
> allocation be data [1] and use this allocation instead:
> new_node = malloc (offsetof (list_head, data) + mysize);

For "completely standard," mysize must be greater than or
equal to sizeof(grand_union), even if the data to be stored
there is smaller. Why? Because the compiler has been told
that an entire `struct list_head' is allocated, and is therefore
at liberty to make references to any byte thereof. If some of
those bytes aren't there, you're out of luck.

Yes, I've seen this happen, even in a "slightly cleaner"
context. The program had a pile of struct types with common
initial sequences and different-sized continuations, and a union
that held all of them. A function took a pointer to the union
type, looked at a type code in the common initial part to decide
what kind of struct was actually present, and switch'ed to type-
specific code that dealt only with the "really-there" member:

struct foo { int type; int data; };
struct bar { int type; char data[42]; };
...
union all { struct foo foo; struct bar bar; ... };

void func(union all *up) {
switch(up->foo.type) {
case FOO:
... use up->foo ...
break;
case BAR:
... use up->bar ...
break;
...
}
}

On one platform, the program would get SIGSEGV (rarely; this
was very difficult to reproduce). Eventually, three of us figured
it out: The function was called with a pointer to a small-sized
`struct foo' that just happened to be allocated at the very end
of mapped memory, and the optimizing compiler ("knowing" that the
entire large-sized `union all' was present) had decided to do a
speculative pre-fetch of a toward-the-end field, inserting the
pre-fetch before the switch. And the field offset, being greater
than sizeof(struct foo), put the access off the end of memory ...

"If you lie to the compiler, it will get its revenge."
-- Henry Spencer

--
Eric Sosman
eso...@ieee-dot-org.invalid