Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: [comp.lang.c.moderated] valid "struct hack" in C89

22 views
Skip to first unread message

Ersek, Laszlo

unread,
Dec 18, 2009, 1:56:36 PM12/18/09
to
From: "Clive D.W. Feather" <cl...@davros.org>
Date: Thu, 17 Dec 2009 18:21:01 -0600 (CST)
Message-ID: <clcm-2009...@plethora.net>

> <http://www.open-std.org/Jtc1/sc22/wg14/www/docs/dr_073.html>
> <http://www.open-std.org/Jtc1/sc22/wg14/www/docs/dr_178.html>
> <http://www.open-std.org/Jtc1/sc22/wg14/www/docs/dr_072.html>


From: "Clive D.W. Feather" <cl...@davros.org>
Date: Thu, 17 Dec 2009 19:08:31 -0600 (CST)
Message-ID: <clcm-2009...@plethora.net>

> <http://www.open-std.org/Jtc1/sc22/wg14/www/docs/dr_051.html>

Thank you for steering me to these links; they are very useful and have
convinced me that the struct hack cannot be made strictly conforming in
C89; just like the C99 rationale says. These were they key words, from
DR51:

it permits an implementation to tailor how it represents pointers
to the size of the objects they point at.

And I've remembered near and far pointers from DOS or so. (I didn't
program at that time in C yet, just read about them, so they might be
completely irrelevant here.) This sentence also clarifies that it makes no
difference to replace the array member holding one element by a single
nonarray object member.


> The proposal that led to the C99 feature was:
> <http://www.davros.org/c/wg14n791.txt>

I'd have never hoped for an answer directly from the originator of the
flexible array member. Awesome.


> You're missing the wording in 6.3.6:
>
> For the purposes of these operators, a pointer to a nonarray object
> behaves the same as a pointer to the first element of an array of
> length one with the type of the object as its element type.

Yes, I did miss that.


From: James Kuyper <james...@verizon.net>
Date: Thu, 17 Dec 2009 18:20:39 -0600 (CST)
Message-ID: <clcm-2009...@plethora.net>

> My understanding is that code which used of the struct hack technically
> had undefined behavior in C89, but that it was essentially 100%
> portable.

Now I plan to document the struct hack in my code [0] instead of replacing
it -- I really don't want to do a second malloc() there. I think I may
finally convince myself with the following thought process:

- my code is written for The Single UNIX(R) Specification, Version 2;

- SUSv2 mandates the interfaces msgsnd() [1] and msgrcv() [2], and the
(normative) descriptions of both of those functions show a struct hack
example;

- even though the SUS versions defer to the corresponding ISO C standards
in any case of conflict, I cannot imagine that any certified (or
non-certified) SUSv2-conformant platform would not support the interfaces
mentioned above, and as a consequence, the struct hack;

- in the description of the c89 utility [3], section "Programming
Environments" makes me believe that pointers consist of at least 32 bits
on SUSv2; that covers my struct hack subscript ranges conveniently.

I thank you both very much,
Laszlo Ersek

[0] http://freshmeat.net/projects/lbzip2
[1] http://www.opengroup.org/onlinepubs/007908775/xsh/msgsnd.html
[2] http://www.opengroup.org/onlinepubs/007908775/xsh/msgrcv.html
[3] http://www.opengroup.org/onlinepubs/007908775/xcu/c89.html
--
comp.lang.c.moderated - moderation address: cl...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.

Ersek, Laszlo

unread,
Dec 20, 2009, 3:42:52 AM12/20/09
to
On Fri, 18 Dec 2009, Ersek, Laszlo wrote:

> - in the description of the c89 utility [3], section "Programming
> Environments" makes me believe that pointers consist of at least 32 bits on
> SUSv2; that covers my struct hack subscript ranges conveniently.
>

> [3] http://www.opengroup.org/onlinepubs/007908775/xcu/c89.html

This last one was stupid; I apologize. The point is exactly that we don't
use a "generic" pointer but an array of known size, or a pointer whose
target of known size might be figured out by the compiler.


This made me recall the last code part of the C FAQ 2.6:

http://www.c-faq.com/struct/structhack.html

----v----

struct name {
int namelen;
char *namep;
};

struct name *makename(char *newname)
{
char *buf = malloc(sizeof(struct name) +
strlen(newname) + 1);
struct name *ret = (struct name *)buf;
ret->namelen = strlen(newname);
ret->namep = buf + sizeof(struct name);
strcpy(ret->namep, newname);

return ret;
}

However, piggybacking a second region onto a single malloc call like this
is only portable if the second region is to be treated as an array of
char.

----^----

I do need an array of "char unsigned" elements. Is the above strictly
conforming? (I can't trust the FAQ anymore at face value.)

Thank you, and I apologize for beating a dead horse.
Laszlo Ersek

James Kuyper

unread,
Dec 20, 2009, 8:55:45 PM12/20/09
to
Ersek, Laszlo wrote:
...

> This made me recall the last code part of the C FAQ 2.6:
>
> http://www.c-faq.com/struct/structhack.html
>
> ----v----
>
> struct name {
> int namelen;
> char *namep;
> };
>
> struct name *makename(char *newname)
> {
> char *buf = malloc(sizeof(struct name) +
> strlen(newname) + 1);
> struct name *ret = (struct name *)buf;
> ret->namelen = strlen(newname);
> ret->namep = buf + sizeof(struct name);
> strcpy(ret->namep, newname);
>
> return ret;
> }
>
> However, piggybacking a second region onto a single malloc call like
> this is only portable if the second region is to be treated as an array
> of char.

Making it portable for other types just requires a little bit of
additional complication to get a suitably aligned pointer:

struct vector{
int veclen;
double *vecp;
};

struct name *makevector(double *newvec, int length)
{
size_t bufdbl = (sizeof(struct vector) + sizeof(double) - 1)
/ sizeof(double);
void *buf = malloc( (bufdbl + length) * sizeof(double) );
struct vector *ret = (struct vector *)buf;
ret->veclen = length;
ret->vecp = (double*)buf + bufdbl;
memcpy(ret->vecp, newvec, length*sizeof(double));

return ret;
}

Note that this wastes a little space if the alignment requirement for
double is smaller than sizeof(double); this is an example of a case
where alignof(type) would be a useful addition to the language.

Clive D. W. Feather

unread,
Dec 20, 2009, 8:47:00 PM12/20/09
to
In message <clcm-2009...@plethora.net>, "Ersek, Laszlo"
<la...@caesar.elte.hu> wrote:
>Now I plan to document the struct hack in my code [0] instead of
>replacing it -- I really don't want to do a second malloc() there. I
>think I may finally convince myself with the following thought process:
>
>- my code is written for The Single UNIX(R) Specification, Version 2;
>
>- SUSv2 mandates the interfaces msgsnd() [1] and msgrcv() [2], and the
>(normative) descriptions of both of those functions show a struct hack
>example;

It may well (I haven't checked) be that those interfaces require that
the struct hack work on any conforming system.

>- even though the SUS versions defer to the corresponding ISO C
>standards in any case of conflict, I cannot imagine that any certified
>(or non-certified) SUSv2-conformant platform would not support the
>interfaces mentioned above, and as a consequence, the struct hack;

Note that this would not be a conflict.

The struct hack invokes behaviour that is undefined in C. That doesn't
prevent some other standard from defining the behaviour if the
application is required to conform to both standards.

--
Clive D.W. Feather | Home: <cl...@davros.org>
Mobile: +44 7973 377646 | Web: <http://www.davros.org>
Please reply to the Reply-To address, which is: <cl...@davros.org>

Ersek, Laszlo

unread,
Dec 21, 2009, 12:36:45 PM12/21/09
to
From: James Kuyper <james...@verizon.net>
Date: Sun, 20 Dec 2009 19:55:45 -0600 (CST)
Message-ID: <clcm-2009...@plethora.net>

> Making it portable for other types just requires a little bit of
> additional complication to get a suitably aligned pointer:
>
> struct vector{
> int veclen;
> double *vecp;
> };
>
> struct name *makevector(double *newvec, int length)
> {
> size_t bufdbl = (sizeof(struct vector) + sizeof(double) - 1)
> / sizeof(double);
> void *buf = malloc( (bufdbl + length) * sizeof(double) );
> struct vector *ret = (struct vector *)buf;
> ret->veclen = length;
> ret->vecp = (double*)buf + bufdbl;
> memcpy(ret->vecp, newvec, length*sizeof(double));
>
> return ret;
> }

Great! If I get it, you're calculating the (rounded up) integral number of
double's in an array so that the array, when overlayed at the beginning of
the structure, fully covers the struct. You then add those double's you
actually want to use, and never access the former part. The latter part is
correctly aligned because it's a slice of the larger array placed at the
beginning of a malloc()'d area, and such areas are correctly aligned for
any type.

I won't try to prove the above is strictly conforming, but I've learned
something again. Thanks.

-o-

Both in the original version,

>> struct name {
>> int namelen;
>> char *namep;
>> };
>>
>> struct name *makename(char *newname)
>> {
>> char *buf = malloc(sizeof(struct name) +
>> strlen(newname) + 1);
>> struct name *ret = (struct name *)buf;
>> ret->namelen = strlen(newname);
>> ret->namep = buf + sizeof(struct name);
>> strcpy(ret->namep, newname);
>>
>> return ret;
>> }

and additionally in your version at the top, the pointer member of the
structure is set by an expression that depends on "buf" (of type "void *"
or "char *") and not on "ret" (of type "struct ... *"). I wonder if this
is accidental or intentional? Couldn't we rewrite the original version
like the following? (My changes that I intend to be significant are on
lines A and B which are meant as alternatives):


struct name
{
size_t namelen;
char *namep;
};

struct name *
makename(const char *newname)
{
size_t len;
struct name *ret;

len = strlen(newname);
ret = malloc(sizeof *ret + (len + 1u));
if (0 != ret) {
ret->namelen = len;

ret->namep = (char *)(ret + 1); /* A */
ret->namep = (char *)ret + sizeof *ret; /* B */

(void)strcpy(ret->namep, newname); /* C */
}
return ret;
}


I believe section "6.3.6 Additive operators" of the C89 standard makes
both A and B valid. I'm not sure whether the write access on line C is
strictly conformant, though. If the careful dependency on nothing else
than "buf" in the above code examples is deliberate, then maybe the
compiler is permitted to draw some conclusions of the fact that I
immediately cast the return value of malloc() to "struct name *", and
perhaps after that point I must not write to *ret->namep. 6.3.6 says,

unless both the pointer operand and the result point to elements of
the same array object, or the pointer operand points one past the
last element of an array object and the result points to an element
of the same array object, the behavior is undelined if the result
is used as an operand of the unary * operator.

i) I'm handling a nonarray object (*ret), but for this section, it behaves
like an array with one element.

ii) On line A, the result of the addition points (before the cast) one
past the last element of the array.

iii) The result is used as an operand of the unary * operator on line C,
but only after casting the result to char *.

Is the cast to "char *" on line A, between steps ii and iii, sufficient
for keeping the write access well-defined? If not, is line B sufficient?
Or is there no way out once we depend on "ret" (instead of "buf")?

Thanks,
lacos

James Kuyper

unread,
Dec 21, 2009, 4:10:16 PM12/21/09
to

Intentional. The struct type and the type pointed at by it's member have
no special relationship which would, in general, allow for safe
conversion between the two pointer types. In this particular case, the
pointer is guaranteed to be suitably aligned for either type. However,
since in general it would not be a safe conversion, I prefer to convert
directly from void*. The safety of that conversion is guaranteed by the
definition of malloc(). However, this is just a preference; it should
work fine either way.

Ersek, Laszlo

unread,
Dec 21, 2009, 4:09:52 PM12/21/09
to
On Mon, 21 Dec 2009, la...@ludens.elte.hu wrote:

From: "Clive D. W. Feather" <cl...@davros.org>
Date: Sun, 20 Dec 2009 19:47:00 -0600 (CST)
Message-ID: <clcm-2009...@plethora.net>

> In message <clcm-2009...@plethora.net>, "Ersek, Laszlo"
> <la...@caesar.elte.hu> wrote:
>>
>> - SUSv2 mandates the interfaces msgsnd() [1] and msgrcv() [2], and the
>> (normative) descriptions of both of those functions show a struct hack
>> example;
>
> It may well (I haven't checked) be that those interfaces require that
> the struct hack work on any conforming system.

I'm a bit uneasy because it doesn't appear to be an explicit requirement;
it's rather two instances of an example (in normative parts of the
standard).


>> - even though the SUS versions defer to the corresponding ISO C
>> standards in any case of conflict, I cannot imagine that any certified
>> (or non-certified) SUSv2-conformant platform would not support the
>> interfaces mentioned above, and as a consequence, the struct hack;
>
> Note that this would not be a conflict.
>
> The struct hack invokes behaviour that is undefined in C. That doesn't
> prevent some other standard from defining the behaviour if the
> application is required to conform to both standards.

One comes to equate "undefined" to "never do this". It never ever crossed
my mind that a standard based on C89 might "supplement" C89 (not that
SUSv2 would do that as far as I know). I was OK with choosing (and
documenting) some behavior for "implementation-defined" and choosing (and
not documenting) some behavior for "unspecified", but defining "undefined"
catches me by surprise. Thanks.

lacos

Chris M. Thomasson

unread,
Dec 22, 2009, 3:12:40 AM12/22/09
to
"James Kuyper" <james...@verizon.net> wrote in message
news:clcm-2009...@plethora.net...

I believe that you can use this to compute the alignment of any type:
_________________________________________________
#define ALIGN_OF(mp_type) \
offsetof( \
struct \
{ \
char pad_ALIGN_OF; \
mp_type type_ALIGN_OF; \
}, \
type_ALIGN_OF \
)
_________________________________________________

James Kuyper

unread,
Dec 22, 2009, 2:41:36 PM12/22/09
to
Chris M. Thomasson wrote:
> "James Kuyper" <james...@verizon.net> wrote in message
> news:clcm-2009...@plethora.net...
...

>> Note that this wastes a little space if the alignment requirement for
>> double is smaller than sizeof(double); this is an example of a case
>> where alignof(type) would be a useful addition to the language.
>
> I believe that you can use this to compute the alignment of any type:
> _________________________________________________
> #define ALIGN_OF(mp_type) \
> offsetof( \
> struct \
> { \
> char pad_ALIGN_OF; \
> mp_type type_ALIGN_OF; \
> }, \
> type_ALIGN_OF \
> )

That's pretty likely to produce the desired result, but it is not
required to do so. The C standard does allow padding to be inserted
between struct elements. The only good reason I'm aware of for doing so
is to insert the minimum number of bytes needed to give each member it's
required alignment. However, the standard does not require that only the
minimum needed amount of padding be used.

James Kuyper

unread,
Dec 22, 2009, 2:41:49 PM12/22/09
to
Chris M. Thomasson wrote:
> "James Kuyper" <james...@verizon.net> wrote in message
> news:clcm-2009...@plethora.net...
...

>> Note that this wastes a little space if the alignment requirement for
>> double is smaller than sizeof(double); this is an example of a case
>> where alignof(type) would be a useful addition to the language.
>
> I believe that you can use this to compute the alignment of any type:
> _________________________________________________
> #define ALIGN_OF(mp_type) \
> offsetof( \
> struct \
> { \
> char pad_ALIGN_OF; \
> mp_type type_ALIGN_OF; \
> }, \
> type_ALIGN_OF \
> )

That's pretty likely to produce the desired result, but it is not

required to do so. The C standard does allow padding to be inserted
between struct elements. The only good reason I'm aware of for doing so
is to insert the minimum number of bytes needed to give each member it's
required alignment. However, the standard does not require that only the
minimum needed amount of padding be used.

Clive D.W. Feather

unread,
Dec 22, 2009, 7:05:04 PM12/22/09
to
> One comes to equate "undefined" to "never do this". It never ever crossed
> my mind that a standard based on C89 might "supplement" C89 (not that
> SUSv2 would do that as far as I know).

Oh yes, it would. For example, it defines macros beginning with "__" and
functions beginning "is".

It is normal for such standards to define at least some things that are
undefined in the C Standard. Remember that the meaning of "undefined" is
that "this Standard puts no requirements on the behaviour".

> I was OK with choosing (and
> documenting) some behavior for "implementation-defined" and choosing (and
> not documenting) some behavior for "unspecified", but defining "undefined"
> catches me by surprise. Thanks.

You're welcome.

--
Clive D.W. Feather | If you lie to the compiler,
Email: cl...@davros.org | it will get its revenge.
Web: http://www.davros.org | - Henry Spencer
Mobile: +44 7973 377646

Chris M. Thomasson

unread,
Dec 23, 2009, 1:56:50 PM12/23/09
to

Well, AFAICT the amount of padding used will be enough for proper alignment,
so the value returned should be usable. If the compiler inserts more than
the minimum amount, you will be definitely be wasting some space however the
alignment should still be correct. What am I missing here?

jameskuyper

unread,
Dec 23, 2009, 2:41:33 PM12/23/09
to

Nothing: I'm just pointing out that, in principle, it can be
wastefully larger than the true alignment requirement, just as is true
of sizeof(type); it could even be more wasteful than sizeof(type). In
practice, it's very likely to be less wasteful than sizeof(type) in
any circumstance where a the alignment is smaller than the size of the
type. However, a true language-level construct would be better than
either workaround.

Keith Thompson

unread,
Dec 23, 2009, 2:57:08 PM12/23/09
to
"Chris M. Thomasson" <n...@spam.invalid> writes:
> "James Kuyper" <james...@verizon.net> wrote in message
> news:clcm-2009...@plethora.net...
>> Chris M. Thomasson wrote:
[...]

>>> I believe that you can use this to compute the alignment of any type:
>>> _________________________________________________
>>> #define ALIGN_OF(mp_type) \
>>> offsetof( \
>>> struct \
>>> { \
>>> char pad_ALIGN_OF; \
>>> mp_type type_ALIGN_OF; \
>>> }, \
>>> type_ALIGN_OF \
>>> )
>>
>> That's pretty likely to produce the desired result, but it is not
>> required to do so. The C standard does allow padding to be inserted
>> between struct elements. The only good reason I'm aware of for doing
>> so is to insert the minimum number of bytes needed to give each
>> member it's required alignment. However, the standard does not
>> require that only the minimum needed amount of padding be used.
>
> Well, AFAICT the amount of padding used will be enough for proper
> alignment, so the value returned should be usable. If the compiler
> inserts more than the minimum amount, you will be definitely be
> wasting some space however the alignment should still be
> correct. What am I missing here?

You're missing the fact that the compiler is allowed to insert more
padding than necessary.

For example, assume an implementation where int is 4 bytes and has an
alignment of 4 bytes, but for some reason, given:
struct {
char pad;
int i;
};
the compiler inserts 7 bytes of padding between "pad" and "i", rather
than the required 3. Then ALIGN_OF(int) will yield 8, which is an
alignment that will work for int, but not *the* alignment of int.
(Note that the alignment of a type cannot exceed the size of a type,
because arrays cannot have gaps.)

I know of no real-world implementation that behaves this way,
but the standard allows it (basically because the standard doesn't
bother to impose tight restrictions on struct member layout).

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Chris M. Thomasson

unread,
Jan 4, 2010, 7:29:35 PM1/4/10
to
"Keith Thompson" <ks...@mib.org> wrote in message

If the compiler inserts more than the minimum amount, then you will be
wasting space.


> For example, assume an implementation where int is 4 bytes and has an
> alignment of 4 bytes, but for some reason, given:
> struct {
> char pad;
> int i;
> };
> the compiler inserts 7 bytes of padding between "pad" and "i", rather
> than the required 3. Then ALIGN_OF(int) will yield 8, which is an
> alignment that will work for int, but not *the* alignment of int.

Exactly. I was only interested if the value returned by `ALIGN_OF()' will
always work for the given type. I agree with James Kuyper in that an
`alignof' keyword should be provided by the language standard. Actually,
IMHO at the very least C should at have an `ALIGN_MAX' macro defined in
`limits.h'...


> (Note that the alignment of a type cannot exceed the size of a type,
> because arrays cannot have gaps.)

One could align the array on a boundary provided by `ALIGN_OF(type)' and use
`sizeof(type)' to ensure there are no gaps. That should work.


> I know of no real-world implementation that behaves this way,
> but the standard allows it (basically because the standard doesn't
> bother to impose tight restrictions on struct member layout).
--

0 new messages