Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Aliasing and subtyping structs

336 views
Skip to first unread message

Andreas

unread,
Nov 1, 2016, 8:16:02 AM11/1/16
to
Hi all,

while searching the web I've found a lot of assertions that a common
idiom of subtyping structs in C is well-defined by virtue of the first
struct member always having the same address as the surrounding struct.
What I'm concerned about is aliasing in this context.

struct sub_type
{
uint8_t super_tag;
...
};

struct super_type
{
struct sub_type super;
...
};

struct hyper_type
{
struct super_type super;
...
};

void foo (struct sub_type* base)
{
struct super_type* up = 0;
struct hyper_type* upper = 0;
if (base->super_tag == UP_ID)
{
up = (struct super_type*) base; // *NOT aliasing!?*
...
}
else if (base->super_tag == UP_ID)
{
upper = (struct hyper_type*) base; // *NOT aliasing!?*
...
}
...
}

void bar (struct super_type* up, struct hyper_type* upper)
{
struct sub_type* base1 = (struct sub_type*) up; // *ALIASING?*
struct sub_type* base2 = (struct sub_type*) upper; // *ALIASING?*
struct sub_type* base1 = up.super;
struct sub_type* base2 = upper.super.super;
...
}

Sorry for the lengthy code ...

Could you please tell me whether my assumptions in the comments above
are right or not?


TIA,

Andreas

--
What do you mean? An African or a European swallow?

Andreas

unread,
Nov 1, 2016, 8:34:56 AM11/1/16
to

Following up to myself to add missing address operators to sample code:

struct sub_type
{
uint8_t super_tag;
...
};

struct super_type
{
struct sub_type super;
...
};

struct hyper_type
{
struct super_type super;
...
};

void foo (struct sub_type* base)
{
struct super_type* up = 0;
struct hyper_type* upper = 0;
if (base->super_tag == UP_ID)
{
up = (struct super_type*) base; // *NOT aliasing!?*
...
}
else if (base->super_tag == UP_ID)
{
upper = (struct hyper_type*) base; // *NOT aliasing!?*
...
}
...
}

void bar (struct super_type* up, struct hyper_type* upper)
{
struct sub_type* base1 = (struct sub_type*) up; // *ALIASING?*
struct sub_type* base2 = (struct sub_type*) upper; // *ALIASING?*
struct sub_type* base3 = &up.super;
struct sub_type* base4 = &upper.super.super;
...
}

Sorry,

Ben Bacarisse

unread,
Nov 1, 2016, 11:27:23 AM11/1/16
to
Andreas <nos...@invalid.invalid> writes:

> Following up to myself to add missing address operators to sample code:
>
> struct sub_type
> {
> uint8_t super_tag;
> ...
> };
>
> struct super_type
> {
> struct sub_type super;
> ...
> };
>
> struct hyper_type
> {
> struct super_type super;
> ...
> };

This looks odd to me. I'd expect more and more specific classes to be
progressively derived from a base class. Those derived classes would be
sub- (and sub-sub-classes). But this is a sketch so I might have the
wrong end of the stick.

> void foo (struct sub_type* base)
> {
> struct super_type* up = 0;
> struct hyper_type* upper = 0;
> if (base->super_tag == UP_ID)
> {
> up = (struct super_type*) base; // *NOT aliasing!?*
> ...
> }
> else if (base->super_tag == UP_ID)
> {
> upper = (struct hyper_type*) base; // *NOT aliasing!?*
> ...
> }
> ...
> }
>
> void bar (struct super_type* up, struct hyper_type* upper)
> {
> struct sub_type* base1 = (struct sub_type*) up; // *ALIASING?*
> struct sub_type* base2 = (struct sub_type*) upper; // *ALIASING?*
> struct sub_type* base3 = &up.super;
> struct sub_type* base4 = &upper.super.super;

(you need -> rather than the first .)

> ...
> }

You ask if your comments are correct. The trouble is that aliasing is a
very general word. In this context two pointers to the same object are
always aliases -- that's all it means -- so all of your pointer
assignments set up aliases.

I think what you are asking is whether the aliases conflict with places
where the compiler may assume that no aliasing is taking place. That
will crop up in places where you don't have any comments. For example
since 'up' and 'upper' are pointers to distinct types, the compiler may
assume that they do not alias.

Unfortunately I can't follow what you are doing in enough detail to
comment on whether it's OK or not.

--
Ben.

James Kuyper

unread,
Nov 1, 2016, 11:42:03 AM11/1/16
to
Yes, it is guaranteed that "A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.".
(6.7.2.1p15). Technically, that means that you'd have to use (struct
sub_type*)(struct super_type*) co convert upper to base2. The standard
does NOT guarantee that this pair of conversions has the same effect as
the direct conversion to struct sub_type*. However, in practice, that's
unlikely to be problematic.

*base1 is now an alias for up.super, and *base2 is now an alias for
upper.super.super. This is not a problem in itself, but it is something
to be aware of.

> struct sub_type* base1 = up.super;
> struct sub_type* base2 = upper.super.super;

That code contains at least two errors on each line, possibly more, so
I'm not sure what was intended. It looks like you are defining two new
variables, base1 and base2, in the same scope where you've already
defined variables with those names. I see two possibilities: you
intended this as an alternative to the first pair of lines - in that
case, you should have used #if #else #endif to separate the two
alternatives. The other possibility is that you forgot to give these
variables new names, such as base3 and base4.

The next problem is that the initializers for those definitions have the
type "struct sub_type", while those variables have the type "struct
sub_type*". I suspect that there's a missing "&" after the "=" in each
of those definitions.

Andreas

unread,
Nov 2, 2016, 4:48:03 AM11/2/16
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

>
> This looks odd to me. I'd expect more and more specific classes to be
> progressively derived from a base class. Those derived classes would be
> sub- (and sub-sub-classes). But this is a sketch so I might have the
> wrong end of the stick.

It has been some time since I have used usenet, so my question was
completely misleading/wrong. I'll try again below ...

> Unfortunately I can't follow what you are doing in enough detail to
> comment on whether it's OK or not.

I reformulate in OOP terms ...

struct base_type
{
uint8_t type_tag;
...
};

struct derived_type
{
struct base_type base;
// more specific data here
...
};

struct more_derived_type
{
struct derived_type base;
// even more specific data here
...
};

I find many references to this idiom, often asking whether the language
standard supports it. Unfortunately I can only find answers referring
to the fact that the 'base' members above are guaranteed by C (and
consequentially for PODs by C++) to have the same address as each value
of their containing struct types.

That's fine but I believe that there is another aspect to be considered
when using this idiom, namely aliasing. Regarding this aspect I find
the following reference in both C99 (I refer to draft N 1256) and C11
(as of draft N 1570) in paragraph 6.5.7:

<quote N 1256>

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

...

- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union)

...

</quote N 1256>

When using the idiom mentioned above it is often necessary to cast a
pointer from a derived type to a base type (upcast) or vice (downcast).

As a use case consider a library defining the base type and allowing for
the user to define derived types using the idiom above. When there are
callbacks from the lib to user code, usually there are downcast in user
code necessary to restore the more detailed information from the derived
types (that the lib code doesn't know), i.e. something along the lines:

void downcaster (struct base_type* sth)
{
struct derived_type* detailed = 0;
struct more_derived_type* more_detailed = 0;

if (sth->type_tag == DERIVED_ID)
{
detailed = (struct derived_type*) sth; // OK
}
else if (sth->type_tag == MORE_DERIVED_ID)
{
more_detailed = (struct more_derived_type*) sth; // OK
}
...
}

My understanding of section 6.5.7 is that neither 'detailed' nor
'more_detailed' alias the object pointed to by 'sth'.

On the other hand when user code calls library APIs it will be necessary
to upcast pointers, as the library API will only know about base types.

void upcaster (struct derived_type* d, struct more_derived_type* md)
{
struct base_type* b1 = (struct base_type*) d; // NOK
struct base_type* b2 = (struct base_type*) md; // NOK

struct base_type* b3 = & (d->base);
struct base_type* b4 = & (md->base->base);

call_lib_api (b[1234]);
...
}

I understand paragraph 6.5.7 so that 'b1' is aliasing 'd' and 'b2' is
aliasing 'md', while 'b3' and'b4' are obviously ok.

My question is: Is my understanding, as described above, correct or am I
lost in standardese here?


TIA,

Andreas

unread,
Nov 2, 2016, 4:56:15 AM11/2/16
to
James,

thanks for your review, you are completely right: my original posting was
written both misleading and syntactically wrong, i.e. very careless;
please consider my followup to Ben's response. I hope I have got it
right this time ...

Ben Bacarisse

unread,
Nov 2, 2016, 11:48:50 AM11/2/16
to
Andreas <nos...@invalid.invalid> writes:
<snip>
> struct base_type
> {
> uint8_t type_tag;
> ...
> };
>
> struct derived_type
> {
> struct base_type base;
> // more specific data here
> ...
> };
>
> struct more_derived_type
> {
> struct derived_type base;
> // even more specific data here
> ...
> };
>
> I find many references to this idiom, often asking whether the language
> standard supports it. Unfortunately I can only find answers referring
> to the fact that the 'base' members above are guaranteed by C (and
> consequentially for PODs by C++) to have the same address as each value
> of their containing struct types.

Right, but address is a little vague. A C pointer is an address with a
type and, whist the offset of the first (non-bitfield) member of a
struct is guaranteed to be zero, a pointer to the struct and a pointer
to the first member will have different types. This matters because the
effective type rules you are concerned with are all about types. The
address is not really the issue.
No, they are all aliases.

> On the other hand when user code calls library APIs it will be necessary
> to upcast pointers, as the library API will only know about base types.
>
> void upcaster (struct derived_type* d, struct more_derived_type* md)
> {
> struct base_type* b1 = (struct base_type*) d; // NOK
> struct base_type* b2 = (struct base_type*) md; // NOK
>
> struct base_type* b3 = & (d->base);
> struct base_type* b4 = & (md->base->base);
>
> call_lib_api (b[1234]);

What's b?

> ...
> }
>
> I understand paragraph 6.5.7 so that 'b1' is aliasing 'd' and 'b2' is
> aliasing 'md', while 'b3' and'b4' are obviously ok.

They are all aliases and they are all in some vague sense OK.

The key issue is not what aliases what, but what will go wrong when the
compiler assumes that two references do not alias a signle object when
in fact they do. Your sketch has no cases where that is a problem (as
far as I can see) and I think such code is usually OK.

Do you have an example of something that someone has said is a problem
that is worrying you?

--
Ben.

wil...@wilbur.25thandclement.com

unread,
Nov 3, 2016, 9:55:29 PM11/3/16
to
Andreas <nos...@invalid.invalid> wrote:
<snip>
> I find many references to this idiom, often asking whether the language
> standard supports it. Unfortunately I can only find answers referring
> to the fact that the 'base' members above are guaranteed by C (and
> consequentially for PODs by C++) to have the same address as each value
> of their containing struct types.
>
> That's fine but I believe that there is another aspect to be considered
> when using this idiom, namely aliasing. Regarding this aspect I find
> the following reference in both C99 (I refer to draft N 1256) and C11
> (as of draft N 1570) in paragraph 6.5.7:
>
> <quote N 1256>
>
> An object shall have its stored value accessed only by an lvalue
> expression that has one of the following types:
>
> ...
>
> - an aggregate or union type that includes one of the aforementioned
> types among its members (including, recursively, a member of a
> subaggregate or contained union)
>
> ...
>
> </quote N 1256>
>
> When using the idiom mentioned above it is often necessary to cast a
> pointer from a derived type to a base type (upcast) or vice (downcast).

I don't think I can provide any more clarity than what has been stated
elsethread, but perhaps this recent GCC regression and ensuing discussion
might help:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71120

The GCC optimization bug shows how compilers are applying the aliasing
rules. Pay attention to not only what GCC was doing wrong, but to what it
was doing that wasn't wrong. Be sure to follow the link to the glibc issue
page, too.

I think the thing to keep in mind is that the rules are designed so that
compilers can track side-effects and, under heavy optimization, avoid
eliding operations that your code depends on. The most pertinent part of the
6.5p7 quote isn't "of the following types", but "shall have its stored value
accessed only by an [compatible] lvalue".

In the absence of the restrict qualifier, and without any other information,
compilers must assume that pointers might alias the same object. So if you
pass two pointers, a and b, to a function and cast them willy-nilly, as long
as the effective type of any lvalue expression, inside or outside of that
function, is the same as the preceding store, you're good. The issue is when
the compiler can prove (or thinks it can prove) that the pointers don't
alias (such as by proving that they couldn't alias in a well-defined
program) and then decides to elide some side-effect under license of the
as-if rule.

I'll abstain from providing a simpler example here because I'll probably get
some detail wrong and just add to the confusion.

supe...@casperkitty.com

unread,
Nov 4, 2016, 12:56:59 PM11/4/16
to
On Thursday, November 3, 2016 at 8:55:29 PM UTC-5, > I think the thing to keep in mind is that the rules are designed so that
> compilers can track side-effects and, under heavy optimization, avoid
> eliding operations that your code depends on. The most pertinent part of the
> 6.5p7 quote isn't "of the following types", but "shall have its stored value
> accessed only by an [compatible] lvalue".

The present rules are a mess because the authors of the original standard
don't seem to have put much thought into the original rules beyond the idea
that a compilers given something like:

float f;

float test(int *p)
{
f = 23.0f;
*p = 5;
return f;
}

shouldn't be required to make pessimistic aliasing assumptions about the
possibility of "p" aliasing "f". If the authors of the original Standard
had given the matter more thought than that, they haven't produced any
evidence of it that I've seen.

C was in wide use before the Standard was written, and there was a
substantial corpus of code that uses forms of aliasing not provided for
in C89. The authors of the C89 Standard, however said they wanted to
avoid making any breaking changes. The only way I can see to reconcile
those viewpoints was that they expected compiler writers to behave as
though the Standard included a proviso that implementation authors should
also support other forms of aliasing which would be useful for the
implementations' target platform and intended application field. To have
actually said such a thing could have been considered condescending, but
I can't imagine they expected that people wanting to write useful
implementations would do otherwise.

Note also, btw, that for the kinds of optimization that most compilers
would want to do in 1989 or 1999, it would be easy to achieve compatibility
with most code that relied upon aliasing, at relatively little cost in
efficiency, by adding one simple rule: flush all register-cached
variables of a given type whenever a pointer of that type is cast to any
other. Such a rule would not be usefully applied to types which have
padding bits and trap representations such that no value written as an "int"
would be readable as "float", however, and the Standard does not tend to
mandate rules which make sense for some platforms but not others.

Later standards claim to merely be "clarifying" previous rules, but under
a presumption that no code would ever need to do anything which had been
provided as a result of near-unanimous judgment of compiler writers, rather
than mandated by the previous Standard. If rules were being designed from
the ground up to allow vectorization, but with the intention that they
be sufficient to accommodate programmers' needs, they'd be very different.
There are many kinds of optimizations they'd allow which are presently
forbidden, but they'd also recognize more usable constructs than are
presently recognized (e.g. by providing a form of pointer declaration for
a pointer which can be used to interpret as an X, storage which might also
be holding a Y or Z).

Andreas

unread,
Nov 7, 2016, 6:25:40 PM11/7/16
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

> Do you have an example of something that someone has said is a problem
> that is worrying you?

I did some experiments (inspired by
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html),
sorry for the somewhat lengthy code following:

<test_alias.c>
#include <stddef.h> /* size_t */
#include <stdint.h> /* fixed-width integers */
#include <string.h> /* memcpy */

struct foo
{
uint16_t a;
};

struct bar
{
uint16_t b;
};

struct base
{
uint32_t b;
};

struct derived
{
struct base base;
uint32_t d;
};

struct more_derived
{
struct derived base;
uint32_t m;
};


uint32_t foobar1 (uint32_t v)
{
uint16_t* const sp = (uint16_t*) &v;
uint16_t hi = sp[0];
uint16_t lo = sp[1];

sp[1] = hi;
sp[0] = lo;

return v;
}

uint32_t foobar2 (uint32_t v)
{
uint8_t tmp[2] = {};
uint8_t* const b = (uint8_t*) &v;

(void) memcpy (tmp, b + 2, 2);
(void) memcpy (b + 2, b, 2);
(void) memcpy (b, tmp, 2);

return v;
}

void foobar3 (uint32_t* four, uint16_t* two, size_t len)
{
size_t x;

for (x = 0; x < len; ++x)
{
four[x] = ((uint32_t) *two) + x;
}
}

void foobar4 (uint32_t* four, int32_t* ofour, size_t len)
{
size_t x;

for (x = 0; x < len; ++x)
{
four[x] = ((uint32_t) *ofour) + x;
}
}

void foobar5 (struct foo* f, struct bar* b, size_t len)
{
size_t x;

for (x = 0; x < len; ++x)
{
f[x].a = b->b + x;
}
}

void foobar6 (struct derived* d, struct base* b, size_t len)
{
size_t x;

for (x = 0; x < len; ++x)
{
*((uint32_t*) &d[x]) = b->b + x;
}
}

void foobar7 (struct more_derived* m, struct base* b, size_t len)
{
size_t x;

for (x = 0; x < len; ++x)
{
*((uint32_t*) &m[x]) = b->b + x;
}
}

void foobar8 (struct base* b, uint32_t* v, size_t len)
{
size_t x;

for (x = 0; x < len; ++x)
{
b[x].b = *v + x;
}
}

void foobar9 (struct base* b, uint16_t* v, size_t len)
{
size_t x;

for (x = 0; x < len; ++x)
{
b[x].b = *v + x;
}
}
</test_alias.c>

If you compile that code with (GCC 4.9 and CLANG 3.5):

gcc -g -fverbose-asm -Wall -m32 -fstrict-aliasing -O3 -S -o test_alias_32_O3.s test_alias.c
clang -g -fverbose-asm -Wall -m32 -fstrict-aliasing -O3 -S -o test_alias_clang_32_O3.s test_alias.c

than this results in the following:

- foobar1: v and b do not alias, although they probably could according
to c99 6.5.7 (I guess strictly speaking it could just return v without
doing anything else)

- foobar2: v and b do not alias and must not acc. to 6.5.7; it swaps
words portably

Interestingly enough both compilers emit identical code for foobar1 and
foobar2 (NB: GCC emits good code, two fetches and three stores, but the
code produced by CLANG is impressive; I guess one cannot write faster
assembler code complying to x86 standard calling convention even
manually)

- foobar3: four and two alias; if they somewhere refer to the same
stored value, results may not be, what you expect (nasal daemons etc.)

- foobar4: four and ofour do not alias as expected acc. to 6.5.7

- foobar5: f and b alias as expected acc. to 6.5.7

- foobar6: d and b do not alias as expected acc. to 6.5.7

- foobar7: m and b do not alias as expected acc. to 6.5.7

- foobar8: b and v do not alias as expected acc. to 6.5.7; note that
struct base contains a member of type uint32_t

- foobar9: b and v alias as expected acc. to 6.5.7; no uint16_t in
struct base

So my initial doubt about casting from base to derived and vice versa
not being reflexive in the standard was based on fallacy. I guess it
does not matter how two pointers get their values, but the aliasing rule
in the standard just says that the compiler needs to assume that a
pointer to base and a pointer to derived might refer the same stored
value.


Thanks for your support,

supe...@casperkitty.com

unread,
Nov 8, 2016, 12:42:47 AM11/8/16
to
On Monday, November 7, 2016 at 5:25:40 PM UTC-6, Andreas wrote:
> - foobar1: v and b do not alias, although they probably could according
> to c99 6.5.7 (I guess strictly speaking it could just return v without
> doing anything else)

If a pointer p of type T is cast to another type the resulting pointer is
accessed, gcc will often recognize the access as modifying *p, but will
not recognize aliasing with any objects of type T that are accessed
elsewhere via other means. As of 6.2, it will also in some cases omit
code generation for assignments in cases where the destination holds the
same bit pattern as the source, even if the assignment should force the
compiler to recognize an object as a different type (e.g. given:

#include <stdint.h>

union lll { long l; long long ll; };

long testlll(long *lp, long long *llp, union lll *up)
{
long long t;
if (*lp != 1234)
return 1234;
t=up->l;
up->ll = t;
*llp = 5;
t=up->ll;
up->l = t;
return *lp;
}
long testlll2(void)
{
union lll lv;
lv.l=1234;
return testlll(&lv.l, &lv.ll, &lv);
}

gcc 6.2 will generate code for testlll2 that unconditionally returns 1234
even though the compiler should have no problem recognizing aliasing
in testlll between *lp and *up->l (since they're both the same type), nor
between *llp and *up->ll. In this example, there is no type punning of
any form except in the compiler's imagination, but in both cases where
code writes to *up it reinterprets the value that was there as the new
type and observes that it matches the new value, thus omitting the write.

If you want to see whether gcc is properly recognizing aliasing, use code
more like the above. Otherwise you may hit upon situations where it
happens to correctly recognize aliasing and reach false conclusions about
the cases where it actually will.

[Incidentally, I must begrudging praise gcc for at least being "honest"
in how it processes in-lining with examples like the above; if a function
won't recognize aliasing in the general case, gcc will generally not have
it recognize aliasing even if it is in-lined with arguments that are
conspicuously aliased.]

> - foobar3: four and two alias; if they somewhere refer to the same
> stored value, results may not be, what you expect (nasal daemons etc.)

While it might be sensible for a compiler to assume that *two won't
alias *four, I see no justification for that in the Standard. Of course,
if the intention is that a compiler not be required to recognize that a
pointer may alias other things, one may achieve that using "restrict".

> So my initial doubt about casting from base to derived and vice versa
> not being reflexive in the standard was based on fallacy. I guess it
> does not matter how two pointers get their values, but the aliasing rule
> in the standard just says that the compiler needs to assume that a
> pointer to base and a pointer to derived might refer the same stored
> value.

Having derived types contain a structure of the base type will satisfy the
aliasing rules, but if the length of the common members is not a multiple
of their alignment, it might not satisfy layout requirements--especially if
code needs to communicate with a library that doesn't waste space in its
structures.

Andreas

unread,
Nov 8, 2016, 3:54:08 PM11/8/16
to
William,

<wil...@wilbur.25thandClement.com> writes:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71120

thanks for a very interesting read. I've never had to use the POSIX
socket API directly in serious code, but the way it approaches
genericity made and makes me feel uncomfortable.

Anton Shepelev

unread,
Nov 21, 2016, 8:59:35 AM11/21/16
to
Andreas:

>Hi all,
>
>while searching the web I've found a lot of asser-
>tions that a common idiom of subtyping structs in C
>is well-defined by virtue of the first struct mem-
>ber always having the same address as the surround-
>ing struct. What I'm concerned about is aliasing
>in this context.
>
>struct sub_type
>{
> uint8_t super_tag;
> ...
>};
>
>struct super_type
>{
> struct sub_type super;
> ...
>};
>
>struct hyper_type
>{
> struct super_type super;
> ...
>};
>[...]

Am I right in thinking that "Object-oriented pro-
gramming in C":

http://www.cs.rit.edu/~ats/books/ooc.pdf

is based entirely on the violation of strict alias-
ing?

--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

supe...@casperkitty.com

unread,
Nov 21, 2016, 10:32:57 AM11/21/16
to
On Monday, November 21, 2016 at 7:59:35 AM UTC-6, Anton Shepelev wrote:
> Am I right in thinking that "Object-oriented pro-
> gramming in C":
>
> http://www.cs.rit.edu/~ats/books/ooc.pdf
>
> is based entirely on the violation of strict alias-
> ing?

That book goes to some extra lengths to comply with the aliasing rules,
which can sometimes make structure layouts less efficient than it would
otherwise be. If one wants to have some structures whose header's size
isn't a multiple of the alignment, then in Ritchie's language one could
write the code as e.g. :

struct header { void *info; uint16_t flags };
struct thing1 { void *info; uint16_t flags;
uint16_t coords[3]; };
struct thing2 { void *info, uint16_t flags;
uint8_t label[7]; uint32_t dat[]};

and cast pointers to either thing1 or thing2 into pointers of type header.
The approach exemplified by the book, however, would make thing1 and thing2
each contain a field of type "struct header", thus increasing the size of
those structures thing1 and thing2 by 8 bytes each [a 50% increase in the
case of thing1].

James R. Kuyper

unread,
Nov 21, 2016, 11:15:56 AM11/21/16
to
On 11/21/2016 08:59 AM, Anton Shepelev wrote:
> Andreas:
>
>> Hi all,
>>
>> while searching the web I've found a lot of asser-
>> tions that a common idiom of subtyping structs in C
>> is well-defined by virtue of the first struct mem-
>> ber always having the same address as the surround-
>> ing struct. What I'm concerned about is aliasing
>> in this context.
>>
>> struct sub_type
>> {
>> uint8_t super_tag;
>> ...
>> };
>>
>> struct super_type
>> {
>> struct sub_type super;
>> ...
>> };
>>
>> struct hyper_type
>> {
>> struct super_type super;
>> ...
>> };
>> [...]
>
> Am I right in thinking that "Object-oriented pro-
> gramming in C":
>
> http://www.cs.rit.edu/~ats/books/ooc.pdf
>
> is based entirely on the violation of strict alias-
> ing?

I don't have time to read that entire document right now. Could you cite
a particular page that involves such violations? I know that the general
concept of "object-oriented programming in C" (as opposed to the book of
that title) does not rely violation of C's aliasing rules.

supe...@casperkitty.com

unread,
Nov 21, 2016, 11:35:51 AM11/21/16
to
On Monday, November 21, 2016 at 10:15:56 AM UTC-6, James R. Kuyper wrote:
> I don't have time to read that entire document right now. Could you cite
> a particular page that involves such violations? I know that the general
> concept of "object-oriented programming in C" (as opposed to the book of
> that title) does not rely violation of C's aliasing rules.

Object-oriented code targeting compilers which impose looser rules than what
the Standard would allow can sometimes be written in more efficient or
readable fashion than code which abides gcc's interpretation of the Standard.
Among other things, if one has an array of pointers which are all going to
target things of derived type, but will needs to be able to pass it to a
function which expects an array of base-type pointers, the Standard would
require that the function either accept a pointer of the header type or
void*, and that the type of the array match that choice. All code which
needs to use the array as the derived type would then have to explicitly cast
its contents at every use.

Despite C's reputation for flexible pointer semantics, even Java has far more
flexible semantics here than what C supports, since Java allows a function
which expects a reference to an array of references to Animal to be given
a reference to an array of references to Cat, provided that it doesn't try
to store a reference to something that isn't a Cat (or subtype thereof) into
the array.

Anton Shepelev

unread,
Nov 22, 2016, 3:42:39 AM11/22/16
to
supercat:

>Despite C's reputation for flexible pointer seman-
>tics, even Java has far more flexible semantics
>here than what C supports, since Java allows a
>function which expects a reference to an array of
>references to Animal to be given a reference to an
>array of references to Cat, provided that it
>doesn't try to store a reference to something that
>isn't a Cat (or subtype thereof) into the array.

Covariance of collection elements, or of array el-
emetns in your example, is a relatevely new feature
in computer languages. Did any classic procedural
languages have it?

supe...@casperkitty.com

unread,
Nov 22, 2016, 9:50:57 AM11/22/16
to
On Tuesday, November 22, 2016 at 2:42:39 AM UTC-6, Anton Shepelev wrote:
> Covariance of collection elements, or of array el-
> emetns in your example, is a relatevely new feature
> in computer languages. Did any classic procedural
> languages have it?

C74 could certainly handle such constructs without any difficulty whatsoever.
I'm not sure when or whether it was deliberately removed from the language,
but it's very clear that C74 supported such things, and today's compilers do
not.

Ben Bacarisse

unread,
Nov 22, 2016, 11:07:39 AM11/22/16
to
supe...@casperkitty.com writes:
<snip>
> C74

What's that? Do you have a reference for it?

--
Ben.

supe...@casperkitty.com

unread,
Nov 22, 2016, 3:03:27 PM11/22/16
to
On Tuesday, November 22, 2016 at 10:07:39 AM UTC-6, Ben Bacarisse wrote:
> supercat writes:
> <snip>
> > C74
>
> What's that? Do you have a reference for it?

Sorry--I thought you'd seen other posts where I've used the term previously
as an informal name for the language described by the 1974 C Reference
Manual.

See https://www.bell-labs.com/usr/dmr/www/cman.pdf paying particular
attention to 7.1.7 and 7.1.8 which make it abundantly clear that given:

struct s1 {double foo; int bar; int baz;};
struct s2 {double foo; int bar; float woozle; };

the types "struct s1*" and "struct s2*" may be used interchangeably by code
which only needed to access "foo" and "bar". The lack of any type checking
at all wasn't a good thing, but it was clearly possible to write functions
that could operate on objects of either type, pointers to either type, or
arrays of pointers to either type, without having to care about the exact
structure type in question.

The fact that later versions of the language would require casts when
converting between the two pointer types does not imply that there would
need to have been any problem allowing code which uses such casts to
achieve the same semantics as had been possible under C74. What is not
clear is whether the removal of such semantics from the language was
deliberate, or if the Committee expected that the Common Initial Sequence
rule was sufficient to express the intent that such semantics remain
available, and that compiler writers would recognize the usefulness of
related aspects whether or not the Standard identified them all.

Anton Shepelev

unread,
Nov 22, 2016, 3:23:07 PM11/22/16
to
James R. Kuyper to Anton Shepelev:

> > Am I right in thinking that "Object-oriented
> > programming in C":
> >
> > http://www.cs.rit.edu/~ats/books/ooc.pdf
> >
> > is based entirely on the violation of strict
> > aliasing?
>
> I don't have time to read that entire document
> right now. Could you cite a particular page that
> involves such violations? I know that the general
> concept of "object-oriented programming in C" (as
> opposed to the book of that title) does not rely
> violation of C's aliasing rules.

The author makes extensive use of casting between
non-aliasing pointers. He writes:

1. Passing a circle as a point means converting
from a struct Circle * to a struct Point *.
We will refer to this as an up-cast from a
subclass to a superclass -- in ANSI-C it can
only be accomplished with an explicit conver-
sion operator or through intermediate void *
values.

2. It is usually unsound, however, to pass a
point to a function intended for circles such
as Circle_draw(): converting from a struct
Point * to a struct Circle * is only permissi-
ble if the point originally was a circle. We
will refer to this as a down-cast from a su-
perclass to a subclass -- this requires ex-
plicit conversions or void * values, too, and
it can only be done to pointers to objects
that were in the subclass to begin with.

3. #define x(p) (((const struct Point *)(p)) -> x)
#define y(p) (((const struct Point *)(p)) -> y)

These macros can be applied to a pointer to
any object that starts with a struct Point,
i.e., to objects from any subclass of our
points. The technique is to up-cast the
pointer into our superclass and reference the
interesting component there. const in the
cast blocks assignments to the result. If
const were omitted

#define x(p) (((struct Point *)(p)) -> x)

a macro call x(p) produces an l-value which
can be the target of an assignment. A better
modification function would be the macro defi-
nition

#define set_x(p,v) (((struct Point *)(p)) -> x = (v))

which produces an assignment.

James R. Kuyper

unread,
Nov 22, 2016, 4:02:56 PM11/22/16
to
On 11/22/2016 03:23 PM, Anton Shepelev wrote:
> James R. Kuyper to Anton Shepelev:
>
>>> Am I right in thinking that "Object-oriented
>>> programming in C":
>>>
>>> http://www.cs.rit.edu/~ats/books/ooc.pdf
>>>
>>> is based entirely on the violation of strict
>>> aliasing?
>>
>> I don't have time to read that entire document
>> right now. Could you cite a particular page that
>> involves such violations? I know that the general
>> concept of "object-oriented programming in C" (as
>> opposed to the book of that title) does not rely
>> violation of C's aliasing rules.
>
> The author makes extensive use of casting between
> non-aliasing pointers. He writes:
>
> 1. Passing a circle as a point means converting
> from a struct Circle * to a struct Point *.
> We will refer to this as an up-cast from a
> subclass to a superclass -- in ANSI-C it can
> only be accomplished with an explicit conver-
> sion operator or through intermediate void *
> values.

On page 34, he writes:
struct Circle { const struct Point _; int rad; };

Given
struct Circle c, *pc=&c;
struct Point p, *pp=&p;

The explicit conversion he referred to is (const struct Point *)pc, and
it is allowed by 6.7.2.1p15. However, he's wrong about void* being the
only other way to do this. &pc->_ is simpler and equivalent, and will
continue to work even if some other field becomes the first member of
struct Circle.

> 2. It is usually unsound, however, to pass a
> point to a function intended for circles such
> as Circle_draw(): converting from a struct
> Point * to a struct Circle * is only permissi-
> ble if the point originally was a circle. We

In other words, given

pp = &c._;

then (struct Circle*)pp is also allowed by 6.7.2.1p15. However, given:

pp - &p;

then (struct Circle*)pp would not be allowed.

I don't see any problem with this approach. What are your concerns about it?

supe...@casperkitty.com

unread,
Nov 22, 2016, 5:13:13 PM11/22/16
to
On Tuesday, November 22, 2016 at 3:02:56 PM UTC-6, James R. Kuyper wrote:
> The explicit conversion he referred to is (const struct Point *)pc, and
> it is allowed by 6.7.2.1p15. However, he's wrong about void* being the
> only other way to do this. &pc->_ is simpler and equivalent, and will
> continue to work even if some other field becomes the first member of
> struct Circle.

...
> In other words, given
>
> pp = &c._;
>
> then (struct Circle*)pp is also allowed by 6.7.2.1p15. However, given:
>
> pp - &p;
>
> then (struct Circle*)pp would not be allowed.
>
> I don't see any problem with this approach. What are your concerns about it?

Do you mean pp = &p for the second one?

The ability to take a convert a pointer to the contained "parent" object
into a pointer to the derived object is reliant upon the parent object being
the first field of the object in question; if the parent gets moved to some
other spot in the structure, code may still compile but it won't work.

Many compilers require that code with a derived-type pointer that wants to
access fields of the parent must explicitly perform the object through the
parent (e.g. if "foo" is derived from "moo", and starts with a member
"parent" of type "moo", then code wanting to access an "inherited" field x
of object *y must write foo->parent.x rather than just foo->x, even in cases
where the code in question shouldn't need to care about which members are in
the original or derived class.

If the length of the common portion isn't a multiple of its alignment
requirement it will be necessary to add padding which can't be used by
derived types. This wastes space and may also make it impossible for
the structures to have layouts that match other APIs.

Ben Bacarisse

unread,
Nov 22, 2016, 6:36:36 PM11/22/16
to
supe...@casperkitty.com writes:

> On Tuesday, November 22, 2016 at 10:07:39 AM UTC-6, Ben Bacarisse wrote:
>> supercat writes:
>> <snip>
>> > C74
>>
>> What's that? Do you have a reference for it?
>
> Sorry--I thought you'd seen other posts where I've used the term previously
> as an informal name for the language described by the 1974 C Reference
> Manual.

Ah, right, pre-K&R C. Not a language I'd like to go back to. From what
you go on to say I don't think you do either. I sounds like you want
some tailored mash-up that fits your view of what C should be.

<snip>
--
Ben.

Noob

unread,
Nov 22, 2016, 7:10:20 PM11/22/16
to
On 22/11/2016 23:13, supe...@casperkitty.com wrote:

> The ability to take a convert a pointer to the contained "parent" object
> into a pointer to the derived object is reliant upon the parent object being
> the first field of the object in question; if the parent gets moved to some
> other spot in the structure, code may still compile but it won't work.

Behold Linux's container_of() macro.
(110% dependent on GCC ID behavior)

http://lxr.free-electrons.com/source/include/linux/kernel.h#L830

#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})

supe...@casperkitty.com

unread,
Nov 22, 2016, 7:24:50 PM11/22/16
to
On Tuesday, November 22, 2016 at 5:36:36 PM UTC-6, Ben Bacarisse wrote:
> supercat writes:
> Ah, right, pre-K&R C. Not a language I'd like to go back to. From what
> you go on to say I don't think you do either. I sounds like you want
> some tailored mash-up that fits your view of what C should be.

I am of the opinion that later versions of languages should strive to be
supersets of earlier versions whenever practical. If earlier features
become too expensive to support by default the solution is to define a
means by which programs can indicate when/whether they need support for
the features in question; in some cases that may be accomplished by adding
a "language-version" directive and then specifying that code which states
that it is written for language version X or later must not use feature Y
unless it explicitly specifies that it does so.

Voila--old code continues to work as it always has, new code that doesn't
need to use the features in question can be compiled with optimizations
that would break older code, and code that needs the features can be marked
to block optimizations in the few places they'd be problematic while
allowing them everywhere else.

In what way would that not be a preferable state of affairs to what exists
now? For many possible behavioral guarantees, there will be some programs
where they have a high cost and no benefit, and others where they have ZERO
cost but significant benefit. Rather than trying to forge some horrible
compromise, it would be much more helpful to simply let code *specify* which
guarantees it needs.

To allow for the possibility that some platforms might not be capable of
supporting some guarantees, implementations would have the option to
refuse compilation of code requiring them. A compiler that refuses to
compile any program that requires any guarantee beyond the minimums could
be conforming, but would likely be regarded as being of poor quality. What
would be important would be that use of the features would never result in
UB. Either they would work as expected, or compilation would fail outright.
Both situation are vastly preferable to generating machine code which does
not meet requirements.

Richard Damon

unread,
Nov 22, 2016, 10:21:52 PM11/22/16
to
If you look at how the Standards Committee has worked on the Standard,
backwards compatibility HAS been a key point. If the Standard mandated a
certain behavior for a code sequence, they try very hard to not change
that, and in cases where the previous standard has left behavior
undefined or implementation defined, they try hard to not mandate
behavior in a way incompatible with a real existing implementation.

What they haven't required is that an implementation when it becomes a
new implementation (maybe implementing a new version of the standard) be
backwards compatible with its previous non-standard required behavior,
in part because they ARE different implementations (and in fact, most
'implementations' are really to the standard a whole lot of different
implementations based on the option flags given when processing, many of
which may actually be non-conforming).

Most of the 'additional guarantees' that you would like to see, actually
to the standard process would be great things for a supplemental
standard (in the vain of POSIX), which provide limitations to the
implementations and promises to the user beyond the base C standard, as
an extension.

The Standards Committee has been somewhat reluctant to add 'optional'
features to the Standard (though it does have a few, normally for
special reasons). I suspect that since what you are looking for fits so
nicely as an extension, I can't see it likely for them to add a bunch of
new optional features to the base Standard.

wil...@wilbur.25thandclement.com

unread,
Nov 22, 2016, 11:15:14 PM11/22/16
to
AFAICT the use of typeof (and thus also the statement expression) seems to
only serve a type safety function. I don't see why the following isn't
semantically equivalent for otherwise correct code.

(type *)((char *)(ptr) - offsetof(type, member))

David Brown

unread,
Nov 23, 2016, 5:31:42 AM11/23/16
to
Just as a note here, the Linux kernel is always compiled with
-fno-strict-alias. So there is no problem if code does not follow the C
rules about which types are compatible regarding aliasing.

Anton Shepelev

unread,
Nov 23, 2016, 6:08:07 AM11/23/16
to
David Brown:

>Just as a note here, the Linux kernel is always
>compiled with -fno-strict-alias. So there is no
>problem if code does not follow the C rules about
>which types are compatible regarding aliasing.

And here Linus's opinion about it:

http://stackoverflow.com/a/2959468/2862241

The second post in full is here:

http://www.mail-archive.com/linux...@vger.kernel.org/msg01647.html

See also the comments to the StackOverflow post
about the way gcc follows the letter rather than the
spirit of the standard.

supe...@casperkitty.com

unread,
Nov 23, 2016, 11:31:33 AM11/23/16
to
On Tuesday, November 22, 2016 at 9:21:52 PM UTC-6, Richard Damon wrote:
> If you look at how the Standards Committee has worked on the Standard,
> backwards compatibility HAS been a key point. If the Standard mandated a
> certain behavior for a code sequence, they try very hard to not change
> that, and in cases where the previous standard has left behavior
> undefined or implementation defined, they try hard to not mandate
> behavior in a way incompatible with a real existing implementation.

If the C89 Standard Committee was trying to maintain backward compatibility
(I believe they were) that would imply that they did not expect that
describing formerly-defined actions as invoking "Undefined Behavior" would
break programs that relied upon them. Implementations would be free, after
all, to keep on handling those actions in useful fashion whether or not the
Standard mandated such treatment.

I think the difficulty is that the Committee's actions have been
reinterpreted as a decision to break code which used those behaviors. The
1974 document (which I think represents the best definition of what the C
language was in 1974) clearly and unambiguously defines semantics which are
sometimes useful, but are not available in the language defined by today's
Standard. At what point, if ever, was there a conscious decision to remove
them, or was there instead a progression where one version of the Standard
didn't think it necessary to mandate them since compilers universally
supported them anyway, and then a later version of the Standard was drawn
from the perspective that the language had never supported the behaviors?

> What they haven't required is that an implementation when it becomes a
> new implementation (maybe implementing a new version of the standard) be
> backwards compatible with its previous non-standard required behavior,
> in part because they ARE different implementations (and in fact, most
> 'implementations' are really to the standard a whole lot of different
> implementations based on the option flags given when processing, many of
> which may actually be non-conforming).

I think they expected that if existing implementations for machines with
some characteristic [e.g. 16-bit big-endian silent-wraparound two's-
complement, using 32-bit linear pointers] naturally supported certain
useful semantics, future implementations for similar platforms would
naturally support those same behaviors. Compilers might offer various
options to loosen up behavioral guarantees, but quality implementations
should try to make clear what guarantees would be waived by which options,
and should ensure that old semantics remain available.

For example, a compiler might treat signed integer types as being implicitly
promotable to larger types (which could be larger than any types available
to the user). Given "int i1,i2; long L1,L2;" a compiler might transform
"L1=i1*i2;" into "L1=(long)i1*i2;" or "L1+1 > L2" into "L1 >= L2", but
allow code to force wrapping semantics with "L1=(int)(i1*i2);" or
"(long)(L1+1) > L2". Note that compilers wouldn't have to reach very far
to apply the former transforms, and that the use of casts to block them is
analogous to what's required in floating-point when using platforms that
don't promise to refrain from using extra precision during computations
[e.g. given "float f=16777216.0f; double d1=f+1.0f, d2=(float)(f+1.0f);"
the value of d1 might be 16777217.0f, but d2 would be 16777216.0f].

> Most of the 'additional guarantees' that you would like to see, actually
> to the standard process would be great things for a supplemental
> standard (in the vain of POSIX), which provide limitations to the
> implementations and promises to the user beyond the base C standard, as
> an extension.

A problem I think is that people writing documentation seldom bother to
specify things which they consider obvious. How many shirt catalogs
explicitly promise that shirts have two sleeves on opposite sides?

> The Standards Committee has been somewhat reluctant to add 'optional'
> features to the Standard (though it does have a few, normally for
> special reasons). I suspect that since what you are looking for fits so
> nicely as an extension, I can't see it likely for them to add a bunch of
> new optional features to the base Standard.

For the "Standard" to really qualify as a standard, it should define classes
of things such that knowing that x is in class X and knowing that y is in
class Y will allow one to predict something useful about the interaction of
objects x and y. By that definition, the so-called C "Standard" is only
applicable to programs with constraint violations (which generally aren't
very useful).

Recognizing optional behaviors would greatly increase the value of the
Standard by making it possible and practical to define two very useful
categories of implementations and programs and a couple of useful guarantees
related to them:

If an implementation which is *at least* Minimally Conforming [satisfying
requirements somewhat looser than what the Standard presently mandates]
is fed a Selectively Conforming program, it would be required to either
run the program without UB or fail in Implementation-Defined fashion.

If an implementation which is Conforming is fed a Strictly-Conforming
program [which satisfies a couple of additional requirements beyond what
would be required under the present standard(*)] it will run the program
successfully.

(*) Among other things, for a program to be strictly conforming, it would
be required to include a directive mandating either static or run-time
stack checking, and if a program which specifies static checking uses
recursion, it would need to include directives that promise a limit
to recursive nesting. This would eliminate stack overflow as a possible
cause of UB for strictly-conforming programs.

While many tasks cannot be done by Strictly Conforming programs, and the
Standard does not provide any guarantee that even Strictly Conforming
programs will run without UB on more than one implementation, adding
optional behaviors to the Standard would make it possible to greatly expand
the range of programs to which the Standard would apply, and offer an
extremely useful guarantee about their behavior.

supe...@casperkitty.com

unread,
Nov 23, 2016, 11:41:23 AM11/23/16
to
On Wednesday, November 23, 2016 at 4:31:42 AM UTC-6, David Brown wrote:
> Just as a note here, the Linux kernel is always compiled with
> -fno-strict-alias. So there is no problem if code does not follow the C
> rules about which types are compatible regarding aliasing.

The authors of gcc have stated that they feel no obligation to uphold
behaviors which are not documented and are not mandated by the Standard,
no matter how consistently the compiler has upheld them in the past. Does
anything in gcc's documentation actually promise that -fno-strict-alias
will cause it to always and forever regard aliasing as defined behavior,
rather than merely disabling *today's* optimizations that would break such
code but not precluding the possibility of imposing future optimizations
which would break it?

If the authors of gcc had not explicitly expressed the indicated attitude
when responding to bug reports, I would be satisfied that the only sensible
purpose of -fno-strict-alias would be to cause aliasing to be treated as
having defined behavior, but if the authors of the Standard want to reserve
the right to change any behavior that isn't explicitly documented, then
failure to explicitly document any behavior should be taken as reserving the
right to break code that uses it.

BTW, are there any versions of Unix that do not rely upon forms of aliasing
not mandated by the C or Posix standards?

me

unread,
Nov 23, 2016, 12:01:18 PM11/23/16
to
On Wednesday, November 23, 2016 at 8:31:33 AM UTC-8, supe...@casperkitty.com wrote:
> On Tuesday, November 22, 2016 at 9:21:52 PM UTC-6, Richard Damon wrote:
> > If you look at how the Standards Committee has worked on the Standard,
> > backwards compatibility HAS been a key point. If the Standard mandated a
> > certain behavior for a code sequence, they try very hard to not change
> > that, and in cases where the previous standard has left behavior
> > undefined or implementation defined, they try hard to not mandate
> > behavior in a way incompatible with a real existing implementation.
>
> If the C89 Standard Committee was trying to maintain backward compatibility
> (I believe they were) that would imply that they did not expect that
> describing formerly-defined actions as invoking "Undefined Behavior" would
> break programs that relied upon them. Implementations would be free, after
> all, to keep on handling those actions in useful fashion whether or not the
> Standard mandated such treatment.

In a very real way, _nothing_ had defined behaviour before the standard,
so claiming that formerly defined behaviour changed to undefined doesn't
make sense. If implementation want to maintain their previous behaviour
even in the face of it now being officially "undefined", they're free to
do so.

> I think the difficulty is that the Committee's actions have been
> reinterpreted as a decision to break code which used those behaviors.

No, the standard merely describes which of those behaviours are in fact
defined, and how.

> The
> 1974 document (which I think represents the best definition of what the C
> language was in 1974) clearly and unambiguously defines semantics which are
> sometimes useful, but are not available in the language defined by today's
> Standard.

And thank goodness. Someone else has already mentioned that you seem to want
current C to be pretty much like your C74, but I'd wager a significant
amount of beer that very few others here share that view.
A shirt catalogue is not a standard. Standards _do_ tend to define things
that are "obvious". What's interesting to me is how convoluted and wordy
those definitions often are, probably because when one looks closely, it's
really not so "obvious" after all.

> > The Standards Committee has been somewhat reluctant to add 'optional'
> > features to the Standard (though it does have a few, normally for
> > special reasons). I suspect that since what you are looking for fits so
> > nicely as an extension, I can't see it likely for them to add a bunch of
> > new optional features to the base Standard.
>
> For the "Standard" to really qualify as a standard, it should define classes
> of things such that knowing that x is in class X and knowing that y is in
> class Y will allow one to predict something useful about the interaction of
> objects x and y.

Um, that's exactly what the standard _does_ do; it says "if you do /this/,
you'll get /that/ result". (And, explicitly or by implication, if you do
anything different you're on your own.)

> By that definition, the so-called C "Standard" is only
> applicable to programs with constraint violations (which generally aren't
> very useful).

I don't see how that follows.

> Recognizing optional behaviors would greatly increase the value of the
> Standard by making it possible and practical to define two very useful
> categories of implementations and programs and a couple of useful guarantees
> related to them:

Taken to the extreme, _any_ implementation could be called conforming as long
as every single way in which its behaviour differs from other implementations
was documented as an option. How is this better than no standard at all?

supe...@casperkitty.com

unread,
Nov 23, 2016, 3:06:06 PM11/23/16
to
On Wednesday, November 23, 2016 at 11:01:18 AM UTC-6, someone wrote:
> On Wednesday, November 23, 2016 at 8:31:33 AM UTC-8, supercat wrote:
> > If the C89 Standard Committee was trying to maintain backward compatibility
> > (I believe they were) that would imply that they did not expect that
> > describing formerly-defined actions as invoking "Undefined Behavior" would
> > break programs that relied upon them. Implementations would be free, after
> > all, to keep on handling those actions in useful fashion whether or not the
> > Standard mandated such treatment.
>
> In a very real way, _nothing_ had defined behaviour before the standard,
> so claiming that formerly defined behaviour changed to undefined doesn't
> make sense. If implementation want to maintain their previous behaviour
> even in the face of it now being officially "undefined", they're free to
> do so.

Standards committees is not the only entities in the universe that
"define" things. The 1974 document specifies how a couple of particular
implementations work. Authors of other implementations were expected to
make them behave in analogous fashion as appropriate for the target
platform when practical. For target platforms that were very much like
the PDP-11, almost all behaviors could be defined as "do what the PDP-11
would do". Only when targeting platforms which were different did things
become problematic (e.g. on a PDP-11 there's no question about what (-1)<<1
should mean, but on a ones'-complement machine there would be two plausible
meanings and on a sign-magnitude machine there would be no clear meaning).

> > I think the difficulty is that the Committee's actions have been
> > reinterpreted as a decision to break code which used those behaviors.
>
> No, the standard merely describes which of those behaviours are in fact
> defined, and how.

Which of those behaviors are defined *by the Standard*. Before the Standard
was ratified, any compiler for two's-complement silent-wraparound hardware
which didn't evaluate (-1)<<1 as -2 would have been considered "broken".
Whether or not anything specifically defined the behavior of <<, the meaning
of left-shifting two's-complement values was most likely established even
before Dennis Ritchie was born [I don't know exactly when during the design
of the Atanasoff-Berry Computer the behavior of two's-complement math was
established, or if it had been established for something earlier, but it
would presumably have been established by the time work was discontinued in
1942].

I see no evidence that the authors of the Standard intended that it cause
anything whose meaning had been well established on certain kinds of
machines prior to the Standard's ratification to be regarded as less well
defined than it had been previously.

> > The
> > 1974 document (which I think represents the best definition of what the C
> > language was in 1974) clearly and unambiguously defines semantics which are
> > sometimes useful, but are not available in the language defined by today's
> > Standard.
>
> And thank goodness. Someone else has already mentioned that you seem to want
> current C to be pretty much like your C74, but I'd wager a significant
> amount of beer that very few others here share that view.

I have said that I want a language whose available semantics are a superset
of those available in C74, at least when running on platforms which share
some basic architectural features with the PDP-11 (storage consists of a
plurality of linearly-addressable octets, each integer type larger than
char is stored as two half-size chunks without padding, integer math is
performed using two's-complement hardware without overflow traps, etc.).

I wouldn't mind having to explicitly request semantics which aren't always
needed *if* there were a concise way of doing so which could be expected
to work on platforms which would naturally support the behaviors in
question. The problem is that in many situations where when the easiest
way for compilers to support such operations on platforms where they would
be practical would be to simply treat them as defined whether the Standard
required it or not, nobody saw any need to specify any alternative which
would be unambiguously defined on all platforms that supported it.

> A shirt catalogue is not a standard. Standards _do_ tend to define things
> that are "obvious". What's interesting to me is how convoluted and wordy
> those definitions often are, probably because when one looks closely, it's
> really not so "obvious" after all.

The analogous "standard" would be the fact that shirts intended for use
by normal humans will have two sleeves, one on each side. Many standards
do go out of their way to state the obvious, but the C89 Standard went
out of its way to avoid avoiding suggesting that any particular
implementation should be considered inferior to others, even when that
meant refraining from recommending things that almost all implementations
should support when practical, but which a few implementations might be
able to.

Most modern standards make use of terms MUST, SHOULD, etc. as defined in
RFC-2119. In many cases, a lot of the value of a Standard comes not from
the MUST items, but from the SHOULD items. A standard written by someone
who wants to avoid SHOULD items will often be inferior to one written by
someone who embraces them.

> > For the "Standard" to really qualify as a standard, it should define classes
> > of things such that knowing that x is in class X and knowing that y is in
> > class Y will allow one to predict something useful about the interaction of
> > objects x and y.
>
> Um, that's exactly what the standard _does_ do; it says "if you do /this/,
> you'll get /that/ result". (And, explicitly or by implication, if you do
> anything different you're on your own.)

The Standard requires that for every conforming implementation, there must
be at least one program meeting certain criteria that the implementation
will process as described by the Standard. Nothing in the Standard would
forbid an implementation from dying with a stack overflow (and behaving
in arbitrary fashion) when fed any other program.

> > By that definition, the so-called C "Standard" is only
> > applicable to programs with constraint violations (which generally aren't
> > very useful).
>
> I don't see how that follows.

See above.

> > Recognizing optional behaviors would greatly increase the value of the
> > Standard by making it possible and practical to define two very useful
> > categories of implementations and programs and a couple of useful guarantees
> > related to them:
>
> Taken to the extreme, _any_ implementation could be called conforming as long
> as every single way in which its behaviour differs from other implementations
> was documented as an option. How is this better than no standard at all?

Under my proposed rules, feeding any Selectively Conforming program to an
implementation that was at least Minimally Conforming would result in one of
two things happening:

1. The program would function successfully.

2. The program would fail in Implementation-Defined fashion (depending
upon the directives it uses, such failure might be guaranteed to occur
at compile time, or could be deferred until run time).

A program which relies upon features which some implementations support and
others do not would be less portable than one which does not rely upon such
features, but depending upon how common the feature was it could be, for
all practical purposes, "almost" as portable. An implementation which does
not support features that are needed by many programs could be viewed as
inferior to one which does support them, but could still be very useful for
running programs that don't need such features.

Chris M. Thomasson

unread,
Nov 23, 2016, 4:40:05 PM11/23/16
to
On 11/21/2016 5:59 AM, Anton Shepelev wrote:
> Andreas:
>
>> Hi all,
[...]
> Am I right in thinking that "Object-oriented pro-
> gramming in C":
>
> http://www.cs.rit.edu/~ats/books/ooc.pdf
>
> is based entirely on the violation of strict alias-
> ing?

Well, I sort of like the following technique:

http://pastebin.com/f52a443b1

Not even trying to make full OOP, but its fairly nice...

;^)

Chris M. Thomasson

unread,
Nov 23, 2016, 6:10:36 PM11/23/16
to
Looks okay to me.

FWIW, check this out wrt offsetof:
_________________________________________________
#define ALIGN_OF(mp_type) \
offsetof( \
struct \
{ \
char pad_ALIGN_OF; \
mp_type type_ALIGN_OF; \
}, \
type_ALIGN_OF \
)
_________________________________________________

I posted in this thread:

https://groups.google.com/d/topic/comp.lang.c.moderated/dfFaOdBRq-0/discussion

;^)


Also, wrt container_of, here is some quick code I just typed out:
_________________________________________________
#include <stdio.h>
#include <stddef.h>

struct foo
{
int a;
};

struct contain
{
char a;
double b;
struct foo c;
char d;
};

#define CONTAINS(mp_ptr, mp_type, mp_member) \
((mp_type*)(((unsigned char*)(mp_ptr)) - \
offsetof(mp_type, mp_member)))

int main(void)
{
struct contain c = { 'A', -.66, { 67 }, 'D' };
struct foo* bp = &c.c;
struct contain* cp = CONTAINS(bp, struct contain, c);

printf("struct contain:(%p)->c(%c, %f, %d, %c)\n",
(void*)&c, c.a, c.b, c.c.a, c.d);

printf("struct foo:(%p)->bp(%d)\n",
(void*)bp, bp->a);

printf("struct contain:(%p)->cp(%c, %f, %d, %c)\n",
(void*)cp, cp->a, cp->b, cp->c.a, cp->d);

return 0;
}
_________________________________________________

Quite a useful construct. I have been taking advantage of it for a while
now.

luser droog

unread,
Nov 23, 2016, 6:10:56 PM11/23/16
to
I tried looking at it, but I couldn't see the text under
the failing-to-load Flash ads. Have you considered gist.github.com?
I find it infinitely superior to pastebin.

Chris M. Thomasson

unread,
Nov 23, 2016, 6:13:45 PM11/23/16
to
Yikes! Okay, try this link:

http://pastebin.com/raw/f52a443b1

It should be raw text, also I will just post it here:
___________________________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>


struct object_prv_vtable {
int (*fp_destroy) (void* const);
};


struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};


struct device_vtable {
struct object_prv_vtable const object;
struct device_prv_vtable const device;
};


struct device {
struct device_vtable const* vtable;
};


#define object_destroy(mp_self) ( \
(mp_self)->vtable->object.fp_destroy((mp_self)) \
)


#define device_read(mp_self, mp_buf, mp_size) ( \
(mp_self)->vtable->device.fp_read((mp_self), (mp_buf), (mp_size)) \
)


#define device_write(mp_self, mp_buf, mp_size) ( \
(mp_self)->vtable->device.fp_write((mp_self), (mp_buf), (mp_size)) \
)






/* Sample Header (usb_drive.h)
____________________________________________________________________*/
#if ! defined(USB_HEADER_H)
#define USB_HEADER_H


extern int usb_drive_create(struct device** const);


#endif







/* Sample Impl (usb_drive.c)
____________________________________________________________________*/
/* #include "usb_drive.c" */
#include <stdio.h>
#include <stdlib.h>


struct usb_drive {
struct device device;
/* whatever */
};


static int usb_drive_object_destroy(void* const);
static int usb_drive_device_read(void* const, void*, size_t);
static int usb_drive_device_write(void* const, void const*, size_t);


static struct device_vtable const g_table = {
{ /* object */
usb_drive_object_destroy
},

{ /* device */
usb_drive_device_read,
usb_drive_device_write
}
};


int usb_drive_create(
struct device** const pself
) {
struct usb_drive* const self = malloc(sizeof(*self));
if (self) {
self->device.vtable = &g_table;
*pself = &self->device;
return 0;
}
return -1;
}


int usb_drive_object_destroy(
void* const self_
) {
struct usb_drive* const self = self_;
printf("usb_drive_object_destroy(%p)\n", (void*)self);
free(self_);
return 0;
}


int usb_drive_device_read(
void* const self_,
void* buf,
size_t size
) {
struct usb_drive* const self = self_;
printf("usb_drive_device_read(%p, %p, %lu)\n",
(void*)self, buf, (unsigned long)size);
return 0;
}


int usb_drive_device_write(
void* const self_,
void const* buf,
size_t size
) {
struct usb_drive* const self = self_;
printf("usb_drive_device_write(%p, %p, %lu)\n",
(void*)self, buf, (unsigned long)size);
return 0;
}







/* Sample Application
____________________________________________________________________*/
void read_write(
struct device* const self
) {
char buf[100];

device_read(self, buf, 50);

device_write(self, buf, 5);
}


int main(void) {
struct device* a_device;

if (! usb_drive_create(&a_device)) {
read_write(a_device);

object_destroy(a_device);
}

return 0;
}
___________________________________________________



luser droog

unread,
Nov 23, 2016, 9:29:20 PM11/23/16
to
On Wednesday, November 23, 2016 at 5:13:45 PM UTC-6, Chris M. Thomasson wrote:
> On 11/23/2016 3:10 PM, luser droog wrote:
> > On Wednesday, November 23, 2016 at 3:40:05 PM UTC-6, Chris M. Thomasson wrote:
> >> On 11/21/2016 5:59 AM, Anton Shepelev wrote:
> >>> Andreas:
> >>>
> >>>> Hi all,
> >> [...]
> >>> Am I right in thinking that "Object-oriented pro-
> >>> gramming in C":
> >>>
> >>> http://www.cs.rit.edu/~ats/books/ooc.pdf
> >>>
> >>> is based entirely on the violation of strict alias-
> >>> ing?
> >>
> >> Well, I sort of like the following technique:
> >>
> >> http://pastebin.com/f52a443b1
> >>
> >> Not even trying to make full OOP, but its fairly nice...
> >>
> >> ;^)
> >
> > I tried looking at it, but I couldn't see the text under
> > the failing-to-load Flash ads. Have you considered gist.github.com?
> > I find it infinitely superior to pastebin.
>
> Yikes! Okay, try this link:
>
> http://pastebin.com/raw/f52a443b1
>
> It should be raw text, also I will just post it here:
> ___________________________________________________
<snip>

Excellent. Thanks for the help. I like it. Inheritance by
composition is the current fav AFAIAA from OOP blogs.

I wonder if some of the void* could be written as their
actual types (or superclass type as appropriate) for more
meaning to the reader, if not type-safety per se, ipso facto.

Anton Shepelev

unread,
Nov 24, 2016, 3:41:20 AM11/24/16
to
Chris M. Thomasson qutoing
http://pastebin.com/raw/f52a443b1 :

>#include <stddef.h>
>
>
>struct object_prv_vtable {
> int (*fp_destroy) (void* const);
>};
>
>
>struct device_prv_vtable {
> int (*fp_read) (void* const, void*, size_t);
> int (*fp_write) (void* const, void const*, size_t);
>};
>
>
>struct device_vtable {
> struct object_prv_vtable const object;
> struct device_prv_vtable const device;
>};
>
>
>struct device {
> struct device_vtable const* vtable;
>};
>
>
>#define object_destroy(mp_self) ( \
> (mp_self)->vtable->object.fp_destroy((mp_self)) \
>)
>
>
>#define device_read(mp_self, mp_buf, mp_size) ( \
> (mp_self)->vtable->device.fp_read((mp_self), (mp_buf), (mp_size)) \
>)
>
>
>#define device_write(mp_self, mp_buf, mp_size) ( \
> (mp_self)->vtable->device.fp_write((mp_self), (mp_buf), (mp_size)) \
>)

What is the point of separating object- and device-
related functions -- having the same "object" inter-
face ( destroy() ) to different types of objects,
i.e. to objects with different sets of specific
functions?

Thiago Adams

unread,
Nov 24, 2016, 6:29:18 AM11/24/16
to
On Wednesday, November 23, 2016 at 9:13:45 PM UTC-2, Chris M. Thomasson wrote:
> On 11/23/2016 3:10 PM, luser droog wrote:
> > On Wednesday, November 23, 2016 at 3:40:05 PM UTC-6, Chris M. Thomasson wrote:
> >> On 11/21/2016 5:59 AM, Anton Shepelev wrote:
> >>> Andreas:
> >>>
> >>>> Hi all,
> >> [...]
> >>> Am I right in thinking that "Object-oriented pro-
> >>> gramming in C":
> >>>
> >>> http://www.cs.rit.edu/~ats/books/ooc.pdf
> >>>
> >>> is based entirely on the violation of strict alias-
> >>> ing?
> >>
> >> Well, I sort of like the following technique:
> >>
> >> http://pastebin.com/f52a443b1

This is like the C++ implementation. I think this is the classic approach of java, C#, C++. The idea of "Interfaces".

The problem with interfaces (virtual functions) is the coupling caused by the interfaces. You cannot solve problems independently from each other; instead, you have to deal with the best set of interfaces that fits the algorithms you have so far. When you need to add or remove algorithms then you have to refactoring your interfaces for the best set again, and doing this, sometimes, can break code that is already working. It is very easy to increase coupling with interfaces, and it's hard to keep the interface sets correct. When you do it, probably the interfaces are useful only in your specific project making your classes less reusable.



Chris M. Thomasson

unread,
Nov 24, 2016, 8:37:18 PM11/24/16
to
You are correct: There is a hard core binding with the interface API.
How to make API flexible in this type of scheme? Well, it was never
meant to be full OOP, so perhaps the inflexible can become flexible
within the realm of certain narrow use cases? I did not want to add more
forms of indirection...

;^o

supe...@casperkitty.com

unread,
Nov 25, 2016, 12:56:04 PM11/25/16
to
On Thursday, November 24, 2016 at 7:37:18 PM UTC-6, Chris M. Thomasson wrote:
> You are correct: There is a hard core binding with the interface API.
> How to make API flexible in this type of scheme? Well, it was never
> meant to be full OOP, so perhaps the inflexible can become flexible
> within the realm of certain narrow use cases? I did not want to add more
> forms of indirection...

If a compiler treats pointers to any kind of struct as interchangeable for
purposes of aliasing, the Common Initial Sequence rule, or function-pointer
signature compatibility, that will allow things to be written very nicely:

struct baseThingFuncs {
void (*act1)(struct baseThing *it);
}
struct derived1Funcs {
void (*act1)(struct derived1 *it);
void (*act2)(struct derived1 *it);
}

struct baseThing { struct baseThingFuncs *f; int prop1, prop2; }
struct derived1 { struct derived1Funcs *f; int prop1, prop2, prop3; }

Code which has a pointer p of type struct baseThing* would be able to
invoke act1 upon it via p->f->act1(p), but if *p was a derived1, it
would be able to use all the members of derived1 or derived1Funcs without
needing a cast.

Nice and elegant, if one is willing to use the dialect of C associated with
the -fno-strict-aliasing flag.

Thiago Adams

unread,
Nov 25, 2016, 6:22:22 PM11/25/16
to
The problem of interfaces I mentioned is not a implementation problem, but a conceptual one.

The "interface" approach is useful when you don't want to know the concrete type. For instance, you want to create an algorithm that works with no change with the code someone else will give to you and the only thing you know is that the code uses the same interface pointers. "plugin"

I have a name for these polymorphism.
I call it "open polymorphism"

In case I know all the types I call "closed polymorphism"

The closed polymorphism I implement using run-time selection using a number to identify the type. It is faster than vtable.

The open polymorphism I use vtable-style.

In C, because there is nothing ready to use, I have many other variations that are better in different cases like unions kind of "variant type"




Chris M. Thomasson

unread,
Nov 28, 2016, 9:18:12 PM11/28/16
to
How are you doing this selection/dispatch?

>
> The open polymorphism I use vtable-style.
>
> In C, because there is nothing ready to use, I have many other variations that are better in different cases like unions kind of "variant type"

Reminds me of the XEvent union in X11.

Thiago Adams

unread,
Nov 29, 2016, 7:48:48 AM11/29/16
to
like this:

typedef enum { Type1_ID, Type2_ID.. } Types;
typedef struct { Types Type; } TypePtr;
struct { TypePtr Type; ... } Type1;
#define TYPE1_INIT { {Type1_ID}, ... }
struct { TypePtr Type; ... } Type2;
#define TYPE2_INIT { {Type2_ID}, ... }


typedef TypePtr Shape;
CAST(Shape, Type1);
CAST(Shape, Type2);

#define Case(T) case T_##ID:

void Shape_Print(Shape* p)
{
switch (p->Type)
{
Case(Type1):
Type1_Print((Type1*)p);
break;

Case(Type2):
Type2_Print((Type2*)p);
break;

default:
assert(false);
break;
}
}

//I guess here is the controversial point of this topic
//(this works for me)

#define CAST(T1, T2) \
inline T2* T1_##As##_T2(T1* p) { \
return (p->Type == T2_##ID) ? (T2*)p : 0;\
}\
inline T1* T2_##As##_T1(T2* p) { return (T1*)p; }


----

How it differs from vtable-style?

Let's say I have Circle and Cat objects.
The initial design was IObject, IAnimal, IShape.
Now I have to add a new algorithm Print.
It was not at the initial design. Where do I add it now?
Do I put inside the IObject interface or I create a new Interface IPrintable?
The puzzle of interface definition begins and it's not a localized/independent problem.
To add a Print into the IObject interface I have to see if it makes sense for all objects derived from IObject.
Lets see other case. I have IControl that is a interface for widgets.
Initially I put the method "Draw" at this interface because all of my objects derived from IControl had to be drawn.
At some point, I have to add a new object similar of IControl but that doesn't need to be draw.
Then I have to create a new interface called IDraw and move Draw from IControl to IDraw.
When a new algorithm needs to be added I may have to reorganize the Interfaces and the types.
The types are now depending on many interfaces adjusted for the needs of that program globally adjusted.


With this type-selection approach you can solve the problem locally at the collection point. (you can think in small parts)

For instance, for the IControl "Draw" problem I can adjust my for-each-draw to just ignore the new type that doesn't need to be draw.
The new type, or the old ones don't need any changes.


You can create a new specialized algorithm as well without to worry about the interface relationship.
You can have many types of Print algorithms applied differently for different types without to worry it that type is for all types or not.
And you can reuse a type. If you don't need polymorphism just remove 1 line from the begging of you struct and you are done.

But with this approach you cannot add a new type without to change the algorithm. Or a new unknown type in some collection. "plugins"

--
What a language could do is to create the type selection for us.

Imagine a new language:

typedef (Box | Circle) Shape;

void Func(Shape* p)
{
//we known that shape can be only Box or Circle
// "closed polymorphism"

//please language make this dispatch for me
Shape_Draw(p);
}

void Box_Draw(Box* p) {...}
void Circle_Draw(Circle* p) {...}







supe...@casperkitty.com

unread,
Nov 29, 2016, 10:28:39 AM11/29/16
to
On Tuesday, November 29, 2016 at 6:48:48 AM UTC-6, Thiago Adams wrote:
> Now I have to add a new algorithm Print.
> It was not at the initial design. Where do I add it now?

If things are added in a well-defined sequence, such that there will be an
entity that knows all the functions, then it will be possible to have code
which was compiled earlier pass its function and its size to the all-knowing
entity, and ask it to allocate and return a pointer to a table which includes
all functions the entity knows about. If the all-knowing entity knows of a
print function, but it's given a table which doesn't include it, it could
fill in the "print" slot with either a stub that returns an error, or with a
function that uses some supported method of rendering to generate an image
of the object and then uses its own logic to print that.

Anton Shepelev

unread,
Nov 29, 2016, 4:14:32 PM11/29/16
to
James R. Kuyper to Anton Shepelev:

> > The author makes extensive use if casting be-
> > tween non-aliasing pointers. He author writes:
> >
> > 1. Passing a circle as a point means convert-
> > ing from a struct Circle * to a
> > struct Point *. We will refer to this as
> > an up-cast from a subclass to a super-
> > class -- in ANSI-C it can only be accom-
> > plished with an explicit conversion opera-
> > tor or through intermediate void * values.
> >
> On page 34, he writes:
> struct Circle { const struct Point _; int rad; };
>
> Given
>
> struct Circle c, *pc=&c;
> struct Point p, *pp=&p;
>
> The explicit conversion he referred to is
> (const struct Point *)pc ,
> and it is allowed by 6.7.2.1p15.

Is it the item saying that a pointer suitably con-
verted shall point the first field of a struct? I
am sorry I have no standard at hand.

> However, he's wrong about void* being the only
> other way to do this. &pc->_ is simpler and equiv-
> alent, and will continue to work even if some oth-
> er field becomes the first member of struct Cir-
> cle.

In OOP this method is out of the question, because a
member should be accessible immediately from the ob-
ject at hand, without knowing the inheritance level
at which it has been introduced. Whereas the ->_ in
your example betrays the object's internal structure
and the assumption that it is derived from Point.
What should you do if there were a three-tier hier-
archy -- ->_->_? For aught I know, this is allowed
under the hood but not in user code.

> > 2. It is usually unsound, however, to pass a
> > point to a function intended for circles
> > such as Circle_draw(): converting from a
> > struct Point * to a struct Circle * is on-
> > ly permissible if the point originally was
> > a circle. We will refer to this as a
> > down-cast from a superclass to a sub-
> > class -- this requires explicit conver-
> > sions or void * values, too, and it can
> > only be done to pointers to objects that
> > were in the subclass to begin with.
> >
> In other words, given
>
> pp = &c._;
>
> then (struct Circle*)pp is also allowed by
> 6.7.2.1p15. However, given:
>
> pp - &p;
>
> then (struct Circle*)pp would not be allowed.
>
> I don't see any problem with this approach. What
> are your concerns about it?

The problem is that the author uses the latter ex-
pression. In the appendix "ANSI-C Programming
Hints" he shows stark innocence of the aliasing lim-
itations:

The first component of a structure starts right
at the beginning of the structure; therefore,
structures can be lengthened or shortened:

struct a { int x; };
struct c { struct a a; ... } c, * cp = & c;
struct a * ap = & c.a;

ANSI-C permits neither implicit conversions of
pointers to different structures nor direct
access to the components of an inner structure:

cp -> a.x ok, fully specified
((struct a *) cp) -> x ok, explicit conversion (!!!)

Or am missing something?

James R. Kuyper

unread,
Nov 29, 2016, 5:23:16 PM11/29/16
to
On 11/29/2016 04:14 PM, Anton Shepelev wrote:
> James R. Kuyper to Anton Shepelev:
>
>>> The author makes extensive use if casting be-
>>> tween non-aliasing pointers. He author writes:
>>>
>>> 1. Passing a circle as a point means convert-
>>> ing from a struct Circle * to a
>>> struct Point *. We will refer to this as
>>> an up-cast from a subclass to a super-
>>> class -- in ANSI-C it can only be accom-
>>> plished with an explicit conversion opera-
>>> tor or through intermediate void * values.
>>>
>> On page 34, he writes:
>> struct Circle { const struct Point _; int rad; };
>>
>> Given
>>
>> struct Circle c, *pc=&c;
>> struct Point p, *pp=&p;
>>
>> The explicit conversion he referred to is
>> (const struct Point *)pc ,
>> and it is allowed by 6.7.2.1p15.
>
> Is it the item saying that a pointer suitably con-
> verted shall point the first field of a struct?

Yes.

>> However, he's wrong about void* being the only
>> other way to do this. &pc->_ is simpler and equiv-
>> alent, and will continue to work even if some oth-
>> er field becomes the first member of struct Cir-
>> cle.
>
> In OOP this method is out of the question, because a
> member should be accessible immediately from the ob-
> ject at hand, without knowing the inheritance level
> at which it has been introduced. Whereas the ->_ in
> your example betrays the object's internal structure
> and the assumption that it is derived from Point.

The cast makes the assumption that the first actual member of struct
Circle is an object of type struct Point, or an array of struct Point -
either directly, or recursively because the first member has a struct
type of which that is true.
The use of ->_, on the other hand, makes the assumption that there is
some member of struct Circle, not necessarily the first one, with a name
of _ and type of struct Point.

Is making an assumption about the location of the struct Point member
any better, or any worse, than making an assumption about the name of
that member? That depends entirely upon which assumption is more likely
to be true. If you're certain that one assumption is valid, and are
worried that the other one might not be, you should use the expression
which relies upon the assumption your certain of. Otherwise, it doesn't
matter.

> What should you do if there were a three-tier hier-
> archy -- ->_->_? For aught I know, this is allowed
> under the hood but not in user code.
>
>>> 2. It is usually unsound, however, to pass a
>>> point to a function intended for circles
>>> such as Circle_draw(): converting from a
>>> struct Point * to a struct Circle * is on-
>>> ly permissible if the point originally was
>>> a circle. We will refer to this as a
>>> down-cast from a superclass to a sub-
>>> class -- this requires explicit conver-
>>> sions or void * values, too, and it can
>>> only be done to pointers to objects that
>>> were in the subclass to begin with.
>>>
>> In other words, given
>>
>> pp = &c._;
>>
>> then (struct Circle*)pp is also allowed by
>> 6.7.2.1p15. However, given:
>>
>> pp - &p;

Correction: that should have been
pp = &p;

>> then (struct Circle*)pp would not be allowed.
>>
>> I don't see any problem with this approach. What
>> are your concerns about it?
>
> The problem is that the author uses the latter ex-
> pression. ...

I don't see any expression corresponding to that one in the code below:

> ... In the appendix "ANSI-C Programming
> Hints" he shows stark innocence of the aliasing lim-
> itations:
>
> The first component of a structure starts right
> at the beginning of the structure; therefore,
> structures can be lengthened or shortened:
>
> struct a { int x; };
> struct c { struct a a; ... } c, * cp = & c;
> struct a * ap = & c.a;

The code that I said above would not be allowed would be analogous to
the following:

struct a a;
ap = &a;
(struct c*)ap

There is no such code in the sample you've given me, and no indications
about whether or not the author is aware that such code would be
problematic.

> ANSI-C permits neither implicit conversions of
> pointers to different structures nor direct
> access to the components of an inner structure:
>
> cp -> a.x ok, fully specified
> ((struct a *) cp) -> x ok, explicit conversion (!!!)
>
> Or am missing something?

Possibly. Or maybe I'm missing something. I don't see any aliasing
problems in the above code. Could you please identify them?

Thiago Adams

unread,
Nov 30, 2016, 11:26:29 AM11/30/16
to
Do you mean runtime-vtable-build?






supe...@casperkitty.com

unread,
Nov 30, 2016, 12:13:23 PM11/30/16
to
On Wednesday, November 30, 2016 at 10:26:29 AM UTC-6, Thiago Adams wrote:
> On Tuesday, November 29, 2016 at 1:28:39 PM UTC-2, supercat wrote:
> > If things are added in a well-defined sequence, such that there will be an
> > entity that knows all the functions, then it will be possible to have code
> > which was compiled earlier pass its function and its size to the all-knowing
> > entity, and ask it to allocate and return a pointer to a table which includes
> > all functions the entity knows about. If the all-knowing entity knows of a
> > print function, but it's given a table which doesn't include it, it could
> > fill in the "print" slot with either a stub that returns an error, or with a
> > function that uses some supported method of rendering to generate an image
> > of the object and then uses its own logic to print that.
>
> Do you mean runtime-vtable-build?

In a sense, but many such systems also require runtime computation of
offsets. If functions are added to the library in a fixed universally-
agreed-upon sequence, such that given any two versions of the vtable,
one will be a subset of the other, all offsets will be known at compile
time. Consequently, there would be a one-time cost at system startup
but all code which used the vtables would run at full-speed [unlike some
dynamic-vtable-generation approaches where every function call would
require an extra variable fetch].

Actually, on many systems it would be possible to have a build step between
the compiler and linker which could look for symbols with certain magic
names and perform all the necessary adjustments at link time. For example,
if every "class" declared a const vtable whose name followed a certain
convention and an extern vtable whose name was related to the first, a
tool could search all object files for definitions of the first form and
define suitably-expanded vtables of the second form. The exact means by
which the utility would identify the contents of all the appropriate tables
would depend upon the system's object file format, of course.

Anton Shepelev

unread,
Dec 4, 2016, 10:48:01 AM12/4/16
to
James R. Kuyper to Anton Shepelev:

> > > On page 34, he writes:
> > > struct Circle { const struct Point _; int rad;
> > > };
> > >
> > > Given
> > >
> > > struct Circle c, *pc=&c;
> > > struct Point p, *pp=&p;
> > >
> > > The explicit conversion he referred to is
> > > (const struct Point *)pc ,
> > > and it is allowed by 6.7.2.1p15.
> > >
> > > However, he's wrong about void* being the only
> > > other way to do this. &pc->_ is simpler and
> > > equivalent, and will continue to work even if
> > > some other field becomes the first member of
> > > struct Circle.
> >
> > In OOP this method is out of the question, be-
> > cause a member should be accessible immediately
> > from the object at hand, without knowing the in-
> > heritance level at which it has been introduced.
> > Whereas the ->_ in your example betrays the ob-
> > ject's internal structure and the assumption
> > that it is derived from Point. What should you
> > do if there were a three-tier hierar-
> > chy -- ->_->_? For aught I know, this is al-
> > lowed under the hood but in user code...
>
> The cast makes the assumption that the first actu-
> al member of struct Circle is an object of type
> struct Point, or an array of struct Point -- ei-
> ther directly, or recursively because the first
> member has a struct type of which that is true.
> The use of ->_, on the other hand, makes the as-
> sumption that there is some member of struct Cir-
> cle, not necessarily the first one, with a name of
> _ and type of struct Point.

Right.

> Is making an assumption about the location of the
> struct Point member any better, or any worse, than
> making an assumption about the name of that mem-
> ber? That depends entirely upon which assumption
> is more likely to be true.

I should rather say, on one's chosen design, because
mixing both way of access it likely a bad idea any-
way, and code should be stuctured to guarantee the
correctness of one of them.

> If you're certain that one assumption is valid,
> and are worried that the other one might not be,
> you should use the expression which relies upon
> the assumption your certain of. Otherwise, it
> doesn't matter.

It's not what I am worried about. Rather, I doubt
that the by-member access agrees with the concept of
OOP's hierarchical data structures, where the mem-
bers are accessed in a uniform way indifferently of
the object's line of ascent. For example, the syn-
tax for accessing the coordinates of a Circle shall
not depend on the way Circle is derived, e.g.:

Object => Circle
(coordinates defined in the Cirlce)

Object => Point => Circle
(coordinates defined in Point)

Object => RadialFigure => Point => Circle
(coordinates defined in RadialFigure)

If that rule be violated, changing the implementa-
tion of Circle may break the code that uses it.

> > In the appendix "ANSI-C Programming Hints" he
> > shows stark innocence of the aliasing limita-
> > tions:
> >
> > The first component of a structure starts
> > right at the beginning of the structure;
> > therefore, structures can be lengthened or
> > shortened:
> >
> > struct a { int x; };
> > struct c { struct a a; ... } c, * cp = & c;
> > struct a * ap = & c.a;
> >
> > ANSI-C permits neither implicit conversions
> > of pointers to different structures nor
> > direct access to the components of an inner
> > structure:
> >
> > cp -> a.x ok, fully specified
> > ((struct a *) cp) -> x ok, explicit conversion (!!!)
> >
> >
> > Or am missing something?
>
> Possibly. Or maybe I'm missing something. I
> don't see any aliasing problems in the above code.
> Could you please identify them?

Ok. My initial concern was about aliasing, but
since conversion here is made in-place, it cannot be
a problem, because there is no second pointer vari-
able pointing to the same address. Would this:

ap = (struct a *) cp; /* alias to cp */
ap -> x = 10;

be wrong?

James R. Kuyper

unread,
Dec 4, 2016, 12:18:02 PM12/4/16
to
On 12/04/2016 10:48 AM, Anton Shepelev wrote:
> James R. Kuyper to Anton Shepelev:
...
>> If you're certain that one assumption is valid,
>> and are worried that the other one might not be,
>> you should use the expression which relies upon
>> the assumption you're certain of. Otherwise, it
>> doesn't matter.
>
> It's not what I am worried about. Rather, I doubt
> that the by-member access agrees with the concept of
> OOP's hierarchical data structures, where the mem-
> bers are accessed in a uniform way indifferently of
> the object's line of ascent. For example, the syn-
> tax for accessing the coordinates of a Circle shall
> not depend on the way Circle is derived, e.g.:
>
> Object => Circle
> (coordinates defined in the Cirlce)
>
> Object => Point => Circle
> (coordinates defined in Point)
>
> Object => RadialFigure => Point => Circle
> (coordinates defined in RadialFigure)
>
> If that rule be violated, changing the implementa-
> tion of Circle may break the code that uses it.

You're talking about a convention used for writing object-oriented C
code; code that conforms to that convention should always use the access
method that is guaranteed, by use that convention, to work.

I was giving the more general rule: C code doesn't have to follow that
convention, and if it doesn't, accessing the member by name might work
in situations where the cast wouldn't work.
No, because the result of that conversion is a pointer which is defined
as pointing at cp.a. That is an object with the declared type "struct
a", which is therefore also it's effective type. Since the effective
type of the object exactly matches the type of the lvalue that is used
to access it, it is covered by the first item in 6.5p7: "a type
compatible with the effective type of the object". "Two types have
compatible type if their types are the same." (6.2.7p1)

Anton Shepelev

unread,
Dec 4, 2016, 2:14:32 PM12/4/16
to
Given the code:

struct a { int x; };
struct c { struct a a; ... } c, * cp = & c;
struct a * ap = & c.a;

James R. Kuyper to Anton Shepelev:

> > Would this:
> >
> > ap = (struct a *) cp; /* alias to cp */
> > ap -> x
> >
> > be wrong?
>
> No, because the result of that conversion is a
> pointer which is defined as pointing at cp.a.
> That is an object with the declared type "struct
> a", which is therefore also it's effective type.
> Since the effective type of the object exactly
> matches the type of the lvalue that is used to ac-
> cess it, it is covered by the first item in 6.5p7:
> "a type compatible with the effective type of the
> object". "Two types have compatible type if their
> types are the same." (6.2.7p1)

Thanks. If the first field of a stucture has the
same type as that of the pointer target then pointer
conversion works. Will I violate aliasing if I re-
place cp with &c, i.e.:

cp = &c;
ap = (struct a *)&c; /* alias to cp */
ap -> x

James R. Kuyper

unread,
Dec 4, 2016, 3:06:37 PM12/4/16
to
Why are you worried about the possibility of getting a different answer
in this case? Since cp == &c, (struct a*)cp and (struct a*)&c have
precisely the same behavior, and in particular, precisely the same lack
of aliasing problems.

Unless I've missed some important point (which is not impossible - it's
happened before), it seems to me that you must be seriously
misunderstanding something, but your questions have not yet revealed to
me what that misunderstanding is. If you could explain, in detail, why
you thought there might be a problem, I might be able to identify what
it is that you're misunderstanding. Alternatively, that explanation
might help me identify the "important point" that I've been missing.

Anton Shepelev

unread,
Dec 12, 2016, 2:31:59 PM12/12/16
to
James R. Kuyper to Anton Shepelev:

> > struct a { int x; };
> > struct c { struct a a; ... } c, * cp = & c;
> > struct a * ap = & c.a;
> >
> > Thanks. If the first field of a stucture has the
> > same type as that of the pointer target then pointer
> > conversion works. Will I violate aliasing if I re-
> > place cp with &c, i.e.:
> >
> > cp = &c;
> > ap = (struct a *)&c; /* alias to cp */
> > ap -> x
>
> Why are you worried about the possibility of get-
> ting a different answer in this case? Since cp ==
> &c, (struct a*)cp and (struct a*)&c have precisely
> the same behavior, and in particular, precisely
> the same lack of aliasing problems.
>
> Unless I've missed some important point (which is
> not impossible - it's happened before), it seems
> to me that you must be seriously misunderstanding
> something, but your questions have not yet re-
> vealed to me what that misunderstanding is. If you
> could explain, in detail, why you thought there
> might be a problem, I might be able to identify
> what it is that you're misunderstanding. Alterna-
> tively, that explanation might help me identify
> the "important point" that I've been missing.

No, no it is clearly I who lack understanding :-)

I am worried because C does not allow the use of two
pointers to the same location but with different
target types with the exceptions that are known as
the rules of strict aliasing. Why can cp and ap
alias in the example above? Is it because of the
following item in 6.5.7:

An object shall have its stored value accessed
only by an lvalue expression that has one of the
following types:
[...]

-- an aggregate or union type that includes one
of the aforementioned types among its mem-
bers (including, recursively, a member of a
subaggregate or contained union),

Chris M. Thomasson

unread,
Dec 12, 2016, 7:44:42 PM12/12/16
to
I see. So, if we add a new Shape TypeN, we have to add a new entry here
in Shape_Print and add a new entry wrt the CAST macro, right?. That's fine.



>
> //I guess here is the controversial point of this topic
> //(this works for me)
>
> #define CAST(T1, T2) \
> inline T2* T1_##As##_T2(T1* p) { \
> return (p->Type == T2_##ID) ? (T2*)p : 0;\
> }\
> inline T1* T2_##As##_T1(T2* p) { return (T1*)p; }

Concat in pre-processor is fine.

>
>
> ----
>
> How it differs from vtable-style?
>
> Let's say I have Circle and Cat objects.
> The initial design was IObject, IAnimal, IShape.
> Now I have to add a new algorithm Print.
> It was not at the initial design. Where do I add it now?



> Do I put inside the IObject interface or I create a new Interface IPrintable?
> The puzzle of interface definition begins and it's not a localized/independent problem.
> To add a Print into the IObject interface I have to see if it makes sense for all objects derived from IObject.
> Lets see other case. I have IControl that is a interface for widgets.
> Initially I put the method "Draw" at this interface because all of my objects derived from IControl had to be drawn.
> At some point, I have to add a new object similar of IControl but that doesn't need to be draw.
> Then I have to create a new interface called IDraw and move Draw from IControl to IDraw.
> When a new algorithm needs to be added I may have to reorganize the Interfaces and the types.
> The types are now depending on many interfaces adjusted for the needs of that program globally adjusted.
>
>
> With this type-selection approach you can solve the problem locally at the collection point. (you can think in small parts)

I agree with you. Keep in mind that my little interface code worked
great wrt my problem set. Think of testing a dozen mutex impls using a
common interface for a "mutex".


>
> For instance, for the IControl "Draw" problem I can adjust my for-each-draw to just ignore the new type that doesn't need to be draw.
> The new type, or the old ones don't need any changes.
>
>
> You can create a new specialized algorithm as well without to worry about the interface relationship.
> You can have many types of Print algorithms applied differently for different types without to worry it that type is for all types or not.
> And you can reuse a type. If you don't need polymorphism just remove 1 line from the begging of you struct and you are done.
>
> But with this approach you cannot add a new type without to change the algorithm. Or a new unknown type in some collection. "plugins"

Yup. I have used the crude minimalist interface code for plugins. It
works well, and keeps a standard interface for said plugins.


>
> --
> What a language could do is to create the type selection for us.
>
> Imagine a new language:
>
> typedef (Box | Circle) Shape;
>
> void Func(Shape* p)
> {
> //we known that shape can be only Box or Circle
> // "closed polymorphism"
>
> //please language make this dispatch for me
> Shape_Draw(p);
> }
>
> void Box_Draw(Box* p) {...}
> void Circle_Draw(Circle* p) {...}

Interesting.

:^)

Melzzzzz

unread,
Dec 12, 2016, 8:18:52 PM12/12/16
to
Haskell:

data Shape = Circle Double | Rectangle Double Double | Square Double
deriving (Show,Read)

perimeter (Circle r) = 2*r*pi
perimeter (Rectangle x y) = 2*(x+y)
perimeter (Square x) = 4*x

shapes = [Circle 2.4, Rectangle 3.1 4.4, Square 2.1]

main = do
putStrLn $ show $ map perimeter shapes
putStrLn $ show shapes


--
press any key to continue or any other to quit...

Thiago Adams

unread,
Dec 14, 2016, 12:47:45 PM12/14/16
to
The cast macro, creates the cast functions. Because the language would not check any of the casts these functions have a very important role. If the function exists then the cast is valid.
The auto complete from IDEs also shows what are the valid casts when I type Shape_As_... I receive the options.



Thiago Adams

unread,
Dec 14, 2016, 1:15:05 PM12/14/16
to
Something interesting is that at some level I have to iterate myself on types to see what are the possible casts.

Some of the casts just logical renaming.

For instance:

Shape = Circle | Box;

Animal = Cat | Dog;

Anything = Shape | Animal;


The implementation of Anything, Shape and Animal are the same thing (TypePtr).

But I can cast Anything_As_Shape.

I also can cast Anything to Box, because Anything can be Shape that can be Box.

I am considering to create some DSL to create these casts for me.
0 new messages