Re: [comp.lang.c.moderated] Does "sizeof array" equal "nel * sizeof

Ersek, Laszlo

unread,

Jan 13, 2010, 11:40:04 AM1/13/10

to

From: Barry Schwarz <schw...@dqel.com>
Date: Tue, 12 Jan 2010 00:06:14 -0600 (CST)
Message-ID: <clcm-2010...@plethora.net>

> On Mon, 11 Jan 2010 18:25:48 -0600 (CST), "Ersek, Laszlo"
> <la...@caesar.elte.hu> wrote:
>>
>> Here's my problem: does it hold that
>>
>> sizeof array == 10 * sizeof array[0]
>
> If the array is in scope, you can determine the number of elements
> with the expression sizeof array / sizeof *array. From this, you can
> determine that the answer to your question is yes.

Suppose:
- the element type of array is "int",
- the array has 10 elements,
- sizeof(int) == 4,
- sizeof array == 42 (*)

Now

sizeof array / sizeof array[0]
== 42 / 4
== 10

But

sizeof array != 10 * sizeof array[0]
42 != 10 * 4
42 != 40

I have no idea why this would be useful, but I can't see "(*)" explicitly
forbidden anywhere, nor can I derive that prohibition myself. And the blog
post (which I'm simply unable to find) suggested this was permitted. More
exactly, it suggested that the non-equivalence of the following two
statements was permitted:

(void)memcpy(&array[0], ..., 40);
(void)memcpy(&array, ..., 40);

(Yes, yes, I mean the ampersand in the second, and "array" is an array,
not a pointer.)

Furthermore (I apologize for repeating myself), Chris Torek's article [0]
states:

"if you convert these pointers [ie. &array and &array[0])] to `byte
addresses' and print them out with a %p directive in printf(), all three
are QUITE LIKELY [emph. mine] to produce the same output (on a TYPICAL
MODERN [emph. mine] computer"

That "quite likely" makes me uncomfortable.

Thank you,
lacos

[0] http://web.torek.net/torek/c/pa.html
--
comp.lang.c.moderated - moderation address: cl...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.

Ersek, Laszlo

unread,

Jan 13, 2010, 11:40:17 AM1/13/10

to

From: Francis Glassborow <francis.g...@btinternet.com>
Date: Tue, 12 Jan 2010 21:59:33 -0600 (CST)
Message-ID: <clcm-2010...@plethora.net>

>> On Mon, 11 Jan 2010 18:25:48 -0600 (CST), "Ersek, Laszlo"
>> <la...@caesar.elte.hu> wrote:
>>>

>>> (char unsigned *)&array == (char unsigned *)&array[0]
>>>
>>> (Provided the comparison via the equality operator is defined at all, C89
>>> 6.3.8, 6.3.9 -- "If the objects pointed to are not members of the same
>>> aggregate or union object, the result is undefined [...]".)
>
> I do not have a copy of C89 to hand but I am pretty certain that that is
> not about the (in)equality operators. It is a standard way to determine
> if two things are the same object by comparing their addresses for
> equality. Indeed if you could not do that how could you check aliasing?

I probably had a bit of a problem with reading comprehension.

1) 6.3.9 "Equality operators" says "Where the operands have types and
values suitable for the relational operators, the semantics detailed in
6.3.8 apply".

2) I happily moved over to 6.3.8 "Relational operators" and my quote above
is from there. The condition of the quoted statement *does* hold, because
the pointed-to objects are not *members* of the same aggregate (array)
object:

- &array points to the whole array (has type "int (*)[10]"),
- &array[0] points to the first element (has type "int *").

Whether the condition remains true after converting both pointers to (char
unsigned *) depends exactly on my question. If they are equal, then the
condition ceases to hold and they can be compared with relational
operators (and will compare equal, ie. both <= and >= will hold). If they
are not equal after the conversion, then the condition stays true and they
cannot even be compared.

--o--

This was my original thought process. However, moving over to 6.3.8
"Relational operators" and never returning to 6.3.9 "Equality operators"
was wrong on my part, because the latter goes on to say effectively what
you say at the top. In my understanding, after the conversion to (char
unsigned *) both pointers are comparable with ==.

I apologize if I make no sense.

Thank you,
lacos

Francis Glassborow

unread,

Jan 13, 2010, 5:01:41 PM1/13/10

to

Either I am misreading you or you are mistaken. There are two sets of
operators under consideration here

== and |= (comparison for equality)
< <= > >= (comparison for ordering)

The requirement that the pointer be into the same object only applies to
the second set. == != can be applied to any pair of pointers of the same
type (if for some reason you want to compare pointers of different types
just convert them to void*.

Clive D. W. Feather

unread,

Jan 13, 2010, 5:02:02 PM1/13/10

to

In message <clcm-2010...@plethora.net>, "Ersek, Laszlo"

<la...@caesar.elte.hu> wrote:
>>> Here's my problem: does it hold that
>>>
>>> sizeof array == 10 * sizeof array[0]

>Suppose:

>- the element type of array is "int",
>- the array has 10 elements,
>- sizeof(int) == 4,
>- sizeof array == 42 (*)

>I have no idea why this would be useful, but I can't see "(*)"

>explicitly forbidden anywhere, nor can I derive that prohibition myself.

(1) The Standard only allows padding in certain places, and arrays
aren't one of them.

(2) An array is a contiguously allocated nonempty set of objects
(6.2.5#20) - again, no mention of padding.

(3) If you calculate (array + 10) you get a point to beyond the end of
the array, not to somewhere within it.

--
Clive D.W. Feather | Home: <cl...@davros.org>
Mobile: +44 7973 377646 | Web: <http://www.davros.org>
Please reply to the Reply-To address, which is: <cl...@davros.org>

Keith Thompson

unread,

Jan 13, 2010, 5:02:18 PM1/13/10

to

"Ersek, Laszlo" <la...@caesar.elte.hu> writes:
> From: Barry Schwarz <schw...@dqel.com>

>> On Mon, 11 Jan 2010 18:25:48 -0600 (CST), "Ersek, Laszlo"
>> <la...@caesar.elte.hu> wrote:
>>>
>>> Here's my problem: does it hold that
>>>
>>> sizeof array == 10 * sizeof array[0]
>>
>> If the array is in scope, you can determine the number of elements
>> with the expression sizeof array / sizeof *array. From this, you can
>> determine that the answer to your question is yes.
>
> Suppose:
> - the element type of array is "int",
> - the array has 10 elements,
> - sizeof(int) == 4,
> - sizeof array == 42 (*)
>
> Now
>
> sizeof array / sizeof array[0]
> == 42 / 4
> == 10
>
> But
>
> sizeof array != 10 * sizeof array[0]
> 42 != 10 * 4
> 42 != 40
>
> I have no idea why this would be useful, but I can't see "(*)"
> explicitly forbidden anywhere, nor can I derive that prohibition
> myself.

Fascinating. I've always assumed that the size of an array must be
exactly N times the size of an element, but I can't find a guarantee
in the standard either.

C99 6.5.3.4p3 says:

When applied to an operand that has array type, the result is the
total number of bytes in the array.

p6 (which is non-normative) says:

EXAMPLE 2 Another use of the sizeof operator is to compute the
number of elements in an array:

sizeof array / sizeof array[0]

If padding in arrays (presumably at the end) is permitted, then in
order for the example to be valid, the amount of padding allowed would
have to be less than the size of a single element; for example an
array of 4-byte elements could have at most 3 bytes of padding. That
strikes me as a strange an arbitrary requirement, and one that's not
stated anywhere.

Allowing padding in arrays would also affect the representation of
two-dimensional arrays. The individual elements in the second row of
a 2-D array must still be properly aligned. So if the padding must be
less than the element size, there can be no padding for arrays of
elements that are strictly aligned (e.g., if a 4-byte int requires
4-byte alignment).

It's much simpler to assume that the size of an array is exactly N
times the size of an element; the expression in the example then
follows straightforwardly from that.

Perhaps the authors of the standard thought this was so obvious it
didn't need to be stated. Or perhaps it's stated somewhere and we
both missed it.

Other paragraphs that don't quite state this are:

C99 6.2.5p20:

An array type describes a contiguously allocated nonempty set
of objects with a particular member object type, called the
element type. Array types are characterized by their element
type and by the number of elements in the array. An array type
is said to be derived from its element type, and if its element
type is T, the array type is sometimes called ‘‘array of
T’’. The construction of an array type from an element type
is called ‘‘array type derivation’’.

C99 6.2.6, Representations of types, doesn't mention arrays.

> And the blog post (which I'm simply unable to find) suggested
> this was permitted. More exactly, it suggested that the
> non-equivalence of the following two statements was permitted:
>
> (void)memcpy(&array[0], ..., 40);
> (void)memcpy(&array, ..., 40);
>
> (Yes, yes, I mean the ampersand in the second, and "array" is an
> array, not a pointer.)
>
> Furthermore (I apologize for repeating myself), Chris Torek's article
> [0] states:
>
> "if you convert these pointers [ie. &array and &array[0])] to `byte
> addresses' and print them out with a %p directive in printf(), all
> three are QUITE LIKELY [emph. mine] to produce the same output (on a
> TYPICAL MODERN [emph. mine] computer"
>
> That "quite likely" makes me uncomfortable.

I think that's independent of the question of whether arrays can have
padding.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,

Jan 14, 2010, 3:20:22 AM1/14/10

to

"Clive D. W. Feather" <cl...@davros.org> writes:
> In message <clcm-2010...@plethora.net>, "Ersek, Laszlo"
> <la...@caesar.elte.hu> wrote:
>>>> Here's my problem: does it hold that
>>>>
>>>> sizeof array == 10 * sizeof array[0]
>
>>Suppose:
>>- the element type of array is "int",
>>- the array has 10 elements,
>>- sizeof(int) == 4,
>>- sizeof array == 42 (*)
>
>> I have no idea why this would be useful, but I can't see "(*)"
>> explicitly forbidden anywhere, nor can I derive that prohibition
>> myself.
>
> (1) The Standard only allows padding in certain places, and arrays
> aren't one of them.

But does it *disallow* padding? Do the places in the standard that
permit padding state that it's not allowed elsewhere?

Not saying that arrays can have padding isn't the same as saying that
arrays can't have padding.

> (2) An array is a contiguously allocated nonempty set of objects
> (6.2.5#20) - again, no mention of padding.

A structure is a sequentially allocated nonempty set of member objects
(6.2.5#21) - again, no mention of padding, at least not in that place.

"Contiguously allocated" implies that there can't be padding between
elements; I'm trying to figure out whether padding at the end of an
array is permitted. I'm reasonably sure it wasn't intended to be, but
I can't find an actual prohibition.

> (3) If you calculate (array + 10) you get a point to beyond the end of
> the array, not to somewhere within it.

That's not what the standard says. C99 6.5.6p8:

Moreover, if the expression P points to the last element of an
array object, the expression (P)+1 points one past the last
element of the array object,
[...]

It could point to the trailing padding (which is not an element).

Please note that I'm not arguing that arrays can have padding, or that
they should be able to have padding. The point of contention is
whether the standard actually says they can't.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ersek, Laszlo

unread,

Jan 14, 2010, 3:21:01 AM1/14/10

to

From: "Clive D. W. Feather" <cl...@davros.org>
Date: Wed, 13 Jan 2010 16:02:02 -0600 (CST)
Message-ID: <clcm-2010...@plethora.net>

> In message <clcm-2010...@plethora.net>, "Ersek, Laszlo"
> <la...@caesar.elte.hu> wrote:
>
>> - the element type of array is "int",
>> - the array has 10 elements,
>> - sizeof(int) == 4,
>> - sizeof array == 42 (*)
>

> (1) The Standard only allows padding in certain places, and arrays
> aren't one of them.
>

> (2) An array is a contiguously allocated nonempty set of objects
> (6.2.5#20) - again, no mention of padding.

Thank you! Same wording in C89 6.1.2.5 Types ("contiguously allocated
nonempty set of objects"). I guess I should start reading the standard
like a novel sometime. (A very dense novel.)

> (3) If you calculate (array + 10) you get a point to beyond the end of
> the array, not to somewhere within it.

A (fictive) padding starting at *(char unsigned*)&array and ending before
*(char unsigned*)&array[0], that is, a leading padding, would leave this
statement intact and still be a padding, but please ignore this paragraph
of mine. I'm happy with (1) and (2).

Thank you,
lacos

Ersek, Laszlo

unread,

Jan 14, 2010, 3:21:14 AM1/14/10

to

From: Keith Thompson <ks...@mib.org>
Date: Wed, 13 Jan 2010 16:02:18 -0600 (CST)
Message-ID: <clcm-2010...@plethora.net>

> "Ersek, Laszlo" <la...@caesar.elte.hu> writes:
>>
>> sizeof array / sizeof array[0]
>> == 42 / 4
>> == 10
>>
>> But
>>
>> sizeof array != 10 * sizeof array[0]
>> 42 != 10 * 4
>> 42 != 40
>>
>

> Fascinating.

I'm not sure this is a compliment, but thank you anyway :)

> C99 6.5.3.4p3 says:
>
> When applied to an operand that has array type, the result is the
> total number of bytes in the array.

Well, if I wasn't a presumable OCPD case, I guess I would have gotten my
answer by accepting just the next sentence there:

When applied to an operand that has structure or union type, the
result is the total number of bytes in such an object, including
internal and trailing padding.

Adjacent phrases. The one on arrays doesn't mention padding, the one on
structs and unions does. Same in C89 6.3.3.4. <blushes>

(Leading padding for structures and unions is explicitly forbidden in C99
6.7.2.1 p13-14, and trailing padding is allowed in p15. Similar wording
can be found towards the end of C89 6.5.2.1.)

> Other paragraphs that don't quite state this are:
>
> C99 6.2.5p20:
>

> An array type describes a contiguously allocated nonempty set
> of objects

FWIW, this sort of convinced me.

Thanks!

Ersek, Laszlo

unread,

Jan 14, 2010, 3:21:27 AM1/14/10

to

From: Francis Glassborow <francis.g...@btinternet.com>
Date: Wed, 13 Jan 2010 16:01:41 -0600 (CST)
Message-ID: <clcm-2010...@plethora.net>

> == and |= (comparison for equality)
> < <= > >= (comparison for ordering)
>
> The requirement that the pointer be into the same object only applies to
> the second set. == != can be applied to any pair of pointers of the same
> type (if for some reason you want to compare pointers of different types
> just convert them to void*.

Sure.

1. Trust intuition, disperse a few instances of undefined behavior in the
code, have a few nasty surprises (only by reading about them, because
today's permissive mainstream platforms sometimes don't catch you -- see
the struct hack in C89, or volatile with threads, or accessing a 2D array
through a flat pointer etc).

2. Distrust all intuition, run to the standard and c.l.c.m at every second
LOC and doubt.

3. Learn the standard by heart, no need for intuition.

Take a wild guess at which stage I'm in :)

Perhaps stage 1 is a happier form of existence. I used to see job
advertisments with code snippets going "what does this code do?" Some
people can't tell, some people can tell and are proud, and some people are
frustrated because they can't tell what the task is: pointing out the
multitude of undefined behavior in those five lines of code (and still one
is not sure about having noticed *everything*), or pointing out the
intended behavior of the code. I never tried to verify.

Since I figure c.l.c.m is generally in stage 3, I feel free to spam you
(after trying to do my homework). I hope you don't mind.

Cheers,

Barry Schwarz

unread,

Jan 14, 2010, 12:17:49 PM1/14/10

to

On Wed, 13 Jan 2010 10:40:04 -0600 (CST), "Ersek, Laszlo"
<la...@caesar.elte.hu> wrote:

Section 6.5.3.4-3 implies that padding is restricted to unions and
structures. Or at least that sizeof will not notice padding except
for operands of those types.

Sections 6.5.6-7 and 6.5.8-4 say that given
int i;
int *p = &i;
p can be treated as if i were defined
int i[1];
Since your argument is that there could be padding before the first
element (not between elements), this says it won't happen for arrays
with one element.

Since the output of %p is implementation specific, there is no
requirement that two calls to printf using the same operand produce
the same output. I think the emphasized text was leaving the option
for a perverse compiler to produce different output.

--
Remove del for email

Tim Rentsch

unread,

Jan 14, 2010, 12:18:29 PM1/14/10

to

"Ersek, Laszlo" <la...@caesar.elte.hu> writes:

> From: Francis Glassborow <francis.g...@btinternet.com>
> Date: Tue, 12 Jan 2010 21:59:33 -0600 (CST)
> Message-ID: <clcm-2010...@plethora.net>
>
>>> On Mon, 11 Jan 2010 18:25:48 -0600 (CST), "Ersek, Laszlo"
>>> <la...@caesar.elte.hu> wrote:
>>>>
>>>> (char unsigned *)&array == (char unsigned *)&array[0]
>>>>
>>>> (Provided the comparison via the equality operator is defined at all, C89
>>>> 6.3.8, 6.3.9 -- "If the objects pointed to are not members of the same
>>>> aggregate or union object, the result is undefined [...]".)
>>
>> I do not have a copy of C89 to hand but I am pretty certain that that is
>> not about the (in)equality operators. It is a standard way to determine
>> if two things are the same object by comparing their addresses for
>> equality. Indeed if you could not do that how could you check aliasing?
>
> I probably had a bit of a problem with reading comprehension.
>
> 1) 6.3.9 "Equality operators" says "Where the operands have types and
> values suitable for the relational operators, the semantics detailed
> in 6.3.8 apply".
>
> 2) I happily moved over to 6.3.8 "Relational operators" and my quote
> above is from there. The condition of the quoted statement *does*
> hold, because the pointed-to objects are not *members* of the same
> aggregate (array) object:
>
> - &array points to the whole array (has type "int (*)[10]"),
> - &array[0] points to the first element (has type "int *").

These two pointers (suitably converted) must compare
equal because one points to an object and the other
points to a subobject at the beginning of the other
object.

Francis Glassborow

unread,

Jan 14, 2010, 12:19:08 PM1/14/10

to

Keith Thompson wrote:
> That's not what the standard says. C99 6.5.6p8:
>
> Moreover, if the expression P points to the last element of an
> array object, the expression (P)+1 points one past the last
> element of the array object,
> [...]
>
> It could point to the trailing padding (which is not an element).
>
> Please note that I'm not arguing that arrays can have padding, or that
> they should be able to have padding. The point of contention is
> whether the standard actually says they can't.

I think that somewhere, either explicitly or implicitly (Clive probably
can identify where) it is required that

sizeof(array of n Ts) == n*sizeof T;

Tim Rentsch

unread,

Jan 14, 2010, 12:18:42 PM1/14/10

to

Keith Thompson <ks...@mib.org> writes:

> in the standard either. [...]

I raised basically this same question in comp.std.c about
five years ago. A query

rentsch assertion language

in google groups should find the thread fairly easily.

Tim Rentsch

unread,

Jan 14, 2010, 12:18:15 PM1/14/10

to

"Ersek, Laszlo" <la...@caesar.elte.hu> writes:

> [snip]

>
> Furthermore (I apologize for repeating myself), Chris Torek's article
> [0] states:
>
> "if you convert these pointers [ie. &array and &array[0])] to `byte
> addresses' and print them out with a %p directive in printf(), all
> three are QUITE LIKELY [emph. mine] to produce the same output (on a
> TYPICAL MODERN [emph. mine] computer"
>
> That "quite likely" makes me uncomfortable.

The explanation is, two pointers can compare equal and yet have
different representations under a %p directive. Also most
machines have only one representation for all kinds of pointers,
but some machines have different representations for pointers of
different types (which still could point to the same object if,
eg, converted to a (void*)). That's why it's only "quite likely"
that these different equal pointers will print out the same way.

Ersek, Laszlo

unread,

Jan 14, 2010, 6:33:40 PM1/14/10

to

From: =?ISO-8859-1?Q?Hans-Bernhard_Br=F6ker?= <HBBr...@t-online.de>
Date: Thu, 14 Jan 2010 02:20:35 -0600 (CST)
Message-ID: <clcm-2010...@plethora.net>

> Ersek, Laszlo wrote:

>> Namely, if such a padding is in fact allowed, then
>>
>> void f(const int *p)
>> {
>> (void)memcpy(array, p, sizeof array);
>> }
>>
>> is wrong, and the "sizeof array" expression should be replaced by
>> "sizeof array / sizeof array[0] * sizeof array[0]".
>
> The fact that you're dividing, then multiplying, by the same number,
> should already be sufficient indication that this can't be right.

The multiplicative operators (division included) have the same precedence
and are left-associative. Furthermore, the division operator applied to
two positive integer operands returns the floor of the mathematical
result. (I won't look up the exact wording of the standard, but that
should be the gist of the case in question.)

The C expression

sizeof array / sizeof array[0] * sizeof array[0]

is equivalent to

(sizeof array / sizeof array[0]) * sizeof array[0]

and computes the mathematical value of

| |
| sizeof array |
| --------------- | * sizeof array[0]
| sizeof array[0] |
| |
+--- ---+

For example, in C

42 / 10 * 10 == (42 / 10) * 10 == 4 * 10 == 40

The question was exacly whether the numerator MUST be an integral multiple
of the denominator. (In which case floor maps to identity.)

--o--

Rounding A up to the smallest integral multiple of B that is not less than
A is sometimes written as

(A + (B - 1)) / B * B

Because

A | A / B | (A + (B - 1)) / B
+------+-------+--------+---------
| quot | rem | quot | rem
----------------+------+-------+--------+---------
x * B | x | 0 | x | B - 1
x * B + 1 | x | 1 | x + 1 | 0
... | x | ... | x + 1 | ...
x * B + (B - 1) | x | B - 1 | x + 1 | B - 2

(Closed form congruences would look better.)

--o--

When allocating memory for an array via malloc(), I like to check if the
expression computing the size_t argument contains no overflow. (I'm not
sure if calloc() is required to do this internally.) Let SIZE_MAX denote
(size_t)-1, ie. the greatest number representible as size_t.

size_t nmemb, size;

are the input parameters and their product must be checked against
overflow before actually multiplying them. I'll assume they are both
positive. I'll try to mark each expression below with (M) if it's a
mathematical expression and with (C) if it's a C expression. The product
can be represented in C iff

(1) (M) nmemb * size <= SIZE_MAX

One comes to think that the C condition

(2) (C) nmemb <= SIZE_MAX / size

is trivially equivalent to the mathematical fact under (1). It wasn't
trivial for me. (2) is equivalent to

(3) (M) nmemb <= floor(SIZE_MAX / size)

Claim: (1) iff (3) -- thus (1) iff (2).

"Proof":

a) (3) implies (1):

(3) (M) nmemb <= floor(SIZE_MAX / size)

There uniquely exists an integer T in [0, size-1] so that

(4) (M) floor(SIZE_MAX / size) = (SIZE_MAX - T) / size

Putting (4) into (3),

(5) (M) nmemb <= (SIZE_MAX - T) / size

Multiplying by size,

(6) (M) nmemb * size <= SIZE_MAX - T

T is non-negative,

(7) (M) nmemb * size <= SIZE_MAX - T <= SIZE_MAX

Which yields (1).

b) !(3) implies !(1):

Reformulating !(3) results in

(8) (M) nmemb > floor(SIZE_MAX / size)

Since both sides are integers, this is equivalent to

(9) (M) nmemb >= floor(SIZE_MAX / size) + 1

Reusing the definition of T from under (4),

(10) (M) nmemb >= (SIZE_MAX - T) / size + 1

Moving +1 in between the parentheses,

(11) (M) nmemb >= (SIZE_MAX - T + size) / size

Multiplying by size,

(12) (M) nmemb * size >= SIZE_MAX + (size - T)

(size - T) is positive, see T's definition:

(13) (M) T <= size - 1

(14) (M) 1 <= size - T

Thus, from (12),

(15) (M) nmemb * size >= SIZE_MAX + (size - T) > SIZE_MAX

Which is !(1).

Therefore,

void *
myalloc(size_t nmemb, size_t size)
{
return
(0u == size || nmemb <= (size_t)-1 / size)
? malloc(nmemb * size)
: 0;
}

(If passing 0 to malloc() is allowed and setting errno to ENOMEM is not
required.)

Simply checking if the (possibly truncated) product is not less than each
of the factors is not enough; if size_t is an unsigned integer type with
32 value bits, then 0xAAAAAAAA * 0x3 tricks that check.

But I digress. I'm sure this is all well known or even trivial, I just
felt like fooling around a bit, sorry.

Cheers,
lacos

Francis Glassborow

unread,

Jan 14, 2010, 6:33:53 PM1/14/10

to

Tim Rentsch wrote:
> "Ersek, Laszlo" <la...@caesar.elte.hu> writes:
>
>> [snip]
>>
>> Furthermore (I apologize for repeating myself), Chris Torek's article
>> [0] states:
>>
>> "if you convert these pointers [ie. &array and &array[0])] to `byte
>> addresses' and print them out with a %p directive in printf(), all
>> three are QUITE LIKELY [emph. mine] to produce the same output (on a
>> TYPICAL MODERN [emph. mine] computer"
>>
>> That "quite likely" makes me uncomfortable.
>
> The explanation is, two pointers can compare equal and yet have
> different representations under a %p directive. Also most
> machines have only one representation for all kinds of pointers,
> but some machines have different representations for pointers of
> different types (which still could point to the same object if,
> eg, converted to a (void*)). That's why it's only "quite likely"
> that these different equal pointers will print out the same way.

It is perfectly possible for two distinct bit-patterns to represent the
same address. However in such a case they must compare equal. However, I
am not sure that %p requires that such different bit-patterns should
result in identical output.

For those of you who doubt my assertion think of the way that windows
on early machines used a base and offset to represent addresses. Using
different bases could (and did) result in different offsets but when
pointers were compared they were reduced to some canonical form.

Keith Thompson

unread,

Jan 14, 2010, 6:34:19 PM1/14/10

to

Francis Glassborow <francis.g...@btinternet.com> writes:
> Keith Thompson wrote:
>> That's not what the standard says. C99 6.5.6p8:
>>
>> Moreover, if the expression P points to the last element of an
>> array object, the expression (P)+1 points one past the last
>> element of the array object,
>> [...]
>>
>> It could point to the trailing padding (which is not an element).
>>
>> Please note that I'm not arguing that arrays can have padding, or that
>> they should be able to have padding. The point of contention is
>> whether the standard actually says they can't.
> I think that somewhere, either explicitly or implicitly (Clive
> probably can identify where) it is required that
>
> sizeof(array of n Ts) == n*sizeof T;

I eagerly await a citation. Nobody was able to offer one when this
was raised in comp.std.c in 2005.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,

Jan 14, 2010, 6:34:06 PM1/14/10

to

Found it. The subject was "sizeof(T[N]) == (N) * sizeof(T) ?";
the original article was posted 2005-03-28, Message-ID
<kfnll87...@alumnus.caltech.edu>. I even posted 3 followups
in the thread myself.

Several participants seemed to feel that C99 6.2.5p20:

An _array type_ describes a contiguously allocated nonempty set of
objects with a particular member object type, called the _element
type_.

settles the question. I disagree, particularly in view of p21:

A _structure type_ describes a sequentially allocated nonempty set
of member objects (and, in certain circumstances, an incomplete
array), each of which has an optionally specified name and
possibly distinct type.

Neither paragraph says whether padding at the end is allowed or not.
It clearly is allowed for structures (this is stated explicitly
in C99 6.7.2.1p13). And given C99 6.2.6.1p1:

The representations of all types are unspecified except as stated
in this subclause.

I'm not comfortable concluding from the lack of explicit permission
for array types to have padding that they padding is forbidden.

I'm convinced that it's the intent that arrays *cannot* have padding,
and that the authors of the standard implicitly assumed this;
see the example of the sizeof operator in C99 6.5.3.4p6.

But if I saw a compiler that added padding to the end of an array
(which must be smaller than the element size, and must allow for
proper element alignment in 2-dimensional arrays), I'm not at all
sure what wording from the standard I could cite to prove that
it's non-conforming.

Another little glitch:

C99 6.2.6.1p1 says:
The representations of all types are unspecified except as stated
in this subclause.

But C99 6.7.2.1p13 says:
There may be unnamed padding within a structure object, but not at
its beginning.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,

Jan 14, 2010, 11:41:24 PM1/14/10

to

Francis Glassborow <francis.g...@btinternet.com> writes:
[...]

> It is perfectly possible for two distinct bit-patterns to represent
> the same address. However in such a case they must compare
> equal.

Agreed.

> However, I am not sure that %p requires that such different
> bit-patterns should result in identical output.

Hmm.

C99 7.19.6.1p8, describing the "%p" conversion specifier, says:

The argument shall be a pointer to void. The value of the pointer
is converted to a sequence of printing characters, in an
implementation-defined manner.

One plausible reading of that is that the sequence depends only on
the *value* of the pointer, not on its representation. (If two
different pointer representations compare equal, they represent
the same value.)

> For those of you who doubt my assertion think of the way that windows
> on early machines used a base and offset to represent addresses. Using
> different bases could (and did) result in different offsets but when
> pointers were compared they were reduced to some canonical form.

I see no real harm in having fprintf("%p", ...) produce different
results for different representations of the same pointer value, as
long as the results returned by fscanf("%p", ...) compare equal.
But one could argue that this would violate the standard.

If so, it shouldn't be too much of a burden for "%p" to canonicalize
its argument before printing it.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Francis Glassborow

unread,

Jan 15, 2010, 12:15:43 PM1/15/10

to

Keith Thompson wrote:

>
> Found it. The subject was "sizeof(T[N]) == (N) * sizeof(T) ?";
> the original article was posted 2005-03-28, Message-ID
> <kfnll87...@alumnus.caltech.edu>. I even posted 3 followups
> in the thread myself.
>
> Several participants seemed to feel that C99 6.2.5p20:
>
> An _array type_ describes a contiguously allocated nonempty set of
> objects with a particular member object type, called the _element
> type_.
>
> settles the question. I disagree, particularly in view of p21:
>
> A _structure type_ describes a sequentially allocated nonempty set
> of member objects (and, in certain circumstances, an incomplete
> array), each of which has an optionally specified name and
> possibly distinct type.

Note that the first uses contiguously and the second sequentially. I
think we all agree that an array cannot have any internal padding.

So what about at the end? Well we have already argued that any such
padding must be less than the sizeof an element.
However how does this work when we have a 2D array. The constituent 1D
arrays are the elements of the 2D array and these must be contiguous,
not even a single byte between successive 1D array elements.
I wonder if that is enough to deduce that there cannot be any padding at
the end of an array?

In addition any implementation that chose to add padding (assuming that
the Standard actually does not forbid it) would break reams of code (for
example code where a 2D array is walked as a 1D array).

Keith Thompson

unread,

Jan 15, 2010, 2:00:35 PM1/15/10

to

Francis Glassborow <francis.g...@btinternet.com> writes:
> Keith Thompson wrote:
>> Found it. The subject was "sizeof(T[N]) == (N) * sizeof(T) ?";
>> the original article was posted 2005-03-28, Message-ID
>> <kfnll87...@alumnus.caltech.edu>. I even posted 3 followups
>> in the thread myself.
>>
>> Several participants seemed to feel that C99 6.2.5p20:
>>
>> An _array type_ describes a contiguously allocated nonempty set of
>> objects with a particular member object type, called the _element
>> type_.
>>
>> settles the question. I disagree, particularly in view of p21:
>>
>> A _structure type_ describes a sequentially allocated nonempty set
>> of member objects (and, in certain circumstances, an incomplete
>> array), each of which has an optionally specified name and
>> possibly distinct type.
>
> Note that the first uses contiguously and the second sequentially. I
> think we all agree that an array cannot have any internal padding.
>
> So what about at the end? Well we have already argued that any such
> padding must be less than the sizeof an element.

Yes, but only based on a (non-normative) example.

> However how does this work when we have a 2D array. The constituent 1D
> arrays are the elements of the 2D array and these must be contiguous,
> not even a single byte between successive 1D array elements.
> I wonder if that is enough to deduce that there cannot be any padding
> at the end of an array?

No.

Suppose sizeof (int) == 4, alignof (int) == 2 (assume an "alignof"
keyword with the obvious semantics), and sizeof (int[10]) == 42 (the
latter's legality is the point of contention).

Given:
int arr[10][10];
each element of arr is of type arr[10]. Each such element occupies
42 bytes, consisting of 40 bytes of int sub-elements plus 2 bytes
of trailing padding. The int[10] elements of arr are allocated
contiguously. The second element starts 42 bytes after the first,
and sizeof (arr) is 420 plus the size of any trailing padding.

> In addition any implementation that chose to add padding (assuming
> that the Standard actually does not forbid it) would break reams of
> code (for example code where a 2D array is walked as a 1D array).

Yes. But it's been argued that such code is already non-conforming,
because it attempts to access an array (the 1D sub-array) beyond
its bounds.

If there were any examples of such code in the standard, it would
argue (though only non-normatively) that it's intended to be
conforming and that arrays may not have trailing padding, but I
don't think there is.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Tim Rentsch

unread,

Jan 15, 2010, 11:06:23 PM1/15/10

to

Keith Thompson <ks...@mib.org> writes:

It's hard to understand the point of this complaint. (Forgive me
for using a loaded word like complaint but it was the best I
could think of.) You think the meaning is clear even though it
isn't stated directly enough? I know of no language in the
Standard that suggests that arrays may have padding at the end.
Conversely, in all cases I can think of, what evidence is there
seems to imply that arrays do not have padding. I know about the
"unspecified except as stated in this subclause" phrase, but the
contents of arrays _are_ specified: a contiguously allocated
nonempty set of objects. Structures can have padding, but there
is an explicit statement to that effect; why would such a
statement be there for structures and not for arrays? Also,
padding bytes in structures take unspecified values when a member
is stored -- if arrays can have padding bytes, why isn't there
some statement about what happens to the padding bytes when an
array member is stored? Or, here's another idea -- if arrays
can have padding bytes, then why can't they also have floating
point exception flags?

I should apologize here, because I don't really mean this as
sarcastically as it probably sounds. I _think_ all you're saying
is that the Standard's meaning is clear but the actual text
doesn't state this requirement directly enough or explicitly
enough. Is that right? If so then I wouldn't presume to argue
with that statement. (Of course I might not agree with it but I
still wouldn't presume to argue with it.)

> Another little glitch:
>
> C99 6.2.6.1p1 says:
> The representations of all types are unspecified except as stated
> in this subclause.
>
> But C99 6.7.2.1p13 says:
> There may be unnamed padding within a structure object, but not at
> its beginning.

Sadly the entry in 6.7.2.1p13 is one of several places outside
of 6.2.6 that discuss various aspects of how different types
are represented.

Clive D. W. Feather

unread,

Jan 20, 2010, 4:26:17 PM1/20/10

to

In message <clcm-2010...@plethora.net>, Keith Thompson

<ks...@mib.org> wrote:
>> (1) The Standard only allows padding in certain places, and arrays
>> aren't one of them.
>
>But does it *disallow* padding? Do the places in the standard that
>permit padding state that it's not allowed elsewhere?
>
>Not saying that arrays can have padding isn't the same as saying that
>arrays can't have padding.

WG14 has always taken the approach, when interpreting the Standard, that
when X is described as permitted in situations Y and Z, this implies
that it is not permitted in any other situation.

I was told that this is part of the ISO rules for interpreting
standards, but I can't provide you with any evidence of that.

>> (2) An array is a contiguously allocated nonempty set of objects
>> (6.2.5#20) - again, no mention of padding.
>
>A structure is a sequentially allocated nonempty set of member objects
>(6.2.5#21) - again, no mention of padding, at least not in that place.
>
>"Contiguously allocated" implies that there can't be padding between
>elements; I'm trying to figure out whether padding at the end of an
>array is permitted.

If it was, I'm sure we would have written "contiguously allocated
nonempty set of objects, plus potential padding", or something like
that.

>> (3) If you calculate (array + 10) you get a point to beyond the end of
>> the array, not to somewhere within it.
>
>That's not what the standard says. C99 6.5.6p8:
>
> Moreover, if the expression P points to the last element of an
> array object, the expression (P)+1 points one past the last
> element of the array object,
> [...]
>
>It could point to the trailing padding (which is not an element).

True.

--
Clive D.W. Feather | Home: <cl...@davros.org>
Mobile: +44 7973 377646 | Web: <http://www.davros.org>
Please reply to the Reply-To address, which is: <cl...@davros.org>

Clive D. W. Feather

unread,

Jan 20, 2010, 4:26:30 PM1/20/10

to

In message <clcm-2010...@plethora.net>, Keith Thompson
<ks...@mib.org> wrote:

>Another little glitch:
>
>C99 6.2.6.1p1 says:
> The representations of all types are unspecified except as stated
> in this subclause.
>
>But C99 6.7.2.1p13 says:
> There may be unnamed padding within a structure object, but not at
> its beginning.

Since I wrote 6.2.6.1, I have to take the blame for that one.

--
Clive D.W. Feather | Home: <cl...@davros.org>
Mobile: +44 7973 377646 | Web: <http://www.davros.org>
Please reply to the Reply-To address, which is: <cl...@davros.org>

Keith Thompson

unread,

Jan 20, 2010, 5:03:15 PM1/20/10

to

"Clive D. W. Feather" <cl...@davros.org> writes:

> In message <clcm-2010...@plethora.net>, Keith Thompson
> <ks...@mib.org> wrote:
>>> (1) The Standard only allows padding in certain places, and arrays
>>> aren't one of them.
>>
>>But does it *disallow* padding? Do the places in the standard that
>>permit padding state that it's not allowed elsewhere?
>>
>>Not saying that arrays can have padding isn't the same as saying that
>>arrays can't have padding.
>
> WG14 has always taken the approach, when interpreting the Standard,
> that when X is described as permitted in situations Y and Z, this
> implies that it is not permitted in any other situation.
>
> I was told that this is part of the ISO rules for interpreting
> standards, but I can't provide you with any evidence of that.

[...]

That would be well worth mentioning somewhere in the standard, even if
only in a non-normative foreword or introduction.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,

Jan 20, 2010, 11:08:04 PM1/20/10

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
> Keith Thompson <ks...@mib.org> writes:

[big snip]

>> I'm convinced that it's the intent that arrays *cannot* have padding,
>> and that the authors of the standard implicitly assumed this;
>> see the example of the sizeof operator in C99 6.5.3.4p6.
>>
>> But if I saw a compiler that added padding to the end of an array
>> (which must be smaller than the element size, and must allow for
>> proper element alignment in 2-dimensional arrays), I'm not at all
>> sure what wording from the standard I could cite to prove that
>> it's non-conforming.
>
> It's hard to understand the point of this complaint. (Forgive me
> for using a loaded word like complaint but it was the best I
> could think of.) You think the meaning is clear even though it
> isn't stated directly enough?

Something like that.

It's fairly obvious that a lot of things in the standard are
based on the implicit assumption that arrays may not have padding.
But there's apparently no direct statement that arrays may have
padding. It seems obvious, but it's not actually stated.

In Ada, for example, you can easily have a record (structure) type
whose size is not a multiple of its required alignment. Given the
Ada equivalent of:
struct s {
int i;
char c;
};
you might have a 4-byte alignment requirement for both int and
struct s, but sizeof(struct s) could be 5. An array of struct s
would then require 3 bytes of padding after each element.

This isn't possible in C, and we've already established that C
can't have padding *between* array elements (which means that,
in this case, the padding has to be in the structure). The point,
though, is the idea of padding in arrays isn't inherently absurd.

An analogy: Imagine if C99 6.5.6p5:

The result of the binary + operator is the sum of the operands.

were deleted. There are plenty of other things in the standard that
are based on the assumption that "+" yields the sum of its operands,
and it would be unreasonable for an implementer to have it do
something else. But I would certainly complain if it were never
stated.

> I know of no language in the
> Standard that suggests that arrays may have padding at the end.
> Conversely, in all cases I can think of, what evidence is there
> seems to imply that arrays do not have padding. I know about the
> "unspecified except as stated in this subclause" phrase, but the
> contents of arrays _are_ specified: a contiguously allocated
> nonempty set of objects. Structures can have padding, but there
> is an explicit statement to that effect; why would such a
> statement be there for structures and not for arrays? Also,
> padding bytes in structures take unspecified values when a member
> is stored -- if arrays can have padding bytes, why isn't there
> some statement about what happens to the padding bytes when an
> array member is stored?

Yes, all of this demonstrates that the authors of the standard
*assumed* that arrays can't have padding. I don't dispute that.

> Or, here's another idea -- if arrays
> can have padding bytes, then why can't they also have floating
> point exception flags?
>
> I should apologize here, because I don't really mean this as
> sarcastically as it probably sounds. I _think_ all you're saying
> is that the Standard's meaning is clear but the actual text
> doesn't state this requirement directly enough or explicitly
> enough. Is that right? If so then I wouldn't presume to argue
> with that statement. (Of course I might not agree with it but I
> still wouldn't presume to argue with it.)

My complaint isn't that the standard "doesn't state this requirement
directly enough or explicitly enough". It's that it doesn't state it
at all.

[snip]

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Tim Rentsch

unread,

Jan 21, 2010, 12:02:48 PM1/21/10

to

"Clive D. W. Feather" <cl...@davros.org> writes:

> In message <clcm-2010...@plethora.net>, Keith Thompson
> <ks...@mib.org> wrote:
>>> (1) The Standard only allows padding in certain places, and arrays
>>> aren't one of them.
>>
>>But does it *disallow* padding? Do the places in the standard that
>>permit padding state that it's not allowed elsewhere?
>>
>>Not saying that arrays can have padding isn't the same as saying that
>>arrays can't have padding.
>
> WG14 has always taken the approach, when interpreting the Standard,
> that when X is described as permitted in situations Y and Z, this
> implies that it is not permitted in any other situation.

Is it also true that X is described as _not_ permitted in
some particular circumstance(s), this implies that it
is permitted in circumstances that don't specifically
prohibit it? (Sorry I don't have an example handy.)

Tim Rentsch

unread,

Jan 21, 2010, 12:02:35 PM1/21/10

to

Keith Thompson <ks...@mib.org> writes:

> Tim Rentsch <t...@alumni.caltech.edu> writes:
>> Keith Thompson <ks...@mib.org> writes:
> [big snip]
>>> I'm convinced that it's the intent that arrays *cannot* have padding,
>>> and that the authors of the standard implicitly assumed this;
>>> see the example of the sizeof operator in C99 6.5.3.4p6.
>>>
>>> But if I saw a compiler that added padding to the end of an array
>>> (which must be smaller than the element size, and must allow for
>>> proper element alignment in 2-dimensional arrays), I'm not at all
>>> sure what wording from the standard I could cite to prove that
>>> it's non-conforming.
>>
>> It's hard to understand the point of this complaint. (Forgive me
>> for using a loaded word like complaint but it was the best I
>> could think of.) You think the meaning is clear even though it
>> isn't stated directly enough?
>
> Something like that.
>
> It's fairly obvious that a lot of things in the standard are
> based on the implicit assumption that arrays may not have padding.
> But there's apparently no direct statement that arrays may have

> padding. [snip]

I assume you meant there's no direct statement that arrays may
NOT have padding. Certainly there is no direct statement
that arrays may HAVE padding.

>> I know of no language in the
>> Standard that suggests that arrays may have padding at the end.
>> Conversely, in all cases I can think of, what evidence is there
>> seems to imply that arrays do not have padding. I know about the
>> "unspecified except as stated in this subclause" phrase, but the
>> contents of arrays _are_ specified: a contiguously allocated
>> nonempty set of objects. Structures can have padding, but there
>> is an explicit statement to that effect; why would such a
>> statement be there for structures and not for arrays? Also,
>> padding bytes in structures take unspecified values when a member
>> is stored -- if arrays can have padding bytes, why isn't there
>> some statement about what happens to the padding bytes when an
>> array member is stored?
>
> Yes, all of this demonstrates that the authors of the standard
> *assumed* that arrays can't have padding. I don't dispute that.

IMO it demonstrates that the standard's authors believe the
standard _requires_ arrays can't have padding. Not that
they just assumed it.

>> Or, here's another idea -- if arrays
>> can have padding bytes, then why can't they also have floating
>> point exception flags?
>>
>> I should apologize here, because I don't really mean this as
>> sarcastically as it probably sounds. I _think_ all you're saying
>> is that the Standard's meaning is clear but the actual text
>> doesn't state this requirement directly enough or explicitly
>> enough. Is that right? If so then I wouldn't presume to argue
>> with that statement. (Of course I might not agree with it but I
>> still wouldn't presume to argue with it.)
>
> My complaint isn't that the standard "doesn't state this requirement
> directly enough or explicitly enough". It's that it doesn't state it
> at all.

In my reading it does.

Keith Thompson

unread,

Jan 21, 2010, 3:01:40 PM1/21/10

to

Yes, that's what I meant.

[snip]

>> Yes, all of this demonstrates that the authors of the standard
>> *assumed* that arrays can't have padding. I don't dispute that.
>
> IMO it demonstrates that the standard's authors believe the
> standard _requires_ arrays can't have padding. Not that
> they just assumed it.

Or it demonstrates that they thought it was so obvious that it didn't
need to be stated. If so, I disagree.

[...]

>> My complaint isn't that the standard "doesn't state this requirement
>> directly enough or explicitly enough". It's that it doesn't state it
>> at all.
>
> In my reading it does.

I suggest that the fact that we're having this discussion implies
that, at the very least, the standard would be clearer if it stated
this explicitly.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Tim Rentsch

unread,

Jan 21, 2010, 4:27:49 PM1/21/10

to

Keith Thompson <ks...@mib.org> writes:

> Tim Rentsch <t...@alumni.caltech.edu> writes:
>> Keith Thompson <ks...@mib.org> writes:
> [...]
>>> My complaint isn't that the standard "doesn't state this requirement
>>> directly enough or explicitly enough". It's that it doesn't state it
>>> at all.
>>
>> In my reading it does.
>
> I suggest that the fact that we're having this discussion implies
> that, at the very least, the standard would be clearer if it stated
> this explicitly.

I absolutely agree on that.

blytkerchan

unread,

Jan 23, 2010, 12:09:56 AM1/23/10

to

On Jan 14, 6:34 pm, Keith Thompson <ks...@mib.org> wrote:
> Tim Rentsch <t...@alumni.caltech.edu> writes:
> > Keith Thompson <ks...@mib.org> writes:
>
> >> "Ersek, Laszlo" <la...@caesar.elte.hu> writes:

> >>> From: Barry Schwarz <schwa...@dqel.com>

> >>>> On Mon, 11 Jan 2010 18:25:48 -0600 (CST), "Ersek, Laszlo"
> >>>> <la...@caesar.elte.hu> wrote:
>
> >>>>> Here's my problem: does it hold that
>
> >>>>> sizeofarray== 10 * sizeofarray[0]
>

> >>>> If thearrayis in scope, you can determine the number of elements
> >>>> with the expression sizeofarray/ sizeof *array. From this, you can

> >>>> determine that the answer to your question is yes.
>
> >>> Suppose:
> >>> - the element type ofarrayis "int",

> >>> - thearrayhas 10 elements,

> >>> - sizeof(int) == 4,
> >>> - sizeofarray== 42 (*)
>
> >>> Now
>

> >>> sizeofarray/ sizeofarray[0]

> >>> == 42 / 4
> >>> == 10
>
> >>> But
>

> >>> sizeofarray!= 10 * sizeofarray[0]

> >>> 42 != 10 * 4
> >>> 42 != 40
>
> >>> I have no idea why this would be useful, but I can't see "(*)"
> >>> explicitly forbidden anywhere, nor can I derive that prohibition
> >>> myself.
>

> >> Fascinating. I've always assumed that the size of anarraymust be

> >> exactly N times the size of an element, but I can't find a guarantee
> >> in the standard either. [...]
>
> > I raised basically this same question in comp.std.c about
> > five years ago. A query
>
> > rentsch assertion language
>
> > in google groups should find the thread fairly easily.
>
> Found it. The subject was "sizeof(T[N]) == (N) * sizeof(T) ?";
> the original article was posted 2005-03-28, Message-ID

> <kfnll871jtq....@alumnus.caltech.edu>. I even posted 3 followups

> in the thread myself.
>
> Several participants seemed to feel that C99 6.2.5p20:
>
> An _array type_ describes a contiguously allocated nonempty set of
> objects with a particular member object type, called the _element
> type_.
>
> settles the question. I disagree, particularly in view of p21:
>
> A _structure type_ describes a sequentially allocated nonempty set
> of member objects (and, in certain circumstances, an incomplete
> array), each of which has an optionally specified name and
> possibly distinct type.
>

> Neither paragraph says whetherpaddingat theendis allowed or not.

> It clearly is allowed for structures (this is stated explicitly
> in C99 6.7.2.1p13). And given C99 6.2.6.1p1:
>
> The representations of all types are unspecified except as stated
> in this subclause.
>
> I'm not comfortable concluding from the lack of explicit permission

> forarraytypes to havepaddingthat theypaddingis forbidden.

>
> I'm convinced that it's the intent that arrays *cannot* havepadding,
> and that the authors of the standard implicitly assumed this;
> see the example of the sizeof operator in C99 6.5.3.4p6.
>
> But if I saw a compiler that addedpaddingto theendof anarray
> (which must be smaller than the element size, and must allow for
> proper element alignment in 2-dimensional arrays), I'm not at all
> sure what wording from the standard I could cite to prove that
> it's non-conforming.

Sorry for butting in so late in the discussion - I'm (quite) a bit
behind on my reading. However, I'd like to offer a look from a
different angle: IIRC the Cray-1 architecture, which has "address
registers" into which pointers are loaded, generates a fault if an
address is loaded that doesn't correspond to a valid memory location.
If my memory isn't playing tricks on me, then a C standard that
prohibits padding at the end of an array would either make it
impossible to use the one-past-the-end (conventional) way of finding
the end of an array, or would make it impossible to write a reasonably
useful but standard-compliant C compiler for Cray-1.

The pointer to one-past-the-end should be valid, but not referencible
- i.e.
T a[10];
T * a_end = a + 10;
should be valid code, but dereferencing a_end results in undefined
behavior. On an architecture where pointers must point to real memory,
this would require at least one byte (word, maybe?) of padding behind
the array.

I don't know if this also means that sizeof should report the extra
padding. Normally, I'd say it should, but if it does, and T is char,
it would report sizeof a as 11, which would strike me as odd and would
break sizeof a / sizeof a[0].

Just a little disclaimer: I've been looking for documentation that
confirms my memory on Cray-1, but I haven't found anything detailed
enough, so this is basically conjecture at this point.

Ronald Landheer-Cieslak

--
Software Development Professional
http://landheer-cieslak.com
http://vlinder.ca

Keith Thompson

unread,

Jan 23, 2010, 4:37:22 AM1/23/10

to

blytkerchan <blytk...@gmail.com> writes:
> On Jan 14, 6:34 pm, Keith Thompson <ks...@mib.org> wrote:

[...]

>> I'm not comfortable concluding from the lack of explicit permission
>> forarraytypes to havepaddingthat theypaddingis forbidden.
>>
>> I'm convinced that it's the intent that arrays *cannot* havepadding,
>> and that the authors of the standard implicitly assumed this;
>> see the example of the sizeof operator in C99 6.5.3.4p6.
>>
>> But if I saw a compiler that addedpaddingto theendof anarray
>> (which must be smaller than the element size, and must allow for
>> proper element alignment in 2-dimensional arrays), I'm not at all
>> sure what wording from the standard I could cite to prove that
>> it's non-conforming.

There really were spaces between all the words. I've seen Google
Groups mess up this way before. I wonder what causes it.

Anyway ...

> Sorry for butting in so late in the discussion - I'm (quite) a bit
> behind on my reading. However, I'd like to offer a look from a
> different angle: IIRC the Cray-1 architecture, which has "address
> registers" into which pointers are loaded, generates a fault if an
> address is loaded that doesn't correspond to a valid memory location.
> If my memory isn't playing tricks on me, then a C standard that
> prohibits padding at the end of an array would either make it
> impossible to use the one-past-the-end (conventional) way of finding
> the end of an array, or would make it impossible to write a reasonably
> useful but standard-compliant C compiler for Cray-1.

I wouldn't be that much of a problem. All that would be necessary is
for each full object (meaning each object that isn't part of a larger
object) to have a single extra byte (word?) at the end. A full object
is either an explicitly declared named object, or an object created by
malloc, calloc, or realloc.

For a 2-dimensional array, for example, you wouldn't need the extra
byte at the end of each row; a pointer just past the end of the first
row points to the beginning of the second row.

And even this would apply only to an object at the very tail end of
the range of valid memory locations. The compiler could just refuse
to allocate an object there, restricting the address space usable by C
from, say, 1048576 words to 1048575 words.

[snip]

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

James Kuyper

unread,

Jan 23, 2010, 5:22:04 PM1/23/10

to

blytkerchan wrote:
...

> Sorry for butting in so late in the discussion - I'm (quite) a bit
> behind on my reading. However, I'd like to offer a look from a
> different angle: IIRC the Cray-1 architecture, which has "address
> registers" into which pointers are loaded, generates a fault if an
> address is loaded that doesn't correspond to a valid memory location.
> If my memory isn't playing tricks on me, then a C standard that
> prohibits padding at the end of an array would either make it
> impossible to use the one-past-the-end (conventional) way of finding
> the end of an array, or would make it impossible to write a reasonably
> useful but standard-compliant C compiler for Cray-1.
>
> The pointer to one-past-the-end should be valid, but not referencible
> - i.e.
> T a[10];
> T * a_end = a + 10;
> should be valid code, but dereferencing a_end results in undefined
> behavior. On an architecture where pointers must point to real memory,
> this would require at least one byte (word, maybe?) of padding behind
> the array.

I think that you're assuming that each array is a separate block of
valid memory locations. I known nothing about that particular machine,
but I do know that similar issues are dealt with on other machines by
allocating large blocks of memory, and then subdividing those blocks
into the various objects needed by the program. As long as an array is
never allocated too close to the end of the large block of memory,
there's no hardware problem with the one-past-the-end pointers.

> I don't know if this also means that sizeof should report the extra
> padding.

No, it should not. The extra padding is an implementation detail which
is not part of the C object itself.

Francis Glassborow

unread,

Jan 23, 2010, 5:21:50 PM1/23/10

to

Keith Thompson wrote:

>> Sorry for butting in so late in the discussion - I'm (quite) a bit
>> behind on my reading. However, I'd like to offer a look from a
>> different angle: IIRC the Cray-1 architecture, which has "address
>> registers" into which pointers are loaded, generates a fault if an
>> address is loaded that doesn't correspond to a valid memory location.
>> If my memory isn't playing tricks on me, then a C standard that
>> prohibits padding at the end of an array would either make it
>> impossible to use the one-past-the-end (conventional) way of finding
>> the end of an array, or would make it impossible to write a reasonably
>> useful but standard-compliant C compiler for Cray-1.
>
> I wouldn't be that much of a problem. All that would be necessary is
> for each full object (meaning each object that isn't part of a larger
> object) to have a single extra byte (word?) at the end. A full object
> is either an explicitly declared named object, or an object created by
> malloc, calloc, or realloc.

We do not have to go that far. All that is required is that the address
one beyond the end of an array is owned by the program. In the case of
static and locals that is not generally a problem. In the case of
dynamic objects it means that the malloc family must (as part of the
suitably aligned requirement?) must ensure that there is always a spare
byte at the end. of any allocation. For example when the programmer
requests n bytes malloc allocates n+1 starting from a suitable alignment
boundary.

>
> For a 2-dimensional array, for example, you wouldn't need the extra
> byte at the end of each row; a pointer just past the end of the first
> row points to the beginning of the second row.
>
> And even this would apply only to an object at the very tail end of
> the range of valid memory locations. The compiler could just refuse
> to allocate an object there, restricting the address space usable by C
> from, say, 1048576 words to 1048575 words.
>
> [snip]
>

--
Note that robinton.demon.co.uk addresses are no longer valid.

Pierre Asselin

unread,

Jan 30, 2010, 3:13:30 PM1/30/10

to

Keith Thompson <ks...@mib.org> wrote:

> Francis Glassborow <francis.g...@btinternet.com> writes:
> > In addition any implementation that chose to add padding (assuming
> > that the Standard actually does not forbid it) would break reams of
> > code (for example code where a 2D array is walked as a 1D array).

The ability to switch between 1-D and n-D arrays is *absolutely essential*
to numerical computing.

> Yes. But it's been argued that such code is already non-conforming,
> because it attempts to access an array (the 1D sub-array) beyond
> its bounds.

> If there were any examples of such code in the standard, it would
> argue (though only non-normatively) that it's intended to be
> conforming and that arrays may not have trailing padding, but I
> don't think there is.

There is an example in the Rationale (5.10)] that comes close.
About section 6.7.5.3, Function declarators:

void g(double *ap, int n)
{
double (*a)[n]= (double (*)[n]) ap;
/* ... */ a[1][2] /* ... */
}

/* ... */

{
double x[10][10];
g(&x[0][0], 10);
}

I say "comes close" because "x" is declared "double x[10][10]" as
opposed to "double x[10*10]". The example is type-punning a 2-D
array to 1-D and back, as opposed to type-punning a true 1-D array.
The example would still work with padding but the whole thing would
look really contrived. (If that was the intent the example would
be better written with (void *ap) or (unsigned char *ap) instead
of double.)

The Rationale mentions applicability to numerical computing a few
times. I know the committee is not always of one mind, but it
seems that some members expect the straightforward memory layout.
That the normative standard allows padding looks like a [minor]
bug.

Keith Thompson

unread,

Jan 30, 2010, 11:50:49 PM1/30/10

to

p...@see.signature.invalid (Pierre Asselin) writes:
> Keith Thompson <ks...@mib.org> wrote:
>
>> Francis Glassborow <francis.g...@btinternet.com> writes:
>> > In addition any implementation that chose to add padding (assuming
>> > that the Standard actually does not forbid it) would break reams of
>> > code (for example code where a 2D array is walked as a 1D array).
>
> The ability to switch between 1-D and n-D arrays is *absolutely essential*
> to numerical computing.
>
>> Yes. But it's been argued that such code is already non-conforming,
>> because it attempts to access an array (the 1D sub-array) beyond
>> its bounds.

[snip]

I agree that there are good reasons for allowing safe type-punning
between 1-D and 2-D (and N-D) arrays. But consider this:

#include <stdio.h>

static double matrix[10][10];

static void foo(double (*p)[10])
{
(*p)[10] = 1.25;
}

int main(void)
{
foo(&matrix[0]);
printf("matrix[1][0] = %g\n", matrix[1][0]);
return 0;
}

Is the code generated for foo() not allowed to perform bounds
checking on references to *p, which is of type double[10]?
What if the arugment passed to foo() were a pointer to a standalone
double[10] array rather than to an element of a double[10][10] array?

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Hans-Bernhard Bröker

unread,

Jan 31, 2010, 12:47:46 PM1/31/10

to

Keith Thompson wrote:
> #include <stdio.h>
>
> static double matrix[10][10];
>
> static void foo(double (*p)[10])
> {
> (*p)[10] = 1.25;

This is plainly wrong, but for reasons that bear no relevance to the
issue under discussion.

This example just misses whatever point is still being discussed here.

Keith Thompson

unread,

Feb 1, 2010, 12:16:33 AM2/1/10

to

Hans-Bernhard Bröker <HBBr...@t-online.de> writes:
> Keith Thompson wrote:
>> #include <stdio.h>
>>
>> static double matrix[10][10];
>>
>> static void foo(double (*p)[10])
>> {
>> (*p)[10] = 1.25;
>
> This is plainly wrong, but for reasons that bear no relevance to the
> issue under discussion.
>
> This example just misses whatever point is still being discussed here.

Would you care to expand on that? It's entirely possible that
you're right and I'm wrong, but just saying so with no explanation
is hardly helpful.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Seebs

unread,

Feb 1, 2010, 12:36:49 PM2/1/10

to

On 2010-02-01, Keith Thompson <ks...@mib.org> wrote:
> Hans-Bernhard Bröker <HBBr...@t-online.de> writes:
>> Keith Thompson wrote:
>>> static void foo(double (*p)[10])
>>> {
>>> (*p)[10] = 1.25;

>> This is plainly wrong, but for reasons that bear no relevance to the
>> issue under discussion.

> Would you care to expand on that? It's entirely possible that

> you're right and I'm wrong, but just saying so with no explanation
> is hardly helpful.

type x[10];

x[10] = (value of type);

Looks like an array overflow to me.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Jasen Betts

unread,

Feb 1, 2010, 12:37:02 PM2/1/10

to

On 2010-02-01, Keith Thompson <ks...@mib.org> wrote:

> Hans-Bernhard Bröker <HBBr...@t-online.de> writes:
>> Keith Thompson wrote:
>>> #include <stdio.h>
>>>
>>> static double matrix[10][10];
>>>
>>> static void foo(double (*p)[10])
>>> {
>>> (*p)[10] = 1.25;
>>
>> This is plainly wrong, but for reasons that bear no relevance to the
>> issue under discussion.
>>
>> This example just misses whatever point is still being discussed here.
>
> Would you care to expand on that? It's entirely possible that
> you're right and I'm wrong, but just saying so with no explanation
> is hardly helpful.

double (*p)[10] // p is a pointer to array(s) of 10 doubles

(*p)[10] // element number 10 of the what p points at

that's an array bound error as the last element of the array is number 9,

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Keith Thompson

unread,

Feb 1, 2010, 6:45:30 PM2/1/10

to

Jasen Betts <ja...@xnet.co.nz> writes:
> On 2010-02-01, Keith Thompson <ks...@mib.org> wrote:
>> Hans-Bernhard Bröker <HBBr...@t-online.de> writes:
>>> Keith Thompson wrote:
>>>> #include <stdio.h>
>>>>
>>>> static double matrix[10][10];
>>>>
>>>> static void foo(double (*p)[10])
>>>> {
>>>> (*p)[10] = 1.25;
>>>
>>> This is plainly wrong, but for reasons that bear no relevance to the
>>> issue under discussion.
>>>
>>> This example just misses whatever point is still being discussed here.
>>
>> Would you care to expand on that? It's entirely possible that
>> you're right and I'm wrong, but just saying so with no explanation
>> is hardly helpful.
>
> double (*p)[10] // p is a pointer to array(s) of 10 doubles
>
> (*p)[10] // element number 10 of the what p points at
>
> that's an array bound error as the last element of the array is number 9,

Upthread, Pierre Asselin stated that

The ability to switch between 1-D and n-D arrays is *absolutely
essential* to numerical computing.

If so, then I think this:

double matrix[10][10];
double *p = &matrix[0][0];
p[10] = 1.25;

should be well-defined, and should set matrix[1][0] to 1.25.

By making the dimension of the array, 10, explicit in the parameter
type, yes, I may have missed the point. (But again, Hans-Bernhard
Bröker's followup would have been more constructive if he'd explained
it, rather than just asserting it.)

Ok, here's another (hopefully better) example:

#include <stdio.h>

static double matrix[10][10];

static void foo(double *p, int index)
{
p[index] = 1.25;
}

int main(void)
{
foo(&matrix[0][0], 10);

printf("matrix[1][0] = %g\n", matrix[1][0]);
return 0;
}

Consider an implementation with fat pointers, where a pointer
carries bounds information that can be checked. What bounds are
associated with p? For what values of index is the assignment's
behavior undefined?

If we can freely do type-punning between 1-D and 2-D arrays, then
the behavior of foo(&matrix[0][0], N) is well-defined for any N
from 0 to 99. If not, then it's well-defined only for N from 0 to 9.

Is there a difference between a double[10] object that's declared as
"double obj[10];", and one that happens to be an element of a larger
array? Are the bounds-checking rules different for the two cases?
(By "bounds-checking rules", I don't mean to imply that bounds
checking is required, only that the behavior is undefined in cases
where a bounds check, if it were there, would have failed.)

I agree with Pierre Asselin that allowing that kind of freedom would
be quite useful, but I don't see any support for it in the standard.
I think that the behavior of the above program is undefined, though
it's likely to work as expected (printing "matrix[1][0] = 1.25")
on most or all implementations.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Tim Rentsch

unread,

Feb 2, 2010, 5:37:05 PM2/2/10

to

Keith Thompson <ks...@mib.org> writes:

> Jasen Betts <ja...@xnet.co.nz> writes:
>> On 2010-02-01, Keith Thompson <ks...@mib.org> wrote:

> Br@C3{B6}ker's followup would have been more constructive if he'd explained

> it, rather than just asserting it.)
>
> Ok, here's another (hopefully better) example:
>
> #include <stdio.h>
>
> static double matrix[10][10];
>
> static void foo(double *p, int index)
> {
> p[index] = 1.25;
> }
>
> int main(void)
> {
> foo(&matrix[0][0], 10);
> printf("matrix[1][0] = %g\n", matrix[1][0]);
> return 0;
> }
>
> Consider an implementation with fat pointers, where a pointer
> carries bounds information that can be checked. What bounds are
> associated with p? For what values of index is the assignment's
> behavior undefined?
>
> If we can freely do type-punning between 1-D and 2-D arrays, then
> the behavior of foo(&matrix[0][0], N) is well-defined for any N
> from 0 to 99. If not, then it's well-defined only for N from 0 to 9.

I think if you think this through you'll see the reasoning here
isn't quite right. There are two separate questions, one about
interpreting a 1-D array as a 2-D array (or vice versa), and one
about array bounds violation, and they are orthogonal. In the
specific example here, the expression '&matrix[0][0]' isn't
referring to a 2-D array but a 1-D array (namely, matrix[0]). To
make the example just about interdimensional array reframing, the
calling expression should be, eg, 'foo( (int*) &matrix, N )',
which I think most everyone would agree should work for any N
between 0 and 99.

> Is there a difference between a double[10] object that's declared as
> "double obj[10];", and one that happens to be an element of a larger
> array? Are the bounds-checking rules different for the two cases?
> (By "bounds-checking rules", I don't mean to imply that bounds
> checking is required, only that the behavior is undefined in cases
> where a bounds check, if it were there, would have failed.)
>
> I agree with Pierre Asselin that allowing that kind of freedom would
> be quite useful, but I don't see any support for it in the standard.

It seems clear that interdimensional array reframing is meant to
work, provided (1) there are no alignment issues on the pointers
involved (which normally there will not be if the two base types
are the same) and (2) a suitably large object is reframed so that
accesses in the reframed array aren't outside the limits of the
orginal reframed object. I will grant you the Standard doesn't
do as good a job as it could of making that explicit and obvious,
but it does seem clear that this is the expectation.

> I think that the behavior of the above program is undefined, though
> it's likely to work as expected (printing "matrix[1][0] = 1.25")
> on most or all implementations.

I agree on that, but that's because the pointer passed to 'foo()'
is based on an insuffiently sized object, not because a 2-D array
can't be interpreted as a 1-D array. If the call is written as
'foo( (int *) &matrix, 10 )' the behavior is defined and will
have the expected result.

Tim Rentsch

unread,

Feb 2, 2010, 5:36:52 PM2/2/10

to

p...@see.signature.invalid (Pierre Asselin) writes:

Actually there isn't any type punning going on here, only pointer
conversions (the implicit one(s) in '&x[0][0]', and the explicit
one in '(double (*)[n]) ap'. All of these conversions are
guaranteed to work (with or without padding, as you point out),
because of the rule that pointers to objects may be converted to
pointers to subobjects at their beginning, and vice versa.

There remains the question about whether the access 'a[1][2]' is
a bounds violation, which is difficult to answer definitively
since the Standard is so non-specific about what is or isn't
allowed in this area. In any case though that question is
orthogonal to any concerns about the pointer conversions, which
we know must work in this particular example (and unfortunately
that doesn't tell us anything about the more general case,
except, as you say, it does give some indication of how the
Rationale's authors view the question).