Struct members -> Array elements

14 views
Skip to first unread message

Tomás

unread,
May 22, 2006, 10:45:51 PM5/22/06
to
Do you think the following code should be legal?:

struct Monkey {
double a;
double b;
double c;
};


int main()
{
Monkey obj;

double *p = obj.a;

++p = 54.3;

++p = 23.17;
}


There's a lot of novices out there who have no knowledge of padding; they
commonly ask why sizeof(SomeArbitraryStruct) is not equal to the sum of
its members.

We all know that it's possible to position an object in memory directly
after another object of the same type, and this is how we can increment a
pointer to go through an array's elements.

However, I propose that the Standard guarantee that there be no padding
between members of the same type within a POD struct. Such that the
following code would work perfectly:

struct Monkey {

union {
double array[5];

struct {
double a;
double b;
double c;
double d;
double e;
};
};
};

int main()
{
Monkey monkey;

monkey.array[2] = 65.3;

monkey.d = 23.23;
}

(The idea is that the named members correspond directly to the array
elements).

I can't think of any reason why this wouldn't work on any platform.


-Tomás

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]

kanze

unread,
May 23, 2006, 11:59:47 AM5/23/06
to
"Tomás" wrote:
> Do you think the following code should be legal?:

> struct Monkey {
> double a;
> double b;
> double c;
> };

> int main()
> {
> Monkey obj;
> double *p = obj.a;
> ++p = 54.3;
> ++p = 23.17;
> }

No. Why on earth should it be?

> There's a lot of novices out there who have no knowledge of
> padding; they commonly ask why sizeof(SomeArbitraryStruct) is
> not equal to the sum of its members.

Generally, I suspect that the opposite is true. Why on earth
should a novice suppose anything about the layout of a class?
It takes a relatively experienced C++ programmer to recognize
that this is a POD, and that there is no real reason why the
compiler wouldn't lay it out exactly as it would an array.

> We all know that it's possible to position an object in memory
> directly after another object of the same type, and this is
> how we can increment a pointer to go through an array's
> elements.

> However, I propose that the Standard guarantee that there be
> no padding between members of the same type within a POD
> struct.

And what would that buy us?

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Greg Herlihy

unread,
May 23, 2006, 1:08:15 PM5/23/06
to
"Tomás" wrote:
> Do you think the following code should be legal?:
>
> struct Monkey {
> double a;
> double b;
> double c;
> };
>
>
> int main()
> {
> Monkey obj;
>
> double *p = obj.a;
>
> ++p = 54.3;
>
> ++p = 23.17;
> }
>
>
> There's a lot of novices out there who have no knowledge of padding; they
> commonly ask why sizeof(SomeArbitraryStruct) is not equal to the sum of
> its members.
>
> We all know that it's possible to position an object in memory directly
> after another object of the same type, and this is how we can increment a
> pointer to go through an array's elements.

Correct. Objects in an array all align properly with no padding between
any of them. In other words, if one object in memory is correctly
aligned (such as Monkey.a), then an object of the same type directly
following it in memory must also be correctly aligned (such as
Monkey.b) - and no intervening padding will exist between them.

Moreover padding can be inserted between objects only where it is
needed for correct alignment, and cannot be inserted between objects
already correctly aligned. Therefore we can conclude that there is no
padding between Monkey.a, Monkey.b, and Monkey.c.

> However, I propose that the Standard guarantee that there be no padding
> between members of the same type within a POD struct.

I believe that such a guarantee can be deduced from the Standard
already.

Greg

Ducky Yazy

unread,
May 23, 2006, 1:05:51 PM5/23/06
to

"Tomás" wrote:
> Do you think the following code should be legal?:
>
> struct Monkey {
> double a;
> double b;
> double c;
> };
>
>
> int main()
> {
> Monkey obj;
>
> double *p = obj.a;

I don't think the line above is legal. Are you expecting the following
code?
double *p = &obj.a;

>
> ++p = 54.3;
>
> ++p = 23.17;

Oooops these two lines! If your expected "p" is pointed to "obj.a",
these two lines
maybe work ok on some compiliers. But this trick may not work sometimes
because
of the "memory alignment (padding)".

.

Francis Glassborow

unread,
May 23, 2006, 12:06:08 PM5/23/06
to
In article <Hgscg.9439$j7.3...@news.indigo.ie>, Tomás
<No.E...@Address.ucar.edu.invalid> writes
And I cannot think of a reason why we should want to spend valuable
committee time considering such a proposal. The special handling of PODs
is purely for backward compatibility with C. Now either you can
demonstrate that C requires your code to work (and then you have a
compatibility issue which could be addressed) or it does not in which
case your change would introduce a compatibility issue which would be a
reason for not accepting it.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

kuy...@wizard.net

unread,
May 23, 2006, 4:59:19 PM5/23/06
to
Ducky Yazy wrote:
> "Tomás" wrote:
> > Do you think the following code should be legal?:
^^^^^^^^^

> >
> > struct Monkey {
> > double a;
> > double b;
> > double c;
> > };
> >
> >
> > int main()
> > {
> > Monkey obj;
> >
> > double *p = obj.a;
..

> >
> > ++p = 54.3;
> >
> > ++p = 23.17;
>
> Oooops these two lines! If your expected "p" is pointed to "obj.a",
> these two lines
> maybe work ok on some compiliers. But this trick may not work sometimes
> because
> of the "memory alignment (padding)".

Of course that doesn't work with the standard as it is currently
written. The point of his message was to propose that standard be
changed to prohibit such padding between consecutive data members of
the same type.

My own opinion is that if you want values of the same type to be
contiguous, you should declare an array, and store those values in the
elements of that array. I think this proposed change would encourage
bad coding practices that treat unrelated data items as if they were in
fact related. Code that relied upon such a feature would break if any
one of the variables involved were changed to a different data type;
such code is too fragile for my tastes.

Tomás

unread,
May 23, 2006, 4:02:41 PM5/23/06
to

Firstly, I did make a typo in my original post; I should have written:

double *p = &obj.a;

(I left out the ampersand in my original post).


I'll reply in sequence to the replies which have been posted thus far.


kanze posted:

<in reference to my proposal>


> And what would that buy us?


We would be able to write:

struct Monkey {

union {
double array[5];

struct {
double a;
double b;
double c;
double d;
double e;
};
};

};


rather than:

class Monkey {
public:
double array[5];

double &a, &b, &c, &d, &e;

Monkey() : a(*array),
b(array[1]),
c(array[2]),
d(array[3]),
e(array[4]) {}
};


The first form is prefereable as it's a POD, there's no need for a
constructor, and there's no need to complicate things with references.


Greg Herlihy posted:

> Therefore we can conclude that there is no
> padding between Monkey.a, Monkey.b, and Monkey.c.

<snip>


> I believe that such a guarantee can be deduced from the
> Standard already

It's a minority who are prepared to make that assertion (as you can see
from the replies). I do agree with your logic... (which is why I started
this thread)... but it seems that some people here are quick to discredit
the code as simply being incorrect, stating that the code makes a false
presumption that the double's are positioned right after one another in
memory.

The purpose of my proposal is that the Standard will explicitly state in
plain English that there's no padding, such that the topic won't even be
open for debate as it is now.


Ducky Yazy posted:

> But this trick may not work sometimes
> because of the "memory alignment (padding)".


My argument is that there shouldn't be any padding because the objects are
of the same type, and thus should fit neatly one after another in memory.
(The padding needed at the end of a struct is already taken into account by
"sizeof", and this value is used in pointer arithmetic -- which is how we
get the desired behaviour of using a pointer to go through an array).


Francis Glassborow posted:

> And I cannot think of a reason why we should want to spend valuable
> committee time considering such a proposal.


It would improve the language. (If the committee time is "too valuable" to
consider improving the language, then maybe we need a new committee).


> The special handling of PODsis purely for backward compatibility with C.


I am vehemently against this argument. The Fancy Class Types we have in C++
are built upon the fundamental core functionality of primitive types and
POD's. C++ should progress and improve and get better and better, but it
should never forget that it's built on the core functionality of primitive
types and POD's. C++ would be a poor language if the following code was
illegal:

struct Monkey {
union {
long double a;

struct {
int b;
void* c;
};
};

char d;

short e;
};

int main()
{
Monkey obj;

int * const p = reinterpret_cast<int*>(&obj);

*p = 67;

if ( obj.b != 67 ) PeformSomeUndefinedBehaviour();
}

I believe that this "core functionality" should be milked for everything
it's worth, stretching it as far as we can, getting every last penny out of
it. An advanced programming language with features such as mutliple virtual
inheritance, and templates, would be even better if it was built on a solid
core functionality. We should improve this core functionality by explicitly
stating in the Standard that there's no padding between members of the same
type within a POD.


> Now either you can
> demonstrate that C requires your code to work (and then you have a
> compatibility issue which could be addressed) or it does not in which
> case your change would introduce a compatibility issue which would be a
> reason for not accepting it.

Can anyone mention one sole platform upon which there would be padding
between members of the same type in a struct? I can't think of any reason
whatsoever why there would be. I'd go on to say that the following two
structures should be exactly the same size:

struct { double a,b,c,d,e };

struct { double array[5] };

Ron Natalie

unread,
May 23, 2006, 4:40:55 PM5/23/06
to
Greg Herlihy wrote:

>> double *p = obj.a;

Presuambly double *p = &obj.a;
>>
>> ++p = 54.3;
and
*++p = 54.3;


>
> Moreover padding can be inserted between objects only where it is
> needed for correct alignment, and cannot be inserted between objects
> already correctly aligned. Therefore we can conclude that there is no
> padding between Monkey.a, Monkey.b, and Monkey.c.

Upon what do you base this? There is nothing in C or C++ that seems
to preclude internal padding for whatever whim the implementation
desires.

Tomás

unread,
May 23, 2006, 5:23:59 PM5/23/06
to
To summarise, I'd like to see the following program output the following
on all platforms:

"Program is absent of Undefined Behaviour. YIPPIE!"


(I wonder if anyone can find a platform where it wouldn't? I doubt it.)


#include <iostream>
#include <cstdlib>

bool has_printed_UB = false;

void PerformUndefinedBehaviour()
{
std::cout << "Undefined Behaviour!\n";

has_printed_UB = true;
}

struct Monkey {
union {
double array[5];

struct { double a, b, c, d, e; };

struct { int f; int g; };
};

char h;

double i;

double j;
};

int main()
{
Monkey obj;

/* First demonstrate that the struct elements correspond
to the array elements */

obj.a = 1;
obj.b = 2;
obj.c = 3;
obj.d = 4;
obj.e = 5;

if ( *obj.array != 1 ||
obj.array[1] != 2 ||
obj.array[2] != 3 ||
obj.array[3] != 4 ||
obj.array[4] != 5 ) PerformUndefinedBehaviour();


/* Now demonstrate that the first member of a POD union or POD struct
has the same address as the actual union/struct. (Yes I realise
that the Standard already guarantees this.) */


int * p = reinterpret_cast<int*>(&obj);

*p = 67;

*++p = 54;

if ( !( 67 == obj.f && 54 == obj.g ) ) PerformUndefinedBehaviour();


/* Now show that i and j are contiguous: */

if ( sizeof(double) !=
reinterpret_cast<const char*>(&obj.j)
- reinterpret_cast<const char*>(&obj.i)
) PerformUndefinedBehaviour();


if (!has_printed_UB)
std::cout << "Program is absent of Undefined Behaviour. YIPPIE!
\n";

std::system("PAUSE");

Seungbeom Kim

unread,
May 23, 2006, 5:55:52 PM5/23/06
to
kanze wrote:

> "Tomás" wrote:
>> However, I propose that the Standard guarantee that there be
>> no padding between members of the same type within a POD
>> struct.
>
> And what would that buy us?

Suppose you have a struct with members named x, y, and z, and you could
want to refer to them sometimes by names, and sometimes by indices. The
former because it's more natural and closer to the problem domain, and
the latter because it's better suited for across-the-board operations
(and can even benefit from standard algorithms such as std::for_each,
std::transform, etc.). Without any guarantee from the standard, though,
you are forced to write something like:

struct point
{
double x, y, z;
double& operator[](int i)
{
switch (i) {
case 0:
return x;
case 1:
return y;
case 2:
return z;
default:
assert(false && "index out of range");
}
}
// and the const version:
const double& operator[](int i) const { /* ... */ }
};

I'm not sure, though, whether the condition for no padding "between
members of the same type within a POD struct" is either necessary or
sufficient for something useful; maybe we can guarantee more, or
maybe we can require less.

I feel this is also related to Library Issue #387 ("std::complex
over-encapsulated"), which proposes essentially that the real part and
the imaginary part of z should be accessible by an array of value_type
laid on z.

--
Seungbeom Kim

Seungbeom Kim

unread,
May 23, 2006, 5:58:36 PM5/23/06
to
Tomás wrote:
>
> Can anyone mention one sole platform upon which there would be padding
> between members of the same type in a struct? I can't think of any reason
> whatsoever why there would be. I'd go on to say that the following two
> structures should be exactly the same size:
>
> struct { double a,b,c,d,e };
>
> struct { double array[5] };

The Library Issue 387 states "All existing implementations already have
the layout proposed here." From this, I can infer that no existing
implementation has padding between members of type T in std::complex<T>,
and thus (probably) neither between members of the same type in a POD
struct.

--
Seungbeom Kim

Francis Glassborow

unread,
May 23, 2006, 7:49:14 PM5/23/06
to
In article <1148365513....@u72g2000cwu.googlegroups.com>, Greg
Herlihy <gre...@pacbell.net> writes

>Moreover padding can be inserted between objects only where it is
>needed for correct alignment, and cannot be inserted between objects
>already correctly aligned. Therefore we can conclude that there is no
>padding between Monkey.a, Monkey.b, and Monkey.c.

No that is not true. Let me give you an example:

struct X {
char a, b, c;
};

If you set your compiler switched to optimise for space it will usually
just add a byte of padding at the end. If you set it for speed, on some
platforms it might add three bytes of padding between each of a, b and
c. (Optimum alignment on 32 bit word systems). Nonetheless for an array
of char it has to take the no padding route.


--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

---

Francis Glassborow

unread,
May 23, 2006, 6:50:50 PM5/23/06
to
In article <e4vu27$57d$1...@news.Stanford.EDU>, Seungbeom Kim
<musi...@bawi.org> writes

>The Library Issue 387 states "All existing implementations already have
>the layout proposed here." From this, I can infer that no existing
>implementation has padding between members of type T in std::complex<T>,
>and thus (probably) neither between members of the same type in a POD
>struct.

May well be true for doubles but not necessarily true for char (just as
an example)


--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

---

kuy...@wizard.net

unread,
May 23, 2006, 7:50:43 PM5/23/06
to
"Tomás" wrote:
..

What you haven't explained is why that's desireable. If a, b, c, and d
are truly independent meanings, what value is there in having them
adjacent? If there's value in having them be adjacent, then they should
be stored as an array, and the existence of seperate named references
pointing at the elements of that array is merely a notational
convenience. In other words, in the only case where the creation of
'array' make sense to me, the second approach strikes me as a better
representation of the way you should be thinking about it.

> Francis Glassborow posted:
>
> > And I cannot think of a reason why we should want to spend valuable
> > committee time considering such a proposal.
>
>

> It would improve the language. ...

How? Please explain.

> ... (If the committee time is "too valuable" to


> consider improving the language, then maybe we need a new committee).

He's not saying the committee time is "too valuable" to consider
improving the language. He's saying that he doesn't consider this
suggestion to be a significant improvement in the language. I agree.

> > The special handling of PODsis purely for backward compatibility with C.
>
>
> I am vehemently against this argument. The Fancy Class Types we have in C++
> are built upon the fundamental core functionality of primitive types and
> POD's. C++ should progress and improve and get better and better, but it
> should never forget that it's built on the core functionality of primitive

> types and POD's. ...

You might not like it, but that is indeed the approach the committee
has chosen, and it has already moved a long way along that road. The
concept of a POD type was invented solely to allow certain gaurantees
to apply to POD types that do NOT apply to C++ classes in general. The
last time I posted a list of those guarantees on this newsgroup it had
about 26 items. That's 26 important ways in which C++ classes are
allowed, by a deliberate decision of the C++ committee, to differ from
"core functionality" of POD types.

You can try to convince them that this was a bad idea, but an attitude
with that much momentum behind it isn't likely to change easily.

Note: Saying "primitive types and PODs" is redundant - the primitive
types are POD types.

> ... C++ would be a poor language if the following code was


> illegal:
>
> struct Monkey {
> union {
> long double a;
>
> struct {
> int b;
> void* c;
> };
> };
>
> char d;
>
> short e;
> };
>
> int main()
> {
> Monkey obj;
>
> int * const p = reinterpret_cast<int*>(&obj);
>
> *p = 67;
>
> if ( obj.b != 67 ) PeformSomeUndefinedBehaviour();
> }

I can't imagine any reason why code like that should be encouraged. It
goes seriously wrong, without producing any compile-time error
messages, if the type of 'b' is changed, or if any data members are
added prior to the union, or prior to b within the unnamed struct.
Setting p to point directly at obj.b would be much safer.

Carl Barron

unread,
May 23, 2006, 7:25:42 PM5/23/06
to
Herlihy <gre...@pacbell.net> wrote:

>
> Moreover padding can be inserted between objects only where it is
> needed for correct alignment, and cannot be inserted between objects
> already correctly aligned.

Not what I read. outside the first member of the struct be at the
same address as the struct itself, I see no requirements about what
padding is not allowed other than the offset of the first member of a
POD struct is zero.

Tomás

unread,
May 23, 2006, 7:51:45 PM5/23/06
to
posted:


> What you haven't explained is why that's desireable.


I'll keep going with the array example. For human beings writing code,
it's more natural to deal with names. However, for the purposes of loops,
it may be handier to work with array indexes. Let's say I'm writing a
program to manage my finances; I have a POD structure which stores the
amount I get paid for each day. I want to give members meaningful names,
but at the same time, I want to be able to access them by array indexes.
The following would be perfect in my opinion:

struct WeekPayment {

union {

unsigned days[7];

struct {

unsigned monday, tuesday, wednesday, thursday,
friday, saturday, sunday;

};

};
};


int main()
{
WeekPayment wp;

for ( unsigned i = 0, pay = 100; 7 > i; ++i )
{
wp.days[i] = pay += 50;
}

/* Then somewhere else in the code: */

if ( wp.saturday == wp.sunday )
TakeMondayOffNextWeek();
}


-Tomás

Francis Glassborow

unread,
May 24, 2006, 12:51:42 AM5/24/06
to
In article <3LIcg.9453$j7.3...@news.indigo.ie>, Tomás
<No.E...@Address.ucar.edu.invalid> writes

>The purpose of my proposal is that the Standard will explicitly state in
>plain English that there's no padding, such that the topic won't even be
>open for debate as it is now.

I understand your proposal, I just disagree with it. It would disqualify
implementations that always place a buffer between variables (for
debugging purposes) as well as making assumptions that are not valid for
all types.

>

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

---

Greg Herlihy

unread,
May 24, 2006, 3:52:26 AM5/24/06
to
Ron Natalie wrote:
> Greg Herlihy wrote:
>
> >> double *p = obj.a;
>
> Presuambly double *p = &obj.a;
> >>
> >> ++p = 54.3;
> and
> *++p = 54.3;
>
>
> >
> > Moreover padding can be inserted between objects only where it is
> > needed for correct alignment, and cannot be inserted between objects
> > already correctly aligned. Therefore we can conclude that there is no
> > padding between Monkey.a, Monkey.b, and Monkey.c.
>
> Upon what do you base this? There is nothing in C or C++ that seems
> to preclude internal padding for whatever whim the implementation
> desires.

§9.2/18 states: "There might therefore be unnamed padding within a
POD-struct object, but not at its beginning, as necessary to achieve
appropriate alignment."

The "as necessary" part certainly seems to require that any padding
separating members in a POD-struct has to be in fact necessary for the
purpose of attaining appropriate alignment of the struct's members.

But we also know that such padding can never be needed to separate two
adjacent objects of the same type in memory. It makes no difference how
those two objects came to be adjacent - whether they are elements in
the same array, or members of the same struct or are adjacent simply by
sheer chance - makes no difference, because the alignment requirements
for a type are both constant and universal. So the fact that two
objects of the same type need no padding to separate them when residing
in an array means that no two objects of the same type anywhere can
ever require padding between them.

And if it were the case that C++ compilers were free to add whimsical
padding between struct members of the same type - then it should be
possible to find a C++ compiler that does in fact do so.

Greg

kanze

unread,
May 24, 2006, 9:16:23 AM5/24/06
to
kuy...@wizard.net wrote:
> Ducky Yazy wrote:
> > "Tomás" wrote:
> > > Do you think the following code should be legal?:

> > > struct Monkey {


> > > double a;
> > > double b;
> > > double c;
> > > };

> > > int main()
> > > {
> > > Monkey obj;

> > > double *p = obj.a;
> ..

> > > ++p = 54.3;
> > > ++p = 23.17;

> > Oooops these two lines! If your expected "p" is pointed to
> > "obj.a", these two lines maybe work ok on some compiliers.
> > But this trick may not work sometimes because of the "memory
> > alignment (padding)".

> Of course that doesn't work with the standard as it is currently
> written. The point of his message was to propose that standard be
> changed to prohibit such padding between consecutive data members of
> the same type.

Note that prohibiting padding is not sufficient. Given a
declaration like:

double a[2][2] ;

the compiler is not allowed to insert padding, but something
like:

double* p = &a[0][0] ;
++ p ;
++ p ; // Dereferencing p has just become undefined behavior
++ p ; // And this is completely undefined behavior...

The C standard was carefully formulated to allow bounds
checking.

In the case in question, for example, the compiler could easily
see that p was initialized to point to a scalar. If there were
later an expression along the lines of p[n], the compiler will
then assume that n is 0, since anything else is undefined
behavior.

> My own opinion is that if you want values of the same type to
> be contiguous, you should declare an array, and store those
> values in the elements of that array.

Quite. The structure to do it exists. If that's what you need,
use it.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,
May 24, 2006, 9:16:21 AM5/24/06
to
Seungbeom Kim wrote:
> kanze wrote:
> > "Tomás" wrote:
> >> However, I propose that the Standard guarantee that there
> >> be no padding between members of the same type within a POD
> >> struct.

> > And what would that buy us?

> Suppose you have a struct with members named x, y, and z, and
> you could want to refer to them sometimes by names, and
> sometimes by indices.

So that x and a[0] are the same object? Sounds like obfuscation
to me.

> The former because it's more natural and closer to the problem
> domain, and the latter because it's better suited for
> across-the-board operations (and can even benefit from
> standard algorithms such as std::for_each, std::transform,
> etc.).

I can see a class making both interfaces more or less available.
I can't see any use of it at the lowest, data representation
level, however. What's wrong with simply:

class Point3D
{
public:
double getX() const { return coords[ 0 ] ; }
double getY() const { return coords[ 1 ] ; }
double getZ() const { return coords[ 2 ] ; }
double* getXYZ() const { return coords ; }

// Since you mention STL...
double* begin() const { return coords ; }
double* end() const { return coords + 3 ; }
size_T size() const { return 3 ; }
// ...

private:
double coords[ 3 ] ;
} ;

(or just x(), y(), z() and xyz(), if you prefer).

The functions are small enough that even the worst compiler will
inline them, and you end up with the interface that you want.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,
May 24, 2006, 12:28:12 PM5/24/06
to
Greg Herlihy wrote:
> Ron Natalie wrote:
> > Greg Herlihy wrote:

> > >> double *p = obj.a;

> > Presuambly double *p = &obj.a;

> > >> ++p = 54.3;
> > and
> > *++p = 54.3;

> > > Moreover padding can be inserted between objects only
> > > where it is needed for correct alignment, and cannot be
> > > inserted between objects already correctly aligned.
> > > Therefore we can conclude that there is no padding between
> > > Monkey.a, Monkey.b, and Monkey.c.

> > Upon what do you base this? There is nothing in C or C++
> > that seems to preclude internal padding for whatever whim
> > the implementation desires.

> §9.2/18 states: "There might therefore be unnamed padding
> within a POD-struct object, but not at its beginning, as
> necessary to achieve appropriate alignment."

> The "as necessary" part certainly seems to require that any
> padding separating members in a POD-struct has to be in fact
> necessary for the purpose of attaining appropriate alignment
> of the struct's members.

And who defines "as necessary"? According to what criteria? On
an IA-32, padding is not "necessary" in:
struct S { char c ; double d ; } ;
Every compiler I know inserts some padding between c and d,
however, at least by default.

I think you're reading too much into what is, when all is said
and done, a parenthetical remark.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

John Nagle

unread,
May 24, 2006, 12:28:18 PM5/24/06
to
kuy...@wizard.net wrote:
> Ducky Yazy wrote:
>
>>"Tomás" wrote:
> Of course that doesn't work with the standard as it is currently
> written. The point of his message was to propose that standard be
> changed to prohibit such padding between consecutive data members of
> the same type.

It's certainly common in networking code to assume the obvious
placement of structure elements. Is that assumption supported
by the standard, or not?

Pascal had "packed", to indicate that there was to be no padding
in the data representation. ("packed array of bool" was a bit string.)
That was a good idea that didn't make it into C.

John Nagle
Animats

Tomás

unread,
May 24, 2006, 11:03:30 AM5/24/06
to
Francis Glassborow posted:


> struct X {
> char a, b, c;
> };
>
> If you set your compiler switched to optimise for space it will usually
> just add a byte of padding at the end. If you set it for speed, on some
> platforms it might add three bytes of padding between each of a, b and
> c. (Optimum alignment on 32 bit word systems). Nonetheless for an array
> of char it has to take the no padding route.

When I first read that, I mistakenly thought you meant that the compiler
may change it to the following behind your back:

struct X {

int a, b, c;

};

telling you that they are char's, when in actual fact they are actually
int's... so I was going to demonstrate how that could be a problem:-

struct X {
int a, b, c;
};

int main()
{
X original = { 2, 3, 4 };

X &clone = *new X;

memset( &clone, CHAR_MAX, sizeof( X ) );

memcpy( &clone.a, &original.a, 1 );

/* Wups, clone.a still has its high bits set! */
}

(Yes I know that's entirely irrelevant but what the hey I had written the
code before I copped it.)

Moving on to something of relevance:

So are there actual implementations out there which may put padding
between members of the same type within a POD struct?

(Out of curiosity: A char has no alignment requirements... how would
aligning it on a 4-byte boundary make it any faster?)


-Tomás

Andrew

unread,
May 24, 2006, 12:27:42 PM5/24/06
to
If I would really want to do what you want I would use
compiler-specific pragma package 1 to explicitly say compiler not to do
padding. And also I would place a compile-time assertion after that to
verify that the size of the union equals the size of double[5] array.
And put it all in one header file. I think there is no need to
constrain implementation in the way you want. You already can do it,
why would we want to loose such freedom?

kanze

unread,
May 24, 2006, 12:37:46 PM5/24/06
to
"Tomás" wrote:

> kanze posted:

> <in reference to my proposal>
> > And what would that buy us?

> We would be able to write:

> struct Monkey {
> union {
> double array[5];
>
> struct {
> double a;
> double b;
> double c;
> double d;
> double e;
> };
> };
> };

> rather than:

> class Monkey {
> public:
> double array[5];
> double &a, &b, &c, &d, &e;
> Monkey() : a(*array),
> b(array[1]),
> c(array[2]),
> d(array[3]),
> e(array[4]) {}
> };

But that doesn't answer my question. I've never seen anyone
write anything like the second anyway, and just addressing the
padding issue doesn't make the first work. So I'm still up in
the air as to what possible advantages this would buy us.

> Greg Herlihy posted:

> > Therefore we can conclude that there is no
> > padding between Monkey.a, Monkey.b, and Monkey.c.
> <snip>
> > I believe that such a guarantee can be deduced from the
> > Standard already

> It's a minority who are prepared to make that assertion (as
> you can see from the replies).

Well, the people who actually wrote the original text in the C
standard disagree with him, and none of the compilers I know for
IA-32 respect his point of view either, so it doesn't seem very
well founded. (The issue was raised with regards to C -- the
language here is, I believe, taken verbatim from the C standard
-- the the conclusion was that "as necessary" means "whatever
the implementor considers necessary, for whatever reasons he
feels appropriate".)

> I do agree with your logic... (which is why I started this
> thread)... but it seems that some people here are quick to
> discredit the code as simply being incorrect, stating that the
> code makes a false presumption that the double's are
> positioned right after one another in memory.

Well, the code is invalid for several reasons, not just because
of padding. Mainly, it is invalid because the C++ standard (and
the C standard) say that it is. With regards to pointer
arithmetic, a scalar is considered to be an array with one
element. And the standards very explicitly say that you can
increment a pointer to one past the end of the array, and no
further, and that you cannot access through a pointer to one
past the end of the array.

The expressed intention, here, of course, is to allow
implementations which do array bounds checking.

[...]
> Francis Glassborow posted:

> > And I cannot think of a reason why we should want to spend
> > valuable committee time considering such a proposal.

> It would improve the language. (If the committee time is "too
> valuable" to consider improving the language, then maybe we
> need a new committee).

I think his point was, precisely, that it wouldn't improve the
langage, and that thus, any time spent on it was wasted.

> > The special handling of PODsis purely for backward
> > compatibility with C.

> I am vehemently against this argument.

So use C. Except, of course, that what you are asking for
isn't legal in C, either.

What you're asking for is basically that C++ be even less type
safe than C. I don't think it's in the cards; the reason many
of us moved to C++ in the first place was that it offered more
type safety.

> The Fancy Class Types we have in C++ are built upon the
> fundamental core functionality of primitive types and POD's.
> C++ should progress and improve and get better and better, but
> it should never forget that it's built on the core
> functionality of primitive types and POD's. C++ would be a
> poor language if the following code was illegal:

> struct Monkey {
> union {
> long double a;
>
> struct {
> int b;
> void* c;
> };
> };
> char d;
> short e;
> };

> int main()
> {
> Monkey obj;

> int * const p = reinterpret_cast<int*>(&obj);
> *p = 67;

> if ( obj.b != 67 ) PeformSomeUndefinedBehaviour();
> }

Nobody is proposing to make it illegal, although I can't see
where it would be any loss. Anybody who actually wrote code
like this would be fired from just about any place I've worked
for.

[...]


> > Now either you can demonstrate that C requires your code to
> > work (and then you have a compatibility issue which could be
> > addressed) or it does not in which case your change would
> > introduce a compatibility issue which would be a reason for
> > not accepting it.

> Can anyone mention one sole platform upon which there would be
> padding between members of the same type in a struct?

You're getting hung up on the padding issue. The C committee
decided, back in the 1980's, that the standard would have to be
so written that a bounds checking implementation was legal.
They modified the existing K&R C to some degree to achieve this;
FWIW, people like Kernighan, Thomson and Richie were consulted,
and approved of this evolution. And bounds checking
implementations have existed, although I don't know if they are
currently available. Changing this would mean going backwards
some 20 years.

I think that there is a very strong consensus in the standards
committee that we want to go forwards, not backwards.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,
May 24, 2006, 12:37:43 PM5/24/06
to
Francis Glassborow wrote:
> In article <e4vu27$57d$1...@news.Stanford.EDU>, Seungbeom Kim
> <musi...@bawi.org> writes
> >The Library Issue 387 states "All existing implementations
> >already have the layout proposed here." From this, I can
> >infer that no existing implementation has padding between
> >members of type T in std::complex<T>, and thus (probably)
> >neither between members of the same type in a POD struct.

> May well be true for doubles but not necessarily true for char
> (just as an example)

Note that

1. The library issue is still open -- the standards committee
has not decided to act on it, and may never act on it.

2. The proposed change does NOT forbid padding between elements
of a POD struct -- it only concerns the layout of complex.
The most obvious implementation, in fact, is to declare an
array of two T in complex, accessing data[0] or data[1],
according to whether the real or the imaginary part is
needed.

3. The motivation is strictly compatibility with other
languages -- originally Fortran, and now C99 as well.

In fact, this defect report is completely irrelevant to the
discussion here.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nicola Musatti

unread,
May 24, 2006, 1:00:29 PM5/24/06
to

"Tomás" wrote:
> posted:
>
>
> > What you haven't explained is why that's desireable.
>
> I'll keep going with the array example. For human beings writing code,
> it's more natural to deal with names. However, for the purposes of loops,
> it may be handier to work with array indexes. Let's say I'm writing a
> program to manage my finances; I have a POD structure which stores the
> amount I get paid for each day. I want to give members meaningful names,
> but at the same time, I want to be able to access them by array indexes.
> The following would be perfect in my opinion:
>
> struct WeekPayment {
>
> union {
>
> unsigned days[7];
>
> struct {
>
> unsigned monday, tuesday, wednesday, thursday,
> friday, saturday, sunday;
>
> };
>
> };
> };

You can either use an enum as index:

enum week_days { monday, tuesday, wednesday, thursday, friday,
saturday, sunday };

or provide accessors:

unsigned monday() const { return days[1]; }
unsigned & monday() { return days[1]; }

Cheers,
Nicola Musatti

Anders Dalvander

unread,
May 24, 2006, 1:00:51 PM5/24/06
to
Here is another solution:

#include <cstddef>
#include <iostream>

template <typename T>
struct vector4
{
public:
T x, y, z, w;

vector4()
: x(),
y(),
z(),
w()
{
}

vector4(T _x, T _y, T _z, T _w)
: x(_x),
y(_y),
z(_z),
w(_w)
{
}

const T& operator[](size_t i) const
{
return this->*v[i];
}

T& operator[](size_t i)
{
return this->*v[i];
}

private:
typedef T vector4<T>::* const vec[4];
static const vec v;
};

template <typename T>
const typename vector4<T>::vec vector4<T>::v = { &vector4<T>::x,
&vector4<T>::y, &vector4<T>::z, &vector4<T>::z };

int main()
{
vector4<int> v;

v[0] = 10;

std::cout << "v[0]: " << v[0] << '\n';
std::cout << "v.x: " << v.x << '\n';
}

Seungbeom Kim wrote:
> Suppose you have a struct with members named x, y, and z, and you could
> want to refer to them sometimes by names, and sometimes by indices.

> <snip>

wa...@stoner.com

unread,
May 24, 2006, 1:24:10 PM5/24/06
to
kanze wrote:

> And what would that buy us?

An ability to interface with other languages at a higher level.

I'm communicating with Fortran. Some interface tells me that the
floating point values associated with a particular pipe live at a
particular location, ndx, in a large floating point array, d[]. For
instance, the pipe's outside diameter is stored at an offset of -10,
and the pipe's minimum elevation is stored at offset +54.

We wanted to interface to the Fortran code, and eventually convert it
all to C++. We wrote a converter that treated the fortan source as an
IDL, and generated C++ code that would convert ndx into a smart-pointer
(actually a proxy) for the pipe info.

class PipeDptr : public dStruct
{
public:
// Public constructors
PipeDptr(){};
PipeDptr(int i):dStruct(i){};
.
float& ODiameter() { return rm(-10); };
.
float& MinElev() { return rm(54); };
.
};

which gave our code a standard-conforming way to get at the elements by
name, rather than by index. Of course the interface between C++ and
Fortran isn't covered by the standard, but our documentation tells us
that layouts of a fortran REAL array matches the layout of a C++ float
array, and we also take care of the calling conventions.

However, a much better solution (for us) assumed that floats were
adjacent in memory

struct PipeD
{
.
float ODiameter;
.
float MinElev;
.
};

with a factory that uses placement-new or reinterpret_cast to
"interpret" a hunk of the array as a PipeD. (Placement new has the
advantage of being standard-conforming. reinterpret_cast has the
advantage that it better expresses the intent. Neither generates any
code, since the construction of a POD-struct is a no-op).

The main advantages of PipeD are:

1) In the debugger, it is nice to be able to see named struct data
members.
2) When C++ becomes the "source" development platform, you don't have
to maintain those magic integers (the "54" associated with MinElev) by
hand.

This is less standard-conforming, because it assumes that adjacent
floats in a struct are at adjacent addresses (we can confirm that
assumption with a compile-time assert on sizeof PipeD) and also assumes
that float alignment is good enough for struct-of-float (we can confirm
that using sizeof struct_that_contains_one_float).

Its not the case that all-the-world is C++ (or even C++ with one little
island of C). When you want to interface to another language, you need
to have some way of describing your data layouts, or accepting the
other guy's data layout. Unfortunately, the most complex layout the
standard gives us is for array of small integers called unsigned char
(once you look up CHAR_BIT, and then make assumptions about bit-order
and byte order). In practice we tend to make additional assumptions
(it is a pretty good bet that the four-byte C++ float on your platform
is pretty much the same as the four-byte Fortran REAL on the same
platform, ...).

I currently use implementations which pack POD elements tightly, and
have found that to be a useful feature to exploit. I'm not yet
convinced that the feature is so useful that it should be standardized.
However, it does seem to be existing practice.

kuy...@wizard.net

unread,
May 24, 2006, 1:24:28 PM5/24/06
to
"Tomás" wrote:
..

> I'll keep going with the array example. For human beings writing code,
> it's more natural to deal with names. However, for the purposes of loops,
> it may be handier to work with array indexes. Let's say I'm writing a
> program to manage my finances; I have a POD structure which stores the
> amount I get paid for each day. I want to give members meaningful names,
> but at the same time, I want to be able to access them by array indexes.
> The following would be perfect in my opinion:
>
> struct WeekPayment {
>
> union {
>
> unsigned days[7];
>
> struct {
>
> unsigned monday, tuesday, wednesday, thursday,
> friday, saturday, sunday;
>
> };
>
> };
> };

Personally, I consider the following code (which is based on an example
you've already given of what you didn't want to do) a much more
appropriate way to implement something like that:

struct WeekPayment
{
public:
unsigned days[7];

double &monday, &tuesday, &wednesday, &thursday, &friday,
&saturday, &sunday;

WeekPayment() : monday(days[0]), tuesday(days[1]),
wednesday(days[2]),
thursday(days[3]), friday(days[4]), saturday(days[5]),
sunday(days[6]) {}
};

4zum...@gmail.com

unread,
May 24, 2006, 2:02:58 PM5/24/06
to
In the specific case where you want a union with:
struct { double a,b,c,d,e }; and struct { double array[5] }; in, this
might be reasonable. It would be an extension of the fact that you can
have a union of PODs with a common starting list of element and access
them in both.

Adding this just to give a different name to things using a union seems
like a lot of work, and it doesn't seem like it would be that much more
work to just use references, espically if you used a macro.

Outside of unions, you obviously don't want to be able to access these
classes through each other, as it would hurt optimising a lot, as many
more things would alias with each other.

wa...@stoner.com

unread,
May 24, 2006, 3:44:49 PM5/24/06
to

kanze wrote:

> What's wrong with simply:
>
> class Point3D
> {
> public:
> double getX() const { return coords[ 0 ] ; }
> double getY() const { return coords[ 1 ] ; }
> double getZ() const { return coords[ 2 ] ; }
> double* getXYZ() const { return coords ; }
>
> // Since you mention STL...
> double* begin() const { return coords ; }
> double* end() const { return coords + 3 ; }
> size_T size() const { return 3 ; }
> // ...
>
> private:
> double coords[ 3 ] ;
> } ;
>
> (or just x(), y(), z() and xyz(), if you prefer).

What's wrong:
1) In the debugger, when I look at a Point3D object, it shows me array
values, not named values. Not only do I want to see all three values,
I want to see all three names.
2) Those literal integers are a maintenance issue. Use an enum
instead, and you've got a different set of maintenance issues.
3) You need two functions for each array element (one const getter,
and one non-const setter).
4) This isn't a POD (although that would be trivial to fix), meaning
that any struct or array containing a Point3D is also not a POD.

Change that to

struct Point3D
{
// No data members other than double allowed.
double x,y,z;

size_T size() const
{
const int mysize = 3;
compile_time_assert(sizeof(*this)==mysize*sizeof(double));
return mysize;
}

cv double* begin() cv { return (cv double *)this; }
cv double* end() cv { return begin() + size(); }
} ;

There are a number of things wrong with this also, but if my issues
(1-4) are important to you, it does a pretty good job of addressing
them.

Seungbeom Kim

unread,
May 24, 2006, 2:49:14 PM5/24/06
to
kanze wrote:
> Francis Glassborow wrote:
>> In article <e4vu27$57d$1...@news.Stanford.EDU>, Seungbeom Kim
>> <musi...@bawi.org> writes
>>> The Library Issue 387 states "All existing implementations
>>> already have the layout proposed here." From this, I can
>>> infer that no existing implementation has padding between
>>> members of type T in std::complex<T>, and thus (probably)
>>> neither between members of the same type in a POD struct.
>
>> May well be true for doubles but not necessarily true for char
>> (just as an example)
>
> Note that
>
> 1. The library issue is still open -- the standards committee
> has not decided to act on it, and may never act on it.

I'm aware of it. My quotation does not depend on whether that issue
will be accepted or not; I just quoted what was stated there as a
(supporting) "fact" (though the issue itself is a "proposal").

> 2. The proposed change does NOT forbid padding between elements
> of a POD struct -- it only concerns the layout of complex.
> The most obvious implementation, in fact, is to declare an
> array of two T in complex, accessing data[0] or data[1],
> according to whether the real or the imaginary part is
> needed.

Again, I'm referring to the fact. And I assumed that many library
implementations used two named members for the real and the imaginary
parts but that still had the proposed layout. At least, the GNU
Standard C++ Library v3 does that. If there is an implementation that
uses two named members and thus fails to have the proposed layout, I
will be interested to hear about it.

> 3. The motivation is strictly compatibility with other
> languages -- originally Fortran, and now C99 as well.

Yes, compatibility is important; not only in complex but probably also
in other data structures.

> In fact, this defect report is completely irrelevant to the
> discussion here.

Not completely, if you understand what I really meant.

--
Seungbeom Kim

Bart van Ingen Schenau

unread,
May 24, 2006, 2:50:20 PM5/24/06
to
Tomás wrote:

> (Out of curiosity: A char has no alignment requirements... how would
> aligning it on a 4-byte boundary make it any faster?)

Because then the compiler can use fast word-sized instructions to access
each struct member, instead of using, or worse: emulating, much slower
byte-sized instructions.

>
>
> -Tomás
>

Bart v Ingen Schenau
--
a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq
c.l.c FAQ: http://www.eskimo.com/~scs/C-faq/top.html
c.l.c++ FAQ: http://www.parashift.com/c++-faq-lite/

Tomás

unread,
May 24, 2006, 5:28:44 PM5/24/06
to
Bart van Ingen Schenau posted:

> Tomás wrote:
>
>> (Out of curiosity: A char has no alignment requirements... how would
>> aligning it on a 4-byte boundary make it any faster?)
>
> Because then the compiler can use fast word-sized instructions to access
> each struct member, instead of using, or worse: emulating, much slower
> byte-sized instructions.


But would it not have to use 16/32 Bit int's? If so, I demonstrated in
another post how this would cause problems.


-Tomas

James Kanze

unread,
May 25, 2006, 10:48:36 AM5/25/06
to
wa...@stoner.com wrote:
> kanze wrote:

>> What's wrong with simply:

>> class Point3D
>> {
>> public:
>> double getX() const { return coords[ 0 ] ; }
>> double getY() const { return coords[ 1 ] ; }
>> double getZ() const { return coords[ 2 ] ; }
>> double* getXYZ() const { return coords ; }
>>
>> // Since you mention STL...
>> double* begin() const { return coords ; }
>> double* end() const { return coords + 3 ; }
>> size_T size() const { return 3 ; }
>> // ...

>> private:
>> double coords[ 3 ] ;
>> } ;

>> (or just x(), y(), z() and xyz(), if you prefer).
>
> What's wrong:
> 1) In the debugger, when I look at a Point3D object, it shows
> me array values, not named values. Not only do I want to see
> all three values, I want to see all three names.

I take it you avoid std::string as well, because you don't see
it's value as you would expect in the debugger.

But if the semantics are those of named variables, why do you
want the array. There's a contradiction here.

> 2) Those literal integers are a maintenance issue. Use an
> enum instead, and you've got a different set of maintenance
> issues.

I'd say that those literal integers are the definition of x, y
and z. I don't see them as a great problem. And any solution
giving two names to a single object, and trying to keep them in
sync, will create maintenance issues.

> 3) You need two functions for each array element (one const
> getter, and one non-const setter).

So?

> 4) This isn't a POD (although that would be trivial to fix),
> meaning that any struct or array containing a Point3D is also
> not a POD.

Remove the private, above, and it is a POD, if that's what is
needed.

--
James Kanze kanze...@neuf.fr


Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

James Kanze

unread,
May 25, 2006, 10:48:50 AM5/25/06
to
No.E...@Address.ucar.edu wrote:
> Bart van Ingen Schenau posted:

>> Tom\ufffds wrote:

>>> (Out of curiosity: A char has no alignment requirements...
>>> how would aligning it on a 4-byte boundary make it any
>>> faster?)

>> Because then the compiler can use fast word-sized
>> instructions to access each struct member, instead of using,
>> or worse: emulating, much slower byte-sized instructions.

> But would it not have to use 16/32 Bit int's?

No. On at least some machines, writing a byte involves reading
a word, updating the byte, and then writing the word. If the
compiler aligns the char fields of a struct on a word boundary,
it can generate a write word instruction, rather than a write
byte.

> If so, I demonstrated in another post how this would cause
> problems.

I didn't see any demonstration of any problem. Where's the
problem -- the language makes no guarantee concerning the values
of padding bytes.

--
James Kanze kanze...@neuf.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

---

kuy...@wizard.net

unread,
May 25, 2006, 11:53:36 AM5/25/06
to
John Nagle wrote:
> kuy...@wizard.net wrote:
.

> > Of course that doesn't work with the standard as it is currently
> > written. The point of his message was to propose that standard be
> > changed to prohibit such padding between consecutive data members of
> > the same type.
>
> It's certainly common in networking code to assume the obvious
> placement of structure elements. Is that assumption supported
> by the standard, or not?

Not. Whether padding which is not mandatory to achieve alignment is
allowed is controversial. However, padding which is needed to achieve
proper alignment clearly is permitted, and that's sufficient to violate
any assumption that structures are packed.

James Kanze

unread,
May 25, 2006, 11:46:13 PM5/25/06
to
wa...@stoner.com wrote:
> kanze wrote:

>> And what would that buy us?

> An ability to interface with other languages at a higher level.

> I'm communicating with Fortran. Some interface tells me that
> the floating point values associated with a particular pipe
> live at a particular location, ndx, in a large floating point
> array, d[]. For instance, the pipe's outside diameter is
> stored at an offset of -10, and the pipe's minimum elevation
> is stored at offset +54.

This is sort of an argument. Of course, anything you do to
interface with Fortran will be implementation dependant, so you
don't need any additional guarantees from the standard. And I
sort of think that some sort of wrapper interface would be the
preferred solution.

[...]


> However, a much better solution (for us) assumed that floats
> were adjacent in memory

> struct PipeD
> {
> .
> float ODiameter;
> .
> float MinElev;
> .
> };

> with a factory that uses placement-new or reinterpret_cast to
> "interpret" a hunk of the array as a PipeD. (Placement new
> has the advantage of being standard-conforming.
> reinterpret_cast has the advantage that it better expresses
> the intent. Neither generates any code, since the
> construction of a POD-struct is a no-op).

And what's the problem? I do this myself, in some of my code.
It's undefined behavior according to the standard, and it is
neither guaranteed by the standard nor portable. But it works
in the implementation I'm targetting, and since it is being used
to implement things that are very implementation specific
anyway, that's sufficient.

> The main advantages of PipeD are:

> 1) In the debugger, it is nice to be able to see named struct data
> members.
> 2) When C++ becomes the "source" development platform, you don't have
> to maintain those magic integers (the "54" associated with MinElev) by
> hand.

I'm not sure that maintaining those "magic integers" is any
worse than ensuring the alignment of the two structs. (I'm not
familiar with modern Fortran, but in the old days, the
specification in Fortran would use magic integers. And in such
cases, I'd almost certainly write a mini-parser to generate the
C++ classes based on the Fortran code.)

[...]


> I currently use implementations which pack POD elements
> tightly, and have found that to be a useful feature to
> exploit. I'm not yet convinced that the feature is so useful
> that it should be standardized. However, it does seem to be
> existing practice.

I currently have code which counts on the fact that the Solaris
ABI forbids padding between char arrays in a struct. The code
is very Solaris dependant in many ways, however, so this doesn't
bother me too much. I can see the utility in implementation
specific situations, such as the one you describe. I just don't
see any necessity, or even any advantage, of making it a
standard requirement.

--
James Kanze kanze...@neuf.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

---

SuperKoko

unread,
May 25, 2006, 11:49:17 PM5/25/06
to

Greg Herlihy wrote:
>
> §9.2/18 states: "There might therefore be unnamed padding within a

> POD-struct object, but not at its beginning, as necessary to achieve
> appropriate alignment."
>
> The "as necessary" part certainly seems to require that any padding
> separating members in a POD-struct has to be in fact necessary for the
> purpose of attaining appropriate alignment of the struct's members.
>
Necessary?
I have counter examples. x86 CPU have no alignment requirements (I know
that's somewhat special).
But several popular compilers align 32 bits integers on 4 bytes
boundaries, doubles on 4 or 8 bytes boundaries, etc.
It is not necessary, but it is faster.
For character, aligning on 4 bytes boundaries is faster too!
But for arrays, it doesn't worth it, because it is an horrible speed
pessimization.
That's why, characters allocated on the stack, are often aligned on 4
bytes boundaries (which avoid read/write of the same memory word when
manipulating two different characters).
This can be a good reason to do so for structures, as Francis
Glassborow said:
Francis Glassborow wrote:
>
>No that is not true. Let me give you an example:

>
>
> struct X {
> char a, b, c;
>
>
> };
>
>
> If you set your compiler switched to optimise for space it will usually
> just add a byte of padding at the end. If you set it for speed, on some
> platforms it might add three bytes of padding between each of a, b and
> c. (Optimum alignment on 32 bit word systems). Nonetheless for an array
> of char it has to take the no padding route.
>
On x86 CPU, when doing things such as x.a=x.b+x.c, the speed improvment
may be noticeable.

> But we also know that such padding can never be needed to separate two
> adjacent objects of the same type in memory. It makes no difference how
> those two objects came to be adjacent - whether they are elements in
> the same array, or members of the same struct or are adjacent simply by
> sheer chance - makes no difference, because the alignment requirements
> for a type are both constant and universal.

Constant and universal?
But, it depends on #pragma for some compilers!
For instance, with Borland C++ 5.0 (32 bits x86)

#pragma pack(2)
struct X {
int d;
};
#pragma pack(4)
struct Y {
char a;
X b;
int c;
};
the b field has offset 2 in the Y structure (thus the padding of int
doesn't seem to be 4 here).
the c field has offset 8 (and thus, there is 2 bytes of padding between
b and c).
The final structure is:
[a][ ][d][d][d][d][ ][ ][c][c][c][c]
Where [a] is a byte of Y::a, [ ] is a padding byte, [d] is a byte of
X::d and c is a byte of Y::c

> objects of the same type need no padding to separate them when residing
> in an array means that no two objects of the same type anywhere can
> ever require padding between them.
>

X::d and Y::c is almost a counter example.

Tomás wrote:
> When I first read that, I mistakenly thought you meant that the compiler
> may change it to the following behind your back:
>
>
> struct X {
>
>
> int a, b, c;
>
>
> };
>
>
> telling you that they are char's, when in actual fact they are actually
> int's... so I was going to demonstrate how that could be a problem:-
>

No, like that:
struct X {
char a;char padding0[3];
char b;char padding1[3];
char c;char padding2[3];
};


> (Out of curiosity: A char has no alignment requirements... how would
> aligning it on a 4-byte boundary make it any faster?)

Read the IA-32 architecture manual.
x86 32 bits CPU are not really able to read a single byte at a time.
They take out from the cache, at least 4 bytes (aligned on 4 bytes
boundaries), and are able to extract the single byte from the 4 bytes
word.
Even when data is not aligned, you can read a 4 bytes word, but it
internally requires reading two 4 bytes word, and computing the value,
which requires CPU cycles (and is NOT necessary).
But, even for 1 byte data, it can be faster to have two bytes in two
different memory words, because it reduces read/write dependencies.
Thus, something like x.a=x.b+x.c will be faster (on pentium CPU for
instance) with padding bytes than without.

> So are there actual implementations out there which may put padding

> between members of the same type within a POD struct?
I am not aware of any (but I know only few compilers).
But I know that many compilers (including Borland C++) put padding
bytes between automatic variables on the stack:
For instance:
int f() {
char a,b,c,d; // assume that these local variables are not put in
registers.
}
Then a,b,c and d will not be contiguous... Each one will be stored in
its single 4 bytes word.
Note also that the calling conventions (__cdecl and __stdcall) on x86
CPU does require alignment on 4 bytes boundaries for each argument:
Thus
int f(char a, char b, char c);
Will have padding bytes... And will use 12 bytes of stack!

This can be a reason why compiler may want to do the same thing in
structs (and even document it).
Like that, the compilers may allow (in C code) to call a function with
a "wrong" prototype, replacing several parameters by structures having
the same layout.

With IA-32 there are two types of alignments :
The "required" alignment which is always 1
The "speed optimal" alignment which depends on the data type.
With SSE, it's even more complex : 32 bits floats arrays may be
"faster" when aligned on 128 bits boundaries.
Thus, it may be perfectly sensible for a compiler to put some padding
in this structure:
struct X{
int x,y;
float z;
float arr[4];
};
Between X::z and X::arr[0], in order to align arr on 128 bits.

It may be sensible, for a compiler, to use one alignment, or the other,
depending on #pragmas given by the user...
It may even be possible that a user specify individually for each field
whether the field must be "fast" or "compact".
In that case:
struct X {
char a;
__compact int b; // this field will not be accessed often
__fast int c; // this field will be accessed very often, thus the
programmer wants fast accesses.
};
would yield a structure where offsetof(X,b)==1, but offsetof(X,c)==8.
And thus, there would be some padding between two consecutive integers.

Sengbeom Kim wrote:
> Suppose you have a struct with members named x, y, and z, and you could

> want to refer to them sometimes by names, and sometimes by indices. The


> former because it's more natural and closer to the problem domain, and
> the latter because it's better suited for across-the-board operations
> (and can even benefit from standard algorithms such as std::for_each,

> std::transform, etc.). Without any guarantee from the standard, though,
> you are forced to write something like:
>
There are many alternatives:
First, use an array, and provide accessors:
struct point {
int coord[3];
int& x() {return coord[0];}
int& y() {return coord[1];}
int& z() {return coord[2];}
const int& x() const {return coord[0];}
const int& y() const {return coord[1];}
const int& z() const {return coord[2];}
};
Second, use an enumeration to give names to the coordinates:
struct point {
enum Coordinate {x,y,z};
int coord[3];
};

then p.coord[x] will identify the first coordinate.
This eumeration can be used everywhere where a "name of coordinate" is
expected.
And, for convenience, it may even be possible to overload
point::operator[](Coordinate)

Third, assuming that you may want the padding (for speed reasons) that
the compiler may put between fields, you don't want an array.
Then, you can still use pointer-to-members or the offsetof macro, and
put all the pointer-to-members in an array with static-storage
duration.

kanze wrote:
> And who defines "as necessary"? According to what criteria? On
> an IA-32, padding is not "necessary" in:
> struct S { char c ; double d ; } ;
> Every compiler I know inserts some padding between c and d,
> however, at least by default.
>

Which doesn't mean that every compiler do. For instance Borland C++
puts no padding by default.
I *perfectly agree* with the argument, though.
I just said that to make clear that implementations are not all the
same.

John Nagle:


> It's certainly common in networking code to assume the obvious
> placement of structure elements. Is that assumption supported
> by the standard, or not?
>

On x86 CPU there are at least two *different obvious* placement.
The no-padding placement (it is the easiest to use when you want a
specific scheme).
The padding-for-speed placement.
Since it may be very useful in networking, to have a very specific
layout, many popular compilers have a #pragma pack(1) or another way to
specify that structures have no padding.

There would be several big problems if the C++ comittee adpots this
proposal:
1) It would be necessary to convince the WG14 (C comittee) to do the
same modification to the C standard.
Otherwise this incompatibility would be a big problem when porting C++
code to C, or, simply, for C++ to C compilers.
2) It would introduce an incompatibility between C++ and *all* existing
C and C++ compilers.
Thus, for example, a C++0x compiler would not be able to have a
portable C++ to C compiler.
While, nowadays, it is possible to write a C++ to C compiler which
works with any C89 compiler.

wa...@stoner.com wrote:
> An ability to interface with other languages at a higher level.
>

It is the exact opposite.
Such binary interface specification is not the matter of the C++
standard.
layout specification are a good thing, but they depend on the
platform...
And often, when a compiler vendor wants to port his C++ compiler to
that platform, he wants to respect as far as possible, the layout
specifications of the platform.
That is *exactly why* the C++ comittee must not specify it...
Otherwise, it would forbid some implementations to respect the layout
of structures of some weird platform.
The base idea of C (and in some measure, C++) is that it abstracts all
differences & details of platforms.
That's why, C and C++ are extremely portable languages, at the cost of
"unspecified things".

Alternatively, the C++ comittee could specify the memory layout of
everything on *all existing platforms*.
But that would be stupid, because there are a huge (and growing) number
of platforms, and I don't see why every platform would accept that the
comittee impose them a memory layout that they don't like... Because
for example, they have currently another memory layout that they like.

Nothing says that C++ implementations are not allowed to specify the
memory layout of structures.
And many, many C++ implementations do specify such layouts.
Of course, if you want to write a very portable application (and it is
not always a concern), you can't use all platform-specific things.

wa...@stoner.com wrote:
> Its not the case that all-the-world is C++ (or even C++ with one little
> island of C). When you want to interface to another language, you need
> to have some way of describing your data layouts, or accepting the
> other guy's data layout. Unfortunately, the most complex layout the
> standard gives us is for array of small integers called unsigned char
> (once you look up CHAR_BIT, and then make assumptions about bit-order
> and byte order). In practice we tend to make additional assumptions
> (it is a pretty good bet that the four-byte C++ float on your platform
> is pretty much the same as the four-byte Fortran REAL on the same
> platform, ...).

> I currently use implementations which pack POD elements tightly, and
> have found that to be a useful feature to exploit. I'm not yet
> convinced that the feature is so useful that it should be standardized.
> However, it does seem to be existing practice.

It IS often standardized.
http://www.google.com/search?hl=en&lr=&q=C+ABI+%22system+V%22&btnG=Search

And you can assume things on the memory layout, on EACH specific
platform.
But the point is that there are many platforms.
And on each platform, there is, a more or less standard, at least a
reference specification of the binary layout.
This reference specification depends on the platform.
There are platforms where you can assume that there is no padding
between chars... But there are perhaps platforms where you can assume
that there is padding between chars... And if there was no padding, the
compiler would not be "compliant" with the ABI of this platform.
However, the idea is that C++ programmers don't need to know all ABI...
Because C++ is a high-level language.
They only need to know the C++ standard, and can, more or less, assume
that all languages, on a specific platform, interact gracefully.
C++ implementers have to papers to read:
The C++ standard.
And the reference ABI of the target platform.

Languages interactions are platform-specific, and thus, is a matter of
compiler implementers/platform-specific comittees.


wa...@stoner.com wrote:
> 2) Those literal integers are a maintenance issue. Use an enum
> instead, and you've got a different set of maintenance issues.
>

But your code has terrible maintenance issues....
inverting z and y would completely change the structure...
Seriously, I would not want to have to maintain such terrible thing.


Seungbeom Kim wrote:
> Again, I'm referring to the fact. And I assumed that many library
> implementations used two named members for the real and the imaginary
> parts but that still had the proposed layout. At least, the GNU
> Standard C++ Library v3 does that. If there is an implementation that
> uses two named members and thus fails to have the proposed layout, I
> will be interested to hear about it.
>

Compilers implementers do know what the layout of their structures is.
And it is very probable that many implementations use a memory layout
where it works.
For instance GNU libstdc++ respects a specific ABI.

Seungbeom Kim:


> Yes, compatibility is important; not only in complex but probably also
> in other data structures.
>

You fail to understand that different platforms use different layouts
It is impossible to give a memory layout which would work (efficiently)
on all platforms.
Each platform must specify its memory layout.

For instance, on IA-32, the calling convention is not a big deal when
communicating between Fortran and C++, because there are only a few
well-defined calling conventions : Mainly __cdecl and __stdcall.
Do you think that the C++ standard should specify these calling
conventions?
At least I hope that the C++ standard will never specify that in:
void f(char c1, char c2);

(&c2)==((&c1)+1)
Because it would invalidate the binary compatibility of C++0x compilers
with C++98/C99/Fortran/Ada/and_others compilers on this platform.

4zum...@gmail.com


> Outside of unions, you obviously don't want to be able to access these
> classes through each other, as it would hurt optimising a lot, as many
> more things would alias with each other.

I have thought about that statement, and know I think that the compiler
can't really do no-alias optimizations (except if he really puts
padding between elements).

struct X {
double x,y,z;
};
int main() {
X k:
double *p=&k.x; // ok
if (p+1 /* ok : pointer one past the end of an array is valid (as
described in 6.5.6p7 and p8) */ == &k.y) {
/* in this code, p[1] is an alias of y */
/* Moreover, p+2 is valid here (one past the end of y) */
if (p+2 == &k.z) {
/* now p[2] is an alias of k.z */
}
}
}

So, I don't think that compilers are able to do no-alias optimizations.
However, from a standard point-of-view, it is forbidden to do that
except if:
1) The implementation effectively documents this layout.
2) OR, the code explicitly tests it with operator==
Otherwise, an hypotetical implementation may do a no-alias optimization
(even if it is practically impossible).

Tomás wrote:
> But would it not have to use 16/32 Bit int's? If so, I demonstrated in


> another post how this would cause problems.

No, it can just use the IA-32 byte move instruction such as "mov al,
byte ptr [esi]"
Pentium & higher CPU, are not internally able to work on bytes... And
they move a whole word.
Thus, having each byte in a separate word reduce the number of
read/write dependencies.

Ron Natalie

unread,
May 26, 2006, 5:15:37 PM5/26/06
to
Greg Herlihy wrote:


> And if it were the case that C++ compilers were free to add whimsical
> padding between struct members of the same type - then it should be
> possible to find a C++ compiler that does in fact do so.
>

You're making a leap that the wording of the standard doesn't support.
Many implementations can deal with different alignment issues at
different performance. Padding to increase performance isn't whimsical
and is permitted provided it isn't at the beginning of the structure.

Reply all
Reply to author
Forward
0 new messages