C structs & A question about octet

Lawrence W. McVoy

unread,

Nov 4, 1986, 6:16:27 PM11/4/86

to

I have a question about alignment and padding. I have noticed (context:
Vax 780, 4.3BSD) that the c compiler pads out struct sizes to be
long word aligned. And it does the pointer arithmetic based on the
padded sizes. (no sh*t, sherlock, one would hope that they are the same)
For instance,

typedef struct {
char byte;
short word;
} three_bytes;

sizeof(three_bytes) == 4, not 3.

three_bytes* p = 100;

p == 100, p+1 == 104, not 103.

For all of you that knew this, you're all saying big deal, so what? Well,
I do (did) stuff like this all the time:

head = (three_bytes*)calloc(N, sizeof(three_bytes));

This wastes N bytes. Sometimes N is around 10 to the 7th or 8th. Bad news.
The fact that pointer arith is "wrong" makes this very icky to work around
even if you are aware of the problem. Anyone have any comments or
suggestions? Does everyone except me know about this?

Also, what's this about alignment that I hear all the time? If compilers are
already aligning things for you, why bother to do it explicitly? You might
say "so it works on stupid compilers" but who are you to say what alignment
should be? I mean, if you port code to a machine with 24 bit alignment and
you've carefully aligned all your stuff to 32 bit boundries, you've screwed
yourself. No fun. Also no gain.

OK, next question: I want to define some types to hold bytes, words, and
long words, where byte == 8 undsigned bits, word == 16 unsigned bits, and
long words == 32 unsigned bits. I want to give them nice names, names
that imply the number of bits. I could use u8, u16, and u32, but I
don't *like* those names. I thought I had a better plan:
use octet for the byte
use hexdectet for sixteen
use <latin for two>, latin for 30> for 32
but 32 turned out to be "duotrentet" or something and that's ugly. So
does anyone have any better names? Something nice and intuitive and
not ugly? How about Greek? How do they spell them?
--
Larry McVoy mc...@rsch.wisc.edu,
{seismo, topaz, harvard, ihnp4, etc}!uwvax!mcvoy

"They're coming soon! Quad-stated guru-gates!"

Tim Pozar

unread,

Nov 5, 1986, 12:22:25 AM11/5/86

to

Funny you were mentioning structure alignments. I was just writing a
programme that plays with the PSP on a MS-DOS machine. I couldn't figure out
why the name of the file was always cut off by two bytes. Oh! the structure
is aligned on the int boundary. Geezsh. But there is a switch for the Micro-
soft 4.0 C compiler (/Zp) that packs structure members.

This is for all...
Is there any spec that a puts() should a \n at the end of everything? My
Microsoft 4.0 compiler does it, and I can't find any reference that describes
puts() doing something like that in K&R. Is this a new standard?
Tim Pozar
______________________________
| |
| UUCP: ihp4!hplabs!well!pozar |
| Fido: 125/406 Sysop |
|______________________________|

BEATTIE

unread,

Nov 5, 1986, 9:22:14 AM11/5/86

to

> OK, next question: I want to define some types to hold bytes, words, and
> long words, where byte == 8 undsigned bits, word == 16 unsigned bits, and
> long words == 32 unsigned bits. I want to give them nice names, names
> that imply the number of bits. I could use u8, u16, and u32, but I
> don't *like* those names. I thought I had a better plan:
> use octet for the byte
> use hexdectet for sixteen
> use <latin for two>, latin for 30> for 32
> but 32 turned out to be "duotrentet" or something and that's ugly. So
> does anyone have any better names? Something nice and intuitive and
> not ugly? How about Greek? How do they spell them?
> --
> Larry McVoy mc...@rsch.wisc.edu,
> {seismo, topaz, harvard, ihnp4, etc}!uwvax!mcvoy

I use my own typedefs for portability.
I simply redefine the typedefs to get the required length and
characteristics.
For example SINT32 is a Signed 32 bit Integer anywhere I go and
P - Positive, U - Unsigned
They are nice and intuitive and not very ugly :-)
My typedefs for the VAX are:
typedef long PINT32;
typedef long SINT32;
typedef unsigned long UINT32;
typedef long BIT32;

typedef short PINT16;
typedef short SINT16;
typedef unsigned short UINT16;
typedef short BIT16;

typedef char PINT8;
typedef char SINT8;
typedef unsigned char UINT8;
typedef char BIT8;

---
Tom.
T.W. Beattie
...!{ihnp4 | houxm | whuxl | ulysses}!hoqax!twb
...!{decvax | ucbvax}!ihnp4!hoqax!twb

Guy Harris

unread,

Nov 5, 1986, 2:29:14 PM11/5/86

to

> I have a question about alignment and padding. I have noticed (context:
> Vax 780, 4.3BSD) that the c compiler pads out struct sizes to be
> long word aligned.

Well, no, actually, it doesn't. It *aligns* members of structures on the
same boundaries that it would align non-structure-member items of the same
type. Thus, in your example, the "short" would be aligned on a 2-byte
boundary; since the only thing that precedes it is a single "char", it would
require one padding byte between them.

On some machines, it simply doesn't have much of a choice. It *could*,
presumably, just pack the structure as tightly as possible; however, it
would then have to generate a *lot* of extra code to access members not on
their "proper" boundary. (No, Virginia, not all machines allow
arbitrary-boundary references to "short"s, "int"s, "long"s, etc..)

On other machines, it could pack the structure as tightly as possible,
because those machines do allow arbitrary-boundary references to "short"s,
"int"s, etc.. It would just mean the code would run more slowly, since few
(if any) machines that allow arbitrary-boundary references do them as
quickly as proper-boundary references.

> For all of you that knew this, you're all saying big deal, so what? Well,
> I do (did) stuff like this all the time:
>
> head = (three_bytes*)calloc(N, sizeof(three_bytes));
>
> This wastes N bytes. Sometimes N is around 10 to the 7th or 8th. Bad news.
> The fact that pointer arith is "wrong" makes this very icky to work around
> even if you are aware of the problem. Anyone have any comments or
> suggestions? Does everyone except me know about this?

Allocating arrays that big is relatively uncommon; C's padding rules make
the more common cases work well, and as such are doing the right thing. I'd
suggest you allocate 3*N bytes as a single array, and then extract the
"short" yourself. NOTE: if you absolutely insist on doing this extraction
by casting a pointer to the byte following the "char" into a "short *", and
just dereferencing that pointer, surround that code with some "#ifdef" and
put a more portable version in the "#else" clause!
--
Guy Harris
{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
g...@sun.com (or g...@sun.arpa)

Peter S. Shenkin

unread,

Nov 6, 1986, 2:09:07 AM11/6/86

to

In article <sun.8943> g...@sun.uucp (Guy Harris) writes:

> [words to the effect that structures are internally padded so that members
> wind up on word-boundaries most efficient for that machine]

>> For all of you that knew this, you're all saying big deal, so what? Well,
>> I do (did) stuff like this all the time:
>>
>> head = (three_bytes*)calloc(N, sizeof(three_bytes));
>>
>> This wastes N bytes. Sometimes N is around 10 to the 7th or 8th. Bad news.

>> ...Anyone have any comments or suggestions?

>
>Allocating arrays that big is relatively uncommon; C's padding rules make
>the more common cases work well, and as such are doing the right thing. I'd
>suggest you allocate 3*N bytes as a single array, and then extract the

>"short" yourself....

Another way to do it: put your short and your char in different structures,
and allocate storage for them separately. The program won't be as easy to
read, and you won't feel virtuous (what, don't YOU feel virtuous when you
write a well-structured program?), but as Guy points out this is a time/storage
trade-off, made by most (if not all) C compilers in favor of time, and at the
expense of storage. Even if you had a compiler which allowed you to resolve
the issue in favor of storage, doing it that way would significantly, perhaps
prohibitively, increase the execution time, assuming you really do have to
do something to all those structure elements beyond allocating space for them.

Time/storage is the best-known trade-off in the programming world, but there
are others; for instance, programming_ease/program_performance and
program_readability/program_performance. Yours is obviously a case of the
latter, and to some extent of the former as well.

Peter S. Shenkin Columbia Univ. Biology Dept., NY, NY 10027
{philabs,rna}!cubsvax!peters cubsvax!pet...@columbia.ARPA

Gregory Smith

unread,

Nov 6, 1986, 12:21:06 PM11/6/86

to

In article <29...@rsch.WISC.EDU> mc...@rsch.WISC.EDU (Lawrence W. McVoy) writes:
>
>I have a question about alignment and padding. I have noticed (context:
>Vax 780, 4.3BSD) that the c compiler pads out struct sizes to be
>long word aligned. And it does the pointer arithmetic based on the
>padded sizes. (no sh*t, sherlock, one would hope that they are the same)
>For instance,
>
> typedef struct {
> char byte;
> short word;
> } three_bytes;
>
> sizeof(three_bytes) == 4, not 3.
>
> three_bytes* p = 100;
>
> p == 100, p+1 == 104, not 103.
>
>For all of you that knew this, you're all saying big deal, so what? Well,
>I do (did) stuff like this all the time:
>
> head = (three_bytes*)calloc(N, sizeof(three_bytes));
>
>This wastes N bytes. Sometimes N is around 10 to the 7th or 8th. Bad news.

My, what a big computer you have...

>The fact that pointer arith is "wrong" makes this very icky to work around
>even if you are aware of the problem. Anyone have any comments or
>suggestions? Does everyone except me know about this?

The arithmetic is not 'wrong'. On a vax, 16-bit quantities must be at
an even address to guarantee that they can be written and read easily
( I think the VAX can read and write any 16-bit field, but it takes more
code, and is much slower. This is not what you want).
Similarly, 32-bit quantities ( ints and longs ) must be at an address
which is a multiple of four.

The struct three-bytes is layed out like this:

0: byte
1: ( not used )
2: word ( lo byte
3: hi byte )

The 'not used' byte makes sure that the word will be on an even address,
and furthermore the compiler will make sure that all structs of this type
start on an even address.
Thus it is of size four. structs are not always padded out to a multiple
of four; a struct consisting of 3 char's will have sizeof = 3.

Suppose you put byte *after* word. You will get this:

0: word ( lo byte
1: hi byte )
2: byte
3: not used

And sizeof is still 4. Why? suppose you make an array of these things.
In order to ensure that the 'word's are on even addresses, each of the
structs must start on an even address. Thus a blank byte must be inserted
after each to pad them out to four. If you have ' three_bytes *p',
then ++p must add '4' to the pointer to reach the next one. To keep
things consistent, sizeof() gives 4, and the extra byte is added to *any*
instance of the struct. Make sense yet?

If there is a 32-bit quantity within the struct, the whole struct must be
padded out to a multiple of four, for the same reasons. Check out
Kernighan & Ritchie, page 196.

If you can't afford this wasted memory, you can keep two separate arrays,
one containing the bytes and the other containing the words.

>
>Also, what's this about alignment that I hear all the time? If compilers are
>already aligning things for you, why bother to do it explicitly? You might
>say "so it works on stupid compilers" but who are you to say what alignment
>should be? I mean, if you port code to a machine with 24 bit alignment and
>you've carefully aligned all your stuff to 32 bit boundries, you've screwed
>yourself. No fun. Also no gain.
>

Mostly the compiler takes care of it. If you are using the same pointer type
to point to different object types, or other things like that, you may have to
go to some trouble to make the compiler take care of it. Most of the
discussion occurs because people want more ways to make their programs
portable. Again, you only get into trouble here if you are doing 'unusual'
things.

A classic example is a program which must create a sequence of
mixed structs of different sizes. The type ( and thus size ) of each
struct is determined by looking at its first member ( which would be of
the same type for all of them ). You can use a union containing all the
struct types, but then space will be wasted because they will all be the
same ( worst-case ) size.

The difficulty here is to write the program in such a way that all these
structs will be aligned properly, and to make it work on any machine.

--
----------------------------------------------------------------------
Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

Eric Lund

unread,

Nov 6, 1986, 3:10:48 PM11/6/86

to

"The C Programming Language", Kernighan and Ritchie, p. 196: "Each
non-field member of a structure begins on an addressing boundary
appropriate to its type; therefore, there may be unnamed holes in a
structure."

Kernighan and Ritchie say nothing of how big or small the unnamed holes
may be. I've used one C compiler that byte aligned chars in structs
until it encountered a short, then used even byte alignment until it
encountered a long, and then aligned everything on four byte
boundaries. There is nothing in the quoted sentence that prevents
anyone from writing a code generator that always aligns every non-field
member on a 4K boundary.

Mayhaps you can use fields for some of your needs, but "Field members
are packed into machine integers; they do not straddle words."
(p. 196), and "... implementations are not required to support any but
integer fields. Moreover, even int fields may be considered to be
unsigned." (p. 197) To increase your confidence in predicting the
behavior of the C compiler, p. 212 starts out "Purely hardware issues
like word size...".

(Ever tried sending a raw structure over a heterogeneous network
without benefit of RPC? Back when you were young and foolish and
didn't know that VAXen and SUNs have different byte orders? And you
thought that structs were all the same? You have my sympathies!)

You could name bit types of differing lengths VIII, XVI, and XXXII,
but I don't think that that's what you intend.

Eric the DBA

Disclaimer: any opinions found herein are mine. Please return them;
no questions asked.

Doug Gwyn

unread,

Nov 6, 1986, 9:22:47 PM11/6/86

to

In article <20...@well.UUCP> po...@well.UUCP (Tim Pozar) writes:
> Is there any spec that a puts() should a \n at the end of everything?

Yes; this is what the original puts() did and what every puts() I
have ever seen does. Note that fputs() does NOT append a newline.

Henry Spencer

unread,

Nov 7, 1986, 2:32:55 PM11/7/86

to

> Is there any spec that a puts() should a \n at the end of everything? My
> Microsoft 4.0 compiler does it, and I can't find any reference that describes
> puts() doing something like that in K&R. Is this a new standard?

No, it's an extremely old one. You won't find puts() (or its friend gets())
in K&R at all -- they are too old and too thoroughly obsolete. Try using
fputs(), which is the modern equivalent (and does not add a newline).
--
Henry Spencer @ U of Toronto Zoology
{allegra,ihnp4,decvax,pyramid}!utzoo!henry