Who owns the variable in my header file ?

lipska the kat

unread,

Oct 3, 2012, 2:13:23 PM10/3/12

to

Hi

I have the following program
distributed over 4 files

/* foo.h */
int foo;

void fooset(int f);
int fooget(void);
void fooinc(void);

/* main.c */
#include <stdio.h>
#include <foo.h>

int main(int argc, char **argv){

fooset(10);
printf("foo is %d\n", fooget());

fooinc();
printf("foo is now %d\n", fooget());

return 0;
}

/* fooset.c */
#include <foo.h>

void fooset(int f){
extern int foo;
foo = f;
}

/* fooget.c */
#include <foo.h>

int fooget(void){
extern int foo;
return foo;
}

I run make on my makefile (I'm a beginner at make, Ant is more my thing)
and see a humungous great glob of bytes called foo.h.gch, looks like
foo.h has been compiled ... but I've no idea
why it's so huge.

-rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch

Anyway, the question is who 'owns' the foo declared in foo.h
Storage is obviously set aside as when I run the program I get the
expected output

foo is 10
foo is now 11

I guess this big old lump of bytes has something to do with it.

thanks

lipska

--
Lipska the Kat©: Troll hunter, sandbox destroyer
and farscape dreamer of Aeryn Sun

Eric Sosman

unread,

Oct 3, 2012, 3:05:41 PM10/3/12

to

On 10/3/2012 2:13 PM, lipska the kat wrote:
> Hi
>
> I have the following program
> distributed over 4 files
>
> /* foo.h */
> int foo;

> [...]

>
> /* main.c */
> #include <stdio.h>
> #include <foo.h>

> [...]

>
> /* fooset.c */
> #include <foo.h>

> [...]

>
> /* fooget.c */
> #include <foo.h>

> [...]

This is wrong, assuming all three modules are linked into
one program. Each module provides its own definition of the
variable `foo', and those three definitions collide. There
must be one and only one `foo' in the program, not three.

The way to accomplish this is to remove the *definition*
of `foo' from foo.h and replace it with a *declaration*. The
difference is not so hard to understand: It is the difference
between "I am Lipska" and "I know someone named Lipska." The
way you spell "I know someone named" in C is

/* foo.h */
extern int foo; // "I know an int named foo"
[...]

Each of the three modules thus gets an introduction to `foo'.
In exactly one of these modules (it doesn't matter which; just
pick one that makes the most sense) you also put a definition:

/* wherever.c */
#include <foo.h> // "I know an int named foo"
int foo; // "Yes, and here I am!"
[...]

(The defining module doesn't actually *need* the declaration,
since defining the variable also declares it. But it's a good
idea to use the #include anyhow, because if the compiler sees
both the declaration and the definition together it can alert
you if they disagree -- like if you change one to a `long' and
forget to change the other.)

Incidentally, your use of #include <foo.h> is suspect. The
<> form is for system-provided headers like <stdio.h>, while
programmer-provided header files should use #include "foo.h"
instead. Compilers search for <> and "" inclusions in different
places, and even if the mixup is sometimes harmless it is also
sometimes not so harmless.

> I run make on my makefile (I'm a beginner at make, Ant is more my thing)
> and see a humungous great glob of bytes called foo.h.gch, looks like
> foo.h has been compiled ... but I've no idea
> why it's so huge.
>
> -rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch

This looks like a "precompiled header," generated as a time-
saving step by the (wait for it...) C++ compiler you're using.
(You may have thought you were writing C, but the available
evidence suggests you've set up your build environment to use
C++ instead. Might want to check your setup ...)

> Anyway, the question is who 'owns' the foo declared in foo.h

If you have exactly one definition (as C requires), you might
say that `foo' is "owned" by the module where that definition
appears. (Or you might not; once the modules are linked together,
all global variables are on an equal footing and might as well be
said to be "owned" by the entire program.)

If you have three colliding definitions -- well, there's no
useful way to answer questions about undefined behavior.

> Storage is obviously set aside as when I run the program I get the
> expected output
>
> foo is 10
> foo is now 11

As I hope you're beginning to learn, "It worked" does not
imply "It's right." The possible manifestations of undefined
behavior include "It did (or seemed to do) what I expected."

> I guess this big old lump of bytes has something to do with it.

Possibly, but probably not.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Alan Curry

unread,

Oct 3, 2012, 3:17:10 PM10/3/12

to

In article <k4i2a7$uhj$1...@dont-email.me>,

Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
>On 10/3/2012 2:13 PM, lipska the kat wrote:
>> I run make on my makefile (I'm a beginner at make, Ant is more my thing)
>> and see a humungous great glob of bytes called foo.h.gch, looks like
>> foo.h has been compiled ... but I've no idea
>> why it's so huge.
>>
>> -rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch
>
> This looks like a "precompiled header," generated as a time-
>saving step by the (wait for it...) C++ compiler you're using.
>(You may have thought you were writing C, but the available
>evidence suggests you've set up your build environment to use
>C++ instead. Might want to check your setup ...)

gcc creates those files in C mode too, when you run gcc -c foo.h

>
>> Anyway, the question is who 'owns' the foo declared in foo.h
>
> If you have exactly one definition (as C requires), you might
>say that `foo' is "owned" by the module where that definition
>appears. (Or you might not; once the modules are linked together,
>all global variables are on an equal footing and might as well be
>said to be "owned" by the entire program.)
>
> If you have three colliding definitions -- well, there's no
>useful way to answer questions about undefined behavior.

Oh please. It's not unuseful to explain what actually happened. gcc made a
"common" symbol in each object file, and the linker merged them. This
behavior may not be standardized but it's not hard to explain, and after
you've explained it you can add that there are ways to change it:

compile with -fno-common and the common symbol will be changed to a normal
symbol, and the linker will fail when it sees multiple normal symbols with
the same name. This way your program won't link until you've fixed it to obey
the "one owner" rule.

Or, assuming the GNU linker is being used, link with -Wl,-no-common which
will do the merging of common symbols but also warn you about them, allowing
you to use the program while providing a reminder that you still have some
work to do to make it portable.

--
Alan Curry

Kaz Kylheku

unread,

Oct 3, 2012, 3:40:37 PM10/3/12

to

On 2012-10-03, lipska the kat <lipska...@yahoo.co.uk> wrote:
> Hi
>
> I have the following program
> distributed over 4 files
>
> /* foo.h */
> int foo;

This is an external *definition* and not merely a declaration, so you have
screwed up. As soon as this file is included in more than one translation unit,
foo becomes multiply defined. You want:

extern int foo;

This is purely a declaration now because of two features: the presence of the
extern, and the lack of an initializer. Now you need a definition of foo
somewhere.

Pick a source file where foo is going to reside, and and put an "int foo;"
there. Also, a good idea is to ensure that this source file includes
the header. If you ever change the type of foo, you get type checking
between the declaration in the header and the definition in that file.

> void fooset(int f);

Note that no "extern" is needed on function declaration, because declarations
without bodies are only declarations and not definitions. You can use extern
if you want:

extern void fooset(int f);

Putting extern on function declarations could help you remember to add it to
object declarations.

> int fooget(void);
> void fooinc(void);
>
> /* main.c */
> #include <stdio.h>
> #include <foo.h>

It's a poor idea to use angle brackets on your own header file.

And in fact, in many environments it will not work. You must have passed some
option to your compiler to make it work.

Header files are searched for in two places. Double quotes like #include
"foo.h" specify that one place is searched, and if the header is not found
there, then the other place is searched. Angle brackets indicate that
only the second place is searched.

In many common environments, the first place that is searched is the
same directory which contains the source file which is invoking the #include.
And the second place (used by the angle brackets include, or as
fallback for single quotes) is a bunch of system and compiler header file
directories, outside of the program.

So #include <foo.h> won't work unless you tell the compiler that your
project directory is one of the system directories. That's a bad idea
because then if someone makes a header which clashes with the name of
some system header, that header may get mistakenly included.

> Anyway, the question is who 'owns' the foo declared in foo.h
> Storage is obviously set aside as when I run the program I get the
> expected output

Why it works is that your environment implements the "Relaxed Ref/Def model"
of C linkage. That is to say, it allows multiple external definitions
for an object. These external definitions are merged into a single definition
during linkage.

You are being allowed to get away with a programming error which, in a "Strict
Ref/Def:" linkage model will not allow your program to link.

It is sharp of you to raise a question mark about this, and wonder
"who owns foo". Good for you!

For more information about "Relaxed Ref/Def" and "Strict Ref/Def", find
a document called "Rationale for the ANSI C Programming Language".
This is covered in section 3.1.2.2 of that document.

http://www.lysator.liu.se/c/rat/title.html

This was written by the committee back in the late 1980's when they
standarized the first C programming language.

Ike Naar

unread,

Oct 3, 2012, 3:52:53 PM10/3/12

to

On 2012-10-03, lipska the kat <lipska...@yahoo.co.uk> wrote:

> I have the following program
> distributed over 4 files
>
> /* foo.h */
> int foo;
>
> void fooset(int f);
> int fooget(void);
> void fooinc(void);
>
> /* main.c */
> #include <stdio.h>
> #include <foo.h>
>
> int main(int argc, char **argv){
>
> fooset(10);
> printf("foo is %d\n", fooget());
>
> fooinc();
> printf("foo is now %d\n", fooget());
>
> return 0;
> }
>
> /* fooset.c */
> #include <foo.h>
>
> void fooset(int f){
> extern int foo;

This extern declaration of 'foo' is harmless but unnecessary since
the #include <foo.h> already provides such a declaration.

> foo = f;
> }
>
> /* fooget.c */
> #include <foo.h>
>
> int fooget(void){
> extern int foo;

Same here.

> return foo;
> }

What is missing here is fooinc.c, but let's assume it exists
and contains

/* fooinc.c */
#include <foo.h>

int fooinc(void)
{
++foo;

}

> I run make on my makefile (I'm a beginner at make, Ant is more my thing)
> and see a humungous great glob of bytes called foo.h.gch, looks like
> foo.h has been compiled ... but I've no idea
> why it's so huge.
>
> -rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch
>
> Anyway, the question is who 'owns' the foo declared in foo.h
> Storage is obviously set aside as when I run the program I get the
> expected output
>
> foo is 10
> foo is now 11
>
> I guess this big old lump of bytes has something to do with it.

Actually, the behaviour of your program is undefined, because
there is more than one external definition for int foo.
All translation units main.c, fooget.c, fooset.c and fooinc.c
provide an external definition of int foo. This violates 6.9 p5:

An external definition is an external declaration that is also a
definition of a function (other than an inline definition) or an
object.
If an identifier declared with external linkage is used in an
expression (other than as part of the operand of a sizeof or _Alignof
operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one.

Your linker will probably "fix" this, and merge all external
definitions into one, but if you want your program to be portable
you should not rely on this linker behaviour.
It would be cleaner to change the declaration 'int foo;'
in foo.h by 'extern int foo;', and add a new
translation unit, say foo.c, to contain the (single)
definition of int foo:

/* foo.c */
#include <foo.h>
int foo;

and link the object generated for foo.c to your program
along with the objects generated for main.c, fooget.c,
fooset.c and fooinc.c .

As for the foo.h.gch file, it's a precompiled header,
the result of compiling foo.h with gcc.
If you don't want this, don't compile foo.h,
use it only for inclusion by other *.c files.

Eric Sosman

unread,

Oct 3, 2012, 3:55:45 PM10/3/12

to

On 10/3/2012 3:17 PM, Alan Curry wrote:
> In article <k4i2a7$uhj$1...@dont-email.me>,
> Eric Sosman <eso...@ieee-dot-org.invalid> wrote:

>>[...]

>> If you have three colliding definitions -- well, there's no
>> useful way to answer questions about undefined behavior.
>
> Oh please. It's not unuseful to explain what actually happened. gcc made a
> "common" symbol in each object file, and the linker merged them. This
> behavior may not be standardized but it's not hard to explain, and after
> you've explained it you can add that there are ways to change it:

>[...]

I'm sticking with "no useful way."

--
Eric Sosman
eso...@ieee-dot-org.invalid

lipska the kat

unread,

Oct 3, 2012, 3:59:21 PM10/3/12

to

On 03/10/12 20:05, Eric Sosman wrote:
> On 10/3/2012 2:13 PM, lipska the kat wrote:
>> Hi
>>
>> I have the following program
>> distributed over 4 files

[snip]

> This looks like a "precompiled header," generated as a time-
> saving step by the (wait for it...) C++ compiler you're using.
> (You may have thought you were writing C, but the available
> evidence suggests you've set up your build environment to use
> C++ instead. Might want to check your setup ...)

Yes ... of course, this implies that I know how to change it :-)
I've done what I can with the hideously complex ... er I mean
feature rich gcc software to ensure that I am compiling as c

man gcc says

file.c C source code which must be preprocessed.

All my files end in .c

It also says

C Language Options -ansi

I also compile with the -ansi flag, e.g

gcc -ansi -c -g -I/<path> fooset.c foo.h

Not sure what else I need to do to make certain I'm compiling as C code.
I'll have a google.

I'll have to digest the rest of your response and respond accordingly
all I'm really trying to do is understand extern variables.

> As I hope you're beginning to learn, "It worked" does not
> imply "It's right." The possible manifestations of undefined
> behavior include "It did (or seemed to do) what I expected."

Well yes, but if I run a program 10 times with the same data and get the
same results each time I might start to think that something is 'right'.
If I design my test cases in the usual way (boundary cases and random
'middle ground' cases at the very least) then run those tests with the
same data and get the same output each time then I get a feeling that I
may be on the right path.

Is testing C code fundamentally different to testing code in other
languages ?

Many thanks for taking the time to reply.
It's much appreciated.

Eric Sosman

unread,

Oct 3, 2012, 4:32:50 PM10/3/12

to

On 10/3/2012 3:59 PM, lipska the kat wrote:
> On 03/10/12 20:05, Eric Sosman wrote:
>> On 10/3/2012 2:13 PM, lipska the kat wrote:
>>> Hi
>>>
>>> I have the following program
>>> distributed over 4 files
>
> [snip]
>
>> This looks like a "precompiled header," generated as a time-
>> saving step by the (wait for it...) C++ compiler you're using.
>> (You may have thought you were writing C, but the available
>> evidence suggests you've set up your build environment to use
>> C++ instead. Might want to check your setup ...)
>
> Yes ... of course, this implies that I know how to change it :-)
> I've done what I can with the hideously complex ... er I mean
> feature rich gcc software to ensure that I am compiling as c

From what other posters have written, it appears I guessed
incorrectly about the C/C++ distinction. You may in fact be
compiling your code as C -- but for some reason you're "compiling"
the header file itself. That's probably not what you want to do.

> > As I hope you're beginning to learn, "It worked" does not
> > imply "It's right." The possible manifestations of undefined
> > behavior include "It did (or seemed to do) what I expected."
>
> Well yes, but if I run a program 10 times with the same data and get the
> same results each time I might start to think that something is 'right'.

If you run it ten times with the same data, you're probably hitting
the same fragile set of coincidences each time. Running with different
data could be more illuminating -- although, as Dijkstra said, testing
cannot demonstrate absence of errors, but only their presence.

> If I design my test cases in the usual way (boundary cases and random
> 'middle ground' cases at the very least) then run those tests with the
> same data and get the same output each time then I get a feeling that I
> may be on the right path.
>
> Is testing C code fundamentally different to testing code in other
> languages ?

No, not fundamentally. It seemed to me you'd distributed
the middle of

"Correct programs work."
"This program is correct."
"Therefore, this program works."

to obtain

"Correct programs work."
"This program works."
"Therefore, this program is correct."
"BZZZZT! Thank you for playing."

... and you would certainly not have been the first to do so.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Edward A. Falk

unread,

Oct 3, 2012, 6:36:21 PM10/3/12

to

In article <EK-dnVoFlczbHfHN...@bt.com>,

lipska the kat <lipska...@yahoo.co.uk> wrote:

>Hi
>
>I have the following program
>distributed over 4 files
>
>/* foo.h */
>int foo;
>
>void fooset(int f);
>int fooget(void);
>void fooinc(void);

Perhaps the C standard has changed since I first read it, but AFAIK,
most of the answers so far have been wrong.

If you write this in your code:

int foo;

You've *declared* foo -- that is, described what it is -- but you haven't
*defined* it. There's a difference. In this case, no actual space
for foo has been allocated yet, and it's known as a "common" symbol.
If no module ever actually defines it, the linker will allocate space
for it at the end.

If you write

int bar = 1;

Now you've defined it. Space and an initial value for it will be
included in your module. If more than one module does this, there will
be a conflict.

So add "int bar = 1;" to your foo.h and then compile fooget into a binary.
In Unix, you can give the command "nm fooget.o" to see all the symbols
associated with fooget:

0000000000000000 D bar
0000000000000004 C foo
0000000000000000 T fooget

So you see that bar has been defined, while foo is merely a common
symbol. (And fooget is text -- executable code).

When you compile all of your .c files and link them together, the
linker notes that there is a common named "foo" which is never actually
defined by any module, and so it allocates space for it. Bar, on the
other hand, has been defined more than once, so you have an error:

cc -o foo main.o fooget.o fooset.o
fooget.o:(.data+0x0): multiple definition of `bar'
main.o:(.data+0x0): first defined here
fooset.o:(.data+0x0): multiple definition of `bar'
main.o:(.data+0x0): first defined here
collect2: ld returned 1 exit status

As you can see, it never complained about foo.

If you wanted, you could have defined foo in one place, say in main.c:

#include "foo.h"

int foo = 1;

...

Now, when we do "nm main.o", we get:

0000000000000004 D bar
0000000000000000 D foo
U fooget
U fooinc
U fooset
0000000000000000 T main
U printf

You see that foo and bar are both defined here.

Now, a couple of semi-off-topic points:

You should do

#include "foo.h"

instead of

#include <foo.h>

The <> form is for system header files. My compiler (gcc on Linux) barfed
on the includes until I fixed them. I don't know how you got it to compile
on yours unless you used '-I." or something.

As other posters have mentioned, you really should use "extern" in your
declarations.
--
-Ed Falk, fa...@despams.r.us.com
http://thespamdiaries.blogspot.com/

Keith Thompson

unread,

Oct 3, 2012, 7:21:37 PM10/3/12

to

fa...@rahul.net (Edward A. Falk) writes:
[...]

> Perhaps the C standard has changed since I first read it, but AFAIK,
> most of the answers so far have been wrong.
>
> If you write this in your code:
>
> int foo;
>
> You've *declared* foo -- that is, described what it is -- but you haven't
> *defined* it. There's a difference. In this case, no actual space
> for foo has been allocated yet, and it's known as a "common" symbol.
> If no module ever actually defines it, the linker will allocate space
> for it at the end.
>
> If you write
>
> int bar = 1;

Did you mean to change the name from "foo" to "bar"?

> Now you've defined it. Space and an initial value for it will be
> included in your module. If more than one module does this, there will
> be a conflict.

N1370 6.9.2p2:

A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or
with the storage-class specifier static, constitutes a *tentative
definition*. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains
no external definition for that identifier, then the behavior
is exactly as if the translation unit contains a file scope
declaration of that identifier, with the composite type as of
the end of the translation unit, with an initializer equal to 0.

So

int foo;

*can* be a definition, or at least can act as one.

> So add "int bar = 1;" to your foo.h and then compile fooget into a binary.

But then two or more translation units within your program can see that
definition, and you can get a "multiple definition" error.

Instead, add

extern int foo;

to "foo.h", and

int foo = 1;

to "foo.c". Then any translation unit that includes "foo.h" can use the
object "foo", which is defined in exactly one place in your program.
(You could drop the "extern" in foo.h, making it a tentative definition,
but adding "extern" is more explicit.)

(I'm snipping some context in which you make some of these same points.)

[snip]

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Kaz Kylheku

unread,

Oct 3, 2012, 9:43:01 PM10/3/12

to

On 2012-10-03, Edward A. Falk <fa...@rahul.net> wrote:
> In article <EK-dnVoFlczbHfHN...@bt.com>,
> lipska the kat <lipska...@yahoo.co.uk> wrote:
>>Hi
>>
>>I have the following program
>>distributed over 4 files
>>
>>/* foo.h */
>>int foo;
>>
>>void fooset(int f);
>>int fooget(void);
>>void fooinc(void);
>
> Perhaps the C standard has changed since I first read it, but AFAIK,
> most of the answers so far have been wrong.
>
> If you write this in your code:
>
> int foo;
>
> You've *declared* foo -- that is, described what it is -- but you haven't
> *defined* it.

The above is a tentative definition. It is a definition, but there
is still a chance to override it with a value other than zero
given by a firm (term mine) definition.

When the end of a translation unit is reached, all definitions which
are still tentative become definitions with value zero.

int foo; /* tentative def */
int bar; /* tentative def */

int foo = 3; /* no longer tentative */

/* end of translation unit */
int bar = 0; /* <- not written in the source: but effective behavior */

> for foo has been allocated yet, and it's known as a "common" symbol.

Common symbols originally come from the "Common" linkage model. C supports that
model, but that model allows latitudes that are not permitted of strictly
conforming programs.

See here: http://www.lysator.liu.se/c/rat/c1.html#3-1-2-2

And anyway, ironically, under the Common model, every external declaration is
also a definition (whether or not the keyword extern appears in the
declaration).

> So add "int bar = 1;" to your foo.h and then compile fooget into a binary.
> In Unix, you can give the command "nm fooget.o" to see all the symbols
> associated with fooget:
>
> 0000000000000000 D bar
> 0000000000000004 C foo
> 0000000000000000 T fooget

The "common" designation is here only being used to distinguish those
definitions which have intializers from those which do not, so that the ones
which do not can be put into a different section when the executable is linked
(the "BSS" section).

James Kuyper

unread,

Oct 4, 2012, 12:34:58 AM10/4/12

to

On 10/03/2012 03:59 PM, lipska the kat wrote:
> On 03/10/12 20:05, Eric Sosman wrote:
>> On 10/3/2012 2:13 PM, lipska the kat wrote:

...

> > As I hope you're beginning to learn, "It worked" does not
> > imply "It's right." The possible manifestations of undefined
> > behavior include "It did (or seemed to do) what I expected."
>
> Well yes, but if I run a program 10 times with the same data and get the
> same results each time I might start to think that something is 'right'.

That's a bad assumption. One of the most common ways in which code with
undefined behavior actually behaves is to produce exactly the same
result that you incorrectly assume that it's required to produce. That's
because your assumptions happen to match decisions made by the
implementors of the version of C that you're testing with. Other
implementors of C are free to make different decisions, ones that are
incompatible with your incorrect assumptions.

> If I design my test cases in the usual way (boundary cases and random
> 'middle ground' cases at the very least) then run those tests with the
> same data and get the same output each time then I get a feeling that I
> may be on the right path.
>
> Is testing C code fundamentally different to testing code in other
> languages ?

No, the inappropriateness of concluding that a program is correct, just
because it appears to work, is common to all computer languages.
--
James Kuyper

Kaz Kylheku

unread,

Oct 4, 2012, 1:14:10 AM10/4/12

to

On 2012-10-03, lipska the kat <lipska...@yahoo.co.uk> wrote:
> Well yes, but if I run a program 10 times with the same data and get the
> same results each time I might start to think that something is 'right'.

If the program is essentially deterministic (no real-time inputs, no threads)
then running the same test case (same data, same program, same platform)
ten times is quite silly. It is one test case.

(There may be some differences between the runs, like the OS randomizing the
stack locations or some such thing.)

But it is better to have ten different test cases in a suite and run through
those.

If you want to prove something with ten runs of one test case, perform the ten
runs on ten different platforms and show that the results are the same.

lipska the kat

unread,

Oct 4, 2012, 3:50:27 AM10/4/12

to

On 03/10/12 21:32, Eric Sosman wrote:
> On 10/3/2012 3:59 PM, lipska the kat wrote:
>> On 03/10/12 20:05, Eric Sosman wrote:
>>> On 10/3/2012 2:13 PM, lipska the kat wrote:
>>>> Hi
>>>>
>>>> I have the following program
>>>> distributed over 4 files

[snip]

> From what other posters have written, it appears I guessed
> incorrectly about the C/C++ distinction. You may in fact be
> compiling your code as C -- but for some reason you're "compiling"
> the header file itself. That's probably not what you want to do.

I'll do some experiments.

>> > As I hope you're beginning to learn, "It worked" does not
>> > imply "It's right." The possible manifestations of undefined
>> > behavior include "It did (or seemed to do) what I expected."
>>
>> Well yes, but if I run a program 10 times with the same data and get the
>> same results each time I might start to think that something is 'right'.
>
> If you run it ten times with the same data, you're probably hitting
> the same fragile set of coincidences each time. Running with different
> data could be more illuminating -- although, as Dijkstra said, testing
> cannot demonstrate absence of errors, but only their presence.

Of course, the point I was trying to make is that if my program is
behaving in an 'undefined' way then I might expect 10 runs with
identical data to provide different results. I'm in no way sufficiently
knowledgeable about C to assume otherwise. I suppose it depends on what
you mean by undefined.

If I have a program that reverses it's input a line at a time (ex 1-19 K
and R second edition for example) and I try it with as many different
inputs as my feeble brain can devise and the results are what I expect
then what can I assume from this. In other languages I have used (10s of
KLOC running daily without error) I would assume that the program was
'correct'.

[snip]

This has certainly given me much food for though.

Thanks for taking the time to reply.

lipska the kat

unread,

Oct 4, 2012, 4:02:28 AM10/4/12

to

On 04/10/12 05:34, James Kuyper wrote:
> On 10/03/2012 03:59 PM, lipska the kat wrote:
>> On 03/10/12 20:05, Eric Sosman wrote:
>>> On 10/3/2012 2:13 PM, lipska the kat wrote:
> ...
>> > As I hope you're beginning to learn, "It worked" does not
>> > imply "It's right." The possible manifestations of undefined
>> > behavior include "It did (or seemed to do) what I expected."
>>
>> Well yes, but if I run a program 10 times with the same data and get the
>> same results each time I might start to think that something is 'right'.
>
> That's a bad assumption. One of the most common ways in which code with
> undefined behavior actually behaves is to produce exactly the same
> result that you incorrectly assume that it's required to produce. That's
> because your assumptions happen to match decisions made by the
> implementors of the version of C that you're testing with. Other
> implementors of C are free to make different decisions, ones that are
> incompatible with your incorrect assumptions.

Er ... wow, OK, that is a bit of a head****
Do you mean to say that even if I test my program to destruction and as
far as I can tell it's 'correct', that is it complies with requirements
and behaves as expected it could still be incorrect when compiled with
a different compiler ???

Surely there is some 'base' implementation of C that is used to test
compilers or is it a free for all ... to me this implies that there can
be more than one 'correct' implementation of the C language, or several
or many Cs in fact. Please remember I am a raw beginner at C although I
find this whole discussion fascinating.

[snip]

Given a program written in C, how does one determine that it is
'correct' if complying with requirements and returning the same output
from the same input is not enough.

Thanks for taking the time to reply.

David Brown

unread,

Oct 4, 2012, 4:14:33 AM10/4/12

to

There is no "reference" implementation of C. But there are the
standards documents, which state the rules of C.

One issue you will see here is that there is code that is syntactically
correct, and will be compiled by the compiler, but which is "undefined
behaviour" according to the standards. This is not unique to C - it
applies to some extent to all languages, and may even be unavoidable (I
think Gödel's Incompleteness Theorem implies that). But C programmers,
and many of this group's inhabitants, tend to be more aware of these
issues than many other programmers.

When you write code that depends on such "undefined behaviour", it may
work as expected. It may do so consistently, and with different data
and different variations of the code. But you might find it fails when
using a different compiler, or different version of the compiler, or
different compiler flags (enabling optimisation often brings such flaws
to light).

The way to deal with this is to learn the rules of C - not just learn
what works under testing. And use the tools to the best of their
abilities, to aid your work. Make sure you compile with optimisation
("-Os" or "-O2" is typical), and with warnings enabled ("-Wall" and
"-Wextra" are good for most code, and there are others that can be worth
enabling).

lipska the kat

unread,

Oct 4, 2012, 4:24:12 AM10/4/12

to

On 03/10/12 23:36, Edward A. Falk wrote:
> In article<EK-dnVoFlczbHfHN...@bt.com>,
> lipska the kat<lipska...@yahoo.co.uk> wrote:
>> Hi
>>
>> I have the following program
>> distributed over 4 files

[snip]

>
> So add "int bar = 1;" to your foo.h and then compile fooget into a binary.
> In Unix, you can give the command "nm fooget.o" to see all the symbols
> associated with fooget:
>
> 0000000000000000 D bar
> 0000000000000004 C foo
> 0000000000000000 T fooget

Oh very nice

[lipska@sandbox externs (master)]$ nm main.o
0000000000000004 C foo

U fooget
U fooinc
U fooset
0000000000000000 T main
U printf

Thanks, I didn't know about nm, I'll try man nm then different versions
of foobar and see what the results are

Hours of fun.

[snip]

> Now, a couple of semi-off-topic points:
>
> You should do
>
> #include "foo.h"
>
> instead of
>
> #include<foo.h>
>
> The<> form is for system header files. My compiler (gcc on Linux) barfed
> on the includes until I fixed them. I don't know how you got it to compile
> on yours unless you used '-I." or something.
>
> As other posters have mentioned, you really should use "extern" in your
> declarations.

Thanks for this, I get the #include thing now.
I'm going to refactor the foo get/set/inc program and see what I can
discover

Thanks for taking the time to reply

Keith Thompson

unread,

Oct 4, 2012, 4:30:47 AM10/4/12

to

lipska the kat <lipska...@yahoo.co.uk> writes:
[...]

> Of course, the point I was trying to make is that if my program is
> behaving in an 'undefined' way then I might expect 10 runs with
> identical data to provide different results. I'm in no way sufficiently
> knowledgeable about C to assume otherwise. I suppose it depends on what
> you mean by undefined.

No, that's not what undefined means. The C standard's definition of
*undefined behavior* is:

behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard imposes
no requirements

NOTE Possible undefined behavior ranges from ignoring the
situation completely with unpredictable results, to behaving
during translation or program execution in a documented manner
characteristic of the environment (with or without the issuance
of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).

> If I have a program that reverses it's input a line at a time (ex 1-19 K
> and R second edition for example) and I try it with as many different
> inputs as my feeble brain can devise and the results are what I expect
> then what can I assume from this. In other languages I have used (10s of
> KLOC running daily without error) I would assume that the program was
> 'correct'.

C, as the saying goes, gives you enough rope to shoot yourself in the
foot. I'll show you a concrete example:

#include <stdio.h>

static void write_array(int *arr) {
for (int i = 0; i <= 5; i ++) {
arr[i] = i;
}
}

static void read_array(int *arr) {
for (int i = 0; i <= 5; i ++) {
printf("%d", arr[i]);
putchar(i == 5 ? '\n' : ' ');
}
}

int main(void) {
int x[5] = { 0 };
int y[5] = { 0 };
int z[5] = { 0 };

write_array(y);
read_array(y);

return 0;
}

The array y is defined to have 5 elements, but the program attempts to
store 6 int values in it, and then retrieve and print those 6 values.
Accessing y[5] has undefined behavior, since it's outside the bounds of
the array. But since y is surrounded in memory by two other arrays, x
and z, it's likely that y[5] refers to an element of one of those other
two arrays. (There's no guarantee that x, y, and z are allocated in any
particular order, or even that they're adjacent, but it's likely that
one of them immediately follows y in memory.)

I can compile and run this program 100 times, and it's very likely to
produce the same output every time:

0 1 2 3 4 5

That's just one of the infinitely many things that can happen when the
language standard "imposes no requirements".

(A sufficiently clever optimizing compiler might cause it to produce
different output, or to crash, or even to be rejected at compile time.)

Angel

unread,

Oct 4, 2012, 6:17:08 AM10/4/12

to

Just out of curiosity, I ran this little test through gcc. Without
optimization, or at optimization level 1, gcc only warns about the
unused variables x and z.

At optimization level 2, gcc warns about a subscript out of bounds
on line 5 (in the write_array function). At optimization level 3 it
also gives this warning about line 11 (in the read_array function).

The program does give the 0 1 2 3 4 5 output every time, though.

--
"C provides a programmer with more than enough rope to hang himself.
C++ provides a firing squad, blindfold and last cigarette."
- seen in comp.lang.c

lipska the kat

unread,

Oct 4, 2012, 6:23:54 AM10/4/12

to

On 03/10/12 19:13, lipska the kat wrote:

First of all let me say once again how much I appreciate all your
responses. A seed of doubt has been planted in my mind WRT the
'correctness' of any C code that I have written/will write at any time
in the future. I think this must be a 'good thing' although the
implication is that before I can be as certain as it's possible to be
that any program is 'correct' I need to have read, understood and
inwardly digested the entire language specification. A rather daunting
prospect but I will certainly make a start ... when I can figure out
what standard I should be reading (C89, C90, C99 or C11)

According to man gcc on my Ubuntu Linux 12.04 64 bit machine
89,90 are fully supported whereas support for 99 onwards is 'limited'
Compiling with the -ansi switch compiles to the 89/90 spec which
reinforces my opinion that 90 is the one for me.

There are also a whole bunch of gnu dialects
but my brain imploded at this point so I went to the pub :-)

Anyway, I think I've figured out the extern thing in terms of my very
simple example. I do have one small observation which I will make after
the re-factored code, you have been very generous with your time so far
so apologies for posting this listing again.

============= Code ===============

/* foo.h */
/* explicit _declaration_ of foo */
extern int foo;

void fooset(int f);
int fooget(void);
void fooinc(void);

/* extmain.c */
#include <stdio.h>
#include "foo.h"

/* global variable */
/* implicit definition of foo */
/* set get and inc can't see foo if this is not here */
int foo;

int main(int argc, char **argv){

/* explicit definition of foo */
foo = 0;

fooset(10);
printf("foo is %d\n", fooget());

fooinc();
printf("foo is now %d\n", fooget());

return 0;
}

/* fooset.c */
#include "foo.h"

void fooset(int f){

foo = f;
}

/* fooget.c */
#include "foo.h"

int fooget(void){

return foo;

}

/* fooinc.c */
#include "foo.h"

void fooinc(void){
++foo;
}

Observation:

The word 'extern' doesn't seem to be required

/* foo.h */
extern int foo;

OR

/* foo.h */
int foo;

Both work

At this point it seems to me that the only reason to use
the word extern is as an aid to program documentation.

I'll continue reading.

lipska the kat

unread,

Oct 4, 2012, 7:02:22 AM10/4/12

to

On 04/10/12 09:30, Keith Thompson wrote:
> lipska the kat<lipska...@yahoo.co.uk> writes:
> [...]

[snip]

> C, as the saying goes, gives you enough rope to shoot yourself in the
> foot. I'll show you a concrete example:

[snip]

gcc example.c
example.c: In function ï¿½write_arrayï¿½:
example.c:4:9: error: ï¿½forï¿½ loop initial declarations are only allowed
in C99 mode
example.c:4:9: note: use option -std=c99 or -std=gnu99 to compile your code
example.c: In function ï¿½read_arrayï¿½:
example.c:10:9: error: ï¿½forï¿½ loop initial declarations are only allowed
in C99 mode
make: *** [example] Error 1

gcc -ansi example.c
ditto above

gcc -std=c99 -Wall example.c
example.c: In function ï¿½mainï¿½:
example.c:19:13: warning: unused variable ï¿½zï¿½ [-Wunused-variable]
example.c:17:13: warning: unused variable ï¿½xï¿½ [-Wunused-variable]

gcc -std=c99 -O1 -Wall example.c
ditto above

gcc -std=c99 -O2 -Wall example.c
example.c: In function ï¿½mainï¿½:
example.c:19:13: warning: unused variable ï¿½zï¿½ [-Wunused-variable]
example.c:17:13: warning: unused variable ï¿½xï¿½ [-Wunused-variable]
example.c:5:20: warning: array subscript is above array bounds
[-Warray-bounds]

gcc -std=c99 -O3 -Wall example.c
example.c: In function ï¿½mainï¿½:
example.c:19:13: warning: unused variable ï¿½zï¿½ [-Wunused-variable]
example.c:17:13: warning: unused variable ï¿½xï¿½ [-Wunused-variable]
example.c:5:20: warning: array subscript is above array bounds
[-Warray-bounds]
example.c:11:19: warning: array subscript is above array bounds
[-Warray-bounds]

0 1 2 3 4 5 every time

Now I'm really confused

Maybe I should be reading the C99 spec %-}

lipska

--
Lipska the Katï¿½: Troll hunter, sandbox destroyer

James Kuyper

unread,

Oct 4, 2012, 7:13:55 AM10/4/12

to

On 10/04/2012 03:50 AM, lipska the kat wrote:
...

> Of course, the point I was trying to make is that if my program is
> behaving in an 'undefined' way then I might expect 10 runs with
> identical data to provide different results.

That's a very bad expectation, unless your 10 runs were done using
wildly different implementations of C, on 10 wildly different platforms.

> ... I'm in no way sufficiently

> knowledgeable about C to assume otherwise. I suppose it depends on what
> you mean by undefined.

"undefined behavior" has a very specific meaning in the C standard:

"behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no

requirements" (3.4.3p1). A key phrase needs to be noted: "this
International Standard". Behavior which is defined by something else
(such as the POSIX standard, or an ABI standard for a given platform, or
the documentation for a given compiler, or the fundamental laws of
physics) would still be undefined behavior as far as the C standard is
concerned. If there's anything other than the C standard which defines
the behavior (and there usually is), it will be perfectly repeatable for
as long as that other definition applies, and will fail to be repeatable
as soon as you use it in a situation where the other definition no
longer applies.

For example, if the "undefined behavior" is defined by the POSIX
standard, you can expect the results to be perfectly repeatable on every
POSIX-conforming system, but you'll have no guarantees about non-POSIX
systems. If it's defined by Intel for all CPUs in the same family, the
undefined behavior will be perfectly repeatable as long as you execute
only on that family of CPUs, but not necessarily if you port your code
to an AMD system.

Keep in mind that most of the code constructs that have repeatable
undefined behavior will be repeatable for much less portable reasons
than the examples I gave above. It may be "defined" (though not in any
publicly available document) by a particular version of a particular
compiler when used with a particular set of command line options, and
may be defined differently if you change any of those options, or
upgrade your compiler, or change your code in any way, such as
reordering the variable declarations.

> If I have a program that reverses it's input a line at a time (ex 1-19 K
> and R second edition for example) and I try it with as many different
> inputs as my feeble brain can devise and the results are what I expect
> then what can I assume from this.

That it handled those test cases correctly, and might mishandle any case
you didn't think of. It could even mishandle the cases you did test, if
it contains a time-dependent defect (such as mis-handling a leap year).
You can generalize your test results beyond those test cases only in
proportion to how much you know about what the guaranteed behavior of
your code is. I would judge that you know a fair amount about C, but
your question is about a fairly fundamental point, which implies that
there's still a lot of details you don't know yet.

> ... In other languages I have used (10s of

> KLOC running daily without error) I would assume that the program was
> 'correct'.

10s of KLOC isn't a lot, and your assumptions in those other languages
would be just as unjustified as they are in C. The details of what I've
said are specific to C, but the general principle is not.
--
James Kuyper

David Brown

unread,

Oct 4, 2012, 7:26:04 AM10/4/12

to

On 04/10/2012 12:23, lipska the kat wrote:
> On 03/10/12 19:13, lipska the kat wrote:
>
> First of all let me say once again how much I appreciate all your
> responses. A seed of doubt has been planted in my mind WRT the
> 'correctness' of any C code that I have written/will write at any time
> in the future. I think this must be a 'good thing' although the
> implication is that before I can be as certain as it's possible to be
> that any program is 'correct' I need to have read, understood and
> inwardly digested the entire language specification. A rather daunting
> prospect but I will certainly make a start ... when I can figure out
> what standard I should be reading (C89, C90, C99 or C11)
>
> According to man gcc on my Ubuntu Linux 12.04 64 bit machine
> 89,90 are fully supported whereas support for 99 onwards is 'limited'
> Compiling with the -ansi switch compiles to the 89/90 spec which
> reinforces my opinion that 90 is the one for me.
>

gcc's C99 support is been near-perfect for all practical purposes, and
has been for years. There are a few missing features - so technically
it is "limited" - but nothing you need to worry about. It's support for
the latest C11 is also coming on nicely. There is no point in starting
to learn a new language and limiting yourself to the standards from over
twenty years ago unless you really have some special requirements for
being ansi-C standard. So if gcc is your main development tool, the
standard you want is "-std=gnu99", or "-std=gnu11" if you are feeling
adventurous.

> There are also a whole bunch of gnu dialects
> but my brain imploded at this point so I went to the pub :-)

That sounds like a logical move to me!

>
> Anyway, I think I've figured out the extern thing in terms of my very
> simple example. I do have one small observation which I will make after
> the re-factored code, you have been very generous with your time so far
> so apologies for posting this listing again.
>

There are various ways to structure code and files, but I prefer to
pretend that C has real "modules" like other programming languages. For
each "module", you have two files - "foo.h" and "foo.c". "foo.h"
contains only declarations, plus documentation (very important) and
occasional definitions that have to go in headers (like inline functions
usable by other modules). "foo.c" contains all the implementation.

// foo.h
#ifndef foo_h__
#define foo_h__

// extern is required here to make a declaration but not a definition
extern int foo;

// extern is not required here, but is good documentation
extern void fooset(int f);
extern int fooget(void);
extern void fooinc(void);

#endif // foo_h__

// foo.c

#include "foo.h"

int foo;

void fooset(int f){
foo = f;
}

int fooget(void){
return foo;
}

void fooinc(void){
++foo;
}

Some observations with this system :

It is best to put all your "foo" implementations in "foo.c". There is
seldom any benefit in splitting it up into multiple C files. Either use
multiple "modules" (with their own header and C file) if the functions
are significantly different or the file is too big to work with, or put
them in a single C file. (The exception is for libraries, where many
small C files can have some advantages.)

I like to be strict that variables and functions are either private to a
module, or exported from the module. If they are private, they only
exist in the "foo.c" file, and they are always declared "static". If
they are exported, then there is an "extern" declaration in "foo.h" and
a matching definition without "extern" (and obviously without "static")
in the "foo.c" file. I even enforce this with gcc flags
"-Wmissing-declarations -Wmissing-prototypes -Wredundant-decls
-Wnested-externs".

Some people do things differently - it's partly a matter of taste. But
clearly /my/ way is the best way :-), so I recommend learning it from
the start.

James Kuyper

unread,

Oct 4, 2012, 8:30:57 AM10/4/12

to

On 10/04/2012 04:02 AM, lipska the kat wrote:
> On 04/10/12 05:34, James Kuyper wrote:

...

>> That's a bad assumption. One of the most common ways in which code with
>> undefined behavior actually behaves is to produce exactly the same
>> result that you incorrectly assume that it's required to produce. That's
>> because your assumptions happen to match decisions made by the
>> implementors of the version of C that you're testing with. Other
>> implementors of C are free to make different decisions, ones that are
>> incompatible with your incorrect assumptions.
>
> Er ... wow, OK, that is a bit of a head****
> Do you mean to say that even if I test my program to destruction and as
> far as I can tell it's 'correct', that is it complies with requirements
> and behaves as expected it could still be incorrect when compiled with
> a different compiler ???

Certainly. That's not just because of undefined behavior, either.
There's also behavior that is merely unspecified: the standard provides
(explicitly or, more commonly, implicitly) a list of possible behaviors,
and each implementation gets to choose from that list - in some cases,
it can even make a different choice each time a given piece of code is
executed. Some unspecified behavior is "implementation-defined" which
means that an implementation is required to document which choice it has
made, but there's also a lot of cases where there's no such requirement.

> Surely there is some 'base' implementation of C that is used to test

> compilers ..

No, there is not. Even if there were, the base implementation would have
to make specific choices in every case where the C standard leaves the
behavior unspecified or undefined, and other fully-conforming
implementations of C would not be required to make the same choices,
which greatly reduces the usefulness of having a base implementation.
That may be one reason why there isn't one.

> ... or is it a free for all ...

It's not a free-for-all - the standard does impose a great many specific
requirements. However, the things that it does not specify are what
gives implementors sufficient freedom to create a conforming
implementation of C on almost every platform. That is the reason why C
is one of the most widely implemented of all computer languages.

> ... to me this implies that there can
> be more than one 'correct' implementation of the C language,

Correct - the set of possible fully-conforming implementations of the C
language is infinite. The set of actual fully-conforming implementations
is much smaller, but still large enough that it's not feasible to test
any given program on all of them. It's also sufficiently varied that
testing on only a few dozen of them is insufficient to prove that your
code will work on all of the untested ones.

> ... or several

> or many Cs in fact. Please remember I am a raw beginner at C although I
> find this whole discussion fascinating.
>
> [snip]
>
> Given a program written in C, how does one determine that it is
> 'correct' if complying with requirements and returning the same output
> from the same input is not enough.

That depends upon the requirements. Well-written requirements should
identify a specific version of the C standard (C2011 just came out, so
there aren't many implementations of it, and full implementations of C99
are still rare - but C90 has been fully implemented just about
everywhere). Those requirements should specify that your code must have
no syntax errors or constraint violations according to that version.
Then you read the standard and learn what constitutes a syntax error or
a constraint violation.

Well-written requirements should also limit the dependence of the code
on unspecified or undefined behavior in some appropriate fashion. Useful
programs seldom completely avoid undefined behavior, and almost never
avoid unspecified behavior, but you can fill in those gaps by, for
instance, requiring POSIX conformance.
--
James Kuyper

James Kuyper

unread,

Oct 4, 2012, 8:43:22 AM10/4/12

to

On 10/04/2012 06:23 AM, lipska the kat wrote:
> On 03/10/12 19:13, lipska the kat wrote:
>
> First of all let me say once again how much I appreciate all your
> responses. A seed of doubt has been planted in my mind WRT the
> 'correctness' of any C code that I have written/will write at any time
> in the future. I think this must be a 'good thing' although the
> implication is that before I can be as certain as it's possible to be
> that any program is 'correct' I need to have read, understood and
> inwardly digested the entire language specification. A rather daunting
> prospect but I will certainly make a start ... when I can figure out
> what standard I should be reading (C89, C90, C99 or C11)
>
> According to man gcc on my Ubuntu Linux 12.04 64 bit machine
> 89,90 are fully supported whereas support for 99 onwards is 'limited'
> Compiling with the -ansi switch compiles to the 89/90 spec which
> reinforces my opinion that 90 is the one for me.

That's a safe choice; I prefer C99 myself, and gcc's support for C99 is
almost complete.

> There are also a whole bunch of gnu dialects
> but my brain imploded at this point so I went to the pub :-)

I recommend learning standard C first. Learn about and use the Gnu
dialects later, if you want to - but avoid using gnuisms unless you're
absolutely certain that your code will never be compiled with anything
other than gcc.

The fact that the second one works is due to a feature of gcc that a
conforming implementation of C is not required to have: gcc merges all
of the multiple external definitions of 'foo' into a single definition.
As others have already mentioned, this behavior can be turned off by
selecting the -fno-common option.

> At this point it seems to me that the only reason to use
> the word extern is as an aid to program documentation.

No, it makes the behavior of your code well-defined, which means you can
count on it working even if you use some other fully-conforming
implementation of C to compile it.
--
James Kuyper

James Kuyper

unread,

Oct 4, 2012, 8:43:22 AM10/4/12

to

On 10/04/2012 06:23 AM, lipska the kat wrote:

> On 03/10/12 19:13, lipska the kat wrote:
>
> First of all let me say once again how much I appreciate all your
> responses. A seed of doubt has been planted in my mind WRT the
> 'correctness' of any C code that I have written/will write at any time
> in the future. I think this must be a 'good thing' although the
> implication is that before I can be as certain as it's possible to be
> that any program is 'correct' I need to have read, understood and
> inwardly digested the entire language specification. A rather daunting
> prospect but I will certainly make a start ... when I can figure out
> what standard I should be reading (C89, C90, C99 or C11)
>
> According to man gcc on my Ubuntu Linux 12.04 64 bit machine
> 89,90 are fully supported whereas support for 99 onwards is 'limited'
> Compiling with the -ansi switch compiles to the 89/90 spec which
> reinforces my opinion that 90 is the one for me.

That's a safe choice; I prefer C99 myself, and gcc's support for C99 is
almost complete.

> There are also a whole bunch of gnu dialects
> but my brain imploded at this point so I went to the pub :-)

I recommend learning standard C first. Learn about and use the Gnu
dialects later, if you want to - but avoid using gnuisms unless you're
absolutely certain that your code will never be compiled with anything
other than gcc.

The fact that the second one works is due to a feature of gcc that a
conforming implementation of C is not required to have: gcc merges all
of the multiple external definitions of 'foo' into a single definition.
As others have already mentioned, this behavior can be turned off by
selecting the -fno-common option.

> At this point it seems to me that the only reason to use
> the word extern is as an aid to program documentation.

lipska the kat

unread,

Oct 4, 2012, 9:18:37 AM10/4/12

to

On 04/10/12 13:30, James Kuyper wrote:
> On 10/04/2012 04:02 AM, lipska the kat wrote:
>> On 04/10/12 05:34, James Kuyper wrote:

[snip]

> Well-written requirements should also limit the dependence of the code
> on unspecified or undefined behavior in some appropriate fashion. Useful
> programs seldom completely avoid undefined behavior, and almost never
> avoid unspecified behavior, but you can fill in those gaps by, for
> instance, requiring POSIX conformance.

OK, another really interesting reply, thanks.

I have been twiddling with C on and off for years but I've never had the
time to really get into the details, I've certainly never written
anything really meaningful in C, now I have the time to learn the detail
and hopefully write some useful code, I'm particularly interested in
using computers to control external devices. I have actually written
code to control such devices in Java in the past, the devices in
question are in widespread use every day so I have some idea what is
involved.

I think the best thing at the moment is to pick a version (C99), pick a
compiler (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)) and start to
get my head around the relevant spec (when I can find it).

The responses to my OP have been very helpful and have got me thinking
about all sorts of things that hadn't really entered my head up until now.

Amazing

Thanks again

Joe Pfeiffer

unread,

Oct 4, 2012, 10:35:14 AM10/4/12

to

lipska the kat <lipska...@yahoo.co.uk> writes:

> On 04/10/12 05:34, James Kuyper wrote:
>> On 10/03/2012 03:59 PM, lipska the kat wrote:
>>> On 03/10/12 20:05, Eric Sosman wrote:
>>>> On 10/3/2012 2:13 PM, lipska the kat wrote:
>> ...
>>> > As I hope you're beginning to learn, "It worked" does not
>>> > imply "It's right." The possible manifestations of undefined
>>> > behavior include "It did (or seemed to do) what I expected."
>>>
>>> Well yes, but if I run a program 10 times with the same data and get the
>>> same results each time I might start to think that something is 'right'.
>>
>> That's a bad assumption. One of the most common ways in which code with
>> undefined behavior actually behaves is to produce exactly the same
>> result that you incorrectly assume that it's required to produce. That's
>> because your assumptions happen to match decisions made by the
>> implementors of the version of C that you're testing with. Other
>> implementors of C are free to make different decisions, ones that are
>> incompatible with your incorrect assumptions.
>
> Er ... wow, OK, that is a bit of a head****
> Do you mean to say that even if I test my program to destruction and
> as far as I can tell it's 'correct', that is it complies with
> requirements and behaves as expected it could still be incorrect when
> compiled with
> a different compiler ???

A story I trot out from time to time to illustrate this: for years, I
thought a NULL pointer could be used to represent a string of length 0.
I don't know where I actually got the idea, but it worked quite well --
on the VAXes I was using at the time, where address 0 was readable and
always happened to contain 0. So no amount of testing my code would
have demonstrated that this was a completely incorrect assumption.

Until the day I moved to Sun computers, on which that address was not
user-readable. It took a long, long time to hit all the places I had
wrong code based on that assumption.

And note something here -- it wasn't even a difference in compilers that
turned up my faulty assumption and multiple bugs. It was a difference
in hardware and operating system.

> Surely there is some 'base' implementation of C that is used to test
> compilers or is it a free for all ... to me this implies that there
> can be more than one 'correct' implementation of the C language, or
> several or many Cs in fact. Please remember I am a raw beginner at C
> although I find this whole discussion fascinating.
>
> [snip]
>
> Given a program written in C, how does one determine that it is
> correct' if complying with requirements and returning the same output
> from the same input is not enough.

In practice, that compbined with code reviews is as close as you're
going to get.

lipska the kat

unread,

Oct 4, 2012, 10:52:25 AM10/4/12

to

On 04/10/12 12:26, David Brown wrote:
> On 04/10/2012 12:23, lipska the kat wrote:
>> On 03/10/12 19:13, lipska the kat wrote:
>>

[snip]

> "foo.c" contains all the implementation.
>
> // foo.h
> #ifndef foo_h__
> #define foo_h__

I don't understand what this means or why you are doing it

Can you explain?

thanks

Eric Sosman

unread,

Oct 4, 2012, 11:40:32 AM10/4/12

to

On 10/4/2012 10:52 AM, lipska the kat wrote:
> On 04/10/12 12:26, David Brown wrote:
>> On 04/10/2012 12:23, lipska the kat wrote:
>>> On 03/10/12 19:13, lipska the kat wrote:
>>>
>
> [snip]
>
>> "foo.c" contains all the implementation.
>>
>> // foo.h
>> #ifndef foo_h__
>> #define foo_h__
>
> I don't understand what this means or why you are doing it
>
> Can you explain?

See Question 10.7 on the comp.lang.c Frequently Asked
Questions (FAQ) page at <http://www.c-faq.com/>.

(The rest of the FAQ would be a good read, too.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

lipska the kat

unread,

Oct 4, 2012, 12:00:13 PM10/4/12

to

Thank you, I'm working my way through the FAQ, I hadn't got to that bit yet.

Edward A. Falk

unread,

Oct 4, 2012, 3:01:03 PM10/4/12

to

In article <lnfw5vp...@nuthaus.mib.org>,

Keith Thompson <ks...@mib.org> wrote:
>
>Instead, add
>
> extern int foo;
>
>to "foo.h", and
>
> int foo = 1;
>
>to "foo.c". Then any translation unit that includes "foo.h" can use the
>object "foo", which is defined in exactly one place in your program.

Yes, exactly. This is how I write all my code, and is IMHO
best practices.

Also: I didn't know that tentative definitions were required to
be initialized with zero. That's good to know; I'd always assumed
it was implementation-specific.

Ben Bacarisse

unread,

Oct 4, 2012, 5:31:52 PM10/4/12

to

lipska the kat <lipska...@yahoo.co.uk> writes:

<snip>
> Given a program written in C, how does one determine that it is
> correct' if complying with requirements and returning the same output
> from the same input is not enough.

There are a few tools that can help. For example, there's valgrind (and
other similar things) that can check all of your memory accesses as you
run your tests. But there are many other things that can be wrong but
which appear to work. One general tool is to get into the habit of
reasoning about your programs.

Testing is very helpful of course, but I'd venture to say that the
balance between treating programming as a formal mathematical activity
and treating it like engineering has tended, in recent years, to down
play the mathematical side to the detriment of the field.

--
Ben.

Keith Thompson

unread,

Oct 4, 2012, 7:08:02 PM10/4/12

to

lipska the kat <lipska...@yahoo.co.uk> writes:

[...]

> According to man gcc on my Ubuntu Linux 12.04 64 bit machine
> 89,90 are fully supported whereas support for 99 onwards is 'limited'
> Compiling with the -ansi switch compiles to the 89/90 spec which
> reinforces my opinion that 90 is the one for me.

C89 and C90 are exactly the same language. The history is that
ANSI (the US standards body) issued the first C standard in 1989.
In 1990, ISO (the international standards body) issued its version of
the C standard; it added some introductory material and renumbered
the sections, but made no other changes. ANSI then officially
adopted the ISO standard. This is the version of the language
that's commonly referred to as "ANSI C", though ANSI itself no
longer recognizes it. It's also the version described in the second
(and final) edition of K&R (The C Programming Language, by Kernighan
& Ritchie).

There was an amendment in 1995.

ISO issued a new standard in 1999, and another in 2011; these
are referred to as C99 and C11, respectively. Both of these were
originally issued by ISO, not ANSI, and ANSI officially adopted
both as-is after ISO issued them.

gcc has excellent support for C90, very good support for C99
(see <http://gcc.gnu.org/c99status.html>), and is working on C11
support. The library (most of it) is a separate project from gcc
(the compiler), and implementations generally use different libraries
on different systems. Linux systems use the GNU libc (glibc).

At this point, the only real reason to avoid C99-specific features
is if you need your code to compile with Microsoft's C compiler;
Microsoft has stated that they don't intend to support C past the
C90 standard.

> There are also a whole bunch of gnu dialects
> but my brain imploded at this point so I went to the pub :-)

Yes, gcc implements a number of its own extensions to the
language (as the standard permits). Use "-std=c99 -pedantic", or
"-std=c11 -pedantic" if you're brave, if you want warnings about
non-portable code. Use "-ansi -pedantic" for maximum portability --
but then you're cutting yourself off from newer language features.
Use "-std=gnuxx", where "xx" is a year number, if you want to use
gcc-specific extensions -- but then you're cutting yourself off from
portability to other compilers (except for some that implement the
GNU extensions).

Keith Thompson

unread,

Oct 4, 2012, 7:20:56 PM10/4/12

to

lipska the kat <lipska...@yahoo.co.uk> writes:
> On 04/10/12 09:30, Keith Thompson wrote:
>> lipska the kat<lipska...@yahoo.co.uk> writes:
>> [...]
>
> [snip]
>
>> C, as the saying goes, gives you enough rope to shoot yourself in the
>> foot. I'll show you a concrete example:
>
> [snip]
>
> gcc example.c

> example.c: In function �write_array�:
> example.c:4:9: error: �for� loop initial declarations are only allowed

> in C99 mode
> example.c:4:9: note: use option -std=c99 or -std=gnu99 to compile your code

> example.c: In function �read_array�:
> example.c:10:9: error: �for� loop initial declarations are only allowed

> in C99 mode
> make: *** [example] Error 1
>
> gcc -ansi example.c
> ditto above

Right, I used a C99-specific feature, and gcc with no arguments, or with
"-ansi", doesn't implement C99. You can avoid that by using "-std=c99",
or by changing

for (int i = 0; i <= 5; i ++) {

/* ... */
}

to:

int i;

for (int i = 0; i <= 5; i ++) {

/* ... */
}

Note that the "int i;" declaration has to be at the top of the block,
before any statements (a C90 restriction that C99 removed).

> gcc -std=c99 -Wall example.c
> example.c: In function �main�:
> example.c:19:13: warning: unused variable �z� [-Wunused-variable]
> example.c:17:13: warning: unused variable �x� [-Wunused-variable]

>
> gcc -std=c99 -O1 -Wall example.c
> ditto above
>
> gcc -std=c99 -O2 -Wall example.c

> example.c: In function �main�:
> example.c:19:13: warning: unused variable �z� [-Wunused-variable]
> example.c:17:13: warning: unused variable �x� [-Wunused-variable]

> example.c:5:20: warning: array subscript is above array bounds
> [-Warray-bounds]
>
> gcc -std=c99 -O3 -Wall example.c

> example.c: In function �main�:
> example.c:19:13: warning: unused variable �z� [-Wunused-variable]
> example.c:17:13: warning: unused variable �x� [-Wunused-variable]

> example.c:5:20: warning: array subscript is above array bounds
> [-Warray-bounds]
> example.c:11:19: warning: array subscript is above array bounds
> [-Warray-bounds]

Yes, all those warnings are valid. An "unused variable" warning
doesn't mean that your program is wrong, it just means that you've
probably made a logical error.

The "array subscript is above array bounds" is more serious. As I
said, I deliberately wrote a program whose behavior is undefined;
this absolutely was *not* an example of what you should do.

The program attempts to store values outside the bounds of an array.
I added extra array declarations to try to give those accesses
somewhere to go.

> 0 1 2 3 4 5 every time
>
> Now I'm really confused

The program's behavior is undefined. Printing 0 1 2 3 4 5 every
time is therefore perfectly valid, since nothing in the standard
says it *shouldn't* behave that way.

If you go beyond what the standard actually says, there are reasons
why it behaves the way it does. The arrays x, y, and z probably
happen to be stored next to each other in memory. Writing past
the end of y probably clobbers the beginning of either x or z.
Since x, y, and z are all in memory that you "own", the program is
able to do that with no apparent problem.

A more stringent compiler might have caused the program to crash
when it tried to store past the end of y. Most compilers don't
do that because it requires explicit checking, which is expensive
(it would catch incorrect programs, but slow down correct programs).
A cynic might say that C compilers are designed to let you get your
wrong answers as quickly as possible.

> Maybe I should be reading the C99 spec %-}

It's not easy reading, and there's not really anything in it that
explains the way this program behaves. As far as the standard is
concerned, running that program could make demons fly out of your
nose (obviously that won't really happen, but nothing in the C
standard forbids it).

It's entirely possible that my example was more complex than it
should have been. If you don't understand it, don't worry about it
too much for now. Perhaps you should concentrate more on writing
correct code than on understanding incorrect code.

horme...@gmail.com

unread,

Oct 4, 2012, 8:15:00 PM10/4/12

to

On Thursday, October 4, 2012 7:52:27 AM UTC-7, lipska the kat wrote:
> On 04/10/12 12:26, David Brown wrote:
>> On 04/10/2012 12:23, lipska the kat wrote:
>>> On 03/10/12 19:13, lipska the kat wrote:

> > [snip] > "foo.c" contains all the implementation.
> > // foo.h
> > #ifndef foo_h__
> > #define foo_h__

> I don't understand what this means or why you are doing it Can you explain?

Why, he's making foo.h idempotent, anybody can see that...

And by idempotent, people who use big words are just trying to
say that he is only including foo.h once, because he only includes
it if the macro foo_h__ is not defined...but if it isn't he immediately
defines it, which is what makes it idempotent. Next time the compiler
tries to include foo.h for any reason (usually because it is part of
another .c file's header files in one way or another), the compiler
(or macro preprocessor) sees the #ifndef for the macro (meaning only
execute code if the macro is NOT defined), and skips everything until
the #endif at the bottom of the file...

This is standard operating procedure for writing units or modules
or whatever in C, you should always do it to avoid problems later
on.

And now you can use the word idempotent to confuse other people...

---
William Ernest Reid

Ike Naar

unread,

Oct 5, 2012, 2:51:26 AM10/5/12

to

On 2012-10-05, horme...@gmail.com <horme...@gmail.com> wrote:
> On Thursday, October 4, 2012 7:52:27 AM UTC-7, lipska the kat wrote:
>> On 04/10/12 12:26, David Brown wrote:
>>> On 04/10/2012 12:23, lipska the kat wrote:
>>>> On 03/10/12 19:13, lipska the kat wrote:
>
>> > // foo.h
>> > #ifndef foo_h__
>> > #define foo_h__
>
>> I don't understand what this means or why you are doing it Can you explain?
>
> Why, he's making foo.h idempotent, anybody can see that...

Picking a nit:
Some header files don't have to be made idempotent, because they
already are. The foo.h we are talking about is such a file:

/* begin foo.h */
extern int foo;
void fooset(int);
int fooget(void);
void fooinc(void);
/* end foo.h */

Including foo.h more than once has the same effect as including it
once, so the #ifndef-#define trick is not necessary to make
the file idempotent.
(it doesn't do any harm either, and it's certainly good practice,
but in this case it just doesn't make the file more idempotent than
it already was).

David Brown

unread,

Oct 5, 2012, 3:35:31 AM10/5/12

to

Picking even deeper at the nit...

If you include the extern declarations more than once, the extra
declarations are, by definition, redundant. While they are perfectly
legal in C (as long as everything is consistent), they will trigger
gcc's "-Wredundant-decls" warning if it is enabled. And I like to
enable it, precisely as part of warnings to enforce a good clean
modularity in my declarations and definitions (see my earlier post in
this branch).

lipska the kat

unread,

Oct 5, 2012, 3:44:28 AM10/5/12

to

On 05/10/12 00:20, Keith Thompson wrote:
> lipska the kat<lipska...@yahoo.co.uk> writes:
>> On 04/10/12 09:30, Keith Thompson wrote:
>>> lipska the kat<lipska...@yahoo.co.uk> writes:

[snip]

>
>> Maybe I should be reading the C99 spec %-}
>
> It's not easy reading, and there's not really anything in it that
> explains the way this program behaves. As far as the standard is
> concerned, running that program could make demons fly out of your
> nose (obviously that won't really happen, but nothing in the C
> standard forbids it).

Ha ha, very funny, I now have this image in my head and I can't get rid
of it.

>
> It's entirely possible that my example was more complex than it
> should have been. If you don't understand it, don't worry about it
> too much for now. Perhaps you should concentrate more on writing
> correct code than on understanding incorrect code.

I understand it perfectly well, I just think if someone makes the effort
to reply to my question I should make the effort to respond.
One thing I learned from this and other posts is that C99 is probably a
better choice for me that earlier standards. I added the option you
suggested to my gcc commands along with -Wall ran make on all my current
code and waited for the explosion (of demons perhaps:-) ... but nothing
really of note appeared, There was one warning but that was about it.
Most gratifying.

Thanks for taking the time to reply

lipska

--
Lipska the Kat�: Troll hunter, sandbox destroyer

lipska the kat

unread,

Oct 5, 2012, 3:58:25 AM10/5/12

to

On 04/10/12 22:31, Ben Bacarisse wrote:
> lipska the kat<lipska...@yahoo.co.uk> writes:

[snip]

> Testing is very helpful of course, but I'd venture to say that the
> balance between treating programming as a formal mathematical activity
> and treating it like engineering has tended, in recent years, to down
> play the mathematical side to the detriment of the field.

Well this is an interesting point. When I was at University there was
much discussion going on in the CompSci department about making Software
Engineering (in particular) an explicitly 'engineering' type discipline
leading to a professional qualification similar to those achieved by
architects, doctors, civil engineers etc. The argument against this was
that software engineering (in general) was

'A too abstract and poorly understood discipline to be encumbered by
strict rules and regulations' (not my words).

Having experienced some truly 'alternative' ways of coding in my time as
a software developer I think I probably agree that a little more
discipline wouldn't go amiss when training future engineers

Thanks for taking the time to reply

lipska the kat

unread,

Oct 5, 2012, 4:11:06 AM10/5/12

to

On 05/10/12 01:15, horme...@gmail.com wrote:
> On Thursday, October 4, 2012 7:52:27 AM UTC-7, lipska the kat wrote:
>> On 04/10/12 12:26, David Brown wrote:
>>> On 04/10/2012 12:23, lipska the kat wrote:
>>>> On 03/10/12 19:13, lipska the kat wrote:
>
>>> [snip]> "foo.c" contains all the implementation.
>>> // foo.h
>>> #ifndef foo_h__
>>> #define foo_h__
>
>> I don't understand what this means or why you are doing it Can you explain?
>
> Why, he's making foo.h idempotent, anybody can see that...

Ah, well, you are probably much smarter than I am :-)

> And by idempotent, people who use big words are just trying to
> say that he is only including foo.h once,

I thought this might have something to do with it but I was wondering if
the form of thing had any meaning, specifically the way the underscores
are used. I have been reading some header files from the glibc libraries
and there are underscores all over the place ??? apparently they mean
something.

That was all I was asking really. The question was probably poorly
constructed.

[snip]]

> And now you can use the word idempotent to confuse other people...
>

idempotent, idempotent, idempotent must remember idempotent.

lipska the kat

unread,

Oct 5, 2012, 4:27:06 AM10/5/12

to

On 05/10/12 00:08, Keith Thompson wrote:
> lipska the kat<lipska...@yahoo.co.uk> writes:

[snip]

> At this point, the only real reason to avoid C99-specific features
> is if you need your code to compile with Microsoft's C compiler;
> Microsoft has stated that they don't intend to support C past the
> C90 standard.

It seems to me that the C specification plays a far more important role
in the life of a C developer than the specifications of other languages
play in the lives of developers in those languages. I'm afraid I have to
admit that in the past I have been guilty of using a
(particular)language without understanding it's specification, not sure
I can be so careless with C.

>> There are also a whole bunch of gnu dialects
>> but my brain imploded at this point so I went to the pub :-)
>
> Yes, gcc implements a number of its own extensions to the
> language (as the standard permits). Use "-std=c99 -pedantic", or
> "-std=c11 -pedantic" if you're brave, if you want warnings about
> non-portable code. Use "-ansi -pedantic" for maximum portability --
> but then you're cutting yourself off from newer language features.
> Use "-std=gnuxx", where "xx" is a year number, if you want to use
> gcc-specific extensions -- but then you're cutting yourself off from
> portability to other compilers (except for some that implement the
> GNU extensions).

Well, quite %-}

Thanks for taking the time to reply

Very informative, think I'll stick to C99 for the time being.

Keith Thompson

unread,

Oct 5, 2012, 5:16:03 AM10/5/12

to

lipska the kat <lipska...@yahoo.co.uk> writes:
> On 05/10/12 00:08, Keith Thompson wrote:
>> lipska the kat<lipska...@yahoo.co.uk> writes:
>
> [snip]
>
>> At this point, the only real reason to avoid C99-specific features
>> is if you need your code to compile with Microsoft's C compiler;
>> Microsoft has stated that they don't intend to support C past the
>> C90 standard.
>
> It seems to me that the C specification plays a far more important role
> in the life of a C developer than the specifications of other languages
> play in the lives of developers in those languages. I'm afraid I have to
> admit that in the past I have been guilty of using a
> (particular)language without understanding it's specification, not sure
> I can be so careless with C.

I'm not at all sure that it does so for *most* C developers.
(It's been observed that the Ada programming community is unusual in
that most developers have, and actually use, a copy of the language
reference manual.)

One of C's strongest features (and greatest potential pitfalls)
is that it permits you write blatantly *non-portable* code when
you need to. That often means having to be familiar with whatever
extensions your compiler provides. It's often necessary if you
want to get "close to the metal" (or silicon, or whatever).

Not all code needs to be, or even can be, portable.
My own preference (and recommendation) is to write portable
standard-conforming code unless there's a good reason not to,
and if you need to write non-portable code, to be *aware* of it.

[...]

Angel

unread,

Oct 5, 2012, 7:38:22 AM10/5/12

to

On 2012-10-05, lipska the kat <lipska...@yahoo.co.uk> wrote:
> On 05/10/12 00:08, Keith Thompson wrote:
>> lipska the kat<lipska...@yahoo.co.uk> writes:
>
> [snip]
>
>> At this point, the only real reason to avoid C99-specific features
>> is if you need your code to compile with Microsoft's C compiler;
>> Microsoft has stated that they don't intend to support C past the
>> C90 standard.
>
> It seems to me that the C specification plays a far more important role
> in the life of a C developer than the specifications of other languages
> play in the lives of developers in those languages. I'm afraid I have to
> admit that in the past I have been guilty of using a
> (particular)language without understanding it's specification, not sure
> I can be so careless with C.

You're far from alone there. You wouldn't believe the amount of C code
that works fine on x86, but breaks on sparc, powerpc or even x86_64, all
because the programmer is making assumptions that may be true for x86
but are not guaranteed by any standard.

The size of an int is a common one, or the order of bytes. If you write
your code assuming that an int is 4 bytes stored little-endian, your code
will probably work on x86 but fail on the other architectures I mentioned.
(sparc and powerpc are big-endian, and the size of an int on x86_64 is
likely to be 8 bytes)

Writing a non-trivial program portably is no small task, but saves a lot
of headaches when you want to port it to a different architecture.
Understanding what the C standard (or other standards like POSIX)
guarantee and what they don't is a key part of writing portable code.

James Kuyper

unread,

Oct 5, 2012, 8:04:20 AM10/5/12

to

On 10/05/2012 04:11 AM, lipska the kat wrote:
> On 05/10/12 01:15, horme...@gmail.com wrote:
>> On Thursday, October 4, 2012 7:52:27 AM UTC-7, lipska the kat wrote:
>>> On 04/10/12 12:26, David Brown wrote:
>>>> On 04/10/2012 12:23, lipska the kat wrote:
>>>>> On 03/10/12 19:13, lipska the kat wrote:
>>
>>>> [snip]> "foo.c" contains all the implementation.
>>>> // foo.h
>>>> #ifndef foo_h__
>>>> #define foo_h__
>>
>>> I don't understand what this means or why you are doing it Can you explain?
>>
>> Why, he's making foo.h idempotent, anybody can see that...
>
> Ah, well, you are probably much smarter than I am :-)
>
>> And by idempotent, people who use big words are just trying to
>> say that he is only including foo.h once,

Not precisely. What it means is that including foo.h more than once has
the same net effect as including it exactly once.

> I thought this might have something to do with it but I was wondering if
> the form of thing had any meaning, specifically the way the underscores
> are used. I have been reading some header files from the glibc libraries
> and there are underscores all over the place ??? apparently they mean
> something.

Trailing underscores have no special meaning. However, leading
underscores do have a special meaning - in general, identifiers starting
with an underscore are reserved. Section 7.1.3 explains reserved
identifiers:

> 1 Each header declares or defines all identifiers listed in its associated subclause, and
> optionally declares or defines identifiers listed in its associated future library directions
> subclause and identifiers which are always reserved either for any use or for use as file
> scope identifiers.
> — All identifiers that begin with an underscore and either an uppercase letter or another
> underscore are always reserved for any use.
> — All identifiers that begin with an underscore are always reserved for use as identifiers
> with file scope in both the ordinary and tag name spaces.
> — Each macro name in any of the following subclauses (including the future library
> directions) is reserved for use as specified if any of its associated headers is included;
> unless explicitly stated otherwise (see 7.1.4).
> — All identifiers with external linkage in any of the following subclauses (including the
> future library directions) and errno are always reserved for use as identifiers with
> external linkage.184)
> — Each identifier with file scope listed in any of the following subclauses (including the
> future library directions) is reserved for use as a macro name and as an identifier with
> file scope in the same name space if any of its associated headers is included.
> 2 No other identifiers are reserved. If the program declares or defines an identifier in a
> context in which it is reserved (other than as allowed by

The reason for the existence of reserved identifiers is to prevent
collisions between identifiers used by the implementation of C, and
those used by your own code. The identifiers that would give your own
code undefined behavior if you used them, are precisely the only
identifiers the standard library can use in contexts where they could
otherwise conflict with your code. That applies to any identifiers with
file scope, or macros, that are defined in the standard headers, and
also to any object or function identifiers with external linkage that
are part of the standard library. Using reserved identifiers also
protects the standard headers from being affected by any #defines in
your own code that occur before #including the standard header.

glibc is part of the C implementation, and that's why it makes lots of
use of identifiers starting with underscores.

Note: if C++ compatibility is of importance to you, then you should also
avoid double underscores in the middle of an identifier.
--
James Kuyper

Nobody

unread,

Oct 5, 2012, 11:13:02 AM10/5/12

to

On Fri, 05 Oct 2012 11:38:22 +0000, Angel wrote:

> The size of an int is a common one, or the order of bytes. If you write
> your code assuming that an int is 4 bytes stored little-endian, your code
> will probably work on x86 but fail on the other architectures I mentioned.
> (sparc and powerpc are big-endian, and the size of an int on x86_64 is
> likely to be 8 bytes)

No it isn't. The common x86_64 platforms (Windows, Linux, OSX) all have a
32-bit int. Windows 64-bit also has a 32-bit long (Linux and OSX have a
64-bit long).

64-bit int is uncommon, partly because it means that you have two
standard 64-bit types (int and long) but either no standard 32-bit type
(if short is 16-bit) or no standard 16-bit type (if short is 32-bit).

And even if you have an int32_t type, types smaller than int get
promoted to int at the drop of a hat.

So a platform with a 64-bit int type ends up not being able to run code
which works fine on just about every other 32-bit and 64-bit platform,
effectively being a great "unportability detector" but annoying to
actually port code to.

Angel

unread,

Oct 5, 2012, 12:26:28 PM10/5/12

to

On 2012-10-05, Nobody <nob...@nowhere.com> wrote:
> On Fri, 05 Oct 2012 11:38:22 +0000, Angel wrote:
>
>> The size of an int is a common one, or the order of bytes. If you write
>> your code assuming that an int is 4 bytes stored little-endian, your code
>> will probably work on x86 but fail on the other architectures I mentioned.
>> (sparc and powerpc are big-endian, and the size of an int on x86_64 is
>> likely to be 8 bytes)
>
> No it isn't. The common x86_64 platforms (Windows, Linux, OSX) all have a
> 32-bit int. Windows 64-bit also has a 32-bit long (Linux and OSX have a
> 64-bit long).

Point taken, I should have written "might be" instead of "is likely to
be", but that doesn't invalidate the point that it's wrong to assume
that sizeof (int) == 4.

Replace int with size_t, then. On every 64-bit OS I have tested, sizeof
(size_t) == 8 (and indeed sizeof (int) == 4), but I have seen many a
program where the programmer assumed that int will do in places where
size_t should be used... :-(

[...]

Nick Keighley

unread,

Oct 6, 2012, 5:30:34 AM10/6/12

to

<snip>

As someone remarked this business with "undefined behaviour" is true
of pretty much all programming languages (I'm not convinced Godel has
anything to contribute to this). To some extent C stresses it more,
this is partly because C runs nearly everywhere and has huge numbers
of implementations.

Langauages like Perl and Python have less trouble with this as there
are actually very few implementations. Java side steps it by running
on a virtual machine. In a sense java is utterly non-portable as it
only runs on one platform (the JVM)! Java also nails down many things
that C doesn't such as order of expression of evaluation and size of
fundamental types. Some languages such as Ada had extensive test
suites to validate compilers; but such things are very expensive to
maintain.

Les Cargill

unread,

Oct 6, 2012, 9:01:36 AM10/6/12

to

Perl and Python, being interpreted, also have a "virtual machine"
each.

> In a sense java is utterly non-portable as it
> only runs on one platform (the JVM)! Java also nails down many things
> that C doesn't such as order of expression of evaluation and size of
> fundamental types. Some languages such as Ada had extensive test
> suites to validate compilers; but such things are very expensive to
> maintain.
>

--
Les Cargill

Gordon Burditt

unread,

Oct 6, 2012, 9:19:45 PM10/6/12

to

It is very easy to write a program in C that deliberately crashes
(here this means: calls abort()) under conditions which you
might not test, for example:
- Crashes only on Sunday.
- Crashes only when the time crosses midnight.
- Crashes only when the year changes.
- Crashes only during a Daylight Savings Time transition.
- Crashes only during a leap second (23:59:60 UTC)
- Crashes only on Feb. 29.
- Crashes only when the year is divisible by 400 (leap year).
- Crashes only when the year is divisible by 100 but not 400
(not a leap year)
- Crashes only when the year is divisible by 4000 (there is
no official 4,000-year rule for leap years, but some
people implement it or expect it to be implemented anyway)
- Crashes only when the time is manually set backwards.
- Crashes only when srand(time((time_t *) 0)) causes the
next following call to rand() to return 0.
- Crashes only when calling asctime() and the year is greater
than 9,999 (Y10K bug in the *definition* of asctime()).
- Crashes when time exceeds Jan 18 03:14:07 2038 UTC.
(UNIX time overflow)

And these situations you *SHOULD* test:
- Crashes only on an input line of more than 10,000 characters.
- Crashes only when a command-line argument is missing (this
one is easy: don't test argc before accessing argv[n]).
- Crashes only when a file open fails (this one is easy: don't
check if fopen() fails and try to use the file).

It is probably easier to accidentally write such a program that
causes some type of time-dependent undefined behavior, especially
if it calls time() and localtime() and does some manipulation of
parts of the date or time. Sometimes this is caused by subscripts
going out of range when code doesn't take into account all of the
possibilities. If a program runs for only a few seconds, catching
time-dependent behavior with testing can be difficult.

Richard Damon

unread,

Oct 6, 2012, 9:30:50 PM10/6/12

to

On 10/6/12 5:30 AM, Nick Keighley wrote:

> As someone remarked this business with "undefined behaviour" is true
> of pretty much all programming languages (I'm not convinced Godel has
> anything to contribute to this). To some extent C stresses it more,
> this is partly because C runs nearly everywhere and has huge numbers
> of implementations.
>
> Langauages like Perl and Python have less trouble with this as there
> are actually very few implementations. Java side steps it by running
> on a virtual machine. In a sense java is utterly non-portable as it
> only runs on one platform (the JVM)! Java also nails down many things
> that C doesn't such as order of expression of evaluation and size of
> fundamental types. Some languages such as Ada had extensive test
> suites to validate compilers; but such things are very expensive to
> maintain.
>

Undefined behavior is allowed in C to provide for (significantly)
improved efficiency in some operations. For example, accessing an array
beyond its bounds. If we removed pointers into arrays (and passing
arrays with unspecified bounds), then the compiler could easily add code
to check the subscripts to the array and trap on error conditions. If we
want to support pointers into arrays, then these pointers could also be
made "fatter" to include the bounds of the object they point to (and for
multidimensional arrays, the bounds for each of the larger arrays the
array is part of). This add significant overhead to the pointer and the
operations. Since the design goal of C was to favor creating efficient
code, to make it a reasonable replacement for assembly code, the
tradeoff tend to be made in the favor of efficiency, over catching "bad"
code. Many other languages have chosen to limit the realm of undefined
behavior, by defining what is supposed to happen, forcing the compiler
to possible generate less efficient (but more predictable) code.

Ian Collins

unread,

Oct 6, 2012, 10:19:27 PM10/6/12

to

On 10/07/12 14:19, Gordon Burditt wrote:
> It is very easy to write a program in C that deliberately crashes

Are you replying to some one or posting random musings?

--
Ian Collins

BartC

unread,

Oct 7, 2012, 6:40:48 AM10/7/12

to

"Richard Damon" <news.x.ri...@xoxy.net> wrote in message
news:k4qm0b$jr0$1...@dont-email.me...

> On 10/6/12 5:30 AM, Nick Keighley wrote:
>
>> As someone remarked this business with "undefined behaviour" is true
>> of pretty much all programming languages (I'm not convinced Godel has
>> anything to contribute to this). To some extent C stresses it more,
>> this is partly because C runs nearly everywhere and has huge numbers
>> of implementations.

> If we removed pointers into arrays (and passing
> arrays with unspecified bounds), then the compiler could easily add code
> to check the subscripts to the array and trap on error conditions. If we
> want to support pointers into arrays, then these pointers could also be
> made "fatter" to include the bounds of the object they point to (and for
> multidimensional arrays, the bounds for each of the larger arrays the
> array is part of).

Arrays can have any numbers of dimensions, so would be highly impractical
for any of a thousand possible pointers into an array for each to duplicate
it's half-dozen or dozen dimensions. You would likely also need different
pointers for each of the sub-dimensions.

And for an array whose dimensions are not realised until runtime, or for
'ragged' arrays where the bounds vary through the array, how would
such a pointer be initialised? Other languages would tend to build the
bounds into the arrays themselves.

In any case, C allows pointers into all sorts of objects, including
non-arrays, or a single element of that multi-dimensional array, or to cast
one type of pointer into another; you wouldn't then be able to step or do
arithmetic on such a pointer, without by-passing the bounds checking.

So 'undefined behaviour', if it's as simple as having the wrong value in a
pointer, is built-in to the language!

(For single-dimensional arrays, a 'fat' pointer containing exactly one
bound, could work, provided they are a new explicit type in addition to
regular pointers. Then an array allocator could return such a pointer, which
can be passed to functions and would carry it's length for use by programs,
and could optionally be used for bounds checking by internal code. But for
multi-dimensions, it gets complicated...)

> This add significant overhead to the pointer and the
> operations.

Not if the alternative is to have to always pass the length of the array
together with a pointer to the array. Having bounds-checking code inserted
would be an extra overhead, but that can be optional.

--
Bartc

James Kuyper

unread,

Oct 7, 2012, 10:56:32 AM10/7/12

to

On 10/07/2012 06:40 AM, BartC wrote:
>
>
> "Richard Damon" <news.x.ri...@xoxy.net> wrote in message
> news:k4qm0b$jr0$1...@dont-email.me...

...

>> If we removed pointers into arrays (and passing
>> arrays with unspecified bounds), then the compiler could easily add code
>> to check the subscripts to the array and trap on error conditions. If we
>> want to support pointers into arrays, then these pointers could also be
>> made "fatter" to include the bounds of the object they point to (and for
>> multidimensional arrays, the bounds for each of the larger arrays the
>> array is part of).
>
> Arrays can have any numbers of dimensions, so would be highly impractical
> for any of a thousand possible pointers into an array for each to duplicate
> it's half-dozen or dozen dimensions. You would likely also need different
> pointers for each of the sub-dimensions.

None of that matters; only one range is needed at any given time - it
can be modified whenever changing levels in the multidimensional array.
Whenever an lvalue of array type gets converted to a pointer of it's
element type, that pointer can be given a range corresponding to the
beginning and ending of the array. It doesn't matter whether the element
type is itself an array type - that can only come into play upon
conversion of an lvalue of the element type being converted to a pointer
to it's first element; at which point the same rule applies, giving the
pointer a different range.

> And for an array whose dimensions are not realised until runtime, or for
> 'ragged' arrays where the bounds vary through the array, how would
> such a pointer be initialised?

In C, ragged arrays can only be implemented by allocating each row from
a larger memory space. If the allocation is handled by malloc(), then
the bounds can be inserted at the time malloc() is called. If the user
code allocates one large array, and then fills in an array of pointers
to irregularly-sized pieces of that array, there's no way for the C
compiler to know what the bounds are; it will necessarily use only the
bounds of the big array.

Other languages would tend to build the
> bounds into the arrays themselves.
>
> In any case, C allows pointers into all sorts of objects, including
> non-arrays,

That poses no problems - the C standard specifies that a pointer to a
non-array object can be treated as a pointer to the first and only
element of a 1-element array of the object's type.

> ... or a single element of that multi-dimensional array,

That poses no problem, either; the bounds for the pointer to the single
element are the bounds for the array from which it was selected. If the
programmer wants to restrict the permitted range more tightly than that,
the C language currently provides no mechanism for doing so; though
*((*element_type)[n])element_pointer seems a plausible mechanism that
could be used to tell the compiler to treat it as though it came from a
n-element array (I do NOT claim that the current standard endorses any
such use of this construct).

This construct could also be used to tell the compiler what bounds to
use when filling in a ragged array from a single large array.
--
James Kuyper

Chicken McNuggets

unread,

Oct 7, 2012, 11:48:01 AM10/7/12

to

On 05/10/2012 08:44, lipska the kat wrote:
>
> I understand it perfectly well, I just think if someone makes the effort
> to reply to my question I should make the effort to respond.
> One thing I learned from this and other posts is that C99 is probably a
> better choice for me that earlier standards. I added the option you
> suggested to my gcc commands along with -Wall ran make on all my current
> code and waited for the explosion (of demons perhaps:-) ... but nothing
> really of note appeared, There was one warning but that was about it.
> Most gratifying.
>
> Thanks for taking the time to reply
>
> lipska
>

When compiling with GCC you probably want to add the -Wextra and
-pedantic flags as well to your compilation command.

lipska the kat

unread,

Oct 7, 2012, 1:22:58 PM10/7/12

to

On 07/10/12 02:19, Gordon Burditt wrote:
> It is very easy to write a program in C that deliberately crashes
> (here this means: calls abort()) under conditions which you

[snip]

> - Crashes only when calling asctime() and the year is greater
> than 9,999 (Y10K bug in the *definition* of asctime()).

Well if I have any code running > 9999 then I'll consider it a bit of a
result ... Anyway by then we will have yottahertz fast, organographene
based tri-state quantum processors implanted at birth and everything we
have ever written, or experienced, or imagined, or fantasized about will
exist concurrently and in every possible manifestation and we will
experience multi dimensional photoholographic realities and travel to
the far points of the beyond visible universe instantly and without
cost, pollution or loss of unicorn dust.

And if anyone cares to wager that this will not be so then I'm taking
all bets now, cash only, paid by electronic transfer into my Swiss bank
account (no really, I do have one).

James Kuyper

unread,

Oct 7, 2012, 2:38:40 PM10/7/12

to

On 10/07/2012 01:22 PM, lipska the kat wrote:
> On 07/10/12 02:19, Gordon Burditt wrote:
>> It is very easy to write a program in C that deliberately crashes
>> (here this means: calls abort()) under conditions which you
>
> [snip]
>
>> - Crashes only when calling asctime() and the year is greater
>> than 9,999 (Y10K bug in the *definition* of asctime()).
>
> Well if I have any code running > 9999 then I'll consider it a bit of a

Well, the issue is also relevant to code that computes future times. I
admit, the need to determine calendar dates that far in the future is
quite small - but it's not non-existent.

The problem with asctime() is that it's the only C standard library
function whose behavior is defined entirely by example code (7.3.27.1p2)
showing how it could be implemented. asctime() provides a prime example
of why that's a bad idea. It can be deduced from that example code that
asctime() has undefined behavior if:

timeptr->tm_wday < 0 || timeptr->tm_wday > 6 ||
timeptr->tm_mon < 0 || timeptr->tm_mon > 11 ||
timeptr->tm_year < -2899 || timeptr->tm_year > 8099

The limits on tm_wday and tm_mon are due to their use as array indices;
the limit on tm_year is imposed by the fact that the call to sprintf()
will overflow the provided buffer. Even assuming that the date being
represented is between year 1000 and year 9999, you'll still get a
buffer overflow if

timeptr->tm_mday < -9 || timeptr->tm_mday > 99 ||
timeptr->tm_hour < -9 || timeptr->tm_hour > 99 ||
timeptr->tm_sec < -9 || timeptr->tm_sec > 99

However, until C2011, it was nowhere explicitly stated that this is the
case. In C2011, 7.3.27.1p3 was added, which says that the behavior is
undefined if (in effect) timeptr->tm_year < -900 || timeptr->tm_year >
8099, or any of the other fields are outside their normal range, as
defined in 7.27.1p4; this is more restrictive than the constraints I
deduced above.

asctime() doesn't have to be unsafe - the example code is only an
example. Undefined behavior allows, as one possibility, that asctime()
is implemented more safely than in the example code. It could return a
null pointer when tm_wday or tm_mon are out of range, or it could choose
a special month/day name (such as "INV"). It could also return a null
pointer instead of producing a buffer overflow, or it it could use a
buffer large enough to avoid any possibility of overflow.
--
James Kuyper

Keith Thompson

unread,

Oct 7, 2012, 2:47:53 PM10/7/12

to

"BartC" <b...@freeuk.com> writes:
> "Richard Damon" <news.x.ri...@xoxy.net> wrote in message
> news:k4qm0b$jr0$1...@dont-email.me...
>> On 10/6/12 5:30 AM, Nick Keighley wrote:
>>> As someone remarked this business with "undefined behaviour" is true
>>> of pretty much all programming languages (I'm not convinced Godel has
>>> anything to contribute to this). To some extent C stresses it more,
>>> this is partly because C runs nearly everywhere and has huge numbers
>>> of implementations.
>
>> If we removed pointers into arrays (and passing
>> arrays with unspecified bounds), then the compiler could easily add code
>> to check the subscripts to the array and trap on error conditions. If we
>> want to support pointers into arrays, then these pointers could also be
>> made "fatter" to include the bounds of the object they point to (and for
>> multidimensional arrays, the bounds for each of the larger arrays the
>> array is part of).
>
> Arrays can have any numbers of dimensions, so would be highly impractical
> for any of a thousand possible pointers into an array for each to duplicate
> it's half-dozen or dozen dimensions. You would likely also need different
> pointers for each of the sub-dimensions.

C's multidimensional arrays are nothing more or less than arrays of
arrays. Whatever mechanism existed for 1D arrays would automatically
apply to all higher dimensions.

> And for an array whose dimensions are not realised until runtime, or for
> 'ragged' arrays where the bounds vary through the array, how would
> such a pointer be initialised? Other languages would tend to build the
> bounds into the arrays themselves.

The language has no built-in "ragged" arrays; they're built in user code
by allocating the rows. Whatever method allocates the rows (say,
malloc()) would have to deal with any bounds tracking; for example,
malloc() would have to return a fat pointer.

> In any case, C allows pointers into all sorts of objects, including
> non-arrays, or a single element of that multi-dimensional array, or to cast
> one type of pointer into another; you wouldn't then be able to step or do
> arithmetic on such a pointer, without by-passing the bounds checking.

If C had fat pointers, all the operations you're describing would have
to maintain the "fatness".

BartC

unread,

Oct 7, 2012, 4:26:53 PM10/7/12

to

"James Kuyper" <james...@verizon.net> wrote in message
news:k4s571$l59$1...@dont-email.me...

Do you have any syntax examples of how it might work?

>> And for an array whose dimensions are not realised until runtime, or for
>> 'ragged' arrays where the bounds vary through the array, how would
>> such a pointer be initialised?
>
> In C, ragged arrays can only be implemented by allocating each row from
> a larger memory space. If the allocation is handled by malloc(), then
> the bounds can be inserted at the time malloc() is called.

OK, the array is defined by a single fat pointer. But what would that look
like in actual code? And how do you set up a pointer to a row, or element,
in a way that includes the bounds? And what would a fat pointer actually
contain? Examples I've seen discussed here seem to be very complicated.

>> In any case, C allows pointers into all sorts of objects, including
>> non-arrays,
>
> That poses no problems - the C standard specifies that a pointer to a
> non-array object can be treated as a pointer to the first and only
> element of a 1-element array of the object's type.
>
>> ... or a single element of that multi-dimensional array,
>
> That poses no problem, either; the bounds for the pointer to the single
> element are the bounds for the array from which it was selected.

You want a pointer to a single element which rattles around the larger
array? That's consistent with C programmers demanding to do arithmetic on
their pointers!

I favour using a simple pointer to individual elements (on which you can
choose to do unchecked arithmetic), and slices to ranges of elements. Or a
slice can point to a single element too. The difference with a slice,
compared a fat pointer that includes the array limits into which it points,
is you're only allowed to access what's represented in the slice.

In the array:

int A[10];

I can pass the slice (A+5,3) to a function, it will see a 3-element array
indexed 0..2, corresponding to A[5..7]. It's not interested in the other
elements! That's more awkward and less efficient to do with a proper 'fat'
pointer; while it can point to element [5] as (A,5,10) (ptr, offset,
length), the range A[5..7] would need to be (A+5,0,3).

--
bartc

BartC

unread,

Oct 7, 2012, 4:53:26 PM10/7/12

to

"Keith Thompson" <ks...@mib.org> wrote in message
news:lny5jim...@nuthaus.mib.org...

> "BartC" <b...@freeuk.com> writes:
>> "Richard Damon" <news.x.ri...@xoxy.net> wrote in message
>> news:k4qm0b$jr0$1...@dont-email.me...
>>> On 10/6/12 5:30 AM, Nick Keighley wrote:
>>>> As someone remarked this business with "undefined behaviour" is true
>>>> of pretty much all programming languages (I'm not convinced Godel has
>>>> anything to contribute to this). To some extent C stresses it more,
>>>> this is partly because C runs nearly everywhere and has huge numbers
>>>> of implementations.
>>
>>> If we removed pointers into arrays (and passing
>>> arrays with unspecified bounds), then the compiler could easily add code
>>> to check the subscripts to the array and trap on error conditions. If we
>>> want to support pointers into arrays, then these pointers could also be
>>> made "fatter" to include the bounds of the object they point to (and for
>>> multidimensional arrays, the bounds for each of the larger arrays the
>>> array is part of).
>>
>> Arrays can have any numbers of dimensions, so would be highly impractical
>> for any of a thousand possible pointers into an array for each to
>> duplicate
>> it's half-dozen or dozen dimensions. You would likely also need different
>> pointers for each of the sub-dimensions.
>
> C's multidimensional arrays are nothing more or less than arrays of
> arrays. Whatever mechanism existed for 1D arrays would automatically
> apply to all higher dimensions.

It's more complicated than that. You have simple arrays like this:

int A[5][4][3];

and more dynamic ones like this:

int ***B,***C;

Which might be set up to have dimensions [7][2][4], and [6][6][6].

I can appreciate that it might not be practical to pass a pointer to any of
A, B or C to a function which expects a 3D array of ints; A is not
compatible with B and C.

Setting up a pointer into A would mean a fat pointer with details of 3
dimensions; other static arrays might have N dimensions, which is the
difficulty I was thinking of (a fat pointer would itself be a linear
array!).

With B and C, the same difficulty exists, *unless* the pointers comprising
the arrays are themselves fat pointers, each containing the dimension of
that row. (But RD suggested all dimensions were included in each pointer,
not just the current level.)

--
Bartc

James Kuyper

unread,

Oct 7, 2012, 7:58:58 PM10/7/12

to

double multi_array[3][4][5];

double (*two_array)[4][5] = multi_array+1;
// the bounds for two_array are multi_array+1 and multi_array+2

double (*array)[5] = multi_array[2]+2;
// The bounds for array are multi_array[2] and multi_array[3].

double *single = multi_array[0][3] + 1;
// The bounds for single are multi_array[0][3] and multi_array[0][4]

Here (and elsewhere in this message, for that matter) they may be
off-by-one errors, or pointer-type mismatches. Sorry - I can only double
check what I wrote so many times without getting blurry eyes, and this
is inherently tricky to write.

>>> And for an array whose dimensions are not realised until runtime, or for
>>> 'ragged' arrays where the bounds vary through the array, how would
>>> such a pointer be initialised?
>>
>> In C, ragged arrays can only be implemented by allocating each row from
>> a larger memory space. If the allocation is handled by malloc(), then
>> the bounds can be inserted at the time malloc() is called.
>
> OK, the array is defined by a single fat pointer. But what would that look

> like in actual code? ...

void *p = malloc(100*sizeof(int));
// the bounds of p are the current value of p, and (char*)p + 1000.

int *int_array = (int*)p;
// bounds of int_array are the same as for p.

int (*int_twod)[5] = (int(*)[5])p;
// the bounds for int_twod are the same as for p

int_array = int_twod[2];
// The new bounds for int_array are int_twod[2] and int_twod[3]

> ... And how do you set up a pointer to a row, or element,
> in a way that includes the bounds? ...

The whole point of the fat pointer thing is that it occurs automatically
- it is NOT under your control. However, I did suggest a possible syntax
for imposing a stricter limit:

int_array = *(int (*)[50])int_array;

This could be defined as setting the bounds for int_array to the current
value of int_array, and the current value of int_array + 50.

>... And what would a fat pointer actually

> contain? Examples I've seen discussed here seem to be very complicated.

The fat pointer would have to contain three pieces of information: the
location it currently points at, the lowest location in memory that can
be reached by pointer subtraction with defined behavior, and the highest
location in memory that can reached by pointer addition with defined
behavior. If the maximum size of any object is much smaller than the
number of distinct memory locations that can be pointed at, it may make
sense to describe the two of those memory locations as offsets from the
third, rather than as absolute locations.

>>> In any case, C allows pointers into all sorts of objects, including
>>> non-arrays,
>>
>> That poses no problems - the C standard specifies that a pointer to a
>> non-array object can be treated as a pointer to the first and only
>> element of a 1-element array of the object's type.
>>
>>> ... or a single element of that multi-dimensional array,
>>
>> That poses no problem, either; the bounds for the pointer to the single
>> element are the bounds for the array from which it was selected.
>
> You want a pointer to a single element which rattles around the larger
> array? That's consistent with C programmers demanding to do arithmetic on
> their pointers!

The whole point of the proposal, as I understand it, is to make
violation of the current rules for pointer addition detectable by the
compiler, thereby allowing what might otherwise be dangerous
consequences to be replaced with standard-defined behavior (what that
behavior would be has not yet been specified - perhaps the raising of a
signal?).

However, you could achieve the effect you're talking about by using
*(int (*)[1])(int_array+3). That would create a pointer which can only
point at int_array[3] or int_array[4], and can only be dereferenced when
pointing at int_array[3]. It's a bit clumsy, but I'm not sure it would
be a common enough need to require a more elegant syntax.
--
James Kuyper

James Kuyper

unread,

Oct 7, 2012, 8:04:26 PM10/7/12

to

On 10/07/2012 04:53 PM, BartC wrote:
> "Keith Thompson" <ks...@mib.org> wrote in message
> news:lny5jim...@nuthaus.mib.org...
>> "BartC" <b...@freeuk.com> writes:

...

>>> Arrays can have any numbers of dimensions, so would be highly impractical
>>> for any of a thousand possible pointers into an array for each to
>>> duplicate
>>> it's half-dozen or dozen dimensions. You would likely also need different
>>> pointers for each of the sub-dimensions.
>>
>> C's multidimensional arrays are nothing more or less than arrays of
>> arrays. Whatever mechanism existed for 1D arrays would automatically
>> apply to all higher dimensions.
>
> It's more complicated than that. You have simple arrays like this:
>
> int A[5][4][3];
>
> and more dynamic ones like this:
>
> int ***B,***C;

As far as C is concerned, those are pointers; they may end up pointing
at arrays, but the relevant boundaries are determined by the arrays that
they point at, not by these pointers themselves.

> Which might be set up to have dimensions [7][2][4], and [6][6][6].

As pointers, not arrays, they can't have any of those dimensions.

> I can appreciate that it might not be practical to pass a pointer to any of
> A, B or C to a function which expects a 3D array of ints; A is not
> compatible with B and C.
>
> Setting up a pointer into A would mean a fat pointer with details of 3
> dimensions; other static arrays might have N dimensions, which is the
> difficulty I was thinking of (a fat pointer would itself be a linear
> array!).
> With B and C, the same difficulty exists, *unless* the pointers comprising
> the arrays are themselves fat pointers, each containing the dimension of
> that row. (But RD suggested all dimensions were included in each pointer,
> not just the current level.)

I hadn't noticed that. That's unnecessary, and a mistake on his part, I
think.
--
James Kuyper

BartC

unread,

Oct 7, 2012, 9:14:11 PM10/7/12

to

"James Kuyper" <james...@verizon.net> wrote in message

news:k4t5ab$tle$1...@dont-email.me...

> On 10/07/2012 04:53 PM, BartC wrote:

>> It's more complicated than that. You have simple arrays like this:
>>
>> int A[5][4][3];
>>
>> and more dynamic ones like this:
>>
>> int ***B,***C;
>
> As far as C is concerned, those are pointers; they may end up pointing
> at arrays, but the relevant boundaries are determined by the arrays that
> they point at, not by these pointers themselves.
>
>> Which might be set up to have dimensions [7][2][4], and [6][6][6].
>
> As pointers, not arrays, they can't have any of those dimensions.

How also would you create a 3D array from dimensions known at runtime?

Using a hierarchy of pointers, with a *** one at the top, is the only way I
know in C, even if fiddly (see below). The dimensions (a 3-element array)
are not associated with the pointers, true, but that is what we're talking
about.

/* mallocs not fully checked in this code... */
#include <stdio.h>
#include <stdlib.h>

typedef int T;

T* make1darrayT(int dim) {
return malloc(dim*sizeof(T));
}

T** make2darrayT(int *dims) {
T** p;
int i;
p=malloc(dims[0]*sizeof(T*));
if (p)
for (i=0; i<dims[0]; ++i)
p[i]=make1darrayT(dims[1]);
return p;
}

T*** make3darrayT(int *dims) {
T*** p;
int i;
p=malloc(dims[0]*sizeof(T**));
if (p)
for (i=0; i<dims[0]; ++i)
p[i]=make2darrayT(dims+1);
return p;
}

void printarrayT(T ***A,int *dims) {
int i,j,k;
for (i=0; i<dims[0]; ++i)
for (j=0; j<dims[1]; ++j)
for (k=0; k<dims[2]; ++k)
printf("A[%d][%d][%d] = %d\n",i,j,k,A[i][j][k]);
}

int main (void){
int adims[]={7,2,4};
T ***A;
int i,j,k;

A=make3darrayT(adims);
if (!A) exit(0);

for (i=0; i<adims[0]; ++i)
for (j=0; j<adims[1]; ++j)
for (k=0; k<adims[2]; ++k)
A[i][j][k]=i*10000+j*100+k;

printarrayT(A,adims);
}

--
Bartc

Richard Damon

unread,

Oct 7, 2012, 9:16:20 PM10/7/12

to

If we have an array: int foo[5][6];
there are 3 types that might be used with this.

int (*p1)[5][6] = &foo;
which is a pointer to the full array. Such a pointer could also be used
to point into int bar[4][5][6];

There is also int (*p2)[6] = &foo[n]; which points to one row of the
array, and

int *p3 = &foo[n][m]; which points to an element of foo;

Note that for p1 set equal to &foo p1++, p1--, p1+1, or p1[1] are all
invalid operations, as p1 doesn't point into an array, but a single
object (that happens to be an array).

We have the ability to set p2 = &p1[1] to move down the dimensions, and
if the pointers are fat in that they store the base and limits, all that
information can be derived from the value in p1. What is problematic is
converting a p2 to a p1, but this can only be done via a cast in C (of
use of void* s implicit casting), but it should be noted that this type
of pointer cast is inherently very prone to "undefined behavior" so
language trying to be C like, but avoiding/limiting undefined behavior
might just prohibit this action or warn on the construct and allow
undefined behavior thereafter. Note that even in C this sort of
upcasting of an element to the array is fairly rare, and very prone to
invoking undefined behavior.

void* pointers need to be limited if not eliminated. The void pointer
needs to either be a source of undefined behavior, or it needs to
remember the bounds of the object it pointed into (and maybe of the full
object it is pointing into), making its type punning ability much less
useful (as it becomes disallowed).

It is possible to add checking for this sort of thing, if you generate
for every array (single or multidimensional) a table describing it, and
include in the pointer information on where in the array you are
pointing. Then the cast operation could look into this data an see if
the up cast is valid and if so the new bounds.

For run time generated arrays, malloc and family would need to be
smarter, and somehow be passed the type of the array rather than just a
byte count (it would become a bit more like the C++ new operator), so
that it could return the appropriate fat pointer, since that is the only
can of pointer that can point into an array.

For a pointer to a single object as opposed to into an array, one
solution would be to consider such an object as part of a 1 dimensional
array. If you are using the option of pointer up casting and needing
description blocks for arrays, you can either create a block of all such
objects that have their address taken, or define a special case that a
value of 0 for the pointer to the array block is a signal that the
pointer isn't into an array (and thus upcasting, or manipulating the
address are illegal).

The comment on overhead is comparing to C. A C pointer is typically 1
register big (or at least 1 address register big), and tends to be able
to be manipulated with simple direct machine instructions. The fat
pointer needed to avoid undefined behavior needs to be bigger, as it
must store extra information. The operations on it will typically not be
a simple direct machine instruction (unless the machine is one designed
for this sort of purpose, where address manipulations include bounds
testing) so is slower.

C code attempting to duplicate this bounds checking would be slower than
C code without all this checking, but possible faster than the language
using fat pointers. One reason being that the programmer can possible
know more about what pointers point into and perhaps come up with better
test to "prove" that the accesses are valid. In normal C code, a lot of
accesses the programmer can "prove" to himself that they are safe from
other guarantees, that the compiler might not be able to prove for
itself (like know that all strings do have a null terminator, and thus a
char scanning loop is safe). Of course the danger is that the programmer
also might make a mistake.

James Kuyper

unread,

Oct 7, 2012, 9:47:38 PM10/7/12

to

On 10/07/2012 09:14 PM, BartC wrote:
> "James Kuyper" <james...@verizon.net> wrote in message
> news:k4t5ab$tle$1...@dont-email.me...
>> On 10/07/2012 04:53 PM, BartC wrote:
>
>>> It's more complicated than that. You have simple arrays like this:
>>>
>>> int A[5][4][3];
>>>
>>> and more dynamic ones like this:
>>>
>>> int ***B,***C;
>>
>> As far as C is concerned, those are pointers; they may end up pointing
>> at arrays, but the relevant boundaries are determined by the arrays that
>> they point at, not by these pointers themselves.
>>
>>> Which might be set up to have dimensions [7][2][4], and [6][6][6].
>>
>> As pointers, not arrays, they can't have any of those dimensions.
>
> How also would you create a 3D array from dimensions known at runtime?

Using malloc(), calloc(), or VLAs. Pointers returned by malloc() or
calloc() based upon a VLA would have a range set based upon the size
passed to malloc, the sizes passed by calloc(), or the dimensions of the
VLA, respectively. For this purpose, realloc() is treated the same as
malloc().
--
James Kuyper

Keith Thompson

unread,

Oct 8, 2012, 3:33:56 AM10/8/12

to

No, on the C language level it really is exactly that simple.

> You have simple arrays like this:
>
> int A[5][4][3];

Right.

> and more dynamic ones like this:
>
> int ***B,***C;

Those aren't arrays, they're pointers.

> Which might be set up to have dimensions [7][2][4], and [6][6][6].

You can use multi-level pointers like that to create data structures
that behave like dynamic multi-dimensional arrays. To do so, you
have to explicitly allocate memory for each row, and for each row
of pointers to rows, and so on. Each allocation (presumably a call
to malloc() would have to create a properly initialized fat pointer.

[snip]

The current rules of the language permit an implementation to
make all pointers fat, with information propagating through object
definitions, allocations, assignments, and so forth, so that bounds
checks can be made to fail in circumstances where the behavior
is undefined.

BartC

unread,

Oct 8, 2012, 6:13:00 AM10/8/12

to

"Keith Thompson" <ks...@mib.org> wrote in message

news:lnipalm...@nuthaus.mib.org...

> "BartC" <b...@freeuk.com> writes:

>> and more dynamic ones like this:
>>
>> int ***B,***C;
>
> Those aren't arrays, they're pointers.
>
>> Which might be set up to have dimensions [7][2][4], and [6][6][6].
>
> You can use multi-level pointers like that to create data structures
> that behave like dynamic multi-dimensional arrays. To do so, you
> have to explicitly allocate memory for each row, and for each row
> of pointers to rows, and so on. Each allocation (presumably a call
> to malloc() would have to create a properly initialized fat pointer.

Which according to:

"James Kuyper" <james...@verizon.net> wrote in message

news:k4t503$s2b$1...@dont-email.me...

> The fat pointer would have to contain three pieces of information: the
> location it currently points at, the lowest location in memory that can
> be reached by pointer subtraction with defined behavior, and the highest
> location in memory that can reached by pointer addition with defined
> behavior.

might look like this:

int *p=malloc(100*sizeof(int));

p might contain (A, A, A+400), since malloc() knows nothing about what kind
of objects are to be stored in the block. (A is the address of some heap
memory, and A+400 is the highest value p can have, but cannot be
dereferenced). (All offsets in fat pointers are char offsets!)

While (p+5) might be (A+20,A,A+400). And in:

int B[10];
int q=&B;

q might contain(B,B,B+40). A lone 'B' terminal would decay to (B,B,40) as
well, as would a '&B' term (but with different types).

In the case of my int[7][2][4] dynamic array, the top level pointer could
be (C,C,C+84), and each of the next tier could be (D,D,D+24), located at
(C+i*12,C,C+84) (with pointers being 12 bytes). A pointer to any actual
element would have to be (X+offset,X,X+16).

A pointer to any location in my int[5][4][3] static array would be
(E+offset,E,E+240), slightly different (the pointer could be stepped
anywhere in the entire array, not just in that row).

And a pointer to an isolated int value would be just (F,F,F+4).

> The current rules of the language permit an implementation to
> make all pointers fat, with information propagating through object
> definitions, allocations, assignments, and so forth, so that bounds
> checks can be made to fail in circumstances where the behavior
> is undefined.

OK, so it's just about workable. But I can see some issues:

o It seems it needs to be all or nothing; *all* pointers in an
implementation must be fat, including those you have no intention of doing
any arithmetic on.

o It doesn't really address the issue of array bounds: it only stops a
pointer from wandering outside an allocated block, but which contains
multiple arrays or array rows. Not even in the static [5][4][3] array. Only
in static 1D arrays are the bounds protected. So, under the scheme I
outlined above, they can't be used for subscript checking as proposed by
Richard Damon.

o In fact they don't seem designed for programmer use at all, only for
internal protection, which means:

o Can't be used to extract the length of an array (pointer) passed to a
function
o Can't be used to construct a slice or sub-range of a larger array or
block

o Doubtless there are miscellaneous language issues to be sorted (converting
to and from an int for example; bounds info will be lost)

--
Bartc

James Kuyper

unread,

Oct 8, 2012, 7:40:43 AM10/8/12

to

On 10/08/2012 06:13 AM, BartC wrote:
...

> OK, so it's just about workable. But I can see some issues:
>
> o It seems it needs to be all or nothing; *all* pointers in an
> implementation must be fat, including those you have no intention of doing
> any arithmetic on.

That's probably true, but if you don't intend to do any arithmetic on a
given pointer, and the compiler can figure out your intentions, it could
put the "fat" pointers on a diet, as an optimization.

> o It doesn't really address the issue of array bounds: it only stops a
> pointer from wandering outside an allocated block, but which contains
> multiple arrays or array rows. Not even in the static [5][4][3] array. Only
> in static 1D arrays are the bounds protected. So, under the scheme I
> outlined above, they can't be used for subscript checking as proposed by
> Richard Damon.

I don't see how you reached the conclusion that "... only in static 1D
arrays are the bounds protected.". Array bounds would apply to automatic
and allocated arrays, too. The clause in the standard which currently
specifies undefined behavior for violating the bounds of a array talks
only about a 1D array - it's applicability to multi-dimensional arrays
is derived solely from the fact that what C calls a multi-dimensional
array is just a 1D array whose element type is also a 1D array,
recursively all the way down to the point where the element type is no
longer an array. Therefore, protection of 1D arrays is sufficient to
protect multi-dimensional arrays.

The comment that started this sub-thread merely suggests that violation
of array bounds should trap, which is no better than the undefined
behavior allowed by the current standard. I've suggested that this
should be changing, for instance to the raising of a standard-defined
new signal.

Can you provide an example of code which, according to the current
standard, contains a array bounds violation, where a fat-pointer
implementation would not enable the raising of such a signal?

Here's an example of a fully dynamic 2-d array, where fat pointers would
have no trouble raising such a signal when the array bounds are violated:

int **ragarray = malloc(rows*sizeof *ragarray);
size_t *cols = malloc(rows* sizeof *cols);
if(ragarray && cols)
{
// TBD: Fill in cols

// The bounds for ragarray are ragarray, and
// ragarray+rows*sizeof *ragarray
int row;

for(row=0; row<rows; row++)
{
ragarray[row] = malloc(cols[row] * sizeof *ragarray[row]);
if(ragarray[row] == NULL)
{
// error handling TBD
}
else
{
// The bounds for ragarray[row] are ragarray[row],
// ragarray[row] + cols[row] * sizeof *ragarray[row]
}
}
if(no errors)
{
// Each of the following expressions would violate an array
// bound and therefore result in raising the specified
// signal.
ragarray-1; // Violates lower bound on ragarray
ragarray+(rows+1); // Violates upper bound on ragarray
ragarray[rows] = NULL; // Violates upper bound on ragarray
row = rows/2;
ragarray[row]-1; // Violates lower bound on ragarray[row]

ragarray[row]+cols[row]+1;
//violates upper bound on ragarray[row]

ragarray[row][cols[row]] = 5;
// violates upper bound on ragarray[row]
}
}
// Free all of the allocated memory.

> o In fact they don't seem designed for programmer use at all, only for
> internal protection, which means:
>
> o Can't be used to extract the length of an array (pointer) passed to a
> function
> o Can't be used to construct a slice or sub-range of a larger array or
> block

Fat pointers would enable such features; they just weren't part of the
suggestion.

> o Doubtless there are miscellaneous language issues to be sorted (converting
> to and from an int for example; bounds info will be lost)

Not necessarily - it just requires a larger int size. Of course, the
larger int size required might not be supported, but the same is true of
the smaller int size required by skinny pointers - that's why
*[u]intptr_t and [U]INTPTR_* are optional.
--
James Kuyper

BartC

unread,

Oct 8, 2012, 8:13:05 AM10/8/12

to

"James Kuyper" <james...@verizon.net> wrote in message

news:k4ue3s$3te$1...@dont-email.me...

> On 10/08/2012 06:13 AM, BartC wrote:

>> o It doesn't really address the issue of array bounds: it only stops a
>> pointer from wandering outside an allocated block, but which contains
>> multiple arrays or array rows. Not even in the static [5][4][3] array.
>> Only
>> in static 1D arrays are the bounds protected.

> I don't see how you reached the conclusion that "... only in static 1D
> arrays are the bounds protected.". Array bounds would apply to automatic
> and allocated arrays, too.

OK: my comment wasn't quite right: protection also extends to 1D allocated
arrays which *exclusively* inhabit an allocated block. Such as in
(presumably) your example code, and in my earlier post. (And by "static" I
simply meant not dynamic.)

This applies also to multi-dimensional allocated arrays built on top of such
1D arrays. So protection is better than I expected.

But what I had in mind were:

o Static (ie non-dynamic) N-dimensional arrays (unless the compiler adjusts
pointers so that they are constrained solely within the enclosing row)

o N-dimensional arrays allocated as one solid block (so a pointer could
cross from one row to another)

o Allocated blocks with mixed use (structs containing arrays, or containing
more than one array, or the array contains other data before and/or after,
or an array has been over-allocated but currently has smaller bounds, etc
etc)

(But what I also had in mind was being able to extract array dimensions from
the pointer, which is subject to the problems with fat pointers that are
concerned only lower and upper limits of some allocation.)

> The comment that started this sub-thread merely suggests that violation
> of array bounds should trap, which is no better than the undefined
> behavior allowed by the current standard. I've suggested that this
> should be changing, for instance to the raising of a standard-defined
> new signal.
>
> Can you provide an example of code which, according to the current
> standard, contains a array bounds violation, where a fat-pointer
> implementation would not enable the raising of such a signal?

typedef struct {
int a[4];
int x,y,z;
} S;

S *p = malloc(sizeof(S));

p->a[4]=8;

--
Bartc

James Kuyper

unread,

Oct 8, 2012, 10:08:07 AM10/8/12

to

On 10/08/2012 08:13 AM, BartC wrote:
> "James Kuyper" <james...@verizon.net> wrote in message
> news:k4ue3s$3te$1...@dont-email.me...
>> On 10/08/2012 06:13 AM, BartC wrote:

...

>>> Only
>>> in static 1D arrays are the bounds protected.
>
>> I don't see how you reached the conclusion that "... only in static 1D
>> arrays are the bounds protected.". Array bounds would apply to automatic
>> and allocated arrays, too.
>
> OK: my comment wasn't quite right: protection also extends to 1D allocated
> arrays which *exclusively* inhabit an allocated block. Such as in
> (presumably) your example code, and in my earlier post. (And by "static" I
> simply meant not dynamic.)

The standard defines four meanings for 'static'. I recommend reducing
the potential for confusion by using the word 'static' only when one of
those meanings applies. The standard doesn't define dynamic, and uses
the term primarily in section F, when referring to IEC 60559 dynamic
rounding, which doesn't seem relevant here.

> This applies also to multi-dimensional allocated arrays built on top of such
> 1D arrays. So protection is better than I expected.
>
> But what I had in mind were:
>
> o Static (ie non-dynamic) N-dimensional arrays (unless the compiler adjusts
> pointers so that they are constrained solely within the enclosing row)

Could you please convert your use of "static" or "non-dynamic" into
standard C terminology? I can't come up with a plausible meaning that
makes your statements here both true and relevant.

> o N-dimensional arrays allocated as one solid block (so a pointer could
> cross from one row to another)

Since moving such a pointer from one row to another currently has well
defined behavior, that doesn't constitute an array bounds violation as
far as the standard is concerned. I did propose a syntax, using a cast,
that could be defined as imposing an arbitrary array length on a
pointer, that was smaller than the actual array length. The use of that
construct to give the resulting pointer a longer length should, itself,
cause the specified signal to be raise()d, since it would constitute
disabling a safety.

> o Allocated blocks with mixed use (structs containing arrays, or containing
> more than one array, or the array contains other data before and/or after,
> or an array has been over-allocated but currently has smaller bounds, etc
> etc)

Those don't present any problems for the use of fat pointers to prevent
array bounds violations.

> (But what I also had in mind was being able to extract array dimensions from
> the pointer, which is subject to the problems with fat pointers that are
> concerned only lower and upper limits of some allocation.)

Well, yes. You need a pointer to an entire array to get the bounds on
that array. You could propose even fatter pointers, ones that contain a
pointer to an array-description-block of some kind.

>> The comment that started this sub-thread merely suggests that violation
>> of array bounds should trap, which is no better than the undefined
>> behavior allowed by the current standard. I've suggested that this

>> should be changed, for instance to the raising of a standard-defined

>> new signal.
>>
>> Can you provide an example of code which, according to the current
>> standard, contains a array bounds violation, where a fat-pointer
>> implementation would not enable the raising of such a signal?
>
> typedef struct {
> int a[4];
> int x,y,z;
> } S;
>
> S *p = malloc(sizeof(S));
>
> p->a[4]=8;

p->a is an lvalue of array type; in this context, as in most others, it
is automatically converted into a pointer to the first element of that
array. At the time when that conversion happens, the resulting fat
pointer would be given bounds of p->a and p->a+4. Except when preceded
by &, the expression p->a[4] violates that upper bound, and would
therefore cause the appropriate signal to be raised.

Since the subscript is a constant, and the declaration of S is
necessarily in scope where this expression is used, the compiler can
simply optimize "p->a[4]=8" at compile time into a direct call to
raise(). Only if the subscript were a variable would the compiler
actually need to construct a fat pointer. However, that optimization is
simply a side issue.

Why did you think that example would be problematic? Did you expect p->a
to inherit the same array bounds as p itself? That wouldn't be right;
and it's trivial to get it right.
--
James Kuyper

BartC

unread,

Oct 8, 2012, 11:20:45 AM10/8/12

to

"James Kuyper" <james...@verizon.net> wrote in message

news:k4umo9$n8s$1...@dont-email.me...

> On 10/08/2012 08:13 AM, BartC wrote:
>> "James Kuyper" <james...@verizon.net> wrote in message

>>> Can you provide an example of code which, according to the current
>>> standard, contains a array bounds violation, where a fat-pointer
>>> implementation would not enable the raising of such a signal?
>>
>> typedef struct {
>> int a[4];
>> int x,y,z;
>> } S;
>>
>> S *p = malloc(sizeof(S));
>>
>> p->a[4]=8;

> Why did you think that example would be problematic? Did you expect p->a
> to inherit the same array bounds as p itself? That wouldn't be right;
> and it's trivial to get it right.

Well, yes. I got the impression a fat pointer described the extents of an
allocated block and that was it. Now maybe the compiler can create a
modified pointer when it knows it's going to point to an array and when it
knows the bounds (where they stop short of the size of the block).

But this was a bad example on my part because it uses a fixed array size; it
can check bounds without using any fat pointers!

I can probably create a more contrived example, but I concede that fat
pointers, might be workable in that they can trap the majority of array
indexing errors, when arrays are created and used sensibly (which probably
isn't the case with a lot of my code...).

It also sounds like quite a bit of extra work behind the scenes might be
needed, beyond just creating the fat pointer to start with, and doing a
per-access check of the bounds

--
Bartc

James Kuyper

unread,

Oct 8, 2012, 12:45:02 PM10/8/12

to

On 10/08/2012 11:20 AM, BartC wrote:
> "James Kuyper" <james...@verizon.net> wrote in message
> news:k4umo9$n8s$1...@dont-email.me...
>> On 10/08/2012 08:13 AM, BartC wrote:
>>> "James Kuyper" <james...@verizon.net> wrote in message
>
>>>> Can you provide an example of code which, according to the current
>>>> standard, contains a array bounds violation, where a fat-pointer
>>>> implementation would not enable the raising of such a signal?
>>>
>>> typedef struct {
>>> int a[4];
>>> int x,y,z;
>>> } S;
>>>
>>> S *p = malloc(sizeof(S));
>>>
>>> p->a[4]=8;
>
>> Why did you think that example would be problematic? Did you expect p->a
>> to inherit the same array bounds as p itself? That wouldn't be right;
>> and it's trivial to get it right.
>
> Well, yes. I got the impression a fat pointer described the extents of an
> allocated block and that was it.

No - the whole point of the fat pointer was to enforce array bounds
violations, so what it describes is limits on access to the array, not
the limits on the allocated block.

> I can probably create a more contrived example, but I concede that fat
> pointers, might be workable in that they can trap the majority of array
> indexing errors, when arrays are created and used sensibly (which probably
> isn't the case with a lot of my code...).
>
> It also sounds like quite a bit of extra work behind the scenes might be
> needed, beyond just creating the fat pointer to start with, and doing a
> per-access check of the bounds

Oh yes - the message which first mentioned fat pointers also mentioned
the significant amount of overhead costs associated with such pointers,
as the reason why the C standard does NOT require that the behavior be
well-defined.

lipska the kat

unread,

Oct 8, 2012, 2:02:09 PM10/8/12

to

On 03/10/12 19:13, lipska the kat wrote:
> Hi

[snip]

I've been reading this thread with increasing alarm ... there seems to
be no real way to be sure that one's code complies with ... what
exactly. According to wikipedia the 'C99 spec' is not available to
download for free. I think what I need (for 'C99') is ISO/IEC 9899:1999
This seems to be no longer available. I managed to find something called
Committee Draft — May 6, 2005 ISO/IEC 9899:TC2 but this is dated 2005
so I'm sure it's way out of date. If I go to iso.org and search for
ISO/IEC 9899 I get a bunch of results one of which is ISO/IEC 9899:2011
which costs CHF 230+ which seems a bit steep to me.
The latest 'free' version appears to be
Committee Draft — Septermber(sic) 7, 2007 ISO/IEC 9899:TC3.

Would this version be suitable for me to start with or have significant
changes been made since 2007.

Many thanks

James Kuyper

unread,

Oct 8, 2012, 2:29:08 PM10/8/12

to

On 10/08/2012 02:02 PM, lipska the kat wrote:
> On 03/10/12 19:13, lipska the kat wrote:
>> Hi
>
> [snip]
>
> I've been reading this thread with increasing alarm ... there seems to
> be no real way to be sure that one's code complies with ... what
> exactly. According to wikipedia the 'C99 spec' is not available to
> download for free.

...

> The latest 'free' version appears to be
> Committee Draft — Septermber(sic) 7, 2007 ISO/IEC 9899:TC3.
>
> Would this version be suitable for me to start with or have significant
> changes been made since 2007.

The official C99 documents were the original standard itself, plus three
Technical Corrigenda (TC) that describe various modifications to the
standard. The document you're referring to is 1256.pdf, which shows what
C99 would look like after applying TC1, TC2, and TC3. It's technically
not as official as those other documents, but it's free, and, far more
convenient to use, and has only two known defects: the typo in the date
that you mention, the fact that "The predefined macro
__STDC_MB_MIGHT_NEQ_WC__ should be in 6.10.8p2 (conditional macros)
rather than p1 (required macros)."
I strongly recommend using n1256.pdf INSTEAD OF the official C99 standard.

I gather that n1570.pdf is essentially identical to the current
standard, C2011.

lipska the kat

unread,

Oct 8, 2012, 3:15:27 PM10/8/12

to

On 08/10/12 19:29, James Kuyper wrote:
> On 10/08/2012 02:02 PM, lipska the kat wrote:
>> On 03/10/12 19:13, lipska the kat wrote:
>>> Hi
>>
>> [snip]
>>
>> I've been reading this thread with increasing alarm ...

[snip]

> I strongly recommend using n1256.pdf INSTEAD OF the official C99 standard.

OK, I'll go with that, thanks

Keith Thompson

unread,

Oct 8, 2012, 3:29:08 PM10/8/12

to

"BartC" <b...@freeuk.com> writes:
[...]

> "James Kuyper" <james...@verizon.net> wrote in message
> news:k4t503$s2b$1...@dont-email.me...
>
>> The fat pointer would have to contain three pieces of information: the
>> location it currently points at, the lowest location in memory that can
>> be reached by pointer subtraction with defined behavior, and the highest
>> location in memory that can reached by pointer addition with defined
>> behavior.
>
> might look like this:
>
> int *p=malloc(100*sizeof(int));
>
> p might contain (A, A, A+400), since malloc() knows nothing about what kind
> of objects are to be stored in the block. (A is the address of some heap
> memory, and A+400 is the highest value p can have, but cannot be
> dereferenced). (All offsets in fat pointers are char offsets!)

That doesn't have to be the case. There are a number of ways the
offset information in a fat pointer could be stored. It could be
stored as two additional pointers, to the beginning and (one past)
the end of the relevant array object, or as byte offsets (which
might save space if the offset can be stored more compactly as a
pointer), or as *object* offsets, dependent on the pointed-to type.

In a previous life, I worked on an Ada compiler that had to deal with
this, since Ada requires array bounds checking. The implementation
I worked on didn't use fat pointers; instead, the bounds information
was associated with the array *type*. For an array whose bounds
are compile-time constants, the bounds were not stored at run time;
the compiler would just generate code to check against the constant
values. If the bounds were run-time values, bound checking code
would refer to implicitly created variables associated with the
array type. (Unlike C, Ada's array indexing is not defined in terms
of pointer arithmetic; I don't know whether this scheme would work
for a hypothetical C implementation.)

One interesting thing: A lot of bound checks could be eliminated
during optimization. For example, if you wrote the Ada equivalent
of this C code:

for (int i = 0; i < some_value; i ++) {
arr[i] = 42;
}

the bounds check could be done just once at the top of the loop.

It might be interesting to design a C-like language with *mandatory*
full bounds checking. I wonder how much of C's current semantic would
have to be sacrificed. (I don't intend to design such a language, and
certainly not in this newsgroup.)

[...]

Keith Thompson

unread,

Oct 8, 2012, 5:04:56 PM10/8/12

to

They're available here:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

The offical C99 standard (not including the three TCs) is available for
$30 US from ANSI:

http://webstore.ansi.org/RecordDetail.aspx?sku=INCITS%2fISO%2fIEC+9899-1999+(R2005)#.UHM-uE1kiSo

and the individual TCs (3 for C99, 1 for C11) are available at no charge
from ANSI. I think C11 is also available from ANSI, but I can't find it
at the moment.

But as James says, N1256 is *better* for most purposes than the official
C99 standard.

(The first Technical Corrigendum to C11 fixes an error in the
definitions of the __STDC_VERSION__ and __STDC_LIB_EXT1__ macros, an
error that exists in both the N1570 draft and the released standard.)

BartC

unread,

Oct 8, 2012, 5:28:17 PM10/8/12

to

"Keith Thompson" <ks...@mib.org> wrote in message

news:lna9vwn...@nuthaus.mib.org...

> "BartC" <b...@freeuk.com> writes:

> In a previous life, I worked on an Ada compiler that had to deal with
> this, since Ada requires array bounds checking. The implementation
> I worked on didn't use fat pointers; instead, the bounds information
> was associated with the array *type*.

I don't know Ada but I'd imagine that there must be an array type whose
dimension is different for each instance of the type (a 'flex' bound). In
that case bounds checking can be nearly as easy as it is for fixed bounds!

>For an array whose bounds
> are compile-time constants, the bounds were not stored at run time;
> the compiler would just generate code to check against the constant
> values. If the bounds were run-time values, bound checking code
> would refer to implicitly created variables associated with the
> array type.

Yes, the bounds must be stored in or near the array. Exactly how, is the
compiler's problem.

> It might be interesting to design a C-like language with *mandatory*
> full bounds checking. I wonder how much of C's current semantic would
> have to be sacrificed. (I don't intend to design such a language, and
> certainly not in this newsgroup.)

(I don't blame you! I'm currently working on two languages, one used to
implement the other. The higher level one has bounds checking and everything
else thrown in.

The lower-level, C-like one, doesn't have bounds checking. But what is more
interesting than bounds checking, is also having bounds of arrays and
strings available to the programmer, for passing implicitly between
functions, and generally making life easier. I had an idea for a scheme that
is simpler than the fat pointers that have been discussed, and more
efficient. If I ever get round to making it work I'll post some details as
it could be applied to C too.)

--
Bartc

Keith Thompson

unread,

Oct 8, 2012, 6:47:59 PM10/8/12

to

"BartC" <b...@freeuk.com> writes:
> "Keith Thompson" <ks...@mib.org> wrote in message
> news:lna9vwn...@nuthaus.mib.org...
>> "BartC" <b...@freeuk.com> writes:
>
>> In a previous life, I worked on an Ada compiler that had to deal with
>> this, since Ada requires array bounds checking. The implementation
>> I worked on didn't use fat pointers; instead, the bounds information
>> was associated with the array *type*.
>
> I don't know Ada but I'd imagine that there must be an array type whose
> dimension is different for each instance of the type (a 'flex' bound). In
> that case bounds checking can be nearly as easy as it is for fixed bounds!

Now that I think about it, I was imprecise. Bounds are associated with
an array *subtype*. For example, String is an array type with
unspecified bounds:

type String is array(Positive range<>) of Character;

A given String variable has specific bounds, which may be computed at
compile time or run time:

S: String(1..N);

which is equivalent to:

subtype anon is String(1..N);
S: anon;

There's some compatibity between distinct subtypes of the same type;
"S1 := S2;" is legal, and raises an exception if the two subtypes
have incompatible bounds.

It requires the compiler to do a lot of work behind the scenes.
Ada does *not* tend to follow the "Spirit of C" (though you can write
similarly low-level code in either language).

>>For an array whose bounds
>> are compile-time constants, the bounds were not stored at run time;
>> the compiler would just generate code to check against the constant
>> values. If the bounds were run-time values, bound checking code
>> would refer to implicitly created variables associated with the
>> array type.
>
> Yes, the bounds must be stored in or near the array. Exactly how, is the
> compiler's problem.

I'm not sure what you mean by "in or near". The bounds have to be
*available*; they can be either stored or computed (the language
doesn't care as long as S'First, S'Last, and S'Length yield the
correct results). If they're stored, they don't have to be anywhere
near the memory location containing the array object.

And I think this is approaching the limits of any hypothetical relevance
to C.

Joe Pfeiffer

unread,

Oct 9, 2012, 12:47:43 AM10/9/12

to

lipska the kat <lipska...@yahoo.co.uk> writes:

> On 03/10/12 19:13, lipska the kat wrote:
>> Hi
>
> [snip]
>
> I've been reading this thread with increasing alarm ... there seems to
> be no real way to be sure that one's code complies with ... what
> exactly.

Sorry to say, but "Same as it ever was, same as it ever was, same as it
ever was, same as it ever was"
http://www.youtube.com/watch?v=XI3c-22lL_s

There has never been a language whose semantics have been that fully
defined. There is always undefined behavior -- some of which is
explicitly "undefined", and some of which is just not noticed and so
there just doesn't turn out to be a definition for.

In this newsgroup, you may have fallen in among the most lawyer-minded
group of specification-readers on the net (I'm terrified by the thought
of another newsgroup that's more so). In real life, you code to the
best of your ability, you test to the best of your ability, you ask
questions when you hit something that doesn't make sense, and you go on.

No matter what language you're writing in.
--
"Erwin, have you seen the cat?" -- Mrs. Shroedinger

lipska the kat

unread,

Oct 9, 2012, 4:04:15 AM10/9/12

to

On 09/10/12 05:47, Joe Pfeiffer wrote:
> lipska the kat<lipska...@yahoo.co.uk> writes:
>
>> On 03/10/12 19:13, lipska the kat wrote:
>>> Hi
>>

[snip]

>

> In this newsgroup, you may have fallen in among the most lawyer-minded
> group of specification-readers on the net (I'm terrified by the thought
> of another newsgroup that's more so). In real life, you code to the
> best of your ability, you test to the best of your ability, you ask
> questions when you hit something that doesn't make sense, and you go on.
>
> No matter what language you're writing in.

Well I'm coming from the relatively 'safe' world of Java serverside
development. I'm used to automatic garage collection, no pointers to get
me into trouble, compile time type and array bounds checking (although I
rarely used arrays) built in exception handling (although
catch(Throwable t) is a bit of a copout) full OO semantics, huge free
third party software libraries and frameworks, usable generics,
excellent documentation and no real concerns re portability. I never
read the Java language specification because I never had to. I regularly
wrote, tested, re-factored re-tested and packaged components on my Linux
laptop and deployed them to different Linux distros, Windows based
servers, and anything else that hosted a suitable version of the JVM and
they just worked.

Coming from there to here has been a bit of a shock to the system albeit
strangely liberating. I doubt I'll read the whole spec from cover to
cover but given the comments so far regarding 'undefined behavior' it's
comforting to know it's there .. I think, doesn't help with k&r 2nd
edition ex 2.6 tho %-|

BartC

unread,

Oct 9, 2012, 5:30:11 AM10/9/12

to

"lipska the kat" <lipska...@yahoo.co.uk> wrote in message
news:ye2dnWJ-1aEcR-7N...@bt.com...

> On 09/10/12 05:47, Joe Pfeiffer wrote:

>> No matter what language you're writing in.
>
> Well I'm coming from the relatively 'safe' world of Java serverside
> development.

...

> Coming from there to here has been a bit of a shock to the system albeit
> strangely liberating. I doubt I'll read the whole spec from cover to cover
> but given the comments so far regarding 'undefined behavior' it's
> comforting to know it's there .. I think, doesn't help with k&r 2nd
> edition ex 2.6 tho %-|

You're not comparing like with like. What language is a typical JVM
implementation written in? It's probably not Java! Instead it leaves all the
'dirty work' to a different language and keeps itself clean.

What are Java's primitive data types? I didn't know until five minutes ago
but they seem to be 8, 16, 32, and 64-bit types; very neat and tidy, not
even any unsigned integer types to make life difficult!

C has to do a lot more of the dirty work and is expected to run on more
'interesting' architectures. That doesn't excuse everything, but it might
give an idea why programming can be a bit more challenging.

--
bartc

lipska the kat

unread,

Oct 9, 2012, 6:48:25 AM10/9/12

to

On 09/10/12 10:30, BartC wrote:
> "lipska the kat" <lipska...@yahoo.co.uk> wrote in message
> news:ye2dnWJ-1aEcR-7N...@bt.com...
>> On 09/10/12 05:47, Joe Pfeiffer wrote

>> Coming from there to here has been a bit of a shock to the system

[snip]

> You're not comparing like with like.

Of course not, that's the point it's not a criticism it's an observation.

> What language is a typical JVM implementation written in?

C I should imagine.

> It's probably not Java! Instead it leaves all
> the 'dirty work' to a different language and keeps itself clean.

Ah yes, it does, and that's the beauty of it, I can get on with
meaningful work and not worry about freeing some chunk of memory when
I'm done with it ... far fewer opportunities to put on the hair shirt
and beat myself with sticks. Horses for courses as they say.

> What are Java's primitive data types? I didn't know until five minutes
> ago but they seem to be 8, 16, 32, and 64-bit types; very neat and tidy,
> not even any unsigned integer types to make life difficult!

I think you are implying that making life difficult is the mark of a
'real' programming language. I stopped using Java 'primitives' with the
advent of autoboxing, I couldn't care less how Java is implemented
really, as long as I can do my thing and earn a living then I'm happy
(but see below).

> C has to do a lot more of the dirty work and is expected to run on more
> 'interesting' architectures. That doesn't excuse everything, but it
> might give an idea why programming can be a bit more challenging.

Ant that's why I'm here, because I like a challenge. Please don't take
the comparison between Java and C personally, I'm just explaining where
I'm coming from. A while ago I had to write to the hardware/software
interface of some external device. The language specified was Java.
Don't ask me why, I just do what I'm asked and as long as I get my
cheque at the end of the month I'm good. It was while I was doing this
that I decided that if I wanted to do more of this type of work it would
probably be a good idea to learn C ... and that's why I'm here. I got a
real kick out of controlling that thing from my laptop so I decided on a
change of direction away from serverside programming towards more tricky
device control.

No idea if I'll get there but I will probably die trying :-)

David Thompson

unread,

Oct 11, 2012, 3:09:52 AM10/11/12

to

On Wed, 03 Oct 2012 16:21:37 -0700, Keith Thompson <ks...@mib.org>
wrote:

> fa...@rahul.net (Edward A. Falk) writes:
> [...]
> > Perhaps the C standard has changed since I first read it, but AFAIK,
> > most of the answers so far have been wrong.
> >
> > If you write this in your code:
> >
> > int foo;
> >
> > You've *declared* foo -- that is, described what it is -- but you haven't
> > *defined* it. There's a difference. In this case, no actual space
> > for foo has been allocated yet, and it's known as a "common" symbol.
> > If no module ever actually defines it, the linker will allocate space
> > for it at the end.
> >
> > If you write
> >
> > int bar = 1;
>
> Did you mean to change the name from "foo" to "bar"?
>
> > Now you've defined it. Space and an initial value for it will be
> > included in your module. If more than one module does this, there will
> > be a conflict.
>
> N1370 6.9.2p2: <snip>
> So
>
> int foo;
>
> *can* be a definition, or at least can act as one.
>
It *must* produce a definition in that t.u., as far as C is concerned,
and if in multiple t.u.s linked together formally produces UB. But
many object-formats and linkers treat "default" initialization to zero
differently -- commonly(!) as "bss" or "udata" or somesuch -- than
initialization to nonzero value. Whether explicit = 0 is treated like
explicit (thus "idata" etc) or zero (not) may vary.

*WG14* n1370 is not a version of the standard; is that an SC (or
higher) document maybe? I generally use n869 (preadoption) or n1256
(postcorrigenda) as the best and easily available versions of C99. But
this provision hasn't changed since C90 at least; I don't have K&R1 to
check, and the original (Labs) implementation did use a Fortran-common
linker. (I don't think Labs Unix actually supported Fortran, but this
style was then mainstream.) And to nitpick, the "static" is in
typewriter font and the first "tentative definition" is in italic (not
bold) marking it as the defining occurrence of the term.

> > So add "int bar = 1;" to your foo.h and then compile fooget into a binary.
>
> But then two or more translation units within your program can see that
> definition, and you can get a "multiple definition" error.
>
To be exact multiple t.u.s can #include foo.h, and the declaration in
it, producing multiple definitions, which may or may not be diagnosed.

> Instead, add
>
> extern int foo;
>
> to "foo.h", and
>
> int foo = 1;
>
> to "foo.c". Then any translation unit that includes "foo.h" can use the
> object "foo", which is defined in exactly one place in your program.

Yes.

> (You could drop the "extern" in foo.h, making it a tentative definition,
> but adding "extern" is more explicit.)
>
No. Then you're back to multiple definitions = undefined.
Maybe you meant 'extern' is optional on the definition in in foo.c ?

> (I'm snipping some context in which you make some of these same points.)
>
> [snip]

Keith Thompson

unread,

Oct 11, 2012, 4:30:46 PM10/11/12

to

David Thompson <dave.th...@verizon.net> writes:
[...]

> *WG14* n1370 is not a version of the standard; is that an SC (or
> higher) document maybe? I generally use n869 (preadoption) or n1256
> (postcorrigenda) as the best and easily available versions of C99. But
> this provision hasn't changed since C90 at least; I don't have K&R1 to
> check, and the original (Labs) implementation did use a Fortran-common
> linker. (I don't think Labs Unix actually supported Fortran, but this
> style was then mainstream.) And to nitpick, the "static" is in
> typewriter font and the first "tentative definition" is in italic (not
> bold) marking it as the defining occurrence of the term.

Sorry, that was a typo; I meant n1570.

n869 is a pre-C89 draft. n1256 is a post-C99 draft which
incorporates the three Technical Corrigenda; it's actually better for
most purposes than the original (non-free) C99 standard. n1570 is
a pre-C11 draft; I'm not aware of any differences between it and
the offical C11 standard (other than page headers and so forth).
There's a small Technical Corrigendum to C11 to correct the
definition of __STDC_VERSION__ and one other macro.

[...]

Tim Rentsch

unread,

Dec 17, 2012, 2:06:37 PM12/17/12

to

Not wrong, but an assignment like p = &bar[2]; allows access to
all elements of bar.

> There is also int (*p2)[6] = &foo[n]; which points to one row of the
> array, and

Similarly, the pointer variable p2 allows access to all of foo.

> int *p3 = &foo[n][m]; which points to an element of foo;

Again, not wrong, but p3 allows access to all of foo[n].

> Note that for p1 set equal to &foo p1++, p1--, p1+1, or p1[1] are all
> invalid operations, as p1 doesn't point into an array, but a single

> object (that happens to be an array). [snip remainder]

And so is treated as an array of length one for pointer arithmetic,
which does leave p1-- and p1[1] as undefined behavior, but p1+1
or p1++ are both well-defined.

Shao Miller

unread,

Dec 17, 2012, 5:37:26 PM12/17/12

to

On 12/17/2012 14:06, Tim Rentsch wrote:

> Richard Damon <news.x.ri...@xoxy.net> writes:
>>
>> If we have an array: int foo[5][6];
>> there are 3 types that might be used with this.
>>
>> int (*p1)[5][6] = &foo;
>> which is a pointer to the full array. Such a pointer could also be used
>> to point into int bar[4][5][6];
>
> Not wrong, but an assignment like p = &bar[2]; allows access to
> all elements of bar.
>

(Presumably that was 'p1' in that assignment.) Let's see... It should
be the same as:

p1 = &(*(bar + 2));

and so also:

p1 = bar + 2;

So because the pointer originating from the 'bar' expression has pointee
range 'bar[0]' through 'bar[sizeof bar / sizeof *bar]', then all
elements are accessible, with the last "pointee" actually being the "one
past" position.

The pointer arithmetic doesn't change the bounds, but additional levels
of indirection could.

- Shao Miller

Edward A. Falk

unread,

Dec 18, 2012, 2:14:03 PM12/18/12

to

In article <slrn3vfsk6t...@iceland.freeshell.org>,
Ike Naar <i...@iceland.freeshell.org> wrote:
>
>Picking a nit:
>Some header files don't have to be made idempotent, because they
>already are. The foo.h we are talking about is such a file:

True, but including header guards (aka "include guard", apparently) is
best practices. foo.h might not need them *now*, but what if someone
edits it in the future and doesn't notice that there are no header guards.

http://en.wikipedia.org/wiki/Include_guard

Always take a belt-and-suspenders approach. Header guards cost nothing,
and they can save a lot of debugging pain in the future.

Remember this, especially for libraries: Just because *your* build
didn't break, doesn't mean someone else's won't. Maybe foo.h winds up in
/usr/include someday, and some poor sod of a developer who doesn't have
root on their own workstation discovers they can't build their project
any more because some admin installed the latest version of the foo-dev
package on their system. That's not a far-fetched hypothetical
either; I've had to deal with it many times in my career.

--
-Ed Falk, fa...@despams.r.us.com
http://thespamdiaries.blogspot.com/

Edward A. Falk

unread,

Dec 18, 2012, 2:40:57 PM12/18/12

to

In article <mqudnRQNa7MK3_DN...@bt.com>,
lipska the kat <lipska...@yahoo.co.uk> wrote:
>On 04/10/12 05:34, James Kuyper wrote:
>>
>> That's a bad assumption. One of the most common ways in which code with
>> undefined behavior actually behaves is to produce exactly the same
>> result that you incorrectly assume that it's required to produce. That's
>> because your assumptions happen to match decisions made by the
>> implementors of the version of C that you're testing with. Other
>> implementors of C are free to make different decisions, ones that are
>> incompatible with your incorrect assumptions.

Very well said.
>
>Er ... wow, OK, that is a bit of a head****

Get used to it.

From my quotes file:

What undefined means is:

undefined.
Do not rely on the results.
You have gone outside the domain of the function
You have broken the programming model of the C language.
Here there be dragons!
The implementors can do anything they want.
Be careful.
Use a different algorithm.
This is non-portable.
Don't do it.

They have a saying in computer science: "The source code is
the documentation." Don't believe it. The documentation is
the documentation and the source code is someone's best attempt
at meeting the contract specified in the documentation. Over
time, the source code will change, hopefully to better match
the documentation. If you find a case where the source code does
*not* do what the documentation says, file a bug and make a note
to yourself to never use that feature again because now you can't
trust it.

I've been doing this for a very long time. I still read the man
page for functions I use every day, just to confirm that I'm
calling them right and using the return value right.

Stick to the defined behavior in the spec, and you'll write code that
lasts. I run major programs on a daily basis that I wrote a decade ago
for a different operating system on a different architecture, with a
different byte order than I'm using today, and they still work fine.

>Do you mean to say that even if I test my program to destruction and as
>far as I can tell it's 'correct', that is it complies with requirements
>and behaves as expected it could still be incorrect when compiled with
>a different compiler ???

>Surely there is some 'base' implementation of C that is used to test
>compilers or is it a free for all ... to me this implies that there can
>be more than one 'correct' implementation of the C language, or several
>or many Cs in fact. Please remember I am a raw beginner at C although I
>find this whole discussion fascinating.

Stick to standard ansi C and you'll be fine. Avoid any features
from C89, C90, Cetc. unless you *really* need them. And trust
me, you don't.

>Given a program written in C, how does one determine that it is
>'correct' if complying with requirements and returning the same output
>from the same input is not enough.

Heh. Solve that one, and there's a PhD in it for you, at the very
least. Probably a parade too.

Just stick to standard C and read the man page for every function
you call, and you'll be fine.

Later on, you'll learn about tools such as code coverage
tools and memory monitors that will help, but that's relatively
advanced stuff.

Keith Thompson

unread,

Dec 18, 2012, 2:53:30 PM12/18/12

to

fa...@rahul.net (Edward A. Falk) writes:
[...]

> Stick to standard ansi C and you'll be fine. Avoid any features
> from C89, C90, Cetc. unless you *really* need them. And trust
> me, you don't.

[...]

What do you mean by "standard ansi C"? C89 was the original ANSI
C standard; C90 is the ISO standard that defines exactly the same
language.

Strictly speaking, ANSI has adopted the 2011 ISO C standard, which
supersedes and replaces C99, which supersedes and replaces C89/C90,
so "ANSI C" is now C2011. But that's not what most people mean by
"ANSI C" -- which is why I avoid the term in favor of specifying
the year of the standard I'm referring to.

(Microsoft's C compiler is the only major hosted implementation
I'm aware of that doesn't have reasonably decent support for at
least C99.)

Edward A. Falk

unread,

Dec 18, 2012, 2:54:50 PM12/18/12

to

In article <kcqdnSQhI_AsRe3N...@posted.internetamerica>,
Gordon Burditt <gordon...@burditt.org> wrote:
>
>It is very easy to write a program in C that deliberately crashes
>(here this means: calls abort()) under conditions which you
>might not test, for example:
> - Crashes only on Sunday.
...
> - Crashes only on Feb. 29.

For the record, every Sun computer crashed on Feb 29 1988 due
to a bug in the clock kernel driver.

There is a documented "phase of the moon" bug:

http://www.catb.org/jargon/html/P/phase-of-the-moon.html

>
>And these situations you *SHOULD* test:
> - Crashes only on an input line of more than 10,000 characters.

Test this thoroughly. Bad actors WILL deliberately try to choke
your inputs like this. I caused a CERT advisory this way. (In
my defense, the guilty code was something I'd copied from another
related project.) I've spent many an hour auditing the source code
of standard system utilities and finding dozens of bugs, in an
attempt to close loopholes that had allowed crackers into our
site.

Noob

unread,

Dec 18, 2012, 3:02:00 PM12/18/12

to

Edward A. Falk wrote:

> I've been doing this for a very long time. I still read the man
> page for functions I use every day, just to confirm that I'm
> calling them right and using the return value right.

Case in point: memcpy vs memmove.

So many people "get it wrong" that Linus even suggested
memcpy should become an alias for memmove.

Edward A. Falk

unread,

Dec 18, 2012, 3:03:21 PM12/18/12

to

In article <ln4njjr...@nuthaus.mib.org>,

Keith Thompson <ks...@mib.org> wrote:
>fa...@rahul.net (Edward A. Falk) writes:
>[...]
>> Stick to standard ansi C and you'll be fine. Avoid any features
>> from C89, C90, Cetc. unless you *really* need them. And trust
>> me, you don't.
>[...]
>
>What do you mean by "standard ansi C"?

Ahh, you're right. C89 *was* the original "standard C",
wasn't it?

OK, anyway, I haven't used anything more modern than that because I
haven't needed it.

Greg Martin

unread,

Dec 18, 2012, 3:48:11 PM12/18/12

to

In reading a number of books by and about crackers the thing that struck
me most is the patience that a top cracker will exhibit. Software
development can be a tedious affair but if you cut a corner they *will*
find it because the folks that excel at this don't seem bothered by
tedium. Once they find it they will tell others since that is part of
the culture. When I check my server logs they are full of probes for old
"exploits" which are easily defeated by keeping your software up to date
but you don't see the guys who discover the exploits because they are
careful, though they might be testing your software over a period of a
year before they get tired of the game ... or find the hole they're seeking.

Edward A. Falk

unread,

Dec 19, 2012, 2:01:23 PM12/19/12

to

In article <gE4As.34347$LS5....@newsfe10.iad>,

Greg Martin <gr...@softsprocket.com> wrote:
>On 12-12-18 11:54 AM, Edward A. Falk wrote:
>>

>> [Why it's important to validate your inputs, especially looking
>> for buffer overflows]

>>
>
>In reading a number of books by and about crackers the thing that struck
>me most is the patience that a top cracker will exhibit. Software
>development can be a tedious affair but if you cut a corner they *will*
>find it because the folks that excel at this don't seem bothered by
>tedium. Once they find it they will tell others since that is part of
>the culture. When I check my server logs they are full of probes for old
>"exploits" which are easily defeated by keeping your software up to date
>but you don't see the guys who discover the exploits because they are
>careful, though they might be testing your software over a period of a
>year before they get tired of the game ... or find the hole they're seeking.

True story: The particular server I was auditing was vulnerable via
the WUFTP server (Motto: "Providing root access since 1994"). The
script kiddie who broke in left muddy footprints all over the system,
which were remarkably easy to follow. After he got in and had root
access, he spent an hour or so typing DOS commands before he gave up
and went away.

Lessons learned: keep your software up to date. Don't enable any more
services than you absolutely need.

Here's a trick I use sometimes: If I'm writing an app which I suspect
may be executed with root priveleges, but which doesn't really need them,
I deliberately drop root priveleges in the very first line of code
in my program, just to make sure some bug doesn't get exploited further
down the way.

And writing your own code securely isn't good enough. Years ago, it
was discovered that the X window system base library (libX) was copying
$DISPLAY to another location without checking its length. This meant
that ANY gui app that ran as root could be exploited by putting bad
code into $DISPLAY.