I have the following program
distributed over 4 files
/* foo.h */
int foo;
void fooset(int f);
int fooget(void);
void fooinc(void);
/* main.c */
#include <stdio.h>
#include <foo.h>
int main(int argc, char **argv){
fooset(10);
printf("foo is %d\n", fooget());
fooinc();
printf("foo is now %d\n", fooget());
return 0;
}
/* fooset.c */
#include <foo.h>
void fooset(int f){
extern int foo;
foo = f;
}
/* fooget.c */
#include <foo.h>
int fooget(void){
extern int foo;
return foo;
}
I run make on my makefile (I'm a beginner at make, Ant is more my thing) and see a humungous great glob of bytes called foo.h.gch, looks like foo.h has been compiled ... but I've no idea
why it's so huge.
-rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch
Anyway, the question is who 'owns' the foo declared in foo.h
Storage is obviously set aside as when I run the program I get the expected output
foo is 10
foo is now 11
I guess this big old lump of bytes has something to do with it.
This is wrong, assuming all three modules are linked into
one program. Each module provides its own definition of the
variable `foo', and those three definitions collide. There
must be one and only one `foo' in the program, not three.
The way to accomplish this is to remove the *definition*
of `foo' from foo.h and replace it with a *declaration*. The
difference is not so hard to understand: It is the difference
between "I am Lipska" and "I know someone named Lipska." The
way you spell "I know someone named" in C is
/* foo.h */
extern int foo; // "I know an int named foo"
[...]
Each of the three modules thus gets an introduction to `foo'.
In exactly one of these modules (it doesn't matter which; just
pick one that makes the most sense) you also put a definition:
/* wherever.c */
#include <foo.h> // "I know an int named foo"
int foo; // "Yes, and here I am!"
[...]
(The defining module doesn't actually *need* the declaration,
since defining the variable also declares it. But it's a good
idea to use the #include anyhow, because if the compiler sees
both the declaration and the definition together it can alert
you if they disagree -- like if you change one to a `long' and
forget to change the other.)
Incidentally, your use of #include <foo.h> is suspect. The
<> form is for system-provided headers like <stdio.h>, while
programmer-provided header files should use #include "foo.h"
instead. Compilers search for <> and "" inclusions in different
places, and even if the mixup is sometimes harmless it is also
sometimes not so harmless.
> I run make on my makefile (I'm a beginner at make, Ant is more my thing)
> and see a humungous great glob of bytes called foo.h.gch, looks like
> foo.h has been compiled ... but I've no idea
> why it's so huge.
> -rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch
This looks like a "precompiled header," generated as a time-
saving step by the (wait for it...) C++ compiler you're using.
(You may have thought you were writing C, but the available
evidence suggests you've set up your build environment to use
C++ instead. Might want to check your setup ...)
> Anyway, the question is who 'owns' the foo declared in foo.h
If you have exactly one definition (as C requires), you might
say that `foo' is "owned" by the module where that definition
appears. (Or you might not; once the modules are linked together,
all global variables are on an equal footing and might as well be
said to be "owned" by the entire program.)
If you have three colliding definitions -- well, there's no
useful way to answer questions about undefined behavior.
> Storage is obviously set aside as when I run the program I get the
> expected output
> foo is 10
> foo is now 11
As I hope you're beginning to learn, "It worked" does not
imply "It's right." The possible manifestations of undefined
behavior include "It did (or seemed to do) what I expected."
> I guess this big old lump of bytes has something to do with it.
In article <k4i2a7$uh...@dont-email.me>,
Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>On 10/3/2012 2:13 PM, lipska the kat wrote:
>> I run make on my makefile (I'm a beginner at make, Ant is more my thing)
>> and see a humungous great glob of bytes called foo.h.gch, looks like
>> foo.h has been compiled ... but I've no idea
>> why it's so huge.
>> -rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch
> This looks like a "precompiled header," generated as a time-
>saving step by the (wait for it...) C++ compiler you're using.
>(You may have thought you were writing C, but the available
>evidence suggests you've set up your build environment to use
>C++ instead. Might want to check your setup ...)
gcc creates those files in C mode too, when you run gcc -c foo.h
>> Anyway, the question is who 'owns' the foo declared in foo.h
> If you have exactly one definition (as C requires), you might
>say that `foo' is "owned" by the module where that definition
>appears. (Or you might not; once the modules are linked together,
>all global variables are on an equal footing and might as well be
>said to be "owned" by the entire program.)
> If you have three colliding definitions -- well, there's no
>useful way to answer questions about undefined behavior.
Oh please. It's not unuseful to explain what actually happened. gcc made a
"common" symbol in each object file, and the linker merged them. This
behavior may not be standardized but it's not hard to explain, and after
you've explained it you can add that there are ways to change it:
compile with -fno-common and the common symbol will be changed to a normal
symbol, and the linker will fail when it sees multiple normal symbols with
the same name. This way your program won't link until you've fixed it to obey
the "one owner" rule.
Or, assuming the GNU linker is being used, link with -Wl,-no-common which
will do the merging of common symbols but also warn you about them, allowing
you to use the program while providing a reminder that you still have some
work to do to make it portable.
On 2012-10-03, lipska the kat <lipskathe...@yahoo.co.uk> wrote:
> Hi
> I have the following program
> distributed over 4 files
> /* foo.h */
> int foo;
This is an external *definition* and not merely a declaration, so you have
screwed up. As soon as this file is included in more than one translation unit,
foo becomes multiply defined. You want:
extern int foo;
This is purely a declaration now because of two features: the presence of the
extern, and the lack of an initializer. Now you need a definition of foo
somewhere.
Pick a source file where foo is going to reside, and and put an "int foo;"
there. Also, a good idea is to ensure that this source file includes
the header. If you ever change the type of foo, you get type checking
between the declaration in the header and the definition in that file.
> void fooset(int f);
Note that no "extern" is needed on function declaration, because declarations
without bodies are only declarations and not definitions. You can use extern
if you want:
extern void fooset(int f);
Putting extern on function declarations could help you remember to add it to
object declarations.
It's a poor idea to use angle brackets on your own header file.
And in fact, in many environments it will not work. You must have passed some
option to your compiler to make it work.
Header files are searched for in two places. Double quotes like #include
"foo.h" specify that one place is searched, and if the header is not found
there, then the other place is searched. Angle brackets indicate that
only the second place is searched.
In many common environments, the first place that is searched is the
same directory which contains the source file which is invoking the #include.
And the second place (used by the angle brackets include, or as
fallback for single quotes) is a bunch of system and compiler header file
directories, outside of the program.
So #include <foo.h> won't work unless you tell the compiler that your
project directory is one of the system directories. That's a bad idea
because then if someone makes a header which clashes with the name of
some system header, that header may get mistakenly included.
> Anyway, the question is who 'owns' the foo declared in foo.h
> Storage is obviously set aside as when I run the program I get the > expected output
Why it works is that your environment implements the "Relaxed Ref/Def model"
of C linkage. That is to say, it allows multiple external definitions
for an object. These external definitions are merged into a single definition
during linkage.
You are being allowed to get away with a programming error which, in a "Strict
Ref/Def:" linkage model will not allow your program to link.
It is sharp of you to raise a question mark about this, and wonder "who owns foo". Good for you!
For more information about "Relaxed Ref/Def" and "Strict Ref/Def", find a document called "Rationale for the ANSI C Programming Language".
This is covered in section 3.1.2.2 of that document.
}
> I run make on my makefile (I'm a beginner at make, Ant is more my thing) > and see a humungous great glob of bytes called foo.h.gch, looks like > foo.h has been compiled ... but I've no idea
> why it's so huge.
> -rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch
> Anyway, the question is who 'owns' the foo declared in foo.h
> Storage is obviously set aside as when I run the program I get the > expected output
> foo is 10
> foo is now 11
> I guess this big old lump of bytes has something to do with it.
Actually, the behaviour of your program is undefined, because
there is more than one external definition for int foo.
All translation units main.c, fooget.c, fooset.c and fooinc.c
provide an external definition of int foo. This violates 6.9 p5:
An external definition is an external declaration that is also a
definition of a function (other than an inline definition) or an
object.
If an identifier declared with external linkage is used in an
expression (other than as part of the operand of a sizeof or _Alignof
operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one.
Your linker will probably "fix" this, and merge all external
definitions into one, but if you want your program to be portable
you should not rely on this linker behaviour.
It would be cleaner to change the declaration 'int foo;'
in foo.h by 'extern int foo;', and add a new
translation unit, say foo.c, to contain the (single)
definition of int foo:
/* foo.c */
#include <foo.h>
int foo;
and link the object generated for foo.c to your program
along with the objects generated for main.c, fooget.c,
fooset.c and fooinc.c .
As for the foo.h.gch file, it's a precompiled header,
the result of compiling foo.h with gcc.
If you don't want this, don't compile foo.h,
use it only for inclusion by other *.c files.
> In article <k4i2a7$uh...@dont-email.me>,
> Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>>[...]
>> If you have three colliding definitions -- well, there's no
>> useful way to answer questions about undefined behavior.
> Oh please. It's not unuseful to explain what actually happened. gcc made a
> "common" symbol in each object file, and the linker merged them. This
> behavior may not be standardized but it's not hard to explain, and after
> you've explained it you can add that there are ways to change it:
>[...]
> On 10/3/2012 2:13 PM, lipska the kat wrote:
>> Hi
>> I have the following program
>> distributed over 4 files
[snip]
> This looks like a "precompiled header," generated as a time-
> saving step by the (wait for it...) C++ compiler you're using.
> (You may have thought you were writing C, but the available
> evidence suggests you've set up your build environment to use
> C++ instead. Might want to check your setup ...)
Yes ... of course, this implies that I know how to change it :-)
I've done what I can with the hideously complex ... er I mean
feature rich gcc software to ensure that I am compiling as c
man gcc says
file.c C source code which must be preprocessed.
All my files end in .c
It also says
C Language Options -ansi
I also compile with the -ansi flag, e.g
gcc -ansi -c -g -I/<path> fooset.c foo.h
Not sure what else I need to do to make certain I'm compiling as C code.
I'll have a google.
I'll have to digest the rest of your response and respond accordingly
all I'm really trying to do is understand extern variables.
> As I hope you're beginning to learn, "It worked" does not
> imply "It's right." The possible manifestations of undefined
> behavior include "It did (or seemed to do) what I expected."
Well yes, but if I run a program 10 times with the same data and get the same results each time I might start to think that something is 'right'. If I design my test cases in the usual way (boundary cases and random 'middle ground' cases at the very least) then run those tests with the same data and get the same output each time then I get a feeling that I may be on the right path.
Is testing C code fundamentally different to testing code in other languages ?
Many thanks for taking the time to reply.
It's much appreciated.
> On 03/10/12 20:05, Eric Sosman wrote:
>> On 10/3/2012 2:13 PM, lipska the kat wrote:
>>> Hi
>>> I have the following program
>>> distributed over 4 files
> [snip]
>> This looks like a "precompiled header," generated as a time-
>> saving step by the (wait for it...) C++ compiler you're using.
>> (You may have thought you were writing C, but the available
>> evidence suggests you've set up your build environment to use
>> C++ instead. Might want to check your setup ...)
> Yes ... of course, this implies that I know how to change it :-)
> I've done what I can with the hideously complex ... er I mean
> feature rich gcc software to ensure that I am compiling as c
From what other posters have written, it appears I guessed
incorrectly about the C/C++ distinction. You may in fact be
compiling your code as C -- but for some reason you're "compiling"
the header file itself. That's probably not what you want to do.
> > As I hope you're beginning to learn, "It worked" does not
> > imply "It's right." The possible manifestations of undefined
> > behavior include "It did (or seemed to do) what I expected."
> Well yes, but if I run a program 10 times with the same data and get the
> same results each time I might start to think that something is 'right'.
If you run it ten times with the same data, you're probably hitting
the same fragile set of coincidences each time. Running with different
data could be more illuminating -- although, as Dijkstra said, testing
cannot demonstrate absence of errors, but only their presence.
> If I design my test cases in the usual way (boundary cases and random
> 'middle ground' cases at the very least) then run those tests with the
> same data and get the same output each time then I get a feeling that I
> may be on the right path.
> Is testing C code fundamentally different to testing code in other
> languages ?
No, not fundamentally. It seemed to me you'd distributed
the middle of
"Correct programs work."
"This program is correct."
"Therefore, this program works."
to obtain
"Correct programs work."
"This program works."
"Therefore, this program is correct."
"BZZZZT! Thank you for playing."
... and you would certainly not have been the first to do so.
Perhaps the C standard has changed since I first read it, but AFAIK,
most of the answers so far have been wrong.
If you write this in your code:
int foo;
You've *declared* foo -- that is, described what it is -- but you haven't
*defined* it. There's a difference. In this case, no actual space
for foo has been allocated yet, and it's known as a "common" symbol.
If no module ever actually defines it, the linker will allocate space
for it at the end.
If you write
int bar = 1;
Now you've defined it. Space and an initial value for it will be
included in your module. If more than one module does this, there will
be a conflict.
So add "int bar = 1;" to your foo.h and then compile fooget into a binary.
In Unix, you can give the command "nm fooget.o" to see all the symbols
associated with fooget:
0000000000000000 D bar
0000000000000004 C foo
0000000000000000 T fooget
So you see that bar has been defined, while foo is merely a common
symbol. (And fooget is text -- executable code).
When you compile all of your .c files and link them together, the
linker notes that there is a common named "foo" which is never actually
defined by any module, and so it allocates space for it. Bar, on the
other hand, has been defined more than once, so you have an error:
cc -o foo main.o fooget.o fooset.o
fooget.o:(.data+0x0): multiple definition of `bar'
main.o:(.data+0x0): first defined here
fooset.o:(.data+0x0): multiple definition of `bar'
main.o:(.data+0x0): first defined here
collect2: ld returned 1 exit status
As you can see, it never complained about foo.
If you wanted, you could have defined foo in one place, say in main.c:
#include "foo.h"
int foo = 1;
...
Now, when we do "nm main.o", we get:
0000000000000004 D bar
0000000000000000 D foo
U fooget
U fooinc
U fooset
0000000000000000 T main
U printf
You see that foo and bar are both defined here.
Now, a couple of semi-off-topic points:
You should do
#include "foo.h"
instead of
#include <foo.h>
The <> form is for system header files. My compiler (gcc on Linux) barfed
on the includes until I fixed them. I don't know how you got it to compile
on yours unless you used '-I." or something.
As other posters have mentioned, you really should use "extern" in your
declarations.
-- -Ed Falk, f...@despams.r.us.com
http://thespamdiaries.blogspot.com/
> Perhaps the C standard has changed since I first read it, but AFAIK,
> most of the answers so far have been wrong.
> If you write this in your code:
> int foo;
> You've *declared* foo -- that is, described what it is -- but you haven't
> *defined* it. There's a difference. In this case, no actual space
> for foo has been allocated yet, and it's known as a "common" symbol.
> If no module ever actually defines it, the linker will allocate space
> for it at the end.
> If you write
> int bar = 1;
Did you mean to change the name from "foo" to "bar"?
> Now you've defined it. Space and an initial value for it will be
> included in your module. If more than one module does this, there will
> be a conflict.
N1370 6.9.2p2:
A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or
with the storage-class specifier static, constitutes a *tentative
definition*. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains
no external definition for that identifier, then the behavior
is exactly as if the translation unit contains a file scope
declaration of that identifier, with the composite type as of
the end of the translation unit, with an initializer equal to 0.
So
int foo;
*can* be a definition, or at least can act as one.
> So add "int bar = 1;" to your foo.h and then compile fooget into a binary.
But then two or more translation units within your program can see that
definition, and you can get a "multiple definition" error.
Instead, add
extern int foo;
to "foo.h", and
int foo = 1;
to "foo.c". Then any translation unit that includes "foo.h" can use the
object "foo", which is defined in exactly one place in your program.
(You could drop the "extern" in foo.h, making it a tentative definition,
but adding "extern" is more explicit.)
(I'm snipping some context in which you make some of these same points.)
[snip]
-- Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
> Perhaps the C standard has changed since I first read it, but AFAIK,
> most of the answers so far have been wrong.
> If you write this in your code:
> int foo;
> You've *declared* foo -- that is, described what it is -- but you haven't
> *defined* it.
The above is a tentative definition. It is a definition, but there
is still a chance to override it with a value other than zero
given by a firm (term mine) definition.
When the end of a translation unit is reached, all definitions which
are still tentative become definitions with value zero.
int foo; /* tentative def */
int bar; /* tentative def */
int foo = 3; /* no longer tentative */
/* end of translation unit */
int bar = 0; /* <- not written in the source: but effective behavior */
> for foo has been allocated yet, and it's known as a "common" symbol.
Common symbols originally come from the "Common" linkage model. C supports that
model, but that model allows latitudes that are not permitted of strictly
conforming programs.
And anyway, ironically, under the Common model, every external declaration is
also a definition (whether or not the keyword extern appears in the
declaration).
> So add "int bar = 1;" to your foo.h and then compile fooget into a binary.
> In Unix, you can give the command "nm fooget.o" to see all the symbols
> associated with fooget:
> 0000000000000000 D bar
> 0000000000000004 C foo
> 0000000000000000 T fooget
The "common" designation is here only being used to distinguish those
definitions which have intializers from those which do not, so that the ones
which do not can be put into a different section when the executable is linked
(the "BSS" section).
> On 03/10/12 20:05, Eric Sosman wrote:
>> On 10/3/2012 2:13 PM, lipska the kat wrote:
...
> > As I hope you're beginning to learn, "It worked" does not
> > imply "It's right." The possible manifestations of undefined
> > behavior include "It did (or seemed to do) what I expected."
> Well yes, but if I run a program 10 times with the same data and get the > same results each time I might start to think that something is 'right'.
That's a bad assumption. One of the most common ways in which code with
undefined behavior actually behaves is to produce exactly the same
result that you incorrectly assume that it's required to produce. That's
because your assumptions happen to match decisions made by the
implementors of the version of C that you're testing with. Other
implementors of C are free to make different decisions, ones that are
incompatible with your incorrect assumptions.
> If I design my test cases in the usual way (boundary cases and random > 'middle ground' cases at the very least) then run those tests with the > same data and get the same output each time then I get a feeling that I > may be on the right path.
> Is testing C code fundamentally different to testing code in other > languages ?
No, the inappropriateness of concluding that a program is correct, just
because it appears to work, is common to all computer languages.
-- James Kuyper
On 2012-10-03, lipska the kat <lipskathe...@yahoo.co.uk> wrote:
> Well yes, but if I run a program 10 times with the same data and get the > same results each time I might start to think that something is 'right'.
If the program is essentially deterministic (no real-time inputs, no threads)
then running the same test case (same data, same program, same platform) ten times is quite silly. It is one test case.
(There may be some differences between the runs, like the OS randomizing the
stack locations or some such thing.)
But it is better to have ten different test cases in a suite and run through
those.
If you want to prove something with ten runs of one test case, perform the ten
runs on ten different platforms and show that the results are the same.
> On 10/3/2012 3:59 PM, lipska the kat wrote:
>> On 03/10/12 20:05, Eric Sosman wrote:
>>> On 10/3/2012 2:13 PM, lipska the kat wrote:
>>>> Hi
>>>> I have the following program
>>>> distributed over 4 files
[snip]
> From what other posters have written, it appears I guessed
> incorrectly about the C/C++ distinction. You may in fact be
> compiling your code as C -- but for some reason you're "compiling"
> the header file itself. That's probably not what you want to do.
I'll do some experiments.
>> > As I hope you're beginning to learn, "It worked" does not
>> > imply "It's right." The possible manifestations of undefined
>> > behavior include "It did (or seemed to do) what I expected."
>> Well yes, but if I run a program 10 times with the same data and get the
>> same results each time I might start to think that something is 'right'.
> If you run it ten times with the same data, you're probably hitting
> the same fragile set of coincidences each time. Running with different
> data could be more illuminating -- although, as Dijkstra said, testing
> cannot demonstrate absence of errors, but only their presence.
Of course, the point I was trying to make is that if my program is behaving in an 'undefined' way then I might expect 10 runs with identical data to provide different results. I'm in no way sufficiently knowledgeable about C to assume otherwise. I suppose it depends on what you mean by undefined.
If I have a program that reverses it's input a line at a time (ex 1-19 K and R second edition for example) and I try it with as many different inputs as my feeble brain can devise and the results are what I expect then what can I assume from this. In other languages I have used (10s of KLOC running daily without error) I would assume that the program was 'correct'.
> On 10/03/2012 03:59 PM, lipska the kat wrote:
>> On 03/10/12 20:05, Eric Sosman wrote:
>>> On 10/3/2012 2:13 PM, lipska the kat wrote:
> ...
>> > As I hope you're beginning to learn, "It worked" does not
>> > imply "It's right." The possible manifestations of undefined
>> > behavior include "It did (or seemed to do) what I expected."
>> Well yes, but if I run a program 10 times with the same data and get the
>> same results each time I might start to think that something is 'right'.
> That's a bad assumption. One of the most common ways in which code with
> undefined behavior actually behaves is to produce exactly the same
> result that you incorrectly assume that it's required to produce. That's
> because your assumptions happen to match decisions made by the
> implementors of the version of C that you're testing with. Other
> implementors of C are free to make different decisions, ones that are
> incompatible with your incorrect assumptions.
Er ... wow, OK, that is a bit of a head****
Do you mean to say that even if I test my program to destruction and as far as I can tell it's 'correct', that is it complies with requirements and behaves as expected it could still be incorrect when compiled with
a different compiler ???
Surely there is some 'base' implementation of C that is used to test compilers or is it a free for all ... to me this implies that there can be more than one 'correct' implementation of the C language, or several or many Cs in fact. Please remember I am a raw beginner at C although I find this whole discussion fascinating.
[snip]
Given a program written in C, how does one determine that it is 'correct' if complying with requirements and returning the same output from the same input is not enough.
> On 04/10/12 05:34, James Kuyper wrote:
>> On 10/03/2012 03:59 PM, lipska the kat wrote:
>>> On 03/10/12 20:05, Eric Sosman wrote:
>>>> On 10/3/2012 2:13 PM, lipska the kat wrote:
>> ...
>>> > As I hope you're beginning to learn, "It worked" does not
>>> > imply "It's right." The possible manifestations of undefined
>>> > behavior include "It did (or seemed to do) what I expected."
>>> Well yes, but if I run a program 10 times with the same data and get the
>>> same results each time I might start to think that something is 'right'.
>> That's a bad assumption. One of the most common ways in which code with
>> undefined behavior actually behaves is to produce exactly the same
>> result that you incorrectly assume that it's required to produce. That's
>> because your assumptions happen to match decisions made by the
>> implementors of the version of C that you're testing with. Other
>> implementors of C are free to make different decisions, ones that are
>> incompatible with your incorrect assumptions.
> Er ... wow, OK, that is a bit of a head****
> Do you mean to say that even if I test my program to destruction and as
> far as I can tell it's 'correct', that is it complies with requirements
> and behaves as expected it could still be incorrect when compiled with
> a different compiler ???
> Surely there is some 'base' implementation of C that is used to test
> compilers or is it a free for all ... to me this implies that there can
> be more than one 'correct' implementation of the C language, or several
> or many Cs in fact. Please remember I am a raw beginner at C although I
> find this whole discussion fascinating.
> [snip]
> Given a program written in C, how does one determine that it is
> 'correct' if complying with requirements and returning the same output
> from the same input is not enough.
> Thanks for taking the time to reply.
> lipska
There is no "reference" implementation of C. But there are the standards documents, which state the rules of C.
One issue you will see here is that there is code that is syntactically correct, and will be compiled by the compiler, but which is "undefined behaviour" according to the standards. This is not unique to C - it applies to some extent to all languages, and may even be unavoidable (I think Gödel's Incompleteness Theorem implies that). But C programmers, and many of this group's inhabitants, tend to be more aware of these issues than many other programmers.
When you write code that depends on such "undefined behaviour", it may work as expected. It may do so consistently, and with different data and different variations of the code. But you might find it fails when using a different compiler, or different version of the compiler, or different compiler flags (enabling optimisation often brings such flaws to light).
The way to deal with this is to learn the rules of C - not just learn what works under testing. And use the tools to the best of their abilities, to aid your work. Make sure you compile with optimisation ("-Os" or "-O2" is typical), and with warnings enabled ("-Wall" and "-Wextra" are good for most code, and there are others that can be worth enabling).
> In article<EK-dnVoFlczbHfHNnZ2dnUVZ8hidn...@bt.com>,
> lipska the kat<lipskathe...@yahoo.co.uk> wrote:
>> Hi
>> I have the following program
>> distributed over 4 files
[snip]
> So add "int bar = 1;" to your foo.h and then compile fooget into a binary.
> In Unix, you can give the command "nm fooget.o" to see all the symbols
> associated with fooget:
> 0000000000000000 D bar
> 0000000000000004 C foo
> 0000000000000000 T fooget
Oh very nice
[lipska@sandbox externs (master)]$ nm main.o
0000000000000004 C foo
U fooget
U fooinc
U fooset
0000000000000000 T main
U printf
Thanks, I didn't know about nm, I'll try man nm then different versions of foobar and see what the results are
> The<> form is for system header files. My compiler (gcc on Linux) barfed
> on the includes until I fixed them. I don't know how you got it to compile
> on yours unless you used '-I." or something.
> As other posters have mentioned, you really should use "extern" in your
> declarations.
Thanks for this, I get the #include thing now.
I'm going to refactor the foo get/set/inc program and see what I can discover
lipska the kat <lipskathe...@yahoo.co.uk> writes:
[...]
> Of course, the point I was trying to make is that if my program is > behaving in an 'undefined' way then I might expect 10 runs with > identical data to provide different results. I'm in no way sufficiently > knowledgeable about C to assume otherwise. I suppose it depends on what > you mean by undefined.
No, that's not what undefined means. The C standard's definition of
*undefined behavior* is:
behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard imposes
no requirements
NOTE Possible undefined behavior ranges from ignoring the
situation completely with unpredictable results, to behaving
during translation or program execution in a documented manner
characteristic of the environment (with or without the issuance
of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
> If I have a program that reverses it's input a line at a time (ex 1-19 K > and R second edition for example) and I try it with as many different > inputs as my feeble brain can devise and the results are what I expect > then what can I assume from this. In other languages I have used (10s of > KLOC running daily without error) I would assume that the program was > 'correct'.
C, as the saying goes, gives you enough rope to shoot yourself in the
foot. I'll show you a concrete example:
#include <stdio.h>
static void write_array(int *arr) {
for (int i = 0; i <= 5; i ++) {
arr[i] = i;
}
}
static void read_array(int *arr) {
for (int i = 0; i <= 5; i ++) {
printf("%d", arr[i]);
putchar(i == 5 ? '\n' : ' ');
}
}
int main(void) {
int x[5] = { 0 };
int y[5] = { 0 };
int z[5] = { 0 };
write_array(y);
read_array(y);
return 0;
}
The array y is defined to have 5 elements, but the program attempts to
store 6 int values in it, and then retrieve and print those 6 values.
Accessing y[5] has undefined behavior, since it's outside the bounds of
the array. But since y is surrounded in memory by two other arrays, x
and z, it's likely that y[5] refers to an element of one of those other
two arrays. (There's no guarantee that x, y, and z are allocated in any
particular order, or even that they're adjacent, but it's likely that
one of them immediately follows y in memory.)
I can compile and run this program 100 times, and it's very likely to
produce the same output every time:
0 1 2 3 4 5
That's just one of the infinitely many things that can happen when the
language standard "imposes no requirements".
(A sufficiently clever optimizing compiler might cause it to produce
different output, or to crash, or even to be rejected at compile time.)
-- Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
> lipska the kat <lipskathe...@yahoo.co.uk> writes:
>> If I have a program that reverses it's input a line at a time (ex 1-19 K >> and R second edition for example) and I try it with as many different >> inputs as my feeble brain can devise and the results are what I expect >> then what can I assume from this. In other languages I have used (10s of >> KLOC running daily without error) I would assume that the program was >> 'correct'.
> C, as the saying goes, gives you enough rope to shoot yourself in the
> foot. I'll show you a concrete example:
> #include <stdio.h>
> static void write_array(int *arr) {
> for (int i = 0; i <= 5; i ++) {
> arr[i] = i;
> }
> }
> static void read_array(int *arr) {
> for (int i = 0; i <= 5; i ++) {
> printf("%d", arr[i]);
> putchar(i == 5 ? '\n' : ' ');
> }
> }
> int main(void) {
> int x[5] = { 0 };
> int y[5] = { 0 };
> int z[5] = { 0 };
> write_array(y);
> read_array(y);
> return 0;
> }
> The array y is defined to have 5 elements, but the program attempts to
> store 6 int values in it, and then retrieve and print those 6 values.
> Accessing y[5] has undefined behavior, since it's outside the bounds of
> the array. But since y is surrounded in memory by two other arrays, x
> and z, it's likely that y[5] refers to an element of one of those other
> two arrays. (There's no guarantee that x, y, and z are allocated in any
> particular order, or even that they're adjacent, but it's likely that
> one of them immediately follows y in memory.)
> I can compile and run this program 100 times, and it's very likely to
> produce the same output every time:
> 0 1 2 3 4 5
> That's just one of the infinitely many things that can happen when the
> language standard "imposes no requirements".
> (A sufficiently clever optimizing compiler might cause it to produce
> different output, or to crash, or even to be rejected at compile time.)
Just out of curiosity, I ran this little test through gcc. Without
optimization, or at optimization level 1, gcc only warns about the
unused variables x and z.
At optimization level 2, gcc warns about a subscript out of bounds
on line 5 (in the write_array function). At optimization level 3 it
also gives this warning about line 11 (in the read_array function).
The program does give the 0 1 2 3 4 5 output every time, though.
-- "C provides a programmer with more than enough rope to hang himself.
C++ provides a firing squad, blindfold and last cigarette."
- seen in comp.lang.c
First of all let me say once again how much I appreciate all your responses. A seed of doubt has been planted in my mind WRT the 'correctness' of any C code that I have written/will write at any time in the future. I think this must be a 'good thing' although the implication is that before I can be as certain as it's possible to be that any program is 'correct' I need to have read, understood and inwardly digested the entire language specification. A rather daunting prospect but I will certainly make a start ... when I can figure out what standard I should be reading (C89, C90, C99 or C11)
According to man gcc on my Ubuntu Linux 12.04 64 bit machine
89,90 are fully supported whereas support for 99 onwards is 'limited'
Compiling with the -ansi switch compiles to the 89/90 spec which reinforces my opinion that 90 is the one for me.
There are also a whole bunch of gnu dialects
but my brain imploded at this point so I went to the pub :-)
Anyway, I think I've figured out the extern thing in terms of my very simple example. I do have one small observation which I will make after the re-factored code, you have been very generous with your time so far so apologies for posting this listing again.
============= Code ===============
/* foo.h */
/* explicit _declaration_ of foo */
extern int foo;
void fooset(int f);
int fooget(void);
void fooinc(void);
> lipska the kat<lipskathe...@yahoo.co.uk> writes:
> [...]
[snip]
> C, as the saying goes, gives you enough rope to shoot yourself in the
> foot. I'll show you a concrete example:
[snip]
gcc example.c
example.c: In function write_array :
example.c:4:9: error: for loop initial declarations are only allowed in C99 mode
example.c:4:9: note: use option -std=c99 or -std=gnu99 to compile your code
example.c: In function read_array :
example.c:10:9: error: for loop initial declarations are only allowed in C99 mode
make: *** [example] Error 1
gcc -ansi example.c
ditto above
gcc -std=c99 -Wall example.c
example.c: In function main :
example.c:19:13: warning: unused variable z [-Wunused-variable]
example.c:17:13: warning: unused variable x [-Wunused-variable]
gcc -std=c99 -O1 -Wall example.c
ditto above
gcc -std=c99 -O2 -Wall example.c
example.c: In function main :
example.c:19:13: warning: unused variable z [-Wunused-variable]
example.c:17:13: warning: unused variable x [-Wunused-variable]
example.c:5:20: warning: array subscript is above array bounds [-Warray-bounds]
gcc -std=c99 -O3 -Wall example.c
example.c: In function main :
example.c:19:13: warning: unused variable z [-Wunused-variable]
example.c:17:13: warning: unused variable x [-Wunused-variable]
example.c:5:20: warning: array subscript is above array bounds [-Warray-bounds]
example.c:11:19: warning: array subscript is above array bounds [-Warray-bounds]
0 1 2 3 4 5 every time
Now I'm really confused
Maybe I should be reading the C99 spec %-}
lipska
-- Lipska the Kat : Troll hunter, sandbox destroyer
and farscape dreamer of Aeryn Sun
> Of course, the point I was trying to make is that if my program is > behaving in an 'undefined' way then I might expect 10 runs with > identical data to provide different results.
That's a very bad expectation, unless your 10 runs were done using
wildly different implementations of C, on 10 wildly different platforms.
> ... I'm in no way sufficiently > knowledgeable about C to assume otherwise. I suppose it depends on what > you mean by undefined.
"undefined behavior" has a very specific meaning in the C standard:
"behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements" (3.4.3p1). A key phrase needs to be noted: "this
International Standard". Behavior which is defined by something else
(such as the POSIX standard, or an ABI standard for a given platform, or
the documentation for a given compiler, or the fundamental laws of
physics) would still be undefined behavior as far as the C standard is
concerned. If there's anything other than the C standard which defines
the behavior (and there usually is), it will be perfectly repeatable for
as long as that other definition applies, and will fail to be repeatable
as soon as you use it in a situation where the other definition no
longer applies.
For example, if the "undefined behavior" is defined by the POSIX
standard, you can expect the results to be perfectly repeatable on every
POSIX-conforming system, but you'll have no guarantees about non-POSIX
systems. If it's defined by Intel for all CPUs in the same family, the
undefined behavior will be perfectly repeatable as long as you execute
only on that family of CPUs, but not necessarily if you port your code
to an AMD system.
Keep in mind that most of the code constructs that have repeatable
undefined behavior will be repeatable for much less portable reasons
than the examples I gave above. It may be "defined" (though not in any
publicly available document) by a particular version of a particular
compiler when used with a particular set of command line options, and
may be defined differently if you change any of those options, or
upgrade your compiler, or change your code in any way, such as
reordering the variable declarations.
> If I have a program that reverses it's input a line at a time (ex 1-19 K > and R second edition for example) and I try it with as many different > inputs as my feeble brain can devise and the results are what I expect > then what can I assume from this.
That it handled those test cases correctly, and might mishandle any case
you didn't think of. It could even mishandle the cases you did test, if
it contains a time-dependent defect (such as mis-handling a leap year).
You can generalize your test results beyond those test cases only in
proportion to how much you know about what the guaranteed behavior of
your code is. I would judge that you know a fair amount about C, but
your question is about a fairly fundamental point, which implies that
there's still a lot of details you don't know yet.
> ... In other languages I have used (10s of > KLOC running daily without error) I would assume that the program was > 'correct'.
10s of KLOC isn't a lot, and your assumptions in those other languages
would be just as unjustified as they are in C. The details of what I've
said are specific to C, but the general principle is not.
-- James Kuyper
> First of all let me say once again how much I appreciate all your
> responses. A seed of doubt has been planted in my mind WRT the
> 'correctness' of any C code that I have written/will write at any time
> in the future. I think this must be a 'good thing' although the
> implication is that before I can be as certain as it's possible to be
> that any program is 'correct' I need to have read, understood and
> inwardly digested the entire language specification. A rather daunting
> prospect but I will certainly make a start ... when I can figure out
> what standard I should be reading (C89, C90, C99 or C11)
> According to man gcc on my Ubuntu Linux 12.04 64 bit machine
> 89,90 are fully supported whereas support for 99 onwards is 'limited'
> Compiling with the -ansi switch compiles to the 89/90 spec which
> reinforces my opinion that 90 is the one for me.
gcc's C99 support is been near-perfect for all practical purposes, and has been for years. There are a few missing features - so technically it is "limited" - but nothing you need to worry about. It's support for the latest C11 is also coming on nicely. There is no point in starting to learn a new language and limiting yourself to the standards from over twenty years ago unless you really have some special requirements for being ansi-C standard. So if gcc is your main development tool, the standard you want is "-std=gnu99", or "-std=gnu11" if you are feeling adventurous.
> There are also a whole bunch of gnu dialects
> but my brain imploded at this point so I went to the pub :-)
That sounds like a logical move to me!
> Anyway, I think I've figured out the extern thing in terms of my very
> simple example. I do have one small observation which I will make after
> the re-factored code, you have been very generous with your time so far
> so apologies for posting this listing again.
There are various ways to structure code and files, but I prefer to pretend that C has real "modules" like other programming languages. For each "module", you have two files - "foo.h" and "foo.c". "foo.h" contains only declarations, plus documentation (very important) and occasional definitions that have to go in headers (like inline functions usable by other modules). "foo.c" contains all the implementation.
// foo.h
#ifndef foo_h__
#define foo_h__
// extern is required here to make a declaration but not a definition
extern int foo;
// extern is not required here, but is good documentation
extern void fooset(int f);
extern int fooget(void);
extern void fooinc(void);
#endif // foo_h__
// foo.c
#include "foo.h"
int foo;
void fooset(int f){
foo = f;
}
int fooget(void){
return foo;
}
void fooinc(void){
++foo;
}
Some observations with this system :
It is best to put all your "foo" implementations in "foo.c". There is seldom any benefit in splitting it up into multiple C files. Either use multiple "modules" (with their own header and C file) if the functions are significantly different or the file is too big to work with, or put them in a single C file. (The exception is for libraries, where many small C files can have some advantages.)
I like to be strict that variables and functions are either private to a module, or exported from the module. If they are private, they only exist in the "foo.c" file, and they are always declared "static". If they are exported, then there is an "extern" declaration in "foo.h" and a matching definition without "extern" (and obviously without "static") in the "foo.c" file. I even enforce this with gcc flags "-Wmissing-declarations -Wmissing-prototypes -Wredundant-decls -Wnested-externs".
Some people do things differently - it's partly a matter of taste. But clearly /my/ way is the best way :-), so I recommend learning it from the start.
> On 04/10/12 05:34, James Kuyper wrote:
...
>> That's a bad assumption. One of the most common ways in which code with
>> undefined behavior actually behaves is to produce exactly the same
>> result that you incorrectly assume that it's required to produce. That's
>> because your assumptions happen to match decisions made by the
>> implementors of the version of C that you're testing with. Other
>> implementors of C are free to make different decisions, ones that are
>> incompatible with your incorrect assumptions.
> Er ... wow, OK, that is a bit of a head****
> Do you mean to say that even if I test my program to destruction and as > far as I can tell it's 'correct', that is it complies with requirements > and behaves as expected it could still be incorrect when compiled with
> a different compiler ???
Certainly. That's not just because of undefined behavior, either.
There's also behavior that is merely unspecified: the standard provides
(explicitly or, more commonly, implicitly) a list of possible behaviors,
and each implementation gets to choose from that list - in some cases,
it can even make a different choice each time a given piece of code is
executed. Some unspecified behavior is "implementation-defined" which
means that an implementation is required to document which choice it has
made, but there's also a lot of cases where there's no such requirement.
> Surely there is some 'base' implementation of C that is used to test > compilers ..
No, there is not. Even if there were, the base implementation would have
to make specific choices in every case where the C standard leaves the
behavior unspecified or undefined, and other fully-conforming
implementations of C would not be required to make the same choices,
which greatly reduces the usefulness of having a base implementation.
That may be one reason why there isn't one.
> ... or is it a free for all ...
It's not a free-for-all - the standard does impose a great many specific
requirements. However, the things that it does not specify are what
gives implementors sufficient freedom to create a conforming
implementation of C on almost every platform. That is the reason why C
is one of the most widely implemented of all computer languages.
> ... to me this implies that there can > be more than one 'correct' implementation of the C language,
Correct - the set of possible fully-conforming implementations of the C
language is infinite. The set of actual fully-conforming implementations
is much smaller, but still large enough that it's not feasible to test
any given program on all of them. It's also sufficiently varied that
testing on only a few dozen of them is insufficient to prove that your
code will work on all of the untested ones.
> ... or several > or many Cs in fact. Please remember I am a raw beginner at C although I > find this whole discussion fascinating.
> [snip]
> Given a program written in C, how does one determine that it is > 'correct' if complying with requirements and returning the same output > from the same input is not enough.
That depends upon the requirements. Well-written requirements should
identify a specific version of the C standard (C2011 just came out, so
there aren't many implementations of it, and full implementations of C99
are still rare - but C90 has been fully implemented just about
everywhere). Those requirements should specify that your code must have
no syntax errors or constraint violations according to that version.
Then you read the standard and learn what constitutes a syntax error or
a constraint violation.
Well-written requirements should also limit the dependence of the code
on unspecified or undefined behavior in some appropriate fashion. Useful
programs seldom completely avoid undefined behavior, and almost never
avoid unspecified behavior, but you can fill in those gaps by, for
instance, requiring POSIX conformance.
-- James Kuyper
> First of all let me say once again how much I appreciate all your > responses. A seed of doubt has been planted in my mind WRT the > 'correctness' of any C code that I have written/will write at any time > in the future. I think this must be a 'good thing' although the > implication is that before I can be as certain as it's possible to be > that any program is 'correct' I need to have read, understood and > inwardly digested the entire language specification. A rather daunting > prospect but I will certainly make a start ... when I can figure out > what standard I should be reading (C89, C90, C99 or C11)
> According to man gcc on my Ubuntu Linux 12.04 64 bit machine
> 89,90 are fully supported whereas support for 99 onwards is 'limited'
> Compiling with the -ansi switch compiles to the 89/90 spec which > reinforces my opinion that 90 is the one for me.
That's a safe choice; I prefer C99 myself, and gcc's support for C99 is
almost complete.
> There are also a whole bunch of gnu dialects
> but my brain imploded at this point so I went to the pub :-)
I recommend learning standard C first. Learn about and use the Gnu
dialects later, if you want to - but avoid using gnuisms unless you're
absolutely certain that your code will never be compiled with anything
other than gcc.
> Anyway, I think I've figured out the extern thing in terms of my very > simple example. I do have one small observation which I will make after > the re-factored code, you have been very generous with your time so far > so apologies for posting this listing again.
> ============= Code ===============
> /* foo.h */
> /* explicit _declaration_ of foo */
> extern int foo;
> void fooset(int f);
> int fooget(void);
> void fooinc(void);
> /* global variable */
> /* implicit definition of foo */
> /* set get and inc can't see foo if this is not here */
> int foo;
> int main(int argc, char **argv){
> /* explicit definition of foo */
> foo = 0;
> fooset(10);
> printf("foo is %d\n", fooget());
> fooinc();
> printf("foo is now %d\n", fooget());
> return 0;
> }
> /* fooset.c */
> #include "foo.h"
> void fooset(int f){
> foo = f;
> }
> /* fooget.c */
> #include "foo.h"
> int fooget(void){
> return foo;
> }
> /* fooinc.c */
> #include "foo.h"
> void fooinc(void){
> ++foo;
> }
> Observation:
> The word 'extern' doesn't seem to be required
> /* foo.h */
> extern int foo;
> OR
> /* foo.h */
> int foo;
> Both work
The fact that the second one works is due to a feature of gcc that a
conforming implementation of C is not required to have: gcc merges all
of the multiple external definitions of 'foo' into a single definition.
As others have already mentioned, this behavior can be turned off by
selecting the -fno-common option.
> At this point it seems to me that the only reason to use
> the word extern is as an aid to program documentation.
No, it makes the behavior of your code well-defined, which means you can
count on it working even if you use some other fully-conforming
implementation of C to compile it.
-- James Kuyper