Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Requesting advice how to clean up C code for validating string represents integer

26 views
Skip to first unread message

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 11, 2007, 1:57:30 AM2/11/07
to
I'm working on examples of programming in several languages, all
(except PHP) running under CGI so that I can show both the source
files and the actually running of the examples online. The first
set of examples, after decoding the HTML FORM contents, merely
verifies the text within a field to make sure it is a valid
representation of an integer, without any junk thrown in, i.e. it
must satisfy the regular expression: ^ *[-+]?[0-9]+ *$

If the contents of the field are wrong I want to diagnose as much
as reasonable what's wrong, not just say "syntax error".

Because perl and PHP include support for regular expressions, it
was obvious how to do it, and easy to accomplish:
http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intperl
http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intphp

Because Common Lisp has good utilities for scanning strings, mostly
using position, position-if, and position-if-not, it was equally
easy, and equally obvious, how to do it:
http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intlisp

The Java API is missing some of the functions available in Common
Lisp, so I had to augment the API, but then it was as easy as in
Common Lisp, with nearly the same algorithm:
http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intjava

Now we come to C: I presently have a horrible mess:
http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intc
I'm thinking of pulling out all the character-case testing into a
function that converts a character into a class-number (such as 1
for space, 2 for digit, 3 for sign, etc.), calling that all over
the place, and the using a SELECT statement on the result, which
won't change the logic of the code but might make it tidier.
Alternately I might hand-code replacements for the Lisp/Java
utilities for scanning strings, or find something in one of the C
libraries that would help, and then translate the Lisp or Java code
to C. Do any of you have any other ideas what I might do to clean
up the C code? Don't write my code for me, but just give hints what
library routines might do 90% of the work for me, or suggest
re-design of the algorithm? One thing I don't want to do is
download a REGEX package for C. I'm trying to give examples of how
to do things from scratch in C, not how to simply use somebody
else's program, even if the source for the REGEX module is
available. If something isn't in the a standard library for C, then
it doesn't exist for the purpose of this project. (The only
exception I made is the module for collecting and decoding HTML
FORM contents, which is a prerequisite for this whole project.)

Bill Pursell

unread,
Feb 11, 2007, 3:42:38 AM2/11/07
to
On Feb 11, 6:57 am, rem6...@yahoo.com (robert maas, see http://
tinyurl.com/uh3t) was asking about code that:

> verifies the text within a field to make sure it is a valid
> representation of an integer, without any junk thrown in, i.e. it
> must satisfy the regular expression: ^ *[-+]?[0-9]+ *$
>
> If the contents of the field are wrong I want to diagnose as much
> as reasonable what's wrong, not just say "syntax error".
>
[snip]

> Do any of you have any other ideas what I might do to clean
> up the C code? Don't write my code for me, but just give hints what
> library routines might do 90% of the work for me, or suggest
> re-design of the algorithm?


You could try something as simple as this:

strtol( string, &end, BASE );
if( *end != '\0' )
fprintf( stderr, "syntax error starting at '%c'\n", *end);

I'm not sure that this gives you as much syntax error
as you want, but it tells you where it occurs. (Also,
this doesn't exactly match your specification, since
this doesn't allow trailing whitespace, but that's
a trivial fix.)

--
Bill Pursell


Flash Gordon

unread,
Feb 11, 2007, 4:47:06 AM2/11/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 11/02/07 06:57:

> I'm working on examples of programming in several languages, all
> (except PHP) running under CGI so that I can show both the source
> files and the actually running of the examples online. The first
> set of examples, after decoding the HTML FORM contents, merely
> verifies the text within a field to make sure it is a valid
> representation of an integer, without any junk thrown in, i.e. it
> must satisfy the regular expression: ^ *[-+]?[0-9]+ *$
>
> If the contents of the field are wrong I want to diagnose as much
> as reasonable what's wrong, not just say "syntax error".

<snip>

> Now we come to C: I presently have a horrible mess:
> http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intc

<snip>

> to C. Do any of you have any other ideas what I might do to clean
> up the C code? Don't write my code for me, but just give hints what
> library routines might do 90% of the work for me, or suggest
> re-design of the algorithm? One thing I don't want to do is

<snip>

Good, we generally prefer to give help rather than do peoples work for
them :-)

I would suggest you look at the strto* functions which are part of
standard C taking specific note of the second and third parameters,
since you want to use both. The second parameter is used to tell you the
first invalid character (or the end of the string if completely valid)
and the last parameter to specify base 10 which is what the user will
expect. These functions will even tell you if the number in the string
is out of range for the type it is converted to. Finally, since there is
no strtoi you will probably have to use strtol and then check if it is
in the range of an int before assigning it to an int.
--
Flash Gordon

Random832

unread,
Feb 11, 2007, 5:44:44 AM2/11/07
to
2007-02-11 <rem-2007...@yahoo.com>,

robert maas, see http://tinyurl.com/uh3t wrote:
> I'm working on examples of programming in several languages, all
> (except PHP) running under CGI so that I can show both the source
> files and the actually running of the examples online. The first
> set of examples, after decoding the HTML FORM contents, merely
> verifies the text within a field to make sure it is a valid
> representation of an integer, without any junk thrown in, i.e. it
> must satisfy the regular expression: ^ *[-+]?[0-9]+ *$

I'd use strtol with a base of 10.

Things to consider:
1. It doesn't care if there's junk after the numbers, but why do you?
You can always examine *endptr.
2. Won't work for converting integers greater than eleventy billion or
however much your system supports. But how do you intend to convert
them otherwise?

CBFalconer

unread,
Feb 11, 2007, 6:08:55 AM2/11/07
to
"robert maas, see http://tinyurl.com/uh3t" wrote:
>
> I'm working on examples of programming in several languages, all
> (except PHP) running under CGI so that I can show both the source
> files and the actually running of the examples online. The first
> set of examples, after decoding the HTML FORM contents, merely
> verifies the text within a field to make sure it is a valid
> representation of an integer, without any junk thrown in, i.e. it
> must satisfy the regular expression: ^ *[-+]?[0-9]+ *$
>
> If the contents of the field are wrong I want to diagnose as much
> as reasonable what's wrong, not just say "syntax error".
>
> Because perl and PHP include support for regular expressions, it
> was obvious how to do it, and easy to accomplish:

Perl and PHP are off-topic here. Regular expressions are only
topical in reference to code to implement them. In addition, you
RE is wrong. A numeric field ends when the next character cannot
be used, not on a blank. This is easily done in C, see the
following example:. Note that it leaves detection and use of +- to
the calling function, similarly the decision about the termination
char. Note that this parses a stream.

/*--------------------------------------------------------------
* Read an unsigned value. Signal error for overflow or no
* valid number found. Returns 1 for error, 0 for noerror, EOF
* for EOF encountered before parsing a value.
*
* Skip all leading blanks on f. At completion getc(f) will
* return the character terminating the number, which may be \n
* or EOF among others. Barring EOF it will NOT be a digit. The
* combination of error, 0 result, and the next getc returning
* \n indicates that no numerical value was found on the line.
*
* If the user wants to skip all leading white space including
* \n, \f, \v, \r, he should first call "skipwhite(f);"
*
* Peculiarity: This specifically forbids a leading '+' or '-'.
*/
int readxwd(unsigned int *wd, FILE *f)
{
unsigned int value, digit;
int status;
int ch;

#define UWARNLVL (UINT_MAX / 10U)
#define UWARNDIG (UINT_MAX - UWARNLVL * 10U)

value = 0; /* default */
status = 1; /* default error */

ch = ignoreblks(f);

if (EOF == ch) status = EOF;
else if (isdigit(ch)) status = 0; /* digit, no error */

while (isdigit(ch)) {
digit = ch - '0';
if ((value > UWARNLVL) ||
((UWARNLVL == value) && (digit > UWARNDIG))) {
status = 1; /* overflow */
value -= UWARNLVL;
}
value = 10 * value + digit;
ch = getc(f);
} /* while (ch is a digit) */

*wd = value;
ungetc(ch, f);
return status;
} /* readxwd */

The #includes, skipwhite, and ignoreblks functions are omitted.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews


Malcolm McLean

unread,
Feb 11, 2007, 12:51:03 PM2/11/07
to

"robert maas, see http://tinyurl.com/uh3t" <rem...@yahoo.com> wrote

> Now we come to C: I presently have a horrible mess:
> http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intc
> I'm thinking of pulling out all the character-case testing into a
> function that converts a character into a class-number (such as 1
> for space, 2 for digit, 3 for sign, etc.), calling that all over
> the place, and the using a SELECT statement on the result, which
> won't change the logic of the code but might make it tidier.
> Alternately I might hand-code replacements for the Lisp/Java
> utilities for scanning strings, or find something in one of the C
> libraries that would help, and then translate the Lisp or Java code
> to C. Do any of you have any other ideas what I might do to clean
> up the C code? Don't write my code for me, but just give hints what
> library routines might do 90% of the work for me, or suggest
> re-design of the algorithm? One thing I don't want to do is
> download a REGEX package for C. I'm trying to give examples of how
> to do things from scratch in C, not how to simply use somebody
> else's program, even if the source for the REGEX module is
> available. If something isn't in the a standard library for C, then
> it doesn't exist for the purpose of this project. (The only
> exception I made is the module for collecting and decoding HTML
> FORM contents, which is a prerequisite for this whole project.)
>
The first thing is to make your interface clean.

If you want to parse from a string, take a block out of strtol's book.

int parseint(char *str, char **end).

Return the integer you read, and the end of the input you pased up to. If
you cannot read an integer successfully, make *end equal str and return
INT_MIN. INT_MIN is much less lilely than 0 or -1 to be confused with a real
integer if you have a lazy caller who doesn't check his end pointer
properly.

skip leading whitespace.
Read the optional +/- character and make sure there aren't two of them.
skip whitepace ?
Read digit one by one into an usigned integer, amd multiply by ten if there
are more digits to come. Terminate if the unsigned overflows.
Check for INT_MAX or -INT_MIN if the negative flag is set. Terminate on
overflow.
Convert to a signed integer.
Your spec now says to skip trailing whitespace. Probably a bad idea, but if
the instructions say do it we must do it.
Set the end pointer to end of input on success, input on fail.
Return answer on success, INT_MIN on fail.


robert maas, see http://tinyurl.com/uh3t

unread,
Feb 11, 2007, 2:13:34 PM2/11/07
to
> From: Random832 <ran...@random.yi.org>

> I'd use strtol with a base of 10.

Several people suggested that, but you made some additional
comments I want to reply to, so I'm responding here.

> Things to consider:
> 1. It doesn't care if there's junk after the numbers, but why do you?

This is for processing a HTML FORM filled out by a user, a typical
user who is a total novice at computers yet is trying all sorts of
things found on the Web. If the form asks for an integer to be
entered, but the user enters something else, like two integers, or
an algebraic formula which just happens to start with an integer,
or a floating-point value or decimal fraction, or a fraction, I
don't want to just gobble the first part and ignore all the rest,
because obviously the luser didn't understand/follow instructions.
If I just process the first part and ignore the rest, the luser
will be totally confused why he/she didn't get the intended effect.
Better that I complain about the slightest mess in the input field.

> You can always examine *endptr.

Per a nice example I found on the Web:
Linkname: Bullet Proof Integer Input Using strtol()
URL: http://home.att.net/~jackklein/c/code/strtol.html
I'm indeed now checking for any diagnostics that can be obtained
just from the results returned by strtol (the actual return value,
the global error flag, and the reference pointer endptr). See end
of this message for the code as I have it now.

> 2. Won't work for converting integers greater than eleventy billion or
> however much your system supports. But how do you intend to convert
> them otherwise?

Good point. My previous idea was for the user to get just the
syntax correct for integers, and then if the result is mangled it
obviously means this particular programming language (c, c++, java)
is using fixed-length binary integers, whereas if the result is
always correct no matter how many digits are given, then the
language (lisp) is using unlimited-precision integers. But if it's
easy to diagnose explicitly, such as provided by strtol, then
perhaps I can actually tell the user when overflow happens, to make
the lesson a bit less obscure.

Anyway, using strtol, with all the possible tests on the result:

All of these produce the correct diagnosis (note 15-char buffer for input):

Type a number:555555555555555555
You typed: [55555555555555]
Number out of range.

Type a number:2147000000
Dropping EOL char from end of string.
You typed: [2147000000]
Looks good? N=2147000000

Type a number:2148000000
Dropping EOL char from end of string.
You typed: [2148000000]
Number out of range.

Type a number:
Dropping EOL char from end of string.
You typed: []
No number given.

Type a number:5x
Dropping EOL char from end of string.
You typed: [5x]
After number, extra characters on input line.

But these are not the effects I want:

Type a number:x5
Dropping EOL char from end of string.
You typed: [x5]
No number given.

Type a number:- 5
Dropping EOL char from end of string.
You typed: [- 5]
No number given.

There *is* a number given in each case, just that there's junk
before the number in the first case, and gap between sign and
number in second case. It seems I'll need to manually scan from the
start of the field to the start of the number to distinguish these
patterns (brackets indicate optional):
[white] junk [white] sign [white] digits -- junk before start of number
[white] sign white digits -- gap between sign and number
[white] sign junk digits -- junk (or gap) between sign and number
[white] sign digits -- good
[white] digits -- good
strtol doesn't seem to be helping me diagnose the cruft before the number.


Listing of source code used for the above tests:

#include <stdio.h>
#include <errno.h>

#define MAXCH 15
/* Deliberately small buffer to test buffer-full condition */

main() {
char chars[MAXCH]; char* inres; /* Set by fgets */
size_t len; /* Set by strlen */
char onech;
char* endptr; long long_var; /* Set by strtol */
while (1) {
fpurge(stdin);
printf("\nType a number:");
inres = fgets(chars, MAXCH, stdin);
if (NULL==inres) {
printf("*** Got NULL back, which maybe means end-of-stream?\n");
break;
}
len = strlen(chars);
/* printf("Length of string = %d\n", len); */
if (0 >= len) {
printf("Horrible: Input was 0 chars, not even EOL char, how??\n");
break;
}
onech = chars[len-1];
/* printf("The last character is [%c]\n", onech); */
if ('\n' == onech) {
printf("Dropping EOL char from end of string.\n");
chars[len-1] = '\0';
}
printf("You typed: [%s]\n", inres, NULL, inres);
errno = 0;
long_var = strtol(chars, &endptr, 10);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number given.\n");
} else if ('\0' != *endptr) {
printf("After number, extra characters on input line.\n");
} else {
printf("Looks good? N=%ld\n", long_var);
}
sleep(1);
}
}

Flash Gordon

unread,
Feb 11, 2007, 5:26:27 PM2/11/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 11/02/07 19:13:

>> From: Random832 <ran...@random.yi.org>
>> I'd use strtol with a base of 10.
>
> Several people suggested that, but you made some additional
> comments I want to reply to, so I'm responding here.
>
>> Things to consider:
>> 1. It doesn't care if there's junk after the numbers, but why do you?

<snip comments about detecting bad input that happens to also contain a
number>

> Better that I complain about the slightest mess in the input field.

That the the correct attitude for handling user input.

>> You can always examine *endptr.
>
> Per a nice example I found on the Web:
> Linkname: Bullet Proof Integer Input Using strtol()
> URL: http://home.att.net/~jackklein/c/code/strtol.html
> I'm indeed now checking for any diagnostics that can be obtained
> just from the results returned by strtol (the actual return value,
> the global error flag, and the reference pointer endptr). See end
> of this message for the code as I have it now.

Jack Klein knows his stuff. You have found a good reference.

>> 2. Won't work for converting integers greater than eleventy billion or
>> however much your system supports. But how do you intend to convert
>> them otherwise?
>
> Good point. My previous idea was for the user to get just the
> syntax correct for integers, and then if the result is mangled it
> obviously means this particular programming language (c, c++, java)
> is using fixed-length binary integers, whereas if the result is
> always correct no matter how many digits are given, then the
> language (lisp) is using unlimited-precision integers.

With C and C++ assuming that bad input will lead to obviously bad output
is not in general a good idea since in far too many situations it will
produce something that is not obviously bad.

> But if it's
> easy to diagnose explicitly, such as provided by strtol, then
> perhaps I can actually tell the user when overflow happens, to make
> the lesson a bit less obscure.

OK, that's good.

<snip>

> But these are not the effects I want:
>
> Type a number:x5
> Dropping EOL char from end of string.
> You typed: [x5]
> No number given.
>
> Type a number:- 5
> Dropping EOL char from end of string.
> You typed: [- 5]
> No number given.
>
> There *is* a number given in each case, just that there's junk
> before the number in the first case, and gap between sign and
> number in second case. It seems I'll need to manually scan from the
> start of the field to the start of the number to distinguish these
> patterns (brackets indicate optional):
> [white] junk [white] sign [white] digits -- junk before start of number
> [white] sign white digits -- gap between sign and number
> [white] sign junk digits -- junk (or gap) between sign and number

Yes, you need to check for the above yourself if you want to report
them. strtol will only indicate that it the first non-space character
was invalid, not whether there was something valid further in.

> [white] sign digits -- good
> [white] digits -- good

The above, of course, are indicated by strtol succeeding ;-)

> strtol doesn't seem to be helping me diagnose the cruft before the number.
>
>
> Listing of source code used for the above tests:
>
> #include <stdio.h>
> #include <errno.h>

#include <stdlib.h> /* For strtol. Very important since otherwise the
compiler is *required* to assume it returns an int not a long. */

> #define MAXCH 15
> /* Deliberately small buffer to test buffer-full condition */
>
> main() {

Since no one has mentioned it yet I will. The above, whilst legal in the
original C standard, is bad style and no longer supported in the
new(ish) C standard that might one day become commonly implemented.
Don't use implicit and if you don't want parameters be explicit about it.

int main(void) {

> char chars[MAXCH]; char* inres; /* Set by fgets */
> size_t len; /* Set by strlen */
> char onech;
> char* endptr; long long_var; /* Set by strtol */
> while (1) {

I prefer 'for (;;)' but that is purely a matter of style.

> fpurge(stdin);

Standard C does not have an "fpurge" function or anything similar to
what I am guessing it does.

> printf("\nType a number:");

As per Jack's example you need to flush stdout (or have a \n at the end
of the above line). There is also an argument that using puts (which
outputs a newline after the specified text) or fputs would be better
since they do not scan the string for format specifiers.

> inres = fgets(chars, MAXCH, stdin);

Since chars is an array rather than a pointer you could use:
inres = fgets(chars, sizeof chars, stdin);

> if (NULL==inres) {
> printf("*** Got NULL back, which maybe means end-of-stream?\n");

It is end of stream or an error.

> break;
> }
> len = strlen(chars);
> /* printf("Length of string = %d\n", len); */
> if (0 >= len) {

len cannot be negative or even 0 here for at least three reasons. It is
of type size_t which is unsigned and also strlen returns a size_t. The
third reason is that fgets reads until it either has enough to fill the
buffer (allowing space for the nul termination), until error or end of
stream, or up to and including the newline, which ever comes first. So
given a buffer length of 2 or more it will *always* either return NULL
or it will have written a string with a strlen of at least 1. So this if
cannot be taken.

> printf("Horrible: Input was 0 chars, not even EOL char, how??\n");
> break;
> }
> onech = chars[len-1];
> /* printf("The last character is [%c]\n", onech); */
> if ('\n' == onech) {
> printf("Dropping EOL char from end of string.\n");
> chars[len-1] = '\0';
> }

else {
report that the line entered was too long and then probably read the
rest of the line up to and including the next newline.
}

> printf("You typed: [%s]\n", inres, NULL, inres);
> errno = 0;
> long_var = strtol(chars, &endptr, 10);
> if (ERANGE == errno) {
> printf("Number out of range.\n");
> } else if (endptr==chars) {

At this point you could scan from the start of the string for the first
character that is not white space and report a different error depending
on what it is using the is* functions from ctype.h. Alternatively you
could look at using strspn or strcspn from string.h

> printf("No number given.\n");
> } else if ('\0' != *endptr) {
> printf("After number, extra characters on input line.\n");
> } else {
> printf("Looks good? N=%ld\n", long_var);
> }
> sleep(1);

sleep is not a standard function and seems rather pointless in this program.

> }
> }
--
Flash Gordon

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 11, 2007, 11:48:29 PM2/11/07
to
> From: Flash Gordon <s...@flash-gordon.me.uk>

> > sleep(1);
> sleep is not a standard function and seems rather pointless in this program.

It's absolutely essential for peace of mind when dialed into a Unix
shell with VT100 emulator at 19200 baud. The first time I ran this
program, without the sleep call, and pressed ctrl-D to generate
end-of-stream on stdin, the program went into infinite read-EOS
spew-text loop, which filled up all modem buffers. I immediately
pressed ctrl-C to abort C program, and held it down for about ten
seconds, but it was too late, modem buffers were grossly full. I
then pressed ctrl-Z and held that down for several minutes, but
modem buffers were still spewing to the VT100 emulator. I then
scrolled to the top of the past-screens buffer to see if I could
save anything, but it was already too late, all the past-screens
buffer (appx. 30-40 full VT100 screensfull) had already been
overwritten by the spew. I then waited about ten minutes, watching
spew spew spew incessantly, with no way to know whether the program
had even seen my ctrl-C interrupt. Finally after ten minutes or so
I finally saw a shell prompt. I immediately put in the sleep before
any further work on the program. Now if it gets into an infinite
loop, I press ctrl-C and get instant response because there's no
ten minutes of spew already in the modem buffer.

I copied a few cleanup suggestions from your message and will be
responding about them later.

Nick Keighley

unread,
Feb 12, 2007, 8:15:33 AM2/12/07
to
On 11 Feb, 06:57, rem6...@yahoo.com (robert maas, see http://
tinyurl.com/uh3t) wrote:

<snip>

[the program]


> verifies the text within a field to make sure it is a valid
> representation of an integer, without any junk thrown in, i.e. it
> must satisfy the regular expression: ^ *[-+]?[0-9]+ *$
>
> If the contents of the field are wrong I want to diagnose as much
> as reasonable what's wrong, not just say "syntax error".

<snip>

> Alternately I might hand-code replacements for the Lisp/Java
> utilities for scanning strings, or find something in one of the C
> libraries that would help,

if it was anything other than a number then sscanf() might
be worth a look.

<snip>


--
Nick Keighley


Flash Gordon

unread,
Feb 12, 2007, 4:54:21 PM2/12/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 12/02/07 04:48:

>> From: Flash Gordon <s...@flash-gordon.me.uk>
>>> sleep(1);
>> sleep is not a standard function and seems rather pointless in this program.
>
> It's absolutely essential for peace of mind when dialed into a Unix
> shell with VT100 emulator at 19200 baud. The first time I ran this
> program, without the sleep call, and pressed ctrl-D to generate
> end-of-stream on stdin, the program went into infinite read-EOS
> spew-text loop, which filled up all modem buffers. I immediately

<snip>

I can only suggest that you had some other bug in your program at that
point or a but in your modem. As presented your program would not do
that whether it detected an error or EOF it would break out of the loop
and terminate.

Having said that, I can see that if you are hitting that sort of problem
that a delay could be useful.

> I copied a few cleanup suggestions from your message and will be
> responding about them later.

OK.
--
Flash Gordon

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 12, 2007, 11:24:32 PM2/12/07
to
> From: Flash Gordon <s...@flash-gordon.me.uk>
> > ... The first time I ran this

> > program, without the sleep call, and pressed ctrl-D to generate
> > end-of-stream on stdin, the program went into infinite read-EOS
> > spew-text loop, which filled up all modem buffers. ...

> I can only suggest that you had some other bug in your program at that
> point or a but in your modem. As presented your program would not do
> that whether it detected an error or EOF it would break out of the loop
> and terminate.

Not a bug. It's just that the part of the program to detect EOF wasn't yet
written, and that's the very part I was trying to develop.
Step 1: Put in a printf to see what value comes back when I press ctrl-D.
Step 2: Write code to detect that value and break out of loop.
Step 3: Test that to see whether it works.
Step 4: Remove the printf.
Unfortunately step 1 blew me out for ten minutes or so without the sleep.

Unfortunately c doesn't allow any sleep times except integers. I
looked at nanosecond sleep but it requires loading a special module
and building a special nanosecond object and then loading a number
into that object before you can then pass that object to some OO
method that does the actual sleep, a royal pain if it's just to
prevent spew from filling up modem buffers on dialups. The amount
of time I'd waste learning how to do all that would be worse than
the amount of time I waste having a full one-second sleep at each
interactive I/O transaction in the loop during the development of
this code destinded for CGI where there's a completely different
logic for interactive transactions and no chance for spew hence no
need for the sleep.

Anyway, here's the latest news on my task:

While searching various clues the kind folks here sent me, I
discovered some library functions (strspn, strcspn) which are
useful for skipping across whole classes of characters or
complements of such classes, similar to the functions I implemented
in Java (explicitly) and in Common Lisp (via anonymous-function
parameters). That made it possible to translate my lisp/java
algorithms directly to c.

I decided to completely separate the code for checking general
integer syntax [white]* [sign]? [digit]+ [white]* (pseudo-regex
notation), which is independent of the programming language (except
Java where plus sign isn't allowed in integer literals or string to
parseInt), from the petty code to check whether the resultant value
is within the allowed range for this or that fixed-precision data
type in this or that programming language as implemented by this or
that vendor.

So I have one function, stringCheckInteger, which checks whether
the string is of the appropriate general format, making liberal use
of strspn and strcspn, and another function, stringIntegerTellRange,
which checks whether the string-number can be converted to an
actual number by strtoll, and if so then also checks whether it's
within ranges of the successively smaller integer data types. I
think this is my final c version for the time being.
If anyone is curious, see:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intc>
go to the second form ("re-write").

Flash Gordon

unread,
Feb 13, 2007, 4:41:58 AM2/13/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 13/02/07 04:24:

>> From: Flash Gordon <s...@flash-gordon.me.uk>
>>> ... The first time I ran this
>>> program, without the sleep call, and pressed ctrl-D to generate
>>> end-of-stream on stdin, the program went into infinite read-EOS
>>> spew-text loop, which filled up all modem buffers. ...
>> I can only suggest that you had some other bug in your program at that
>> point or a but in your modem. As presented your program would not do
>> that whether it detected an error or EOF it would break out of the loop
>> and terminate.
>
> Not a bug. It's just that the part of the program to detect EOF wasn't yet
> written, and that's the very part I was trying to develop.

So it still was not needed in the program you posted.

> Step 1: Put in a printf to see what value comes back when I press ctrl-D.
> Step 2: Write code to detect that value and break out of loop.
> Step 3: Test that to see whether it works.
> Step 4: Remove the printf.
> Unfortunately step 1 blew me out for ten minutes or so without the sleep.

That is because it is the wrong approach
1) read the documentation to see what the correct way to do it is
2) write the code
3) test it

Fewer steps and more likely to give you a reliable result.

If you used your method with "isspace" it might lead you to think it
returns 1 to indicate a space, then due to a library upgrade your code
could break because actually it returns any non-zero value for a space.

> Unfortunately c doesn't allow any sleep times except integers. I

Wrong. C does not allow *any* sleeping. The slepp function is *not* part
of C it is part of something else your system provides and makes
accessible from C as an extension.

<snip>

> think this is my final c version for the time being.
> If anyone is curious, see:
> <http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intc>
> go to the second form ("re-write").

I may or may not look later.
--
Flash Gordon

Richard Heathfield

unread,
Feb 13, 2007, 5:36:19 AM2/13/07
to
Flash Gordon said:

> robert maas, see http://tinyurl.com/uh3t wrote, On 13/02/07 04:24:

<snip>


>
>> Unfortunately c doesn't allow any sleep times except integers. I
>
> Wrong. C does not allow *any* sleeping.

Wrong. C does *allow* sleeping. It just doesn't *support* it.

> The [sleep] function is *not* part of C

Arguable. It's not defined by the Standard, I agree. But what is a
language, if not the set of all sentences that can be formed according
to the rules of that language? It is certainly possible to call a
function named sleep(), within the rules of C.

Incidentally, I am not arguing that sleep() is topical.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

CBFalconer

unread,
Feb 13, 2007, 1:09:47 AM2/13/07
to
"robert maas, see http://tinyurl.com/uh3t" wrote:
>
... snip ...

>
> Not a bug. It's just that the part of the program to detect EOF
> wasn't yet written, and that's the very part I was trying to
> develop.
> Step 1: Put in a printf to see what value comes back when I press
> ctrl-D.

What for? You have a macro called EOF available. Use it.

Flash Gordon

unread,
Feb 13, 2007, 7:14:30 AM2/13/07
to
Richard Heathfield wrote, On 13/02/07 10:36:

> Flash Gordon said:
>
>> robert maas, see http://tinyurl.com/uh3t wrote, On 13/02/07 04:24:
> <snip>
>>> Unfortunately c doesn't allow any sleep times except integers. I
>> Wrong. C does not allow *any* sleeping.
>
> Wrong. C does *allow* sleeping. It just doesn't *support* it.

If you want to argue it that way the OP is still wrong. Since if C
allows it then it certainly does not prevent the sleep times from being
double or anything else.

>> The [sleep] function is *not* part of C
>
> Arguable. It's not defined by the Standard, I agree. But what is a
> language, if not the set of all sentences that can be formed according
> to the rules of that language? It is certainly possible to call a
> function named sleep(), within the rules of C.

Yes, and the rules of C allow the sleep function to take a double.

> Incidentally, I am not arguing that sleep() is topical.

Indeed. You are arguing terminology and I don't have any problem with
yours. I was just continuing using the terminology the OP used which was
possibly wrong of me. However, my original comment about the use of
sleep was simply that it was not a standard function and seemed
pointless in the code presented, the OP appeared not to have understood
that point based on talking about C only allowing integer sleep times.

It is important for the OP to realise that the sleep function s/he is
using is not one provided by the C language but one provided by his
specific implementation (and a number of other implementations, but not
even all implementations for common desktops).
--
Flash Gordon

Roland Pibinger

unread,
Feb 13, 2007, 9:32:48 AM2/13/07
to
On Sun, 11 Feb 2007 11:13:34 -0800, robert maas, wrote:
>Per a nice example I found on the Web:
> Linkname: Bullet Proof Integer Input Using strtol()
> URL: http://home.att.net/~jackklein/c/code/strtol.html

The linked code does not reflect the current C Standard:
"If the correct value is outside the range of representable values,
LONG_MIN, LONG_MAX ... is returned ... and the value of the macro
ERANGE is stored in errno."

Best regards,
Roland Pibinger

Roland Pibinger

unread,
Feb 13, 2007, 10:45:16 AM2/13/07
to
On Sun, 11 Feb 2007 22:26:27 +0000, Flash Gordon wrote:
>robert maas, see http://tinyurl.com/uh3t wrote, On 11/02/07 19:13:

>> errno = 0;


>> long_var = strtol(chars, &endptr, 10);
>> if (ERANGE == errno) {
>> printf("Number out of range.\n");
>> } else if (endptr==chars) {
>
>At this point you could scan from the start of the string for the first
>character that is not white space and report a different error depending
>on what it is using the is* functions from ctype.h. Alternatively you
>could look at using strspn or strcspn from string.h

You consider leading whitespace an error?

>
>> printf("No number given.\n");
>> } else if ('\0' != *endptr) {
>> printf("After number, extra characters on input line.\n");
>> } else {
>> printf("Looks good? N=%ld\n", long_var);
>> }
>> sleep(1);

IMO, the last part of the function should look like the following:

errno = 0;
long_var = strtol(chars, &endptr, 0);


if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {

printf("No number or not parsable number given.\n");
} else if ('\0' == *endptr) {


printf("Looks good? N=%ld\n", long_var);

} else if (endptr != chars) {


printf("After number, extra characters on input line.\n");
} else {

printf("Unknown error, should never happen.\n");
}

Best regards,
Roland Pibinger

Flash Gordon

unread,
Feb 13, 2007, 10:13:05 AM2/13/07
to
Roland Pibinger wrote, On 13/02/07 14:32:

Looks like it allows for that to me. It includes:
if (ERANGE == errno)
{
puts("number out of range\n");
}

Admittedly it does not separate out positive and negative out of range,
but that information is mentioned in the text.
--
Flash Gordon

Flash Gordon

unread,
Feb 13, 2007, 1:01:04 PM2/13/07
to
Roland Pibinger wrote, On 13/02/07 15:45:

> On Sun, 11 Feb 2007 22:26:27 +0000, Flash Gordon wrote:
>> robert maas, see http://tinyurl.com/uh3t wrote, On 11/02/07 19:13:
>
>>> errno = 0;
>>> long_var = strtol(chars, &endptr, 10);
>>> if (ERANGE == errno) {
>>> printf("Number out of range.\n");
>>> } else if (endptr==chars) {
>> At this point you could scan from the start of the string for the first
>> character that is not white space and report a different error depending
>> on what it is using the is* functions from ctype.h. Alternatively you
>> could look at using strspn or strcspn from string.h
>
> You consider leading whitespace an error?

Not in this case. Since the OP wanted more specific errors I suggested
scanning for the first non-whitespace character to allow identification
of the character that caused the failure.

>>> printf("No number given.\n");
>>> } else if ('\0' != *endptr) {
>>> printf("After number, extra characters on input line.\n");
>>> } else {
>>> printf("Looks good? N=%ld\n", long_var);
>>> }
>>> sleep(1);
>
> IMO, the last part of the function should look like the following:
>
> errno = 0;
> long_var = strtol(chars, &endptr, 0);
> if (ERANGE == errno) {
> printf("Number out of range.\n");
> } else if (endptr==chars) {
> printf("No number or not parsable number given.\n");

The OP wanted to be more specific in error reporting hence my suggesting
ways of analysing this further.

> } else if ('\0' == *endptr) {
> printf("Looks good? N=%ld\n", long_var);
> } else if (endptr != chars) {

You have already trapped the case when endptr==chars above, so you know
that endptr!=chars if you reach here so I would consider the above test
to be a sign of the coder having not understood what s/he was writing.

> printf("After number, extra characters on input line.\n");
> } else {
> printf("Unknown error, should never happen.\n");

It is guaranteed not to happen!

> }
--
Flash Gordon

Roland Pibinger

unread,
Feb 13, 2007, 2:04:58 PM2/13/07
to
On Tue, 13 Feb 2007 18:01:04 +0000, Flash Gordon wrote:
>Roland Pibinger wrote, On 13/02/07 15:45:
>> IMO, the last part of the function should look like the following:
>>
>> errno = 0;
>> long_var = strtol(chars, &endptr, 0);
>> if (ERANGE == errno) {
>> printf("Number out of range.\n");
>> } else if (endptr==chars) {
>> printf("No number or not parsable number given.\n");
>
>The OP wanted to be more specific in error reporting hence my suggesting
>ways of analysing this further.
>
>> } else if ('\0' == *endptr) {
>> printf("Looks good? N=%ld\n", long_var);
>> } else if (endptr != chars) {
>
>You have already trapped the case when endptr==chars above, so you know
>that endptr!=chars if you reach here so I would consider the above test
>to be a sign of the coder having not understood what s/he was writing.

... or who wants to make explicit which condition is tested instead of
using a 'catch-all' else block. Since errno, endptr, chars and *endptr
are used in the if statements it's not so easy to correspond those
comparisons to the relevant parts or the strtol specification.

>> printf("After number, extra characters on input line.\n");
>> } else {
>> printf("Unknown error, should never happen.\n");
>
>It is guaranteed not to happen!

I'll replace the line with assert(0).

Best regards,
Roland Pibinger

Flash Gordon

unread,
Feb 13, 2007, 3:11:18 PM2/13/07
to
Roland Pibinger wrote, On 13/02/07 19:04:

> On Tue, 13 Feb 2007 18:01:04 +0000, Flash Gordon wrote:
>> Roland Pibinger wrote, On 13/02/07 15:45:
>>> IMO, the last part of the function should look like the following:
>>>
>>> errno = 0;
>>> long_var = strtol(chars, &endptr, 0);
>>> if (ERANGE == errno) {
>>> printf("Number out of range.\n");
>>> } else if (endptr==chars) {
>>> printf("No number or not parsable number given.\n");
>> The OP wanted to be more specific in error reporting hence my suggesting
>> ways of analysing this further.
>>
>>> } else if ('\0' == *endptr) {
>>> printf("Looks good? N=%ld\n", long_var);
>>> } else if (endptr != chars) {
>> You have already trapped the case when endptr==chars above, so you know
>> that endptr!=chars if you reach here so I would consider the above test
>> to be a sign of the coder having not understood what s/he was writing.
>
> ... or who wants to make explicit which condition is tested instead of
> using a 'catch-all' else block.

Then why isn't it
else if (endptr != chars && '\0' != *endptr && errno !- ERANGE)

> Since errno, endptr, chars and *endptr
> are used in the if statements it's not so easy to correspond those
> comparisons to the relevant parts or the strtol specification.

It is very easy. It is even easier to see that you you have a redundant
if because you have already checked for the opposite condition and your
final if only muddies the waters.

I can see no good reason to test for a COND and !COND in a simple if
chain such as this.

>>> printf("After number, extra characters on input line.\n");
>>> } else {
>>> printf("Unknown error, should never happen.\n");
>> It is guaranteed not to happen!
>
> I'll replace the line with assert(0).

Slightly better would be to get rid of the last if above and just use an
else and then put an appropriate assert in the else clause. However, if
you are going to assert anything at all there is still the question why
you don't assert everything.
--
Flash Gordon

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 13, 2007, 8:41:00 PM2/13/07
to
> From: Flash Gordon <s...@flash-gordon.me.uk>
> >>> ... The first time I ran this
> >>> program, without the sleep call, and pressed ctrl-D to generate
> >>> end-of-stream on stdin, the program went into infinite read-EOS
> >>> spew-text loop, which filled up all modem buffers. ...
> >> I can only suggest that you had some other bug in your program at that
> >> point or a but in your modem. As presented your program would not do
> >> that whether it detected an error or EOF it would break out of the loop
> >> and terminate.
> > Not a bug. It's just that the part of the program to detect EOF wasn't yet
> > written, and that's the very part I was trying to develop.
> So it still was not needed in the program you posted.

That depends on how you think of the program. If it had been
intended as a standalone program to distribute to others, then the
sleep could be regarded as a "dunzell" (StarTrek TOS jargon), i.e.
a part that serves no useful function. However in fact it was just
a test rig to develop modules which would later be installed
primarily in a CGI environment (where the toplevel stdin test loop
would not be present at all). As a test rig, where I might at any
time add new buggy code that might produce infinite spew, whereby
I'd need protection from modem-buffer disaster, it was quite
appropriate for the sleep to be in the toplevel loop at all times.
What was posted was just the current version of that test rig at
the moment I posted. But in fact that sleep would be present in
*any* version of that test rig at any time after I encountered the
modem-buffer disaster and consequently took precautions against it
ever happening again in any version of that test rig or any other
test rig descended from it.

If anyone happens to like my program enough to copy it and use it
themselves, but doesn't like the sleep in it, feel free to remove
it, but then don't complain to me if you subsequently try to modify
the program in other ways and introduce a bug and fill up your
modem buffers or even worse fill up all free swap space on your PC
and crash the OS and can't re-boot. (YMMV)

> > Step 1: Put in a printf to see what value comes back when I press ctrl-D.
> > Step 2: Write code to detect that value and break out of loop.
> > Step 3: Test that to see whether it works.
> > Step 4: Remove the printf.
> > Unfortunately step 1 blew me out for ten minutes or so without the sleep.
> That is because it is the wrong approach
> 1) read the documentation to see what the correct way to do it is
> 2) write the code
> 3) test it

That's not good development technique. Documentation often is
misunderstood. If your approach is followed, your program might
have a subtle bug where you're not getting the value you thought
you're getting but you have the test written backwards or otherwise
wrong and for the cases you tested your multiple mistakes are
covering for each other making the program "work" despite being
totally wrongly written.

It's best to read the documentation (as I did, but did't include in
the steps of actual program development, sorry if you assumed
contrary to fact), and the install both the call to whatever
library routine *and* a printf of the return value, then look at
the output to see if it conforms to how you read the documentation
to mean, and if so then proceed to write the test on that basis.
But if the return value doesn't agree with what you thought the
documentation said, you need to consider various alternatives:
- You aren't calling the correct function because you loaded the
wrong library.
- You are calling the correct function in the wrong way (as
happened to me the first time I tried strtoll, see other thread).
- You misunderstood the documentation.

Once you are sure the function returns the value you expect in all
test cases that cover in-range out-of-range cases as well as
carefully constructed right-at-edge-of-range cases, if any of that
makes sense for the given fuction, *then* it's time to write the
test to distinguish between the various classes of results as you
now *correctly* understand them based on agreement between your
reading of documentation and your live tests.

So in this case, calling fgets, I needed to test all these cases:
- Empty input: NonNull return value, Buffer contains EOL NUL
- Normal input: NonNull return value, Buffer contains chars EOL NUL
- Input that overruns buffer: NonNull return value, Buffer contains chars NUL
- Abort via end-of-stream generated via ctrl-D: NULL return value.
- Abort via ctrl-C: Program aborts to shell all by itself.
- Abort via escape (a.k.a. altmode): Goes in as garbage screwup character, avoid.
One case (buffer overrun) I really needed to see for myself,
because the documentation didn't make it clear whether fgets would
omit the NUL so it could fill the entire buffer with data to cram
it all in and not lose that last character, or truncate the data
one shorter to guarantee a NUL was there. In fact the latter
occurs. But I was prepared to force a NUL there, overwriting the
last byte, if fgets had done the first instead. One thing I *did*
have to do is check whether the last character before the NUL was
EOL or not, and clobber it to NUL (shortening string by one
character per c's NUL-terminated-byte-array convention for
emulating "strings") only if it was EOL, so the string seen by the
rest of the program would consistently *not* have the EOL
character.

Now not all those cases are actually necessary for the end purpose
of this program, developing a module intended for CGI usage, but
it's nice to know my how the basic terminal-input routine of my
stdin test rig performs for *all* inputs before making extensive
use of it for *anything*. I don't want confusion later where I
don't know whether strange results are due to bug in test rig or
bug in the actual module I'm trying to develop.

> > Unfortunately c doesn't allow any sleep times except integers. ...
> Wrong. C does not allow *any* sleeping. ...

Let me re-phrase that: The sleep function provided by the library
whose header is unistd.h doesn't allow any sleep times except
integers. Now are you happy?

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 13, 2007, 9:07:15 PM2/13/07
to
> From: CBFalconer <cbfalco...@yahoo.com>

> > Step 1: Put in a printf to see what value comes back when I press
> > ctrl-D.
> What for? You have a macro called EOF available. Use it.

Even if I were to take advice, I'd *still* as a first step put in a
printf to tell me the value of EOF and also the returned value so I
can see if they really are the same when I press ctrl-D.

But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>
Many of the functions described in this chapter return the value of
the macro EOF to indicate unsuccessful completion of the operation.
Since EOF is used to report both end of file and random errors, it's
often better to use the feof function to check explicitly for end of
file and ferror to check for errors.

It doesn't sound like comparing the return value with EOF is a good
way to diagnose what really happened. If I ever decide this needs
fixing, I'll fix it by checking both feof and ferror, not by
comparing with EOF. (Or compare with EOF as a first pass, then if
that matches go ahead and check both feof and ferror to see which
sub-case applies.)

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 13, 2007, 9:12:05 PM2/13/07
to
> From: rpbg...@yahoo.com (Roland Pibinger)

> The linked code does not reflect the current C Standard:
> "If the correct value is outside the range of representable values,
> LONG_MIN, LONG_MAX ... is returned ... and the value of the macro
> ERANGE is stored in errno."

Hmm, indeed I seem to recall seeing that in the specs as I was
researching this before deciding to use strtoll instead. I'll have
to take a look at that someday when I have time.

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 13, 2007, 9:32:05 PM2/13/07
to
> From: rpbg...@yahoo.com (Roland Pibinger)

> IMO, the last part of the function should look like the following:
> errno = 0;
> long_var = strtol(chars, &endptr, 0);
> if (ERANGE == errno) {
> printf("Number out of range.\n");
> } else if (endptr==chars) {
> printf("No number or not parsable number given.\n");
> } else if ('\0' == *endptr) {
> printf("Looks good? N=%ld\n", long_var);
> } else if (endptr != chars) {
> printf("After number, extra characters on input line.\n");
> } else {
> printf("Unknown error, should never happen.\n");
> }

Before I started this task, I made a design desision that
whitespace before and/or after the number is fine, but any other
stray character not part of the [optionalSign] oneOrMoreDigits is
an error. Your advice is inconsistent with the part of the decision
whereby trailing whitespace is fine.

Part of my decision was that whitespace allowance should be
symmetric. It should be allowed before iff allowed after. strtol is
assymtric in this respect, allowing whitespace before (and
rejecting stray non-white text before), but failing to distinguish
between trailing whitespace (OK) and trailing junk (Not OK), either
rejecting both (if caller checks to make sure the final pointer
matches end of string), or accepting both (if caller doesn't make
that check).

There's so much that strtol fails to check the way I want, that
it's best to just not use it at all for preliminary syntax
checking, so I ended up writing my own code, which first version
was ugly, but second version is pretty clean, making liberal use of
strspn and strcspn, which I didn't know about until after I had
already written that ugly first version (and translated it to
equally ugly c++), and then gone ahead to write clean lisp and java
versions, and then also gone ahead to write regex stuff for perl
and PHP, and finally I came back to look at the ugly C to see if I
might make it less ugly.

Your advice to use strtol to do the preliminary syntax check wasn't
good, but in an indirect way it helped, because searching for
documentation for strtol accidently turned up the documentation for
strtoll and for strspn and strcspn.

CBFalconer

unread,
Feb 14, 2007, 1:55:17 AM2/14/07
to
"robert maas, see http://tinyurl.com/uh3t" wrote:
>> From: CBFalconer <cbfalco...@yahoo.com>
>
>>> Step 1: Put in a printf to see what value comes back when I press
>>> ctrl-D.
>>
>> What for? You have a macro called EOF available. Use it.
>
> Even if I were to take advice, I'd *still* as a first step put in a
> printf to tell me the value of EOF and also the returned value so I
> can see if they really are the same when I press ctrl-D.

You NEVER need to know the value of EOF. You simply need to know
that it is negative, and outside the range of char, especially
unsigned char. This is why you usually receive chars in an int.

>
> But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>
>
> Many of the functions described in this chapter return the value of
> the macro EOF to indicate unsuccessful completion of the operation.
> Since EOF is used to report both end of file and random errors, it's
> often better to use the feof function to check explicitly for end of
> file and ferror to check for errors.

WRONG. Those functions are to distinguish between error and
physical EOF when some input routine actually returns EOF. By the
time feof has shown up it is too late to control use of the input
data. C is unlike Pascal in this respect.

BTW, please do not strip attributions for material you quote.

CBFalconer

unread,
Feb 14, 2007, 1:59:27 AM2/14/07
to
"robert maas, see http://tinyurl.com/uh3t" wrote:
>
... snip ...

>
> Part of my decision was that whitespace allowance should be
> symmetric. It should be allowed before iff allowed after. strtol
> is assymtric in this respect, allowing whitespace before (and
> rejecting stray non-white text before), but failing to distinguish
> between trailing whitespace (OK) and trailing junk (Not OK),
> either rejecting both (if caller checks to make sure the final
> pointer matches end of string), or accepting both (if caller
> doesn't make that check).

Not so. The returned value of endptr simply allows the user to
make that decision for himself.

Richard Heathfield

unread,
Feb 14, 2007, 3:12:23 AM2/14/07
to
robert maas, see http://tinyurl.com/uh3t said:

>> From: Flash Gordon <s...@flash-gordon.me.uk>
>
>> [...] C does not allow *any* sleeping. ...


>
> Let me re-phrase that: The sleep function provided by the library
> whose header is unistd.h doesn't allow any sleep times except
> integers. Now are you happy?

"If something isn't in the a standard library for C, then it doesn't
exist for the purpose of this project." - robert maas, in the article
starting this thread.

<unistd.h> is not a standard header, and none of the functions for which
it is required to be included are in the standard library. Therefore,
by your own argument, the sleep function you are talking about does not
exist.

Flash Gordon

unread,
Feb 14, 2007, 3:52:46 AM2/14/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 14/02/07 01:41:

>> From: Flash Gordon <s...@flash-gordon.me.uk>
>>>>> ... The first time I ran this
>>>>> program, without the sleep call, and pressed ctrl-D to generate
>>>>> end-of-stream on stdin, the program went into infinite read-EOS
>>>>> spew-text loop, which filled up all modem buffers. ...
>>>> I can only suggest that you had some other bug in your program at that
>>>> point or a but in your modem. As presented your program would not do
>>>> that whether it detected an error or EOF it would break out of the loop
>>>> and terminate.
>>> Not a bug. It's just that the part of the program to detect EOF wasn't yet
>>> written, and that's the very part I was trying to develop.
>> So it still was not needed in the program you posted.
>
> That depends on how you think of the program. If it had been

<snip>

I think of programs as presented. As presented there was no reason for
the sleep.

> If anyone happens to like my program enough to copy it and use it
> themselves, but doesn't like the sleep in it, feel free to remove
> it, but then don't complain to me if you subsequently try to modify
> the program in other ways and introduce a bug and fill up your
> modem buffers or even worse fill up all free swap space on your PC
> and crash the OS and can't re-boot. (YMMV)

None of those would give me a problem. Even if it was possible for one
of those to give me a problem I would not need the sleep function.

You might want to find out how to use a debugger on your system, then
you can step through the code when you are not sure about it as part of
your testing.

>>> Step 1: Put in a printf to see what value comes back when I press ctrl-D.
>>> Step 2: Write code to detect that value and break out of loop.
>>> Step 3: Test that to see whether it works.
>>> Step 4: Remove the printf.
>>> Unfortunately step 1 blew me out for ten minutes or so without the sleep.
>> That is because it is the wrong approach
>> 1) read the documentation to see what the correct way to do it is
>> 2) write the code
>> 3) test it
>
> That's not good development technique.

True, I should have included some earlier steps such as analysing the
requirements & designing the software.

> Documentation often is
> misunderstood.

My experience if that the above applies to people who thing that
experimenting with a function is a good way to find out about it. It
does not in my experience apply to those who believe the best way to
find out is to read the documentation.

> If your approach is followed, your program might
> have a subtle bug where you're not getting the value you thought
> you're getting but you have the test written backwards or otherwise
> wrong and for the cases you tested your multiple mistakes are
> covering for each other making the program "work" despite being
> totally wrongly written.

That is what testing if for. You feed in as much data (in the loosest
sense) as practical carefully crafted to do your damnedest to break the
code and thus find what is wrong with it.

You said in your post that the way to do it was basically to experiment
with the function.

> It's best to read the documentation (as I did, but did't include in
> the steps of actual program development, sorry if you assumed
> contrary to fact),

I can only go on what you actually post.

> and the install both the call to whatever
> library routine *and* a printf of the return value, then look at
> the output to see if it conforms to how you read the documentation
> to mean, and if so then proceed to write the test on that basis.
> But if the return value doesn't agree with what you thought the
> documentation said, you need to consider various alternatives:
> - You aren't calling the correct function because you loaded the
> wrong library.
> - You are calling the correct function in the wrong way (as
> happened to me the first time I tried strtoll, see other thread).
> - You misunderstood the documentation.

Testing your program will find all of these. Well, it will if you test
it properly.

> Once you are sure the function returns the value you expect in all
> test cases that cover in-range out-of-range cases as well as
> carefully constructed right-at-edge-of-range cases, if any of that
> makes sense for the given fuction, *then* it's time to write the
> test to distinguish between the various classes of results as you
> now *correctly* understand them based on agreement between your
> reading of documentation and your live tests.
>
> So in this case, calling fgets, I needed to test all these cases:
> - Empty input: NonNull return value, Buffer contains EOL NUL
> - Normal input: NonNull return value, Buffer contains chars EOL NUL
> - Input that overruns buffer: NonNull return value, Buffer contains chars NUL
> - Abort via end-of-stream generated via ctrl-D: NULL return value.
> - Abort via ctrl-C: Program aborts to shell all by itself.
> - Abort via escape (a.k.a. altmode): Goes in as garbage screwup character, avoid.

All that would have been in the test set for testing your program so
having read the documentation and written the relevant module you would
test and see that it worked as expected, including the program
gracefully handling "garbage" input.

> One case (buffer overrun) I really needed to see for myself,
> because the documentation didn't make it clear whether fgets would
> omit the NUL so it could fill the entire buffer with data to cram
> it all in and not lose that last character, or truncate the data
> one shorter to guarantee a NUL was there. In fact the latter
> occurs.

If you cannot understand the documentation you have available that is
the time to ask those with more experience/knowledge. Had you at that
point posted here saying that it was not clear from the documentation
you have then someone here would clarify it for you.

> But I was prepared to force a NUL there, overwriting the
> last byte, if fgets had done the first instead. One thing I *did*
> have to do is check whether the last character before the NUL was
> EOL or not, and clobber it to NUL (shortening string by one
> character per c's NUL-terminated-byte-array convention for
> emulating "strings") only if it was EOL, so the string seen by the
> rest of the program would consistently *not* have the EOL
> character.

So, as you cannot tell that from your documentation how do you know that
behaviour is not specific to your implementation and might not change
when a patch is installed on the machine later today?

> Now not all those cases are actually necessary for the end purpose
> of this program, developing a module intended for CGI usage, but
> it's nice to know my how the basic terminal-input routine of my
> stdin test rig performs for *all* inputs before making extensive
> use of it for *anything*. I don't want confusion later where I
> don't know whether strange results are due to bug in test rig or
> bug in the actual module I'm trying to develop.

So you test your test rig once you have written it.

>>> Unfortunately c doesn't allow any sleep times except integers. ...
>> Wrong. C does not allow *any* sleeping. ...
>
> Let me re-phrase that: The sleep function provided by the library
> whose header is unistd.h doesn't allow any sleep times except
> integers. Now are you happy?

Yes.

Understanding what is part of C and what is not is important so that you
can isolate the system specifics and know what will have to be changed
to run the program on some other system.
--
Flash Gordon

Roland Pibinger

unread,
Feb 14, 2007, 8:29:40 AM2/14/07
to
On Tue, 13 Feb 2007 18:32:05 -0800, robert maas wrote:

>> From: rpbg...@yahoo.com (Roland Pibinger)
>> IMO, the last part of the function should look like the following:
>> errno = 0;
>> long_var = strtol(chars, &endptr, 0);
>> if (ERANGE == errno) {
>> printf("Number out of range.\n");
>> } else if (endptr==chars) {
>> printf("No number or not parsable number given.\n");
>> } else if ('\0' == *endptr) {
>> printf("Looks good? N=%ld\n", long_var);
>> } else if (endptr != chars) {
>> printf("After number, extra characters on input line.\n");
>> } else {
>> printf("Unknown error, should never happen.\n");
>> }
>
>Before I started this task, I made a design desision that
>whitespace before and/or after the number is fine, but any other
>stray character not part of the [optionalSign] oneOrMoreDigits is
>an error. Your advice is inconsistent with the part of the decision
>whereby trailing whitespace is fine.

Ok, in your original code you did not distinguish between (allowed)
trailing whitespace and (not allowed) extra characters:

} else if ('\0' != *endptr) {

printf("After number, extra characters on input line.\n");

>Part of my decision was that whitespace allowance should be
>symmetric. It should be allowed before iff allowed after. strtol is
>assymtric in this respect, allowing whitespace before (and
>rejecting stray non-white text before), but failing to distinguish
>between trailing whitespace (OK) and trailing junk (Not OK), either
>rejecting both (if caller checks to make sure the final pointer
>matches end of string), or accepting both (if caller doesn't make
>that check).

Here is a 'symmetric' version that allows for leading and trailing
whitespace but not for 'stray non-white text':

errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {

printf("Not a (parsable) number given.\n");
} else {
while (isspace (*endptr)) { // trailing whitespace?
++endptr;


}
if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);

} else {
printf("After number, invalid extra characters on input
line.\n");
}
}

I hope that this is now a 100% solution. I agree that strtol is a good
example of how not to design a function interface.

Best regards,
Roland Pibinger

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 14, 2007, 3:23:24 PM2/14/07
to
> From: rpbg...@yahoo.com (Roland Pibinger)

> >Before I started this task, I made a design desision that
> >whitespace before and/or after the number is fine, but any other
> >stray character not part of the [optionalSign] oneOrMoreDigits is
> >an error. Your advice is inconsistent with the part of the decision
> >whereby trailing whitespace is fine.
> Ok, in your original code you did not distinguish between (allowed)
> trailing whitespace and (not allowed) extra characters:

I don't believe you've even looked at my original code.
Do you rememer seeing this function definition?
/* Given a string (nul-term), and index where digits ended,
scan to very end making sure no junk, return code:
garafnum = garbage after number */
enum errcode strchkint4(char* str, int* pix) {
char ch;
while (1) {
ch = str[*pix];
if ((0 == ch) || ('\n' == ch)) {
/* printf("At ix=%d, ch=%c, nul/eol reached.\n", *pix, ch); */
return(0);
}
else if (' ' == ch) {
/* printf("At ix=%d, ch=%c, skip white.\n", *pix, ch); */
(*pix)++;
}
else {
/* printf("At ix=%d, ch=%c, junk.\n", *pix, ch); */
return(garafnum);
}
}
}
If you don't remember seeing that code, then you haven't looked at
the original code I wrote for the C implementation of this task,
because *that* is the relevant original code.

> } else if ('\0' != *endptr) {
> printf("After number, extra characters on input line.\n");

You're totally confused. That's not my original code at all.
Here's the chronology:
-1- Original code, such as the piece I posted above.
-2- Translation of original code to C++, which can be found here:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intcpp>
-3- Complete re-write in Common Lisp.
-4- Translation of lisp version to java.
-5- Complete re-write in perl.
-6- Translation of perl version to PHP.
-7- Getting advice to try strtol.
-8- Researching strtol, discovering strtoll which is better.
-9- Trying strtoll in test rig, having trouble.
-A- Getting advice about why strtoll didn't work for me.
-B- Fixing test rig to use strtoll correctly, but being dissatisfied
because it fails to distinguish between trailing whitespace and
trailing junk.
-C- Discovering strspn and strcspn.
-D- Translating lisp/java version to c using strspn and strcspn,
using strtoll only after the syntax check has already been
completed.
-E- Your confusion between the first version -1- using while loop
and something somewhere from -9- to the end using strtol[l].

> I agree that strtol is a good example of how not to design a
> function interface.

At least we're in agreement about that one thing!

There's still the policy decision whether to show absolute
beginners how to write their own code, such as scanning for the
first character that matches or doesn't match a bag of some type,
using position-if and position-if-not in Common Lisp or strspn and
strcspn in C, or just call a magic genie which does almost what you
want but screws up in one aspect requiring a post-call fixup to
make the result 100% correct. At the moment, I prefer the scanning
method in all languages except perl and PHP, because it's
symmetric, and easily translatable between for languages rather
than special to just one add-on library of one laguage. In perl and
PHP I'm presently using regular expressions, a sort of "magic
genie" but without the design flaw that strol[l] have, because (1)
they are nicely integrated into the language, no hassle to use
them, and (2) they are in fact advertised as a primary reason to
use those languages so I might as well show off such usage when I'm
comparing how to do the same task in all six languages.

On the other hand, that's slightly moot for this specific purpose,
which was merely to extract a numeric value from a HTML FORM field
string in the safest way possible, so that the numeric value could
then be used in the actual sample code fragment, which I haven't
started writing yet.

If anyone is curious about the overall project (multi-language
"cookbook" in form of matrix per one or two datatypes that each
operation/function deals with), I've finished all the built-in c
and c++ operators, and their Common Lisp equivalents, and now I'm
doing the c libraries, starting with ctype.h where I'm about
halfway finished. See toplevel "cookook" file:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/CookTop.html>
click on chapter 3 skeleton in progress.

Flash Gordon

unread,
Feb 14, 2007, 4:47:23 PM2/14/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 14/02/07 20:23:
>> From: rpbg...@yahoo.com (Roland Pibinger)

<snip>

>> I agree that strtol is a good example of how not to design a
>> function interface.
>
> At least we're in agreement about that one thing!
>
> There's still the policy decision whether to show absolute
> beginners how to write their own code, such as scanning for the
> first character that matches or doesn't match a bag of some type,
> using position-if and position-if-not in Common Lisp or strspn and
> strcspn in C, or just call a magic genie which does almost what you
> want but screws up in one aspect requiring a post-call fixup to
> make the result 100% correct.

From your perspective it might "screw up" one aspect, but that is
because you are assuming the string is meant to have only one data item.
strtol and friends are designed on the basis that you might want to pass
the rest of the string to something else, so they tell you where to
start. In your case that is looking to see if the remainder is white
space or not, but sometime people might be doing other things.

> At the moment, I prefer the scanning
> method in all languages except perl and PHP, because it's
> symmetric, and easily translatable between for languages rather
> than special to just one add-on library of one laguage. In perl and
> PHP I'm presently using regular expressions, a sort of "magic
> genie" but without the design flaw that strol[l] have, because (1)
> they are nicely integrated into the language, no hassle to use
> them, and (2) they are in fact advertised as a primary reason to
> use those languages so I might as well show off such usage when I'm
> comparing how to do the same task in all six languages.
>
> On the other hand, that's slightly moot for this specific purpose,
> which was merely to extract a numeric value from a HTML FORM field
> string in the safest way possible, so that the numeric value could
> then be used in the actual sample code fragment, which I haven't
> started writing yet.

Personally I would still go with strtol[l] and then check whether the
trailing data is white space or not.

> If anyone is curious about the overall project (multi-language
> "cookbook" in form of matrix per one or two datatypes that each
> operation/function deals with), I've finished all the built-in c
> and c++ operators, and their Common Lisp equivalents, and now I'm
> doing the c libraries, starting with ctype.h where I'm about
> halfway finished. See toplevel "cookook" file:
> <http://www.rawbw.com/~rem/HelloPlus/CookBook/CookTop.html>
> click on chapter 3 skeleton in progress.

Looking at some of the earlier stuff you have work to do there as well.
The hello world programs in C are using implicit int for main which is
not allowed in the latest standard, the web one fails to include stdio.h
which is required (unless you want to do the work of providing your own
prototype), and one of them is a deliberately obfusticated program which
relies on ASCII which the C standard does not guarantee.

This from your "CookBook" is wrong for C95 and earlier, and since you
use implicit int all over the place you are not using C99:

| In c, each function definition is supposed to be before the first time
| it is called. That's because the compiler works forward through the
| file checking each fuction-call to make sure the function is defined,
| and generates an error message immediately when it sees a attempt to
| call a function that isn't defined.

So is this prototype you show "int g2(int n1,n2);"

There are several other errors.

I suggest you need to learn C properly before writing any kind of
"CookBook" that includes C in the languages it uses.
--
Flash Gordon

Keith Thompson

unread,
Feb 14, 2007, 7:13:45 PM2/14/07
to
Flash Gordon <sp...@flash-gordon.me.uk> writes:
> robert maas, see http://tinyurl.com/uh3t wrote, On 14/02/07 20:23:
[snip]

> This from your "CookBook" is wrong for C95 and earlier, and since you
> use implicit int all over the place you are not using C99:
>
> | In c, each function definition is supposed to be before the first time
> | it is called. That's because the compiler works forward through the
> | file checking each fuction-call to make sure the function is defined,
> | and generates an error message immediately when it sees a attempt to
> | call a function that isn't defined.
>
> So is this prototype you show "int g2(int n1,n2);"
>
> There are several other errors.

Including the use of "defined" rather than "declared". A function
call requires a declaration for the called function; it doesn't
require a definition. (That's in C99; C90 allows calls without
declarations, but providing declarations, preferably prototypes, is
still an excellent idea.)

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 14, 2007, 7:24:47 PM2/14/07
to
> From: Flash Gordon <s...@flash-gordon.me.uk>

> you are assuming the string is meant to have only one data item.

Yes, that's the situation here, when validating the contents of a
single HTML-FORM text field, which is supposed to contain exactly
the representation of one integer using decimal notation,
optionally with whitespace around it either/both way(s).

> strtol and friends are designed on the basis that you might want
> to pass the rest of the string to something else, so they tell
> you where to start.

So basically you make sure you've gobbled everything preceding the
item of interest, except whitespace, then you call the function,
which skips the leading whitespace and gobbles the item of
interest, leaving any trailing whitespace and any items of later
interest. So whitespace is treated in an asymmetrical manner, and
at the very end of a chain of [white]* [item]! parsing you have a
single [white]* [null] parser just to verify somebody didn't leave
more useful items that haven't been gobbled?

I'll have to remember that paradigm if and when I ever ask a user
to type in more than one item on a single line, such as if I ever
write a CGI-accessible Soduku solver where a whole row is entered
in a single text field.

Thanks for explaining that other input paradigm, sorta like scanf
but more robust.

> In your case that is looking to see if the remainder is white
> space or not, but sometime people might be doing other things.

Yes. If I wanted to fit my single-item syntax-check into that
multi-item-chain paradigm, I'd have to do it like you suggested in
an earlier message. But unfortunately when it says "no number
present" it really means "no number *immediately* present at start
of line, ignoring optional whitespace". So to satisfy my spec, that
would have to be sub-cased, where if it hits the no-number
condition I'll have to scan for a digit anyway to separate the
sub-cases of junk-before-number and truly-no-number-anywhere.

For now I still like the strspn and strcspn version best for the
current application. But thanks for the explanation of the other
paradigm that I might use for another application someday.

> Looking at some of the earlier stuff you have work to do there as
> well. The hello world programs in C are using implicit int for
> main which is not allowed in the latest standard, the web one
> fails to include stdio.h which is required (unless you want to do
> the work of providing your own prototype),

Let me use -Wall to fix all that ... h.c h1.c h2.c done

In cgis.c (needed for h3.c and beyond), there's a line of code that
shifts the existing value to the left 4 bits and then adds in the
four new bits obtained from the hexadecimal character in the string
it's walking. The line of code looks like this:
c = c<<4 + h;
but the gnu c compiler complains:
cgis.c:118: warning: suggest parentheses around + or - inside shift
Give that there are clearly extra spacing around the =, while the
<< is compact, it's quite clear the intention of the author was:
c = (c<<4) + h;
so it's stupid for the compiler to suggest making it instead:
c = c<<(4 + h);
Should I leave it as-is, or put parens around the shift to avoid
the stupid mis-leading warning?? (Your personal opinion, what you'd
do in my circumstance, writing code examples to share with others,
but in this case simply using somebody else's module to which I
already had to fix a bug before it'd compile.)

Fixed h3.c, all done. Thanks for the heads-up. All my code worked
fine as they were, but they are supposed to be examples for novices
to copy and try and emulate etc. so they faltered in that respect.
Take another look now if you have time.

> and one of them is a deliberately obfusticated program which
> relies on ASCII which the C standard does not guarantee.

Which one specifically? Cite a line of code taht relies on ASCII
and I'll get the idea which section of it to study?

> ... you use implicit int all over the place ...

The only place I used implicit int was in return value for main,
which has now been fixed in all cgi-bin/*.c files unless I screwed
up somewhere.

> So is this prototype you show "int g2(int n1,n2);"

I don't see anything wrong with that prototype. Do I need to
declare n1 and n2 separately, like this?
int g2(int n1, int n2);

> There are several other errors.

Feel free to find a couple totally different errors and tell me
about them, the I'll fix them and anything else they remind me of.

> I suggest you need to learn C properly before writing any kind of
> "CookBook" that includes C in the languages it uses.

I already took three semester-length C classes. That's all that are
offered at De Anza College. What do you suggest for further
correction of anything I happened to get wrong after three
semesters of formal study plus various Web-based exploration
looking for specific info such as strtoll and strcspn?

Random832

unread,
Feb 14, 2007, 10:47:19 PM2/14/07
to
2007-02-13 <rem-2007...@yahoo.com>,

robert maas, see http://tinyurl.com/uh3t wrote:
>> From: Flash Gordon <s...@flash-gordon.me.uk>
>> > ... The first time I ran this
>> > program, without the sleep call, and pressed ctrl-D to generate
>> > end-of-stream on stdin, the program went into infinite read-EOS
>> > spew-text loop, which filled up all modem buffers. ...
>> I can only suggest that you had some other bug in your program at that
>> point or a but in your modem. As presented your program would not do
>> that whether it detected an error or EOF it would break out of the loop
>> and terminate.
>
> Not a bug. It's just that the part of the program to detect EOF wasn't yet
> written, and that's the very part I was trying to develop.
> Step 1: Put in a printf to see what value comes back when I press ctrl-D.
> Step 2: Write code to detect that value and break out of loop.
> Step 3: Test that to see whether it works.
> Step 4: Remove the printf.
> Unfortunately step 1 blew me out for ten minutes or so without the sleep.

Why was there a loop at all in step 1?

Flash Gordon

unread,
Feb 15, 2007, 3:45:04 AM2/15/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 15/02/07 00:24:

>> From: Flash Gordon <s...@flash-gordon.me.uk>
>> you are assuming the string is meant to have only one data item.

<snip>

>> Looking at some of the earlier stuff you have work to do there as
>> well. The hello world programs in C are using implicit int for
>> main which is not allowed in the latest standard, the web one
>> fails to include stdio.h which is required (unless you want to do
>> the work of providing your own prototype),
>
> Let me use -Wall to fix all that ... h.c h1.c h2.c done

You should use "-ansi -pedantic" as well, together with possibly -W.

> In cgis.c (needed for h3.c and beyond), there's a line of code that
> shifts the existing value to the left 4 bits and then adds in the
> four new bits obtained from the hexadecimal character in the string
> it's walking. The line of code looks like this:
> c = c<<4 + h;
> but the gnu c compiler complains:
> cgis.c:118: warning: suggest parentheses around + or - inside shift
> Give that there are clearly extra spacing around the =, while the
> << is compact, it's quite clear the intention of the author was:
> c = (c<<4) + h;
> so it's stupid for the compiler to suggest making it instead:
> c = c<<(4 + h);

You consider a compiler to be stupid for following the language
specification? C, like most computing languages, does not use white
space to group expressions. I seem to recall you also cover Perl in your
"CookBook" and based on this one assumption I would say you don't know
Perl or C.

Did you actually even go to the effort of trying code before putting it
up on your web site? I think not.

<snip>

>> I suggest you need to learn C properly before writing any kind of
>> "CookBook" that includes C in the languages it uses.
>
> I already took three semester-length C classes. That's all that are
> offered at De Anza College.

I'm sorry, but either you failed or those courses based on your current
knowledge or they appear to be almost worthless.

> What do you suggest for further
> correction of anything I happened to get wrong after three
> semesters of formal study plus various Web-based exploration
> looking for specific info such as strtoll and strcspn?

Well, in general doing web-based stuff is a bad idea unless you have a
*very* good reason to trust the specific ones you are using. Your
"CookBook" currently seems to be a prime example of why you should *not*
trust web resources.

Do the world a favour and take down your "CookBook" since you are a long
way from having enough knowledge to write it for even one language, let
alone 6.

I suggest you start looking at the comp.lang.c FAQ (Google will find it)
and buy a copy of K&R2 (the full details are in the bibliography of the
FAQ). Work through *all* the exercises in K&R2 starting with the
assumption that you do not know C since you really do not know it.
--
Flash Gordon

Richard Bos

unread,
Feb 15, 2007, 9:06:55 AM2/15/07
to
rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) wrote:

> But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>
> Many of the functions described in this chapter return the value of
> the macro EOF to indicate unsuccessful completion of the operation.
> Since EOF is used to report both end of file and random errors, it's
> often better to use the feof function to check explicitly for end of
> file and ferror to check for errors.

GNU is wrong on ISO C and does not care. Film at eleven.

Richard

CBFalconer

unread,
Feb 15, 2007, 5:49:41 PM2/15/07
to

In what way?

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 15, 2007, 9:22:58 PM2/15/07
to
> From: Keith Thompson <k...@mib.org>

> > There are several other errors.
> Including the use of "defined" rather than "declared".

OK, there was one section of CookTop.html that was sloppy in the
jargon. I think I've tentatively fixed it. It's rather awkward at
present, but at least it doesn't confuse the two terms. Here's the
(backwards) diff:
% diff CookTop.html*
1787,1788c1787
< checking each fuction-call to make sure the function is declared (i.e.
< at least a prototype showing return type and formal parameters), and
---


> checking each fuction-call to make sure the function is defined, and

1790,1793c1789,1791
< function that isn't declared. It can't guess that you're calling a function
< you will be defining later in the file. Most of the time you actually define
< each function before using it. But if you really must call a
< fuction before you defie it, for example if you have two functions that
---
> function that isn't defined. It can't guess that you're calling a function
> you will be defining later in the file. But if you really must call a
> fuction before you define it, for example if you have two functions that
1795,1796c1793
< You write just a declaration for any function that needs to be called before
< it's defined. You write the type of return value,
---
> You write a function-definition template. You write the type of return value,
1811c1808
< to try to keep the declaration matching the actual function definition
---
> to try to keep the template matching the actual function definition

Thanks for the "heads-up".

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 15, 2007, 9:30:14 PM2/15/07
to
> From: Random832 <ran...@random.yi.org>

> Why was there a loop at all in step 1?

Because after compiling and starting the program and typing a test
value and restarting the program and typing another test value and
restarting the program and typing another test value and restarting
the program and typing another test value, I got fed up with having
to manually re-start the program every time I wanted to type in a
new test value.

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 16, 2007, 12:25:32 AM2/16/07
to
> From: Flash Gordon <s...@flash-gordon.me.uk>

> > Let me use -Wall to fix all that ... h.c h1.c h2.c done
> You should use "-ansi -pedantic" as well, together with possibly -W.

Why? What purpose would be served by doing that?

> > so it's stupid for the compiler to suggest making it instead:
> > c = c<<(4 + h);
> You consider a compiler to be stupid for following the language
> specification?

The language specification does not forbid or suggest against
shifting a value to the left to make room for adding another small
bunch of bits on the right. The code as written is perfectly valid,
a suggestion that it ought to be changed to add in the new bits
right on top of the old ones (mangling both) and *then* shifting to
the left (leaving a hole where the new bits should have been) is
not a good suggestion.

> Did you actually even go to the effort of trying code before
> putting it up on your web site? I think not.

The code for doing the data processing, yes. Didn't you see the
thread where I had a SLEEP call in the test rig to prevent runaway
spew if ctrl-D was pressed to generate EOS on STDIN. After I got
the code for validating string decimal representation of integer
and conversion to actual long long int datatype all working, *then*
I interface it to CGI and put it up, and tried it, and made sure it
was all working before leaving it standing for others to use.

The code for interfacing to CGI, well there's no way to test that
without putting it up on cgi-bin, where anybody might accidently
try it while I'm right in the midst of working on it. There's no
way to avoid that. There's no way for me to run any CGI software
without making it public-available. But I make sure there's a short
period from when I first try interfacing it until it's working, and
it has a WARNING, CODE NOT YET TESTED YET ... at the boundary
between what's already working and what I'm testing at the moment,
just in case somebody else tries it in the middle of a development
period.

I'm probably going to continue the same policy in the future. Any
time I am starting to write a brand-new major algorithm that will
require a lot of work before it's ready for others to try, I'll do
it in a stdio test rig before interfacing it to CGI. But any time
I'm just adding one or two line(s) of code at a time to an existing
script I'll probably put it directly online with that warning ahead
of it. Do you have a serious problem with my policy in this matter??

> > I already took three semester-length C classes. That's all that are
> > offered at De Anza College.
> I'm sorry, but either you failed

I got an "A" in every one of those classes. If you don't believe
me, come here, we'll go to the public library where there's access
to JavaScript (required for viewing transcripts), and I'll show you
my complete DeAnza transcript. If you want to call me a liar in a
public newsgroup, then fuck you bastard!!

> or those courses based on your current knowledge or they appear
> to be almost worthless.

You're entitled to your opinion on such matters. Perhaps you should
come here and look at my transcript to see which instructors were
teaching those classes, and then you write a formal letter to
De Anza College complaining that all those instructors are
incompetant to teach C programming classes.

> Your "CookBook" currently seems to be a prime example of why you
> should *not* trust web resources.

The primary purpose of my "CookBook" is to show, in several
languages in parallel how to do various common tasks, such as the
tasks provided by standard libraries in the various languages, and
eventually some of the more advanced tasks covered in the Perl and
Common Lisp cookbooks. That should accomplish several purposes:
- If a person is trying to learn a new language, and knows how to
do something in one language but needs to know how to do
something equivalent in the new language, whereby the person can
directly search for the library function in the language and
thereby jump directly to the place where that function is
compared to equivalents in other languages.
- If the person wants to convert one kind of data to another kind,
the person can look in the table of contents to find either type
first and the other type as a sub-heading, and thereby have a
short section of similar functions to browse, not distracted by
other functions that deal with other combinations of data types.
- If a person is trying to pick an appropriate language for some
utility, and has an idea what specific data processing steps
would be involved in the task, the person can look up each
relevant section per processing step and get a idea how well
each language covers that step, and thereby get a general idea
how much extra work would be required, or whether it's even
feasible, in the various languages.

At present my "CookBook" is very far from completion. I have
finished including one c library, and lisp equivalents, and am
starting on two more c libraries. I still need to include the rest
of the c libraries, all the stuff in lisp that has no c equivalent,
and include java equivalents for all of that. Then someday I need
to check differences between c and c++ for this all and show the
c++ way whereever different. I also need to include all the java
stuff that's not available in c or lisp. Also someday I need to
include perl and php equivalents where different from c. And of
course include the perl/php stuff that's not in the other languages
at all. For the moment, I'm concentrating on completeness of
data/processing tasks, not much covering control structures such as
thread or inter-process communication at all. Better to be complete
(eventually, this year I hope) in one major class of processing
tasks, than to jump around willy nilly and never get any particular
class of tasks completely covered in five years. In particular, in
browsing the table of contents of the fine GNU C library document:
<http://www.aquaphoenix.com/ref/gnu_c_library/>
I noticed a large amount of stuff on pipes and sockets, which I've
decided *not* to include on this first major pass, partly because
it'd be the "straw that broke the camel's back" for my workload,
but also because it doesn't fit into the datatype matrix anyway.
I've decided to stick totally with the libraries that process data
types inside the machine, until I get that virtually all done.
Actually I'm not even sure I want to finish the library I started
exploring yesterday, the stuff with floating-point numbers. I might
decide to abort that before investing any more time with it.

> I suggest you start looking at the comp.lang.c FAQ (Google will find it)

Is this the one you want me to look at? <http://c-faq.com/>

I browsed it a little, and found one apparent mistake:
<http://c-faq.com/aryptr/arraylval.html>

Q: How can an array be an lvalue, if you can't assign to it?
_________________________________________________________________

A: The term ``lvalue'' doesn't quite mean ``something you can assign
to''; a better definition is ``something that has a location (in
memory).'' [footnote] The ANSI/ISO C Standard goes on to define a
``modifiable lvalue''; an array is not a modifiable lvalue. See also
question 6.5.

In fact you *cannot* assign to an array (except if it was declared
as a formal parameter, in which case it's already degraded to a
simple pointer which *can* be assigned to). You can only assign to
an *element* of an array. For example:
int main(void) {
char name[10] = "John";
name[2] = 'a'; /* Valid, assign to element name is now "Joan". */
name = "Mike"; /* Not legal, assign to *array* itself. */
...
Am I correct there? Thus the question above presumes a false fact,
and the answer should right at the top point out the false premise,
not assume the false premise and issue a red herring of an answer.

Hmmm, curious:
<http://c-faq.com/misc/returnparens.html>
Just the other day somebody corrected me because I followed the
examples/spec in K&R on pages 23, 68, and 70, where the sytax is
repeatedly stated as return(expression). But way back on page 203
it says instead return expression; (no parens), which I noticed
just now for the very first time, in respose to this FAQ item. Is
that a mistake in proofreading in K&R, and if so which was correct
at the time it was written, i.e. were pages 23/68/70 all wrong, or
was page 203 wrong, at the time it was written?

> and buy a copy of K&R2

I have no money to buy anything. Please provide me with a job that
pays earned income if you want to change this present condition of
my life.

Keith Thompson

unread,
Feb 16, 2007, 2:13:42 AM2/16/07
to
rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:
>> From: Flash Gordon <s...@flash-gordon.me.uk>
>> > Let me use -Wall to fix all that ... h.c h1.c h2.c done
>> You should use "-ansi -pedantic" as well, together with possibly -W.
>
> Why? What purpose would be served by doing that?

It would catch more errors.

>> > so it's stupid for the compiler to suggest making it instead:
>> > c = c<<(4 + h);
>> You consider a compiler to be stupid for following the language
>> specification?
>
> The language specification does not forbid or suggest against
> shifting a value to the left to make room for adding another small
> bunch of bits on the right. The code as written is perfectly valid,
> a suggestion that it ought to be changed to add in the new bits
> right on top of the old ones (mangling both) and *then* shifting to
> the left (leaving a hole where the new bits should have been) is
> not a good suggestion.

Quoting what you wrote up-thread:
| In cgis.c (needed for h3.c and beyond), there's a line of code that
| shifts the existing value to the left 4 bits and then adds in the
| four new bits obtained from the hexadecimal character in the string
| it's walking. The line of code looks like this:
| c = c<<4 + h;
| but the gnu c compiler complains:
| cgis.c:118: warning: suggest parentheses around + or - inside shift
| Give that there are clearly extra spacing around the =, while the
| << is compact, it's quite clear the intention of the author was:
| c = (c<<4) + h;

| so it's stupid for the compiler to suggest making it instead:
| c = c<<(4 + h);

You have misunderstood what that expression means. I think somebody
already explained it to you; I'll try again.

The spacing around operators is *ignored* by the compiler, though it
can be useful for legibility and to make your intent clear to the


reader. You wrote:
c = c<<4 + h;

which you apparently wanted to be evaluated as


c = (c<<4) + h;

but it actually *means*


c = c << (4 + h);

because the "+" operator binds more tightly than the "<<" operator.
gcc was kind enough (and clever enough) to warn you about this.

The spacing might convey your intention to a human reader; it does not
convey anything to the compiler.

If you don't believe me, try this program (10 and 17 are just
arbitrary values chosen to cause the expression to give different
results depending on the grouping):

#include <stdio.h>
int main(void)
{
int c = 10;
int h = 17;

if (c<<4 + h == (c<<4) + h) {
printf("c<<4 + h == (c<<4) + h\n");
}

if (c<<4 + h == c<<(4 + h)) {
printf("c<<4 + h == c<<(4 + h)\n");
}
return 0;
}

Another example, that might be clearer:
x+y * z
*looks* like it should mean
(x + y) * z
but it actually means
x + (y * z)

If you find yourself using spacing to indicate grouping in an
expression, I suggest you use parentheses instead.

[...]


> The code for interfacing to CGI, well there's no way to test that
> without putting it up on cgi-bin, where anybody might accidently
> try it while I'm right in the midst of working on it. There's no
> way to avoid that.

[...]

<OFF-TOPIC>
There are a number of ways to avoid that; some of them may not be
available to you, depending on the resources to which you have access.

If you're able to set up your own web server, you can probably
configure it so that nobody else can access it, and experiment to your
heart's content.

If that's not possible, and all you can do is install your code in
cgi-bin, you can try installing it with a name that nobody is likely
to stumble across. You can exercise the code because you know its
name, but nobody else can.
</OFF-TOPIC>

If you have questions about CGI, try asking them in
comp.infosystems.www.authoring.cgi.

[snip]

>> > I already took three semester-length C classes. That's all that are
>> > offered at De Anza College.
>> I'm sorry, but either you failed
>
> I got an "A" in every one of those classes. If you don't believe
> me, come here, we'll go to the public library where there's access
> to JavaScript (required for viewing transcripts), and I'll show you
> my complete DeAnza transcript. If you want to call me a liar in a
> public newsgroup, then fuck you bastard!!

Calm down; nobody called you a liar. And consider watching your
language; there's no point in needlessly offending people.

>> or those courses based on your current knowledge or they appear
>> to be almost worthless.

[...]

>> I suggest you start looking at the comp.lang.c FAQ (Google will find it)
>
> Is this the one you want me to look at? <http://c-faq.com/>
>
> I browsed it a little, and found one apparent mistake:
> <http://c-faq.com/aryptr/arraylval.html>
>
> Q: How can an array be an lvalue, if you can't assign to it?
> _________________________________________________________________
>
> A: The term ``lvalue'' doesn't quite mean ``something you can assign
> to''; a better definition is ``something that has a location (in
> memory).'' [footnote] The ANSI/ISO C Standard goes on to define a
> ``modifiable lvalue''; an array is not a modifiable lvalue. See also
> question 6.5.
>
> In fact you *cannot* assign to an array (except if it was declared
> as a formal parameter, in which case it's already degraded to a
> simple pointer which *can* be assigned to). You can only assign to
> an *element* of an array.

[...]

Of course. Read the FAQ again, more carefully; it doesn't say or
imply that you can assign to an array. The FAQ is perfectly correct.

[...]

> Am I correct there? Thus the question above presumes a false fact,
> and the answer should right at the top point out the false premise,
> not assume the false premise and issue a red herring of an answer.

The letter 'l' in the word "lvalue" originally referred to the *left*
side of an assignment. The idea was that an "lvalue" was an
expression that can appear on the left side of an assignment, and an
"rvalue" was an expression that can appear on the right side of an
assignment. This terminology predates C, and those meanings may have
been appropriate for earlier, simpler languages. In C, the meaning of
"lvalue" has changed to include any expression that designates an
object, whether it can be assigned to or not (and the term "rvalue"
has been largely dropped).

The question is based on a misconception. Someone who's familiar with
the historical meaning of "lvalue" is likely to be confused by the
fact that an array can be an lvalue, but can't appear on the left side
of an assignment. The whole point of the answer is to correct that
misconception.

> Hmmm, curious:
> <http://c-faq.com/misc/returnparens.html>
> Just the other day somebody corrected me because I followed the
> examples/spec in K&R on pages 23, 68, and 70, where the sytax is
> repeatedly stated as return(expression). But way back on page 203
> it says instead return expression; (no parens), which I noticed
> just now for the very first time, in respose to this FAQ item. Is
> that a mistake in proofreading in K&R, and if so which was correct
> at the time it was written, i.e. were pages 23/68/70 all wrong, or
> was page 203 wrong, at the time it was written?

The examples are all correct. The parentheses are *optional*. Both
return(42);
and
return 42;
are perfectly legal.

>> and buy a copy of K&R2
>
> I have no money to buy anything. Please provide me with a job that
> pays earned income if you want to change this present condition of
> my life.

I'm sorry if your financial situation is unfavorable, but that
certainly isn't anybody else's responsibility. Complaining here about
things we can't help you with wastes your time and ours.

Richard Bos

unread,
Feb 16, 2007, 2:44:41 AM2/16/07
to
CBFalconer <cbfal...@yahoo.com> wrote:

> Richard Bos wrote:
> > rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) wrote:
> >
> >> But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>
> >>
> >> Many of the functions described in this chapter return the value of
> >> the macro EOF to indicate unsuccessful completion of the operation.
> >> Since EOF is used to report both end of file and random errors, it's
> >> often better to use the feof function to check explicitly for end of
> >> file and ferror to check for errors.
> >
> > GNU is wrong on ISO C and does not care. Film at eleven.
>
> In what way?

In that it's _not_ better to use the feof() function to check for eof.
feof() is good as an aid to distinguish between eof and error _once
you've already checked for EOF_. IOW, it's used with EOF, not better
than.

Richard

CBFalconer

unread,
Feb 16, 2007, 3:50:34 AM2/16/07
to
Richard Bos wrote:
> CBFalconer <cbfal...@yahoo.com> wrote:
>> Richard Bos wrote:
>>> rem...@yahoo.com (robert maas wrote:
>>>
>>>> But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>
>>>>
>>>> Many of the functions described in this chapter return the
>>>> value of the macro EOF to indicate unsuccessful completion
>>>> of the operation. Since EOF is used to report both end of
>>>> file and random errors, it's often better to use the feof
>>>> function to check explicitly for end of file and ferror to
>>>> check for errors.
>>>
>>> GNU is wrong on ISO C and does not care. Film at eleven.
>>
>> In what way?
>
> In that it's _not_ better to use the feof() function to check for
> eof. feof() is good as an aid to distinguish between eof and
> error _once you've already checked for EOF_. IOW, it's used with
> EOF, not better than.

While I agree with your statement above, how does that make GNU
wrong? feof shows the file is at EOF, but not that a read etc.
failed. If it is not at EOF a read may succeed, or may fail due to
reaching EOF, or may fail due to i/o error. I think you are
objecting to the fact that they don't state explicitly that these
calls should be used to resolve the cause of receiving an EOF
signal. We don't know the context of the above quote without going
to the original, which I haven't. 'better' may simply mean better
than assuming receiving EOF means the file is at eof.

CBFalconer

unread,
Feb 16, 2007, 4:03:21 AM2/16/07
to
robert maas wrote:
>> From: Flash Gordon <s...@flash-gordon.me.uk>
>
... snip ...

>
>> and buy a copy of K&R2
>
> I have no money to buy anything. Please provide me with a job that
> pays earned income if you want to change this present condition of
> my life.

Then you might be well advised to listen to at least some of the
advice you are receiving rather than going off in the wilderness
with random insults. Your knowledge shows gaping holes, and by
your own statements that can only be due to the lack of quality in
your education and/or failure to listen. Your performance here
makes the latter more likely.

santosh

unread,
Feb 16, 2007, 6:25:08 AM2/16/07
to
Keith Thompson wrote:
> rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:
> > Flash Gordon wrote:

<snip>

If so, is there a reason to retain that term at all, and not use a
more generic term like expression?

<snip>

Chris Dollin

unread,
Feb 16, 2007, 7:16:01 AM2/16/07
to
Keith Thompson wrote:

> The letter 'l' in the word "lvalue" originally referred to the *left*
> side of an assignment. The idea was that an "lvalue" was an
> expression that can appear on the left side of an assignment,

Actually no. The idea was that an lvalue was the /value/ that was
obtained by evaluating an expression which was to be assigned
to, ie, traditionally [but not exclusively] on the left-hand-side
of an assignment.

> and an "rvalue" was an expression that can appear on the right
> side of an assignment.

Similarly, "rvalue" meant the /value/ obtained by evaluating an
expression for its "ordinary", not-being-assigned-to, value,
traditionally the right-hand-side of an assignment.

In the assignment `L := R`, we find L's lvalue and R's rvalue,
and then do some assignment magic which puts the rvalue "into"
the lvalue, which means [absent side-effects ...] that L's
rvalue is now [maybe some conversion of] what R's rvalue was.

Part of the reason for introducing this distinction was to
formalise why the variable `a` in `a := a + 1` means two
different things in the two different places: the left-hand
`a` is evaluated for its lvalue and the right-hand one for
its rvalue. For a variable this typically means evaluating
its lvalue and then dereferencing that.

In some languages, literals have lvalues, so the assignment
`1 := 2` is legal. Depending on the language semantics, `1`
may have a single lvalue, or a different one each time it
is evaluated. (The rvalue of `1` might or might not use its
lvalue.) While for assignment this looks like the rabid
and hungry sabre-toothed tiger, it makes more sense for
parameter-passing ...

> This terminology predates C,

having been introduced or popularised by, if I recall,
Christopher Strachey, in the late 60's-early 70's; it
turns up (ditto) in his /Fundamental Concepts in Programming
Languages/ which a quick google doesn't find (references,
yes, text, no). My paper copy is somewhere at home.

> and those meanings may have been appropriate for earlier,
> simpler languages.

In fact they can work for modern languages -- many of which
are /simpler/ in these respects than some earlier languages.

> In C, the meaning of "lvalue" has changed to include any
> expression that designates an object, whether it can be
> assigned to or not

That still fits inside the original formulation: the lvalue
is the value you get /by evaluating on the left/; you may
then be able to store into (through?) it, or not.

(I agree there's a shift to calling the /expression/ the
lvalue, rather than its /value/ the lvalue. I shall
spare you what some people might be moved to call a
"hissy fit" about this.)

> (and the term "rvalue" has been largely dropped).

In favour of "value", isn't it?

C has (at least) three modes for expression evaluation:
lvalue, (r)value, and what one might call "svalue",
evaluation as the operand of `sizeof`.

--
Chris "electric hedgehog" Dollin
"Our future looks secure, but it's all out of our hands"
- Magenta, /Man and Machine/

Keith Thompson

unread,
Feb 16, 2007, 3:58:34 PM2/16/07
to
"santosh" <santo...@gmail.com> writes:
> Keith Thompson wrote:
[...]

>> The letter 'l' in the word "lvalue" originally referred to the *left*
>> side of an assignment. The idea was that an "lvalue" was an
>> expression that can appear on the left side of an assignment, and an
>> "rvalue" was an expression that can appear on the right side of an
>> assignment. This terminology predates C, and those meanings may have
>> been appropriate for earlier, simpler languages. In C, the meaning of
>> "lvalue" has changed to include any expression that designates an
>> object, whether it can be assigned to or not (and the term "rvalue"
>> has been largely dropped).
>
> If so, is there a reason to retain that term at all, and not use a
> more generic term like expression?

The C standard has done just that. There are exactly two occurrences
of the word "rvalue" in the C99 standard. One is in a footnote in
6.3.2.1:

[...]
What is sometimes called "rvalue" is in this International
Standard described as the "value of an expression".

The other is the index entry referring to this footnote.

C90 has the same wording.

Keith Thompson

unread,
Feb 16, 2007, 4:01:51 PM2/16/07
to
Chris Dollin <chris....@hp.com> writes:
> Keith Thompson wrote:
>> The letter 'l' in the word "lvalue" originally referred to the *left*
>> side of an assignment. The idea was that an "lvalue" was an
>> expression that can appear on the left side of an assignment,
>
> Actually no. The idea was that an lvalue was the /value/ that was
> obtained by evaluating an expression which was to be assigned
> to, ie, traditionally [but not exclusively] on the left-hand-side
> of an assignment.

Ah, that makes sense.

>> and an "rvalue" was an expression that can appear on the right
>> side of an assignment.
>
> Similarly, "rvalue" meant the /value/ obtained by evaluating an
> expression for its "ordinary", not-being-assigned-to, value,
> traditionally the right-hand-side of an assignment.

And that explains why that footnote says that an rvalue is the *value*
of the expression, while an lvalue has come to refer to the expression
itself. Thanks for the clarification.

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 16, 2007, 4:20:28 PM2/16/07
to
> From: Keith Thompson <k...@mib.org>

> >> You should use "-ansi -pedantic" as well, together with possibly -W.
> > Why? What purpose would be served by doing that?
> It would catch more errors.

More likely generate bogus errors. For example:

% more tryll.c
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
..
(the declaration for strtoll is located in stdlib.h)
Reference:
<http://www.delorie.com/gnu/docs/glibc/libc_423.html>
20.11.1 Parsing of Integers
The `str' functions are declared in `stdlib.h' ...
...
Function: long long int strtoll (const char *restrict string, char
**restrict tailptr, int base)

% gcc tryll.c -Wall
(compiles without warnings)

% gcc tryll.c -Wall -ansi
tryll.c: In function `tryParseLliTalk':
tryll.c:53: warning: implicit declaration of function `strtoll'
(that's a totally bogus warning, given that stdlib.h is included)

Perhaps -ansi might be of some value later when I'm writing actual
examples of how to do specific things. But right now strtoll is
used only to convert the decoded HTML FORM field from string to
integer so that I can *later* use that numeric value to demonstrate
arithmetic and other functions that require integers as input,
allowing the user to vary the particular integer value being used
as input simply by changing the contents of the form field.

% gcc tryll.c -Wall -pedantic
In file included from tryll.c:3:
/usr/include/stdlib.h:111: warning: ANSI C does not support `long long'
/usr/include/stdlib.h:117: warning: ANSI C does not support `long long'
tryll.c:5: warning: ANSI C does not support `long long'
tryll.c: In function `printflld':
tryll.c:6: warning: ANSI C does not support the `ll' length modifier
tryll.c:11: warning: ANSI C does not support `long long'
tryll.c: At top level:
tryll.c:25: warning: ANSI C does not support `long long'
tryll.c: In function `mystrtoll':
tryll.c:27: warning: ANSI C does not support `long long'
tryll.c: In function `tryParseLliTalk':
tryll.c:49: warning: ANSI C does not support `long long'
tryll.c:57: warning: ANSI C does not support the `ll' length modifier
tryll.c:62: warning: ANSI C does not support the `ll' length modifier

That's even worse at emitting worthless crap that's of no use to me
at this time. Maybe later when I'm making examples of code, this
might allow me to flag anything that's not usable outside the GNU
environment. For now, that would just present so much noise that
I'd never see a signal of an *actual* error in my code.

% gcc tryll.c -Wall -W
(compiles without any warnings)

> You wrote:
> c = c<<4 + h;
> which you apparently wanted to be evaluated as
> c = (c<<4) + h;
> but it actually *means*
> c = c << (4 + h);
> because the "+" operator binds more tightly than the "<<" operator.
> gcc was kind enough (and clever enough) to warn you about this.

Ah, well I didn't write that, Peter Burden wrote it, see link here:
<http://www.rawbw.com/~rem/HelloPlus/hellos.html#c3>
So you're saying his code has a bug in it?
Hmm, I tried putting special characters into the FirstName field in that
form, and indeed it failed to decode it correctly. For example,
when I enter into the field
foo&baz
it gets urlencoded as
firstname=foo%26baz
but Peter's decoder give me:
For key [firstname] the value is [foo].
which is probably actually internally
foo<NUL>baz
where <NUL> prematurely ends the %s printf of the string.

Let me try putting parens there to make the nesting of operators
correct, done, recompiling cgis.c, done, rebuilding h3-c.cgi, done,
re-testing, horay, the bug is fixed, it now gives:
For key [firstname] the value is [foo&baz].

Thank you very much for the heads-up on this bug in Peter's code.
Now I need to comment that change I made in the code, done, and
update my description of it in hellos.html, done!!

> If you don't believe me, try this program ...

I didn't need to. I just keyed special characters such as & and <
into my Web form, observed the hexadecimal encoding of them when I
asked lynx to show me the full URI (using GET method the urlencoded
form contents are after the ? mark in the URI), and then compared
to what came through after decoding as I showed above.

> If you find yourself using spacing to indicate grouping in an
> expression, I suggest you use parentheses instead.

Any time I need to look at the operator precedence table, it means
I shouldn't bother, use parens instead. That's what I do in all my
code. But this was Peters, and it was posted for public use, and
despite one mismatched prototype/definition pair (which I fixed,
and assumed had been left in because his compiler treated the two
as the same whereas mine doesn't), I assumed the basic algorithms
had been tested before posting. I guess I assumed too much. I never
did a complete line-by-line examination of his code to make sure it
looked correct, although now I did a line-by-line browse of it just
to see if anything else was obviously questionable, as well as to
find that troubling line of code again and see if it was the only
case of that particular problem.

So operator precence fooled Peter, and fooled me too, and Peter
didn't bother to unit-test each line of his code, and I just
assumed he had done so before posting his code to the net for
others to use. When I write code, I unit test *every* line of code,
even the most trivial of things, just to make sure I didn't make
some really really stupid typographic error, before I go on to the
next line of code. In lisp that style of developing code comes
totally naturally without any pain. In c it's a royal pain, which
is one of many reasons I hate to write software in c. For an actual
example of how I develop code with line-by-line unit testing, see:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/CookTop.html#stru>
(From the table of contents, in chapter 2, the section "Overall
structure of a program")
then search for the paragraph that begins:
Here's my preferred way to develop lisp software: Bottom-up overall,
input-forward within each level, unit-test each line of code before
incorporating it into a block of code, unit-test the completed block
before building a function around it, unit test the resultant function
before moving on to start work on the next. ...
and read the annotated transcript that follows.

> <OFF-TOPIC>
> There are a number of ways to avoid that; some of them may not be
> available to you, depending on the resources to which you have access.

> If you're able to set up your own web server, ...

Not available to me. No money to lease office space and purchase
equipment, nor to lease space on a commercial server farm.

> If that's not possible, and all you can do is install your code in
> cgi-bin, you can try installing it with a name that nobody is likely
> to stumble across. You can exercise the code because you know its
> name, but nobody else can.

But I'll be incrementally adding to existing CGI application, and
it's a royal pain to make a complete copy of everything (actual
program modules on cgi-bin, Web page with forms for exercising it
in GET and POST mode), and even if I do all that, I still have the
task of folding everything back into the main files, during which
there's a "flag day" when a mix of old and new stuff is
inconsistent. Since this isn't a major commerical service like the
Google search engine with millions or even thousands of users, I
consider it reasonble that new stuff is simply installed directly
online with a suitable warning for the zero or one users who happen
to bump into it during the few minutes it takes to complete the CGI
interfacing of the new module I already got debugged in a stdio
test rig. If I thought I had more than two users during any one
day, of which there was a good chance one of them would encounter
the temporary under-construction state, I might reconsider. At
present I have a couple other much more valuable services which
each gets used by somebody other than myself only about once in two
to six months.

Have you personally *ever* tried one of my CGI applications and
encountered a server abort, or obviously truncated output, because
the program crashed at the start (before transmitting the
CGI-MIME-type header) or in the middle respectively, because I had
just installed something new and it had a horrible bug? Have you
even see it produce obviously partial new output, where each time
you'd refresh it'd show a little bit more as I added line by line
of new code at the bottom? Has anyone reading this thread
encountered such work-in-progress ever?

> If you have questions about CGI, try asking them in
> comp.infosystems.www.authoring.cgi.

I don't have any such questions at the present. But if and when I
ever decide to generate cookies, I might ask there, thanks for the
pointer.

More reply later...

santosh

unread,
Feb 16, 2007, 4:54:22 PM2/16/07
to

It's not. It means that the prototype for strtoll() *isn't* included
in stdlib.h, but it's object code is present in the linker's path.
That's why it links fine but complains about missing prototype when
compiled at a higher warning level. Manually search include files like
stdlib.h, stdint.h, inttypes.h, string.h etc., for strtoll(). If you
don't find it's prototype, then write your own, as per the standard.
It's likely to work.

<snip>

> % gcc tryll.c -Wall -pedantic
> In file included from tryll.c:3:
> /usr/include/stdlib.h:111: warning: ANSI C does not support `long long'
> /usr/include/stdlib.h:117: warning: ANSI C does not support `long long'
> tryll.c:5: warning: ANSI C does not support `long long'
> tryll.c: In function `printflld':
> tryll.c:6: warning: ANSI C does not support the `ll' length modifier
> tryll.c:11: warning: ANSI C does not support `long long'
> tryll.c: At top level:
> tryll.c:25: warning: ANSI C does not support `long long'
> tryll.c: In function `mystrtoll':
> tryll.c:27: warning: ANSI C does not support `long long'
> tryll.c: In function `tryParseLliTalk':
> tryll.c:49: warning: ANSI C does not support `long long'
> tryll.c:57: warning: ANSI C does not support the `ll' length modifier
> tryll.c:62: warning: ANSI C does not support the `ll' length modifier
>
> That's even worse at emitting worthless crap that's of no use to me
> at this time. Maybe later when I'm making examples of code, this
> might allow me to flag anything that's not usable outside the GNU
> environment. For now, that would just present so much noise that
> I'd never see a signal of an *actual* error in my code.

I personally prefer to set the diagnostic level high. If it's really a
benign warning, I ignore it. Silencing the compiler means you're not
taking full advantage of it's automated analysis, something that
computers are quite good at.

<snip>

Joe Wright

unread,
Feb 16, 2007, 5:55:22 PM2/16/07
to
Keith Thompson wrote:
> Chris Dollin <chris....@hp.com> writes:
>> Keith Thompson wrote:
>>> The letter 'l' in the word "lvalue" originally referred to the *left*
>>> side of an assignment. The idea was that an "lvalue" was an
>>> expression that can appear on the left side of an assignment,
>> Actually no. The idea was that an lvalue was the /value/ that was
>> obtained by evaluating an expression which was to be assigned
>> to, ie, traditionally [but not exclusively] on the left-hand-side
>> of an assignment.
>
> Ah, that makes sense.
>
>>> and an "rvalue" was an expression that can appear on the right
>>> side of an assignment.
>> Similarly, "rvalue" meant the /value/ obtained by evaluating an
>> expression for its "ordinary", not-being-assigned-to, value,
>> traditionally the right-hand-side of an assignment.
>
> And that explains why that footnote says that an rvalue is the *value*
> of the expression, while an lvalue has come to refer to the expression
> itself. Thanks for the clarification.
>
An lvalue is an expression which allows the compiler to know the address
of an object, perhaps for purposes of assignment. The value of that
object is not really of interest.

Given:

int a, b;

a = 0;
b = 1;

Expressions a and b are both lvalues in the previous two lines. Nobody
cares the prior values of a or b.

a = b;

Now, a is the lvalue and b is not (an expression not an lvalue is an
rvalue). At this point the rvalue of b is 1 and nobody cares the rvalue
of a, the lvalue (address) of a is needed here. Now a == b.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Flash Gordon

unread,
Feb 16, 2007, 6:47:46 PM2/16/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 16/02/07 21:20:

>> From: Keith Thompson <k...@mib.org>
>>>> You should use "-ansi -pedantic" as well, together with possibly -W.
>>> Why? What purpose would be served by doing that?
>> It would catch more errors.
>
> More likely generate bogus errors. For example:

Nope. Well, -W warns about some things that are not necessarily
problems, but -ansi and -pedantic are very valuable.

> % more tryll.c
> #include <stdio.h>
> #include <errno.h>
> #include <stdlib.h>
> ..
> (the declaration for strtoll is located in stdlib.h)
> Reference:
> <http://www.delorie.com/gnu/docs/glibc/libc_423.html>
> 20.11.1 Parsing of Integers
> The `str' functions are declared in `stdlib.h' ...
> ...
> Function: long long int strtoll (const char *restrict string, char
> **restrict tailptr, int base)

Well, here you are failing to understand that there is more than one
version of the standard. -ansi invokes gcc in compliant (bar bugs) C89
mode, if you want things from C99 then you can use -std=c99 but you have
to understand that gcc does not fully conform to C89 or C99 in this mode.

> % gcc tryll.c -Wall
> (compiles without warnings)
>
> % gcc tryll.c -Wall -ansi
> tryll.c: In function `tryParseLliTalk':
> tryll.c:53: warning: implicit declaration of function `strtoll'
> (that's a totally bogus warning, given that stdlib.h is included)

Nope, it is totally correct. Read the man pages.

<snip>

> Perhaps -ansi might be of some value later when I'm writing actual

Actually, it is best to *start* with the compiler with the highest
warning level and only lower it as and when you understand why the
warnings are a good idea and know enough to know why they do not apply
in your specific situation. You currently do not know enough you you
would have found out about -std as well.

<snip>

>> You wrote:
>> c = c<<4 + h;
>> which you apparently wanted to be evaluated as
>> c = (c<<4) + h;
>> but it actually *means*
>> c = c << (4 + h);
>> because the "+" operator binds more tightly than the "<<" operator.
>> gcc was kind enough (and clever enough) to warn you about this.
>
> Ah, well I didn't write that, Peter Burden wrote it, see link here:
> <http://www.rawbw.com/~rem/HelloPlus/hellos.html#c3>
> So you're saying his code has a bug in it?

Why should it not? Do you have evidence that Peter Burden never makes
mistakes?

> So operator precence fooled Peter, and fooled me too, and Peter
> didn't bother to unit-test each line of his code, and I just
> assumed he had done so before posting his code to the net for
> others to use.

If you post stuff for people to use or link to it from your tutorial it
is *your* responsibility to test it out.

You have failed to fix the other errors pointed out. It would be better
in my opinion to take the page down or add a big warning at the top
saying "CONTAINS MAJOR FACTUAL ERRORS" rather than leaving it up as it is.
--
Flash Gordon

Keith Thompson

unread,
Feb 16, 2007, 7:45:34 PM2/16/07
to
rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:

No, it is not a bogus warning. I explained this to you at length four
days ago. See
<http://groups.google.com/group/comp.lang.c/msg/3922ea6246cdbc4b>.

> Perhaps -ansi might be of some value later when I'm writing actual
> examples of how to do specific things. But right now strtoll is
> used only to convert the decoded HTML FORM field from string to
> integer so that I can *later* use that numeric value to demonstrate
> arithmetic and other functions that require integers as input,
> allowing the user to vary the particular integer value being used
> as input simply by changing the contents of the form field.
>
> % gcc tryll.c -Wall -pedantic
> In file included from tryll.c:3:
> /usr/include/stdlib.h:111: warning: ANSI C does not support `long long'
> /usr/include/stdlib.h:117: warning: ANSI C does not support `long long'

[snip]

The "-ansi" option tells gcc to support the C89/C90 standard. If you
really need long long and strtoll, you should use "-std=c99" instead
of "-ansi". (gcc does not fully support the C99 standard, but it does
support a fairly good fraction of it.) If you *don't* really need
integers bigger than 32 bits, you might consider using "long" and
"strtol" rather than "long long" and "strtoll"; then "-ansi" shouldn't
have anything to complain about.

(To be clear, I'm not telling you that you don't need integers bigger
than 32 bits; I'm advising you on what you can do *if* you don't need
integers bigger than 32 bits.)

[snip]

>> You wrote:
>> c = c<<4 + h;
>> which you apparently wanted to be evaluated as
>> c = (c<<4) + h;
>> but it actually *means*
>> c = c << (4 + h);
>> because the "+" operator binds more tightly than the "<<" operator.
>> gcc was kind enough (and clever enough) to warn you about this.
>
> Ah, well I didn't write that, Peter Burden wrote it, see link here:
> <http://www.rawbw.com/~rem/HelloPlus/hellos.html#c3>
> So you're saying his code has a bug in it?

Apparently so. But I wasn't responding just to the code; I was
responding to your incorrect claim that gcc's warning was stupid.

You *assumed* that you knew what the expression meant, and that gcc's
warning was incorrect. It was entirely conceivable that you were
right; after all, the authors of gcc are imperfect humans, just like
the rest of us. But whenever you see a compiler warning, you
shouldn't assume that it's bogus unless you're sure that you
completely understand *why* it's bogus. The gcc authors are pretty
smart; they make mistakes, but that's not the way to bet in most
cases.

[...]

>> If that's not possible, and all you can do is install your code in
>> cgi-bin, you can try installing it with a name that nobody is likely
>> to stumble across. You can exercise the code because you know its
>> name, but nobody else can.
>
> But I'll be incrementally adding to existing CGI application, and
> it's a royal pain to make a complete copy of everything

[...]


>
> Have you personally *ever* tried one of my CGI applications and
> encountered a server abort, or obviously truncated output, because
> the program crashed at the start (before transmitting the
> CGI-MIME-type header) or in the middle respectively, because I had
> just installed something new and it had a horrible bug? Have you
> even see it produce obviously partial new output, where each time
> you'd refresh it'd show a little bit more as I added line by line
> of new code at the bottom? Has anyone reading this thread
> encountered such work-in-progress ever?

To the best of my knowledge, I have never used one of your CGI
applications at all, and frankly I don't much care how they work
unless they raise interesting issues about C.

Flash Gordon

unread,
Feb 16, 2007, 7:18:45 PM2/16/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 16/02/07 05:25:
>> From: Flash Gordon <s...@flash-gordon.me.uk>

<snip stuff addressed by others>

>> Did you actually even go to the effort of trying code before
>> putting it up on your web site? I think not.
>
> The code for doing the data processing, yes. Didn't you see the

Yes. Did you see the code I pointed out that will not even compile? If
you post code that does not compile I think it is reasonable to assume
that you have not tested it.

>>> I already took three semester-length C classes. That's all that are
>>> offered at De Anza College.
>> I'm sorry, but either you failed
>
> I got an "A" in every one of those classes. If you don't believe
> me, come here, we'll go to the public library where there's access
> to JavaScript (required for viewing transcripts), and I'll show you
> my complete DeAnza transcript. If you want to call me a liar in a
> public newsgroup, then fuck you bastard!!

If you got an A in all of them then the second part of my statement applies.

>> or those courses based on your current knowledge or they appear
>> to be almost worthless.
>
> You're entitled to your opinion on such matters. Perhaps you should
> come here and look at my transcript to see which instructors were
> teaching those classes, and then you write a formal letter to
> De Anza College complaining that all those instructors are
> incompetant to teach C programming classes.

If you pay my air fares and for my time at normal consulting rates
(normal for me, that is) I will be happy to. Or you could complain to
them yourself about the fact that people who do know the languages think
that you do not.

>> Your "CookBook" currently seems to be a prime example of why you
>> should *not* trust web resources.
>
> The primary purpose of my "CookBook" is to show, in several
> languages in parallel how to do various common tasks, such as the
> tasks provided by standard libraries in the various languages, and

At the moment it does one of two things depending on the level of
knowledge of the reader. It either shows your lack of knowledge or it
leads the reader up the garden path.

> eventually some of the more advanced tasks covered in the Perl and
> Common Lisp cookbooks. That should accomplish several purposes:
> - If a person is trying to learn a new language, and knows how to

<snip>

Currently it won't show people how to do anything.

> At present my "CookBook" is very far from completion. I have

At the moment what you have up there is far from correct.

> finished including one c library, and lisp equivalents, and am
> starting on two more c libraries. I still need to include the rest

No, you need to start at the beginning and correct all the things you
have got wrong.

<snip>

>> I suggest you start looking at the comp.lang.c FAQ (Google will find it)
>
> Is this the one you want me to look at? <http://c-faq.com/>

Yes. Others have already pointed out that the things you thought were
wrong were in fact completely correct. Do you think that a group that
sometimes has members who are on the standard committee present,
sometimes has major authors of implementations of the standard C library
(and who make their living off said implementation) etc would have a FAQ
with major errors? I'm not in any of those categories and have only
about 12 years experience in C (but rather longer in professional
software development) so you should listen to the many people here who
know the language better than me. You should also take note of the
things I post that they do not correct, because believe me when I make a
mistake it gets corrected.

>> and buy a copy of K&R2
>
> I have no money to buy anything. Please provide me with a job that
> pays earned income if you want to change this present condition of
> my life.

Your finances are your problem. You putting up a seriously flawed page
about programming is a problem for any unsuspecting person who comes
across it and for those who have to pick up the pieces from their
learning incorrect information.
--
Flash Gordon

CBFalconer

unread,
Feb 16, 2007, 9:12:49 PM2/16/07
to
Flash Gordon wrote:
> robert maas, see http://tinyurl.com/uh3t wrote:
>
... snip ...

>>
>> I have no money to buy anything. Please provide me with a job that
>> pays earned income if you want to change this present condition of
>> my life.
>
> Your finances are your problem. You putting up a seriously flawed page
> about programming is a problem for any unsuspecting person who comes
> across it and for those who have to pick up the pieces from their
> learning incorrect information.

The two facts may be connected. Any smart potential employer would
look around for independant info on a prospective employee. If
that employer happens to know C fairly well he will discover how
clueless and arrogant Maas is.

CBFalconer

unread,
Feb 16, 2007, 9:04:28 PM2/16/07
to

You are exposing your ignorance again. long long is only available
in C99 up. For gcc, -ansi instructs it to use the C90 standard,
which it has done, and found no such thing as strtoll in stdlib.h.

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 17, 2007, 12:15:32 AM2/17/07
to
> From: Keith Thompson <k...@mib.org>

> This terminology predates C, and those meanings may have been
> appropriate for earlier, simpler languages.

I can understand why K&R might have used such obsolete jargon,
before it was fully realized how confusing it is, but I see no
excuse for current online documentation, especially FAQs to
continue to perpetuate such misleading obsolete jargon.

> In C, the meaning of "lvalue" has changed to include any
> expression that designates an object, whether it can be assigned
> to or not (and the term "rvalue" has been largely dropped).

That's completely wrong. A simple variable is an lvalue, but it's
not an object, it's a primitive type. (And in the jargon of OOP,
nothing whatsoever in C is an object. But using the older lisp
jargon, an array is an object but a simple variable isn't.)

> >> and buy a copy of K&R2
> > I have no money to buy anything. Please provide me with a job that
> > pays earned income if you want to change this present condition of
> > my life.
> I'm sorry if your financial situation is unfavorable, but that
> certainly isn't anybody else's responsibility.

It *is* your responsibility if you tell me to spend money I don't
have after I've already told you of my situation. So in the future...
DON'T!

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 17, 2007, 12:23:49 AM2/17/07
to
> From: r...@hoekstra-uitgeverij.nl (Richard Bos)

> > >> Many of the functions described in this chapter return the value of
> > >> the macro EOF to indicate unsuccessful completion of the operation.
> > >> Since EOF is used to report both end of file and random errors, it's
> > >> often better to use the feof function to check explicitly for end of
> > >> file and ferror to check for errors.
> > > GNU is wrong on ISO C and does not care. Film at eleven.
> > In what way?
> In that it's _not_ better to use the feof() function to check for
> eof. feof() is good as an aid to distinguish between eof and
> error _once you've already checked for EOF_. IOW, it's used with
> EOF, not better than.

I take your advice for best practice is first check return value to
see if it's negative or whatever the criterion is for unsuccessful,
and if unsuccessful then check feof() and errno to diagnose why it
failed?

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 17, 2007, 1:31:52 AM2/17/07
to
> From: Chris Dollin <chris.dol...@hp.com>

> Actually no. The idea was that an lvalue was the /value/ that was
> obtained by evaluating an expression which was to be assigned to,

No wonder everyone is confused. In the statement:
x = 5*w;
x is not a value of any kind, it's a **name** of a **variable**
which probably has a value but the value most certainly isn't x. If
w has the value 3, then after executing that statement x will have
the value 15. 15 isn't an lvalue, is it? But you say it is!!

> Part of the reason for introducing this distinction was to
> formalise why the variable `a` in `a := a + 1` means two
> different things in the two different places: the left-hand
> `a` is evaluated for its lvalue and the right-hand one for
> its rvalue.

That's bullshit. 'a' is the *name* of a *place* where data can be
stored and later retrieved. Depending on where a place is specified
in a statement, either the retrieval-from-place or storage-into-place
operation may occur. Some expressions denote a place, such as 'a'
in the above example, or chs[2] in the following example:
char chs[30];
chs[2] = 'a' + 5;
printf("The third character is '%c'.\n", chs[2]);
Some expressions don't denote a place, such as "'a' + 5" in the
above example. Such expressions can be used only to produce a value
*from* the expression, not to store a value gotten from elsewhere
*into* the expression.

There are basically three kinds of these expressions:
- Readonly, such as input channels (stdin), and 'a' + 5
- Writeonly, such as output channels (stdout, errout)
- Read/Write **place**s, as discussed above
Well I guess there's a fourth kind, something where you can read
and write but what you read isn't what you wrote there previously.
For example, it's possible in some langauges to define an
input/output channel as a single object, for example stdin/stdout
together, which is a useful concept if you want to perform
rubout/backspace processing of input in a user-friendly way.

> In some languages, literals have lvalues, so the assignment
> `1 := 2` is legal. Depending on the language semantics, `1`
> may have a single lvalue, or a different one each time it
> is evaluated. (The rvalue of `1` might or might not use its
> lvalue.) While for assignment this looks like the rabid
> and hungry sabre-toothed tiger, it makes more sense for
> parameter-passing ...

I fail to see how it makes any sense at all. In Fortran on the IBM
1620, you could in fact do that. For example (forgive me if I don't
have the syntax exactly correct after 40 years):

SUBROUTINE MANGLE(N)
N = 5;
END

CALL MANGLE(3)
GOTO 3
5 ... (it goes to here, because the literal 3 in the table of
constants, used by the GOTO statement, has been mangled to have the
value 5 now, but the symbol table mapping line numbers to locations
in the compiled code still links 5 to this statement)

I got bitten by that at least once, a really difficult bug to diagnose.

> the lvalue is the value you get /by evaluating on the left/; you
> may then be able to store into (through?) it, or not.

That's a completely garbled way of thinking of it. To store a value
into a place you need to know what function to call to effect the
storage. Common Lisp clarifies the whole idea best with SETF.
There's a function CAR which returns the left side of a pair, and
the function RPLACA which stores a value in the left side of a
pair, leaving the right side unchanged. You can say (setf (car x)
y), i.e. in c notation car(x) = y. How does this work?? There's a
SETF method for CAR, whereby (setf (car x) y) is macro-expanded
into (rplaca x y).

Now consider CADDR, which is defined such that (CADDR x) means (CAR
(CDR (CDR x))). Now suppose you want to say (setf (caddr x) y).
What does that do? How does it work? There's a SETF method that
causes (setf (caddr x) y) to macro-expand to (rplaca (cddr x) y).

The important point is that it's not enough to evaluate the left
side of the assignment to see where the cell is that needs
modifying, you must also say what function to call to modify just
part of that cell, not the the whole thing. If you say (setf (cddr
x) y), it expands into (rplacd (cdr x) y), i.e. it goes as deep as
(cdr x) to get the object that needs modifying, but the it does
RPLACD instead of RPLACA of that object. It's not enough to say
where to make the change, you must say how specifically to make the
change there, whether to modify the left side or the right side.

You can go even smaller than the byte level of access. For example,
suppose you want to store a large number of ENUMs, where each ENUM
has four possible values 0,1,2,3, thus requiring only two bits
each, so you want to pack four ENUMs into each 8-bit byte, and have
a huge array of millions of these four-to-a-byte structures, and
you want to emulate a huge array that directly indexes the
individual ENUMs. It's easy in Common Lisp: First you define a
function for reading out the individual ENUM at location IX,
something like this (I'm using c notation here to make it easier
for you to understand):
ENUM four getfour(unsiged byte *arr, int ix) {
int arrix, subix, nshift;
arrix = ix/4;
subix = ix%4;
nshift = 2*subix;
return (ENUM four)((arr[arrix] >> nshift) & 3);
}

Now you define a function for storing an ENUM into the same place
you got it from, something like this:
void putfour(unsiged byte *arr, int ix, ENUM four newval) {
int arrix, subix, nshift;
arrix = ix/4;
subix = ix%4;
nshift = 2*subix;
arr[arrix] = (arr[arrix] & (~ (3 << nshift)))
| (((unsiged byte)newval) << nshift);
}

Now (what you can do *only* in Common Lisp), you define a SETF
method so that the code:
getfour(bigarr,ix) = newval;
will macro-expand into:
putfour(bigarr,ix,newval);
That makes it easy to copy a ENUM from one place to another,
for example:
getfour(bigarr,ix1) = getfour(bigarr,ix2);
exactly like you can already (in c) do:
arr[ix1] = arr[ix2];
so you don't have to write this ugly assymtric code instead:
putfour(bigarr,ix1,getfour(bigarr,ix2));
putarr(arr,ix1,arr[ix2]);

In summary, there's no value (usefulness) to an "lvalue" as you explain it.
What is misnomered an "lvalue" is really c's version of a setf method,
which unfortunately can't be extended by users as it can in lisp.

Ben Bacarisse

unread,
Feb 17, 2007, 2:30:09 AM2/17/07
to
rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:

>> From: Chris Dollin <chris.dol...@hp.com>
>> Actually no. The idea was that an lvalue was the /value/ that was
>> obtained by evaluating an expression which was to be assigned to,
>
> No wonder everyone is confused.

I don't feel confused.

> In the statement:
> x = 5*w;
> x is not a value of any kind, it's a **name** of a **variable**
> which probably has a value but the value most certainly isn't x. If
> w has the value 3, then after executing that statement x will have
> the value 15. 15 isn't an lvalue, is it? But you say it is!!

No, 15 is not an lvalue and I did not see anything in what Chris
Dollin wrote that says it is.

The standard is quite clear on the matter: some expressions yield
lvalues and some do not. The multiplication operator, *, does not
yield and lvalue. Simple variables, like x and w above, do.

The standard also says what happens when a lvalue, like w, appears in
an expression: it is almost always converted to the contents of the
object it denotes (my words, but look up section 6.3.2.1 if you want
the actual ones). One important place where an lvalues does not get
converted to a "plain" value is on the left of an assignment operator,
like x above.

>> Part of the reason for introducing this distinction was to
>> formalise why the variable `a` in `a := a + 1` means two
>> different things in the two different places: the left-hand
>> `a` is evaluated for its lvalue and the right-hand one for
>> its rvalue.
>
> That's bullshit.

No, I think it is reasonable (though informal) definition of why the
term came into common use. If you want a very formal analysis you
must turn to denotational semantics, originally developed by
Christopher Stratchey and Dana Scott. I think Chris Dollin referred
to Strachey if not to the topic of denotational semantics.

> 'a' is the *name* of a *place* where data can be
> stored and later retrieved. Depending on where a place is specified
> in a statement, either the retrieval-from-place or storage-into-place
> operation may occur. Some expressions denote a place, such as 'a'
> in the above example, or chs[2] in the following example:
> char chs[30];
> chs[2] = 'a' + 5;
> printf("The third character is '%c'.\n", chs[2]);
> Some expressions don't denote a place, such as "'a' + 5" in the
> above example. Such expressions can be used only to produce a value
> *from* the expression, not to store a value gotten from elsewhere
> *into* the expression.
>
> There are basically three kinds of these expressions:

Here you define your own way of looking at variables and expressions,
but unless your terms (and the way you think about them) correspond to
the way variables and expressions are defined by the C standard, all
you will do is confuse yourself -- at least as far as understanding C
is concerned. Ask yourself, for example, if your characterisation of
expressions helps explain the behaviour of operators like sizeof.

<descriptions of other languages snipped>

--
Ben.

Richard Heathfield

unread,
Feb 17, 2007, 4:56:48 AM2/17/07
to
robert maas, see http://tinyurl.com/uh3t said:

>> From: Keith Thompson <k...@mib.org>
>> This terminology predates C, and those meanings may have been
>> appropriate for earlier, simpler languages.
>
> I can understand why K&R might have used such obsolete jargon,
> before it was fully realized how confusing it is,

Most people don't find it confusing.

> but I see no
> excuse for current online documentation, especially FAQs to
> continue to perpetuate such misleading obsolete jargon.

It's not obsolete. It's not even particularly misleading. It's just one
of those things you learn along the way.

>> In C, the meaning of "lvalue" has changed to include any
>> expression that designates an object, whether it can be assigned
>> to or not (and the term "rvalue" has been largely dropped).
>
> That's completely wrong.

No, it isn't. Look, almost everything you've said about C, here and
elsewhere, is wrong, so I think it'd be better for you to spend a
little less time arguing with experts and a little more time listening
to them.

> A simple variable is an lvalue, but it's
> not an object, it's a primitive type.

C doesn't define "variable", but most people seem to use it in a sense
which is broadly in line with the term "object". A type is (sort of) a
handy label for the properties of an object, but it does not of itself
have any storage assigned to it at runtime, so a type cannot store a
value, but an object can.

> (And in the jargon of OOP,

Who mentioned OOP?

> nothing whatsoever in C is an object. But using the older lisp
> jargon, an array is an object but a simple variable isn't.)

In C terms, an int object is an object.

>> >> and buy a copy of K&R2
>> > I have no money to buy anything. Please provide me with a job that
>> > pays earned income if you want to change this present condition of
>> > my life.
>> I'm sorry if your financial situation is unfavorable, but that
>> certainly isn't anybody else's responsibility.
>
> It *is* your responsibility if you tell me to spend money I don't
> have after I've already told you of my situation.

No, it isn't.

> So in the future... DON'T!

It is *your* responsibility to make yourself knowledgeable about C
before trying to conduct discussions in it, and this would certainly
include obtaining basic textbooks on the language. If you aren't
prepared to mow a few lawns to raise the money needed to buy a copy of
K&R2, and if you're not prepared to ask the library to lend you a copy
for a while, then you should not be surprised if your ignorance of C
leads people to correct you frequently.

In short, it is your responsibility to know what you're talking about if
you argue with C experts.

So in the future...

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Keith Thompson

unread,
Feb 17, 2007, 6:19:41 AM2/17/07
to
rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:
>> From: Keith Thompson <k...@mib.org>
>> This terminology predates C, and those meanings may have been
>> appropriate for earlier, simpler languages.
>
> I can understand why K&R might have used such obsolete jargon,
> before it was fully realized how confusing it is, but I see no
> excuse for current online documentation, especially FAQs to
> continue to perpetuate such misleading obsolete jargon.

There is nothing misleading about it if you understand how the word is
used. An lvalue, in modern C (at least since 1989) is an expression
that denotes an object.

>> In C, the meaning of "lvalue" has changed to include any
>> expression that designates an object, whether it can be assigned
>> to or not (and the term "rvalue" has been largely dropped).
>
> That's completely wrong. A simple variable is an lvalue, but it's
> not an object, it's a primitive type. (And in the jargon of OOP,
> nothing whatsoever in C is an object. But using the older lisp
> jargon, an array is an object but a simple variable isn't.)

You have a habit of claiming that various things are completely wrong.
In most cases, you are mistaken. I suggest checking your facts more
carefully before making such claims.

In C, an "object" is, by definition, a "region of data storage in the
execution environment, the contents of which can represent values"
(C99 3.14). Nothing more, nothing less. The term is not related to
OOP. (Incidentally, the C++ standard has a similar definition of
"object".) So yes, a simple variable is an object.

[snip]

If you can't afford a copy of K&R2 (and don't have access to a public
library), Google n1124.pdf. It's free.

Keith Thompson

unread,
Feb 17, 2007, 6:25:34 AM2/17/07
to

You need to check the documentation for each function to see how it
behaves.

fgetc(), for example, returns EOF (not just any negative value) on
reaching end-of-file or on an error. After getting an EOF result, you
can call feof() and/or ferror(). The standard doesn't say that errno
is set on an error, though it may be in some implementations.

Malcolm McLean

unread,
Feb 18, 2007, 5:19:24 AM2/18/07
to

"robert maas, see http://tinyurl.com/uh3t" <rem...@yahoo.com> wrote in
message

>> From: Keith Thompson <k...@mib.org>
>> This terminology predates C, and those meanings may have been
>> appropriate for earlier, simpler languages.
>
> I can understand why K&R might have used such obsolete jargon,
> before it was fully realized how confusing it is, but I see no
> excuse for current online documentation, especially FAQs to
> continue to perpetuate such misleading obsolete jargon.
>
>> In C, the meaning of "lvalue" has changed to include any
>> expression that designates an object, whether it can be assigned
>> to or not (and the term "rvalue" has been largely dropped).
>
> That's completely wrong. A simple variable is an lvalue, but it's
> not an object, it's a primitive type. (And in the jargon of OOP,
> nothing whatsoever in C is an object. But using the older lisp
> jargon, an array is an object but a simple variable isn't.)
>
There is a case for using terms in the way the ANSI committee do in the C
standard.

However the committee is not important enough to be allowed to define how we
use basic programming terms in the English language, even in a C context. If
you are using a term that has other meanings, like "object", in the sense
defined by the standard, then really you ought to qualify "as defined by the
standard".

Racaille

unread,
Feb 18, 2007, 10:05:17 AM2/18/07
to
On Feb 14, 4:32 am, rem6...@yahoo.com (robert maas, see http://tinyurl.com/uh3t)
wrote:
> Part of my decision was that whitespace allowance should be
> symmetric. It should be allowed before iff allowed after. strtol is
> assymtric in this respect, allowing whitespace before (and
> rejecting stray non-white text before), but failing to distinguish
> between trailing whitespace (OK) and trailing junk (Not OK), either
> rejecting both (if caller checks to make sure the final pointer
> matches end of string), or accepting both (if caller doesn't make
> that check).

I would just use sscanf() and be done:

char junk;
switch(sscanf(str, "%d %c", &num, &junk)){
case 1:
/* OK, 'num' is your number */
break;
case 2:
/* FAIL, there was some trailing junk */
break;
case 0:
/* FAIL, invalid number */
break;
default:
/* FAIL, the string was either zero-length
or just whitespaces */
}

I'm not sure if the last 2 cases could be portably
distinguished.

Keith Thompson

unread,
Feb 18, 2007, 11:35:58 AM2/18/07
to
"Malcolm McLean" <regn...@btinternet.com> writes:
[...]

> There is a case for using terms in the way the ANSI committee do in the C
> standard.
>
> However the committee is not important enough to be allowed to
> define how we use basic programming terms in the English language,
> even in a C context. If you are using a term that has other
> meanings, like "object", in the sense defined by the standard, then
> really you ought to qualify "as defined by the standard".

Be aware that I, for one, will not necessarily bother to do so when
posting in this newsgroup. I might when it's necessary to avoid
confusion, but my threshold of confusion many differ from yours.

Keith Thompson

unread,
Feb 18, 2007, 11:41:42 AM2/18/07
to

C99 7.19.6.7p3:
The sscanf function returns the value of the macro EOF if an input
failure occurs before any conversion. Otherwise, the sscanf
function returns the number of input items assigned, which can be
fewer than provided for, or even zero, in the event of an early
matching failure.

Unfortunately, sscanf() with "%d" invokes undefined behavior if the
number can't be represented as an int (i.e., on overflow).

Richard Heathfield

unread,
Feb 18, 2007, 11:46:19 AM2/18/07
to
Malcolm McLean said:

<snip>

> There is a case for using terms in the way the ANSI committee do in
> the C standard.

Yes.

> However the committee is not important enough to be allowed to define
> how we use basic programming terms in the English language, even in a
> C context.

What a strange way to put it.

> If you are using a term that has other meanings, like
> "object", in the sense defined by the standard, then really you ought
> to qualify "as defined by the standard".

Not in *this* newsgroup, though. This is a C newsgroup, and in C the
word "object" has a definite and widely known meaning. If you want to
add useless qualifiers to your articles, that's up to you, but nobody
else is under any obligation to copy you.

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 18, 2007, 2:28:34 PM2/18/07
to
> From: "Malcolm McLean" <regniz...@btinternet.com>

> There is a case for using terms in the way the ANSI committee do
> in the C standard.
> However the committee is not important enough to be allowed to
> define how we use basic programming terms in the English language,
> even in a C context. If you are using a term that has other
> meanings, like "object", in the sense defined by the standard,
> then really you ought to qualify "as defined by the standard".

Well said. Thanks for expressing that opinion better than I could myself.

As for sensible (non-ANSI) language: An "object" to me means
something that (1) is connected together so the various parts of it
don't easily drift apart, and (2) is itself capable of moving
independently of anything else. So an automobile is an "object",
but the engine block of that automobile while firmly locked within
the automobile is not an "object" itself, rather it's just part of
a larger object. Likewise an array is an object, because you can
easily move the object to a new location without losing any of its
importat structure, for example by moving the line of source code
and recompiling, or by calling malloc and copying the array to the
newly allocated storage. But a single element of an array is *not*
an object, because it's firmly fixed adjacent to the other elements
of that same array, and if you move that array element all by
itself to a new location you lose the structure you had whereby it
could be indexed along with all the other array elements in a
uniform manner.

Now an indivisible thing, such as an electron, isn't an object,
it's just a particle. You need a large number of particles before
you have anything properly called an "object". Is a single molecule
of carbon dioxide an "object"? Probably not. Is a complete strand
or loop of DNA an "object"? Probably not, but maybe. Is a complete
living cell an "object"? Probably yes, unless it's firmly fixed
within a living body and you're talking about the body as a whole.
Is a single integer variable an "object"? Probably not, unless
you're treating it as a bit vector.

I think the ANSI committee has abused the word "object" to have too
broad a meaning, losing the essential idea that was originally in
the word, to be acceptable to me, and apparantly to you either,
except in the coerced context of working on standards that mesh
with the ones the committee already did.

I sorta like the terms "object" for something that has more than
one addressible/indexable part, and can move as a unit
independently of other "objects", and the term "slot" for an
individually addressible/indexable part of an object, and the term
"place" for either a slot or a standalone variable where you can
store a value and later retrieve it. In a database the
corresponding terms are "record" (object) and "field" (slot). If
ANSI tries to refer to a single field within a record as an
"object", I don't have to copy their usage.

I suppose to be fair (same rules for goose and gander), I should
qualify my usage too:

"place" as used in Common Lisp SETF-method jargon.

"object" as used in Lisp jargon ever since the early times.
-or-
"object" as used in Java/OOP jargon.

"method" as used in Common Lisp SETF jargon.
-or-
"method" as used in Java/OOP jargon.

"function" as used in mathematics.
-or-
"function" as used in FORTRAN jargon (can't side-effect arguments).
-or-
"function" as used in C/lisp jargon (*can* side-effect via pointer arguments).
-or-
"function" as used in C++ jargon (direct manipulation of reference arguments).

Beliavsky

unread,
Feb 18, 2007, 3:30:22 PM2/18/07
to
On Feb 18, 2:28 pm, rem6...@yahoo.com (robert maas, see http://tinyurl.com/uh3t)
wrote:

<snip>

> I suppose to be fair (same rules for goose and gander), I should
> qualify my usage too:
>
> "place" as used in Common Lisp SETF-method jargon.
>
> "object" as used in Lisp jargon ever since the early times.
> -or-
> "object" as used in Java/OOP jargon.
>
> "method" as used in Common Lisp SETF jargon.
> -or-
> "method" as used in Java/OOP jargon.
>
> "function" as used in mathematics.
> -or-
> "function" as used in FORTRAN jargon (can't side-effect arguments).

Fortran functions can change their arguments, although I think doing
so is poor
style and that a SUBROUTINE should be used if one desires this
behavior. In Fortran 95, one can specify that a function is PURE, in
which case it is not allowed to change its arguments.

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 18, 2007, 3:35:17 PM2/18/07
to
> From: "Racaille" <0xef967...@gmail.com>

> I would just use sscanf() and be done:

That may be what you would do, but I prefer to do better than that.
A week or so ago I discovered strcspn which allows me to
immediately learn whether the user typed any digit whatsoever or
not, and if so then where that first digit is. That applies the
divide-and-conquer pattern: Once I know where that first digit is,
the overall task sub-divides into (1) checking backwards for
immediate optional sign, and before that only whitespace, and (2)
checking forwards for rest of digits, and after that only
whitespace. In fact the two sub-tasks could be parallelized if that
was of any value. strcspn and strspn are handy for those sub-tasks
too. It's too bad this kind of task never came up in those three c
programming classes I took, so the instructor never mentioned such
useful functions, or I would have used them from the start, and
never would have written that ugly first version of my syntax
checker. In lisp of course you don't need such functions built-in,
because it's trivial for ordinary programmers to pass an anonymous
function to POSITION-IF or POSITION-IF-NOT to achieve the same
effect (in fact that's what I did for the lisp version of the
algorithm, and re-invented strcspn and strspn under different names
for the java version before I discovered those functions in the c
library). I think my Java names make more sense than the c names BTW.

By the way, last night as I was glomping a c library into my
multi-language cookbook matrix, I got to the point where strcspn
and strspn appear, so they're in there, alongside the Common Lisp
equivalents, as of last night. Take a look if you're curious:
Start at the top of:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
Find either of these in table of contents:
+ Integers ...
and ... Strings (click here)
+ [Strings]
and ... Integers (click here)
Or cheat by using this direct URL:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html#StrInt>
but if you do that you lose the fun of seeing how the coordination
of data types works in my cookbook matrix.
However you've gotten to the "Strings and Integers" section,
next skim-read until you reach this paragraph:
Skip over particular characters in string
and the next after it:
Skip until first particular character in string
That's the best natural English description I've been able to
figure out. In the technical descrption I use the jargon of "bag of
characters", but I feel that's not best in the brief initial
description for eyeballing purpose. I you can think of better
English here, feel free to post a suggestion. (Also I proofread the
technical spec I wrote several times, kept finding mistakes and
fixing them, and finally the last two proofreadings didn't find
anything wrong, so I think I finally got it right, or did I?)

Yevgen Muntyan

unread,
Feb 18, 2007, 4:04:12 PM2/18/07
to
robert maas, see http://tinyurl.com/uh3t wrote:
>> From: "Racaille" <0xef967...@gmail.com>
>> I would just use sscanf() and be done:
>
[snip]

What's "GNU-c" and why are some functions "GNU-c" while others
are "c"? Then some people read it and think they use GNU-c,
as some people use C/C++...

Yevgen

Keith Thompson

unread,
Feb 18, 2007, 5:18:03 PM2/18/07
to
rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:
[...]

> I think the ANSI committee has abused the word "object" to have too
> broad a meaning, losing the essential idea that was originally in
> the word, to be acceptable to me, and apparantly to you either,
> except in the coerced context of working on standards that mesh
> with the ones the committee already did.
[...]

That's just too bad.

This is comp.lang.c, where we discuss the C programming language,
which is defined by the ISO C standard, which has a perfectly good
technical definition for the word "object". If you want to use the
word "object" *in this newsgroup* in a different sense, then I suggest
you state explicitly that you're doing so. Otherwise, I (and probably
many others) will correct your misuse of the word, or will just ignore
you.

I'm not saying that you shouldn't discuss "objects" in the OOP sense,
or in the sense of physical things, or whatever. I'm just saying that
*in this newsgroup*, the word has the overriding meaning of a "region


of data storage in the execution environment, the contents of which

can represent values".

I suppose the authors of the C standard could have invented new words
that couldn't be confused with existing English words, but I doubt
that you'd want to read about ferlorping a kurplok rather than
declaring an object.

Keith Thompson

unread,
Feb 18, 2007, 5:40:41 PM2/18/07
to

GNU C is the C-like language accepted by gcc. It's very similar to
standard C, but has a number of extensions, some of which violate the
C standard.

Note that gcc is a compiler, not a complete implementation; the C
runtime library is provided separately, and varies from one platform
to another. There is a GNU C library (separate from, but usable with,
gcc).

But I have no idea why the referenced web page applies the "GNU-c" tag
to the isblank() and strtoll() functions, both of which are standard C
(but both are new in C99).

Malcolm McLean

unread,
Feb 18, 2007, 6:54:31 PM2/18/07
to

"Richard Heathfield" <r...@see.sig.invalid> wrote in message

> Not in *this* newsgroup, though. This is a C newsgroup, and in C the
> word "object" has a definite and widely known meaning. If you want to
> add useless qualifiers to your articles, that's up to you, but nobody
> else is under any obligation to copy you.
>
In the C standard the term "object" has a very specific definition. When
most C programmers use the term "object" they are not using it in this
sense, probably even when specifically discussing C.

The standard is an important document, so we can hardly hold that it is
wrong to use its terminology, even where it is somewhat eccentric. However
we cannot allow it to totally dictate our discourse. There aren't enough
words for that. Or in programming terms, they have polluted our namespace.


Richard Heathfield

unread,
Feb 18, 2007, 8:12:56 PM2/18/07
to
Malcolm McLean said:

>
> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>> Not in *this* newsgroup, though. This is a C newsgroup, and in C the
>> word "object" has a definite and widely known meaning. If you want to
>> add useless qualifiers to your articles, that's up to you, but nobody
>> else is under any obligation to copy you.
>>
> In the C standard the term "object" has a very specific definition.

Correct.

> When most C programmers use the term "object" they are not using it in
> this sense, probably even when specifically discussing C.

That's their problem. :-)

> The standard is an important document, so we can hardly hold that it
> is wrong to use its terminology, even where it is somewhat eccentric.

Quite so.

> However we cannot allow it to totally dictate our discourse.

I don't. If I use the word "object" in a non-C context, I intend it to
carry a very different meaning than when I use it in a C context. Same
applies to char, long, short, float, double, pointer, file, array,
value, and so on. If I ask my wife whether she'll be long, I don't
imagine for a second that she will interpret that as a query as to
whether she intends to be an at-least-32-bits-wide integer.

But in a C context, these words have more specialised meanings.

> There
> aren't enough words for that. Or in programming terms, they have
> polluted our namespace.

Well, yes, ISO have polluted our namespace, but not for that reason.

Yevgen Muntyan

unread,
Feb 18, 2007, 8:18:09 PM2/18/07
to
Keith Thompson wrote:
> Yevgen Muntyan <muntyan.r...@tamu.edu> writes:
>> robert maas, see http://tinyurl.com/uh3t wrote:
>>>> From: "Racaille" <0xef967...@gmail.com>
>>>> I would just use sscanf() and be done:
>> [snip]
>>> Start at the top of:
>>> <http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
>> What's "GNU-c" and why are some functions "GNU-c" while others
>> are "c"? Then some people read it and think they use GNU-c,
>> as some people use C/C++...
>
> GNU C is the C-like language accepted by gcc. It's very similar to
> standard C, but has a number of extensions, some of which violate the
> C standard.

You sure you are talking about the same thing as that silly "GNU-c"
on the website?

> Note that gcc is a compiler, not a complete implementation; the C
> runtime library is provided separately, and varies from one platform
> to another. There is a GNU C library (separate from, but usable with,
> gcc).

Yeah, noted that. There are also many other libraries out there.
And there is also an Intel C compiler. Then, I like Adobe Photoshop,
one can draw pictures in it.

> But I have no idea why the referenced web page applies the "GNU-c" tag
> to the isblank() and strtoll() functions, both of which are standard C
> (but both are new in C99).

Exactly. That's my question, "why the heck?".

Yevgen

Keith Thompson

unread,
Feb 18, 2007, 8:38:24 PM2/18/07
to
Yevgen Muntyan <muntyan.r...@tamu.edu> writes:
> Keith Thompson wrote:
>> Yevgen Muntyan <muntyan.r...@tamu.edu> writes:
>>> robert maas, see http://tinyurl.com/uh3t wrote:
>>>>> From: "Racaille" <0xef967...@gmail.com>
>>>>> I would just use sscanf() and be done:
>>> [snip]
>>>> Start at the top of:
>>>> <http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
>>> What's "GNU-c" and why are some functions "GNU-c" while others
>>> are "c"? Then some people read it and think they use GNU-c,
>>> as some people use C/C++...
>> GNU C is the C-like language accepted by gcc. It's very similar to
>> standard C, but has a number of extensions, some of which violate the
>> C standard.
>
> You sure you are talking about the same thing as that silly "GNU-c"
> on the website?

I can only address what "GNU C" *really* means. I can't guess what
the author of the web page meant by it. We'll both just have to wait
for him to respond (or to correct the web page).

CBFalconer

unread,
Feb 18, 2007, 6:34:38 PM2/18/07
to
Keith Thompson wrote:
> Yevgen Muntyan <muntyan.r...@tamu.edu> writes:
>> robert maas, see http://tinyurl.com/uh3t wrote:
>>
>> [snip]
>>> Start at the top of:
>>> <http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
>>
>> What's "GNU-c" and why are some functions "GNU-c" while others
>> are "c"? Then some people read it and think they use GNU-c,
>> as some people use C/C++...
>
> GNU C is the C-like language accepted by gcc. It's very similar
> to standard C, but has a number of extensions, some of which
> violate the C standard.
>
> Note that gcc is a compiler, not a complete implementation; the
> C runtime library is provided separately, and varies from one
> platform to another. There is a GNU C library (separate from,
> but usable with, gcc).
>
> But I have no idea why the referenced web page applies the
> "GNU-c" tag to the isblank() and strtoll() functions, both of
> which are standard C (but both are new in C99).

I believe the referenced page is authored by Maas. If Yevgen scans
old posts in this newsgroup for Maas's posts (google is handy for
this), and the myriad corrections required, he may be able to
evaluate the accuracy of that page, at least as far as it applies
to C.

Yevgen Muntyan

unread,
Feb 18, 2007, 10:11:23 PM2/18/07
to

You broke my plan guys. I figured it'd better to ask instead of
stating that something is nonsense, this thread shows that the former
is more likely to work.

If this guy knows google-fu well enough, he can even get readers,
and those readers might even read what he wrote out there. And
then be afraid to use scary GNU-c thing. Or something. Been there
myself (as a reader, that is).

Silly code pieces are not as bad as things like "GNU-c" or what he
says about character type. The former won't be used by anyone, the
latter will be memorized, unintentionally. Anyway, too tired to speak
English. And was tired when posted, silly idea.

Yevgen

Richard Bos

unread,
Feb 19, 2007, 2:44:09 AM2/19/07
to
"Malcolm McLean" <regn...@btinternet.com> wrote:

> There is a case for using terms in the way the ANSI committee do in the C
> standard.
>
> However the committee is not important enough to be allowed to define how we
> use basic programming terms in the English language, even in a C context. If
> you are using a term that has other meanings, like "object", in the sense
> defined by the standard, then really you ought to qualify "as defined by the
> standard".

TTBOMK, the word "object" had a couple of meanings in programming, one
of them being the one used in the ISO(! ANSI is old hat and foreign) C
Standard, long before this new-fangled fad called "object oriented
programming" introduced a new one. So really, it's the OOPsers who need
to conform, not the rest of us.

Richard

Chris Dollin

unread,
Feb 19, 2007, 9:50:27 AM2/19/07
to
robert maas, see http://tinyurl.com/uh3t wrote:

>> From: Chris Dollin <chris.dol...@hp.com>
>> Actually no. The idea was that an lvalue was the /value/ that was
>> obtained by evaluating an expression which was to be assigned to,
>
> No wonder everyone is confused.

/I'm/ not confused. Not about this.

> In the statement:
> x = 5*w;
> x is not a value of any kind, it's a **name** of a **variable**
> which probably has a value but the value most certainly isn't x.

Duh. Yes. I didn't say otherwise.

The /evaluation/ of the expression `x` will yield an lvalue,
which we can usefully think of as "the address of x".

> If w has the value 3, then after executing that statement x will have
> the value 15. 15 isn't an lvalue, is it? But you say it is!!

No, I don't. Why do you think I do?

>> Part of the reason for introducing this distinction was to
>> formalise why the variable `a` in `a := a + 1` means two
>> different things in the two different places: the left-hand
>> `a` is evaluated for its lvalue and the right-hand one for
>> its rvalue.
>
> That's bullshit.

I admire your ability to produce a cogent and informed argument.

> 'a' is the *name* of a *place* where data can be
> stored and later retrieved. Depending on where a place is specified
> in a statement, either the retrieval-from-place or storage-into-place
> operation may occur. Some expressions denote a place, such as 'a'
> in the above example, or chs[2] in the following example:
> char chs[30];
> chs[2] = 'a' + 5;
> printf("The third character is '%c'.\n", chs[2]);
> Some expressions don't denote a place, such as "'a' + 5" in the
> above example. Such expressions can be used only to produce a value
> *from* the expression, not to store a value gotten from elsewhere
> *into* the expression.

Expressions that "denote a place" are those that can be evaluated
for their lvalues. Those that don't, can't.

>> In some languages, literals have lvalues, so the assignment
>> `1 := 2` is legal. Depending on the language semantics, `1`
>> may have a single lvalue, or a different one each time it
>> is evaluated. (The rvalue of `1` might or might not use its
>> lvalue.) While for assignment this looks like the rabid
>> and hungry sabre-toothed tiger, it makes more sense for
>> parameter-passing ...
>
> I fail to see how it makes any sense at all.

Consider something like

fun f( x ) = ... x := x + 1 ...
... f( 1 ) ...

in a language with pass-by-binding (the formal argument is
bound to the (l)value of the actual argument). The body of
`f` can dink around with `x`, even if the actual argument
is a literal, and without affecting /all/ the places where
the value of `1` is required.

Your FORTRAN example demonstrates why it can be better for
each (l)evaluation of `1` to yield a /new/ lvalue, not the same
one every time.

>> the lvalue is the value you get /by evaluating on the left/; you
>> may then be able to store into (through?) it, or not.
>
> That's a completely garbled way of thinking of it.

You are mistaken.

> To store a value
> into a place you need to know what function to call to effect the
> storage.

(`affect`, not `effect`)

No; in fact you don't. You can have a uniform way of /updating/
the store, and different ways of computing the lvalue. Since
the update is "done" by an operation which is roughly

update( store, lvalue, rvalue )

there's plenty of room to capture any interesting details.

> Common Lisp clarifies the whole idea best with SETF.

That's a matter of opinion: whether it's a "clarification" to
make assignment depend on that systematic use of macros is ...
a choice.

(fx:snip)

> It's not enough to say
> where to make the change, you must say how specifically to make the
> change there, whether to modify the left side or the right side.

Yes. Which doesn't say anything against the lvalue/rvalue description.

> (I'm using c notation here to make it easier for you to understand):

Why do you think using C will do that? It's not as though I'm
unfamiliar with Lisp, after all.

> Now (what you can do *only* in Common Lisp), you define a SETF
> method so that the code:

You can do something very similar using Pop11's updaters. They
don't use macros to do it: they use functions plus one assignment
rewrite rule. Because they don't depend on the /macro-time/
value of the identifiers in function position, they will also
work on procedure /arguments/ as well as /globals/.

I thought Dylan had something similar to SETF as well.

> In summary, there's no value (usefulness) to an "lvalue" as you
> explain it.

Then I've done a worse job of explanation than I'd wish.

> What is misnomered an "lvalue" is really c's version of a setf method,
> which unfortunately can't be extended by users as it can in lisp.

I think you're confusing the `lvalue` I was explaining with the
term as it's used in C, and also trying to view the non-Lisp
world exclusively through Lisp lenses.

--
Chris "electric hedgehog" Dollin
"It took a very long time, much longer than the most generous estimates."
- James White, /Sector General/

Chris Dollin

unread,
Feb 19, 2007, 9:56:47 AM2/19/07
to
Ben Bacarisse wrote:

> No, I think it is reasonable (though informal) definition of why the
> term came into common use. If you want a very formal analysis you
> must turn to denotational semantics, originally developed by
> Christopher Stratchey and Dana Scott. I think Chris Dollin referred
> to Strachey if not to the topic of denotational semantics.

Indeed so. /That/ topic would take me much further outside my
CLC-comfort-zone.

--
Chris "electric hedgehog" Dollin

"The path to the web becomes deeper and wider" - October Project

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 19, 2007, 11:50:36 AM2/19/07
to
> From: Keith Thompson <k...@mib.org>

> C99 7.19.6.7p3:
> The sscanf function returns the value of the macro EOF if an input
> failure occurs before any conversion. Otherwise, the sscanf
> function returns the number of input items assigned, which can be
> fewer than provided for, or even zero, in the event of an early
> matching failure.

Does the C99 standard say anywhere whether a value is or is not
assigned in case of overflow? That is, does it say either of these:
- In case of overflow, assignment must be suppressed.
- In case of overflow, a truncated value must be assigned.
If it says neither (modulo wording of course, anything close would
suffice to define which behaviour is required), then I agree with:


> Unfortunately, sscanf() with "%d" invokes undefined behavior if the
> number can't be represented as an int (i.e., on overflow).

Accordingly I think my policy of first using strspn and strcspn to
efficiently validate the *syntax* of the integer first, and then
using strtoll to convert to long long and tell if overflow occurred
there, and then check against MIN/MAX values to determine whether
that long long can be safely cast to the narrower type wanted by
the application, is the "right" algorithm for my purpose. It's a
little more work, but it completely eliminates undefined behaviour,
and it gives diagnostics with maximum discriminatory power. Yeah,
sure, I could be sloppy and just say "syntax error" or "overflow",
or even allow silent overflow, and let the user figured out what's
wrong with what he put in the Web form, but I don't want that to be
my standard for user friendliness.

Thanks for your fine argument re ambiguity (undefined behaviour) in
C99 standard hence one more reason not to use sscanf all by itself
for validating Web-form contents. Personally, I don't consider use
of undefined behaviour to be good programming practice, so I surely
don't want to show such use in my examples for readers of my
"cookbook". (If anyone catches a place where I've been sloppy in
this respect, I hope you alert me about it!)

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 19, 2007, 12:17:16 PM2/19/07
to
> From: Yevgen Muntyan <muntyan.removet...@tamu.edu>

> > Start at the top of:
> > <http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
> What's "GNU-c" and why are some functions "GNU-c" while others
> are "c"? Then some people read it and think they use GNU-c,
> as some people use C/C++...

I'm trying to warn the reader about some functions which aren't
defined in the official standard, but *are* available in GNU C
which is perhaps the most popular implementation (of compiler and
loader-commands to use corresponding libraries), so the reader
won't be totally confused if he writes code relying on such a
library function but it's not available in the implementation he's
using. If you can suggest a better way to note this distinction
between functions guaranteed to be available in all conforming
implementations and those "extras" provided by major vendors,
please do. I don't like the idea of just having a caveat at the top
of the document saying "some of the c functions described below
aren't available in all implemetations of c". I'd rather
individually flag those very few which might not be available.

Now if the particular function is in the C99 spec, that would be
better to mention, but again I'd need some way of annotating those
functions which aren't in older C but are in C99 without confusing
the reader.

As to C/C++: Because I'm carefully annotating which library needs
to be indluded for *every* library function without exception, the
C and C++ entries for those will always be done separately, because
C++ has renamed all the C libraries to get the C++ compatible
version thereof. So when I am finished with these parts of the
matrix chapter of my "cookbook", there will typically be three LI
elements, the original C library, the C-style C++ converted
library, and the true flavor of C++ way of doing the same kind of
operation (in addition to the equivalents in lisp/java/perl/PHP of
course). I condense the c/c++ part (and java too if applicable)
only where no library is required so you can just write the code
without checking if you loaded the library first. That case applies
of course only to built-in operators operating on primitive types
or structs etc., not to any library calls.

By the way, I hope my readers will look up the complete definition
of the function, using Google for example, any place my quickie
definition isn't quite complete in all details. At least I've
provided the name of the function, which is of use in a Google
search, and the syntax of the call (parameters required, and
general nature of return value expected) which in conjunction with
the formal definition found elsewhere should be enough for the
reader/programmer to write code using the function I've described.
I'm thinking of including in the "suggestions for use of this
document" a more specific recommendation of search terms to use
when looking for complete specifications, someday ...

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 19, 2007, 2:05:31 PM2/19/07
to
> From: Keith Thompson <k...@mib.org>

> This is comp.lang.c, where we discuss the C programming language,
> which is defined by the ISO C standard, which has a perfectly
> good technical definition for the word "object".

Is it anything like last year's new technical definition of "planet"
by the IAU (International Astronom... Union)? :-(

> I'm not saying that you shouldn't discuss "objects" in the OOP
> sense, or in the sense of physical things, or whatever.

OK, then given your instance, I'd be glad to try very hard to
always qualify my use of the word "object" with one of:
- In the original lisp sense, any distinctly allocated block of
memory with a single handle on it, which would include c structs
allocated with malloc or calloc.
- In a generalization of that sense which also includes
static/stack allocated structs and arrays in c.
- In the OOP sense, encapsulation of original lisp/struct sense
together with instance methods tied to the class of such
instances (not separate copy within each individual instance as
some java textbooks mis-state).

> I'm just saying that *in this newsgroup*, the word has the
> overriding meaning of a "region of data storage in the execution
> environment, the contents of which can represent values".

What is the definition of "region" used there? Does it refer *only*
to contiguous blocks of memory which are guaranteed to be
contiguous per the spec, or does it also include blocks of memory
which just happen to be contiguous per one implementation but not
necessarily another? For example, if the declarations are:
short int a[9];
long int b[5];
short int c[7];
if a particular compiler optimizes space by moving the long int
array ahead of either of the short int arrays to reduce amount of
padding needed to respect long boundaries, so that
a[7] a[8] c[0] c[1] c[2]
form a contiguous block of memory, is that considered a "region"
hence an "object"??

I suspect that ISO hasn't defined the meaning of "region" any more
than the IAU defined the meaning of "clearing the neighborhood".
But I'll leave that conclusion as tentative, pending your reply.

Also, I'm not clear on the intended meaning of "contents" and
"values". Is it possible for an object to have only one content
which represents only one value, or must "contents" and "values" be
used strictly in a plural sense? So for example:
short int d[1];
That has only one content, which can represent only one value,
right? Or is "values" plural because you can re-assign that single
array-cell to have different values at different times?

robert maas, see http://tinyurl.com/uh3t

unread,
Feb 19, 2007, 2:40:38 PM2/19/07
to
> From: Keith Thompson <k...@mib.org>

> GNU C is the C-like language accepted by gcc. It's very similar
> to standard C, but has a number of extensions, some of which
> violate the C standard.

Do you happen to know if there's a complete list of such violations
online in accessible format (plain text or simple HTML)? I'd like
to consult such a list as I develop my cookbook/matrix.

> Note that gcc is a compiler, not a complete implementation; the C
> runtime library is provided separately, and varies from one
> platform to another. There is a GNU C library (separate from, but
> usable with, gcc).

Yes, I understand that. Presumably if you compile a c program with
gcc, and specify the generic names of the headers for the various
libraries, for example:
#include <stdlib.h>
rather than specifying the actual path to the header to library, for example:
#include "/usr/local/bin/ansi/c/stdlib.h"
and you don't use a switch such as -ansi that forces gcc to use the
ansi instead of gnu version, then gcc will automatically arrange
that you get the GNU C version of each library rather than the ANSI
version. (Correct me if I'm wrong on this point!)

When I have line that looks like this:
<li>GNU-c (#include &lt;stdlib.h&gt;) -- <em>mumble(x,y)</em></li>
I mean to imply that the function mumble is defined in the GNU C
version of the stdlib library but not in the corresponding ANSI
version of that same-name library. If I make a mistake in such
annotation, feel free to correct me.

I'm thinking of changing my notation. Instead of saying "GNU-c" as
the language as I do there, instead just say "c", but have a
footnote that explains the situation regarding GNU vs. ANSI. I'll
be thinking more about it later today and maybe start updating the
file as soon as I have decided how exactly do do it.

I'd especially like to note differences between original/universal
c and C99, and differences between C99 and GNU, but still just
label it all 'c' with footnote, maybe, still thinking...

> But I have no idea why the referenced web page applies the
> "GNU-c" tag to the isblank() and strtoll() functions, both of
> which are standard C (but both are new in C99).

I had the impression they were not in original c but are in GNU c,
so that was the distinction I was making. But if they're in C99 as
you claim, then I'd rather change that to show they're in C99
(instead of GNU c) but not in original c.

Is the C99 standard online in searchable/HTML format, for free, so
that I could consult it to verify fine points like this instead of
just taking your word for it? And are you referring to ANSI C99 or
ISO C99 anyway??

<http://en.wikipedia.org/wiki/C_(programming_language)#C99>
publication of ISO 9899:1999 in 1999. This standard is commonly
referred to as "C99." It was adopted as an ANSI standard in March
2000.
That's not clear whether it's both an ISO standard and ANSI
standard, from the same document, or an ISO document but only an
ANSI standard.

GCC, despite its extensive C99 support, is still not a completely
compliant implementation; several key features are missing or don't
work correctly.[2]
<http://gcc.gnu.org/c99status.html>
Is that where I should be checking for any differences between C99
standard and GNU C actuality?

Also, where in your opinion is the best online reference for
pre-C99 versions, especially the original K&R C, which presumably
*every* implementation has had plenty of time to get right already?
(I'm mostly interested in the standard libraries, functions therein.)

Keith Thompson

unread,
Feb 19, 2007, 5:51:41 PM2/19/07
to
Chris Dollin <chris....@hp.com> writes:
[...]

> Expressions that "denote a place" are those that can be evaluated
> for their lvalues. Those that don't, can't.
[...]

Right, using the old definition of "lvalue", not the one in the C
standard (as you know).

In the C standard, an "lvalue" is not the result of evaluating an
expression; instead, certain expressions are themselves lvalues. I
suspect that if the C committee had stayed with the older meaning of
the term, they could have avoided some serious problems.

The C90 definition of an "lvalue" was (C90 6.2.2.1):

An _lvalue_ is an expression (with an object type or an incomplete
type other than void) that designates an object.

Consider:

int x; /* line 1 */
int *ptr = NULL; /* line 2 */
ptr = &x; /* line 3 */

Before line 3 is executed, the expression *ptr does not designate an
object, so by a literal reading of the definition, *ptr is not an
value, but it becomes one after line 3 is executed. This clearly was
not the intent, since the lvalue-ness of an exression, in many cases,
needs to be determined at compilation time. *ptr should be an lvalue
regardless of the current value of ptr; attempting to evaluate it *as
an lvalue* invokes undefined behavior if it doesn't *currently*
designate an object.

So the C99 committee attempted to solve this problem, but created a
bigger one. C99 6.3.2.1p1:

An _lvalue_ is an expression with an object type or an incomplete
type other than void; if an lvalue does not designate an object
when it is evaluated, the behavior is undefined.

So the lvalue-ness of an expression no longer depends on the current
value of the expression or any subexpression (solving the problem with
the C90 definition) -- *but* the definition no longer says that it
designates an object, which is the whole idea. By a literal reading
of the C99 definition, 42 is an lvalue (it's an expression of an
object type, namely int). Again, this clearly is not the intent.

Stating the actual intent in standardese is difficult, but not
impossible. An improvement would be to revert to the C90 definition
and add the word "potentially", with a footnote to explain what that
means:

An _lvalue_ is an expression (with an object type or an incomplete
type other than void) that potentially (footnote) designates an
object.

(footnote) An expression potentially designates an object either
if it actually does so, or if it would do so given appropriate
values for its subexpressions. For example, if ptr is an object
pointer, *ptr potentially designates an object (though it doesn't
actually designate an object unless ptr has an appropriate value).

That's off the top of my head; I'm sure it could be worded better.

Perhaps if the standard said, instead of an expression *being* an
lvalue, that its lvalue can be evaluated, this problem wouldn't have
occurred. We'd still need rules about which expressions can be
evaluated for their lvalues, and wording about when such an evaluation
invokes undefined behavior. And if such a change were made now, all
the references to "lvalue" in the standard would have to be modified
to reflect the new (old) meaning.

I suspect we're just stuck with the current meaning of lvalue (and we
have to read what the definition *should* say rather than what it
*does* say).

Keith Thompson

unread,
Feb 19, 2007, 5:58:38 PM2/19/07
to
rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:
>> From: Keith Thompson <k...@mib.org>
>> C99 7.19.6.7p3:
>> The sscanf function returns the value of the macro EOF if an input
>> failure occurs before any conversion. Otherwise, the sscanf
>> function returns the number of input items assigned, which can be
>> fewer than provided for, or even zero, in the event of an early
>> matching failure.
>
> Does the C99 standard say anywhere whether a value is or is not
> assigned in case of overflow? That is, does it say either of these:
> - In case of overflow, assignment must be suppressed.
> - In case of overflow, a truncated value must be assigned.
> If it says neither (modulo wording of course, anything close would
> suffice to define which behaviour is required), then I agree with:
>> Unfortunately, sscanf() with "%d" invokes undefined behavior if the
>> number can't be represented as an int (i.e., on overflow).

As I already told you, it invokes undefined behavior. That means the
standard imposes no requirements.

C99 7.19.6.2p10:

If this object does not have an appropriate type, or if the result
of the conversion cannot be represented in the object, the
behavior is undefined.

I suggest you get your own copy of
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
so you can look these things up yourself rather than depending on the
rest of us to do it for you.

Mark McIntyre

unread,
Feb 19, 2007, 6:08:32 PM2/19/07
to
On Mon, 19 Feb 2007 11:05:31 -0800, in comp.lang.c , rem...@yahoo.com
(robert maas, see http://tinyurl.com/uh3t) wrote:

>> From: Keith Thompson <k...@mib.org>
>> This is comp.lang.c, where we discuss the C programming language,
>> which is defined by the ISO C standard, which has a perfectly
>> good technical definition for the word "object".
>
>Is it anything like last year's new technical definition of "planet"
>by the IAU (International Astronom... Union)? :-(

It defines it as "a region of data storage... the contents of which
can represent values". This seems an entirely reasonable definition to
me. As someone has already said, the word has a wide variety of exact
meanings in many walks of life, so being precise is /not/
inappropriate.

>> I'm not saying that you shouldn't discuss "objects" in the OOP
>> sense, or in the sense of physical things, or whatever.
>
>OK, then given your instance, I'd be glad to try very hard to
>always qualify my use of the word "object" with one of:
>- In the original lisp sense, any distinctly allocated block of
> memory with a single handle on it, which would include c structs
> allocated with malloc or calloc.

This is pretty close to the C definition, if you think it through. I
don't think Lisp required the block of memory to be complex.

>What is the definition of "region" used there?

The standard itself doesn't define region. You would have to check
back in ISO/IEC 2832-1:1993 "Information Technology - Vocabulary
Part1: Fundamental Terms" to see what ISO defined it as.

>if a particular compiler optimizes space by moving the long int
>array ahead of either of the short int arrays to reduce amount of
>padding needed to respect long boundaries, so that
> a[7] a[8] c[0] c[1] c[2]
>form a contiguous block of memory, is that considered a "region"
>hence an "object"??

There's nothing which requires these to be contiguous, so I can't see
how they can be considered either an object or a single region.

>I suspect that ISO hasn't defined the meaning of "region" any more
>than the IAU defined the meaning of "clearing the neighborhood".

FWIW, the IAU had no need to define that since it can be inferred from
an amazing property known as "common sense".

>Also, I'm not clear on the intended meaning of "contents" and
>"values".

Egregious.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Keith Thompson

unread,
Feb 19, 2007, 6:09:37 PM2/19/07
to
rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:
> From: Keith Thompson <k...@mib.org>
[...]

>> I'm just saying that *in this newsgroup*, the word has the
>> overriding meaning of a "region of data storage in the execution
>> environment, the contents of which can represent values".
>
> What is the definition of "region" used there? Does it refer *only*
> to contiguous blocks of memory which are guaranteed to be
> contiguous per the spec, or does it also include blocks of memory
> which just happen to be contiguous per one implementation but not
> necessarily another? For example, if the declarations are:
> short int a[9];
> long int b[5];
> short int c[7];
> if a particular compiler optimizes space by moving the long int
> array ahead of either of the short int arrays to reduce amount of
> padding needed to respect long boundaries, so that
> a[7] a[8] c[0] c[1] c[2]
> form a contiguous block of memory, is that considered a "region"
> hence an "object"??

The C standard does not define the word "region". It does have a
normative reference to "ISO/IEC 2382?1:1993, Information technology --
Vocabulary -- Part 1: Fundamental terms". I don't know whether that
document defines "region" or not; if not, it should be understood to
have its usual English meaning.

The C standard is not a mathematically perfect formal definition. You
have to use some common sense in reading it.

As it happens, it's possible for two or more declared objects to be
adjacent in memory, and it's possible for a program to detect portably
whether they are or not. If two or more objects happen to be adjacent
in memory, I suppose the union of their memory regions could be
considered to be a single memory region, and therefore an object.
This is a mildly interesting technical point, but it's of no
particular use as far as I can see; for any program that tries to make
use of this, there are far better and more portable ways to do it.

Keith Thompson

unread,
Feb 19, 2007, 6:23:14 PM2/19/07
to
rem...@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:
>> From: Keith Thompson <k...@mib.org>
>> GNU C is the C-like language accepted by gcc. It's very similar
>> to standard C, but has a number of extensions, some of which
>> violate the C standard.
>
> Do you happen to know if there's a complete list of such violations
> online in accessible format (plain text or simple HTML)? I'd like
> to consult such a list as I develop my cookbook/matrix.

I don't know. gcc comes with extensive documentation, including a
section on gcc extensions. Its behavior on encountering a user of
such an extension depends on the command-line options. If it issues a
diagnostic (even just a warning) for anything that's a syntax error or
constraint violation in ISO C, that's probably enough for conformance.

Any gcc-specific questions not answered by the documentation should be
directed to gnu.gcc.help.

>> Note that gcc is a compiler, not a complete implementation; the C
>> runtime library is provided separately, and varies from one
>> platform to another. There is a GNU C library (separate from, but
>> usable with, gcc).
>
> Yes, I understand that. Presumably if you compile a c program with
> gcc, and specify the generic names of the headers for the various
> libraries, for example:
> #include <stdlib.h>
> rather than specifying the actual path to the header to library, for example:
> #include "/usr/local/bin/ansi/c/stdlib.h"
> and you don't use a switch such as -ansi that forces gcc to use the
> ansi instead of gnu version, then gcc will automatically arrange
> that you get the GNU C version of each library rather than the ANSI
> version. (Correct me if I'm wrong on this point!)

That's a gcc implementation detail, not a C language issue. <OT>The
gcc installation process creates modified versions of some of the
header files that already exist on the OS; the details are
off-topic.</OT>

And what exactly do you mean by "ANSI"? The current official C
standard, C99, was issued by ISO (and later adopted by ANSI). The
previous standard, which is still in wide use, is C90, also issued by
ISO (and adopted by ANSI). I suggest avoiding the use of "ANSI" as an
adjective; many people still use "ANSI C" to refer to the language
defined by the ANSI C89 and ISO C90 standard documents, but strictly
speaking that usage is incorrect. If you instead refer to "C90" or
"C99", you avoid the ambiguity.

> When I have line that looks like this:
> <li>GNU-c (#include &lt;stdlib.h&gt;) -- <em>mumble(x,y)</em></li>
> I mean to imply that the function mumble is defined in the GNU C
> version of the stdlib library but not in the corresponding ANSI
> version of that same-name library. If I make a mistake in such
> annotation, feel free to correct me.

On the web page, I saw, there were two functions marked "GNU-c", both
of them incorrectly. Both functions are defined by C99, but not by
C90.

<OT>
The phrase "the GNU C version of the stdlib library" doesn't make much
sense, unless you're referring to glibc. I use gcc on Linux, where
the C runtime library is glibc. I also use gcc on Solaris, where the
C runtime library is the one provided by Solaris. gcc is a compiler,
not a complete implementation.
</OT>

> I'm thinking of changing my notation. Instead of saying "GNU-c" as
> the language as I do there, instead just say "c", but have a
> footnote that explains the situation regarding GNU vs. ANSI. I'll
> be thinking more about it later today and maybe start updating the
> file as soon as I have decided how exactly do do it.

[...]

> I had the impression they were not in original c but are in GNU c,
> so that was the distinction I was making. But if they're in C99 as
> you claim, then I'd rather change that to show they're in C99
> (instead of GNU c) but not in original c.

Be careful with the term "original c" (or, preferably, "original C").
Versions of C existed long before the first ANSI standard.

> Is the C99 standard online in searchable/HTML format, for free, so
> that I could consult it to verify fine points like this instead of
> just taking your word for it? And are you referring to ANSI C99 or
> ISO C99 anyway??

n1124.pdf, referenced above, is the C99 standard with two Technical
Corrigenda merged into it. Any post-C99 chanages are marked with
change bars.

> <http://en.wikipedia.org/wiki/C_(programming_language)#C99>
> publication of ISO 9899:1999 in 1999. This standard is commonly
> referred to as "C99." It was adopted as an ANSI standard in March
> 2000.
> That's not clear whether it's both an ISO standard and ANSI
> standard, from the same document, or an ISO document but only an
> ANSI standard.

I have no idea what you're asking.

[snip]

CBFalconer

unread,
Feb 19, 2007, 5:06:26 PM2/19/07
to
"robert maas, see http://tinyurl.com/uh3t" wrote:
> From: Keith Thompson <k...@mib.org>
>>
>> C99 7.19.6.7p3:
>> The sscanf function returns the value of the macro EOF if an
>> input failure occurs before any conversion. Otherwise, the
>> sscanf function returns the number of input items assigned,
>> which can be fewer than provided for, or even zero, in the
>> event of an early matching failure.
>
> Does the C99 standard say anywhere whether a value is or is not
> assigned in case of overflow? That is, does it say either of these:

No, it says the behaviour is undefined. Anything may happen,
including launching WWIII. I believe C99 adds the possibility of
causing a signal. If you look at the input parsers I have
published here you will see means of detecting incipient overflow
and returning an appropriate error.

CBFalconer

unread,
Feb 19, 2007, 5:09:10 PM2/19/07
to
"robert maas, see http://tinyurl.com/uh3t" wrote:
> From: Keith Thompson <k...@mib.org>
>
>> GNU C is the C-like language accepted by gcc. It's very similar
>> to standard C, but has a number of extensions, some of which
>> violate the C standard.
>
> Do you happen to know if there's a complete list of such violations
> online in accessible format (plain text or simple HTML)? I'd like
> to consult such a list as I develop my cookbook/matrix.

No. The standard says that anything that is not defined in the
standard causes undefined behaviour. You can (and should) read it
for yourself. Search for N869 or N1124.

Yevgen Muntyan

unread,
Feb 19, 2007, 10:54:35 PM2/19/07
to
robert maas, see http://tinyurl.com/uh3t wrote:
>> From: Yevgen Muntyan <muntyan.removet...@tamu.edu>
>>> Start at the top of:
>>> <http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
>> What's "GNU-c" and why are some functions "GNU-c" while others
>> are "c"? Then some people read it and think they use GNU-c,
>> as some people use C/C++...
>
> I'm trying to warn the reader about some functions which aren't
> defined in the official standard, but *are* available in GNU C
> which is perhaps the most popular implementation (of compiler and
> loader-commands to use corresponding libraries), so the reader
> won't be totally confused if he writes code relying on such a
> library function but it's not available in the implementation he's
> using.

Imagine your reader is Windows or *BSD or MacOSX or Solaris user.
Does he care about popular implementations? Just don't mention
GNU-only functions at all. Good news is that you won't have to,
there are not so many GNU-only, BSD-only, Whatever-only functions
of general use.

> If you can suggest a better way to note this distinction
> between functions guaranteed to be available in all conforming
> implementations and those "extras" provided by major vendors,
> please do. I don't like the idea of just having a caveat at the top
> of the document saying "some of the c functions described below
> aren't available in all implemetations of c". I'd rather
> individually flag those very few which might not be available.
>
> Now if the particular function is in the C99 spec, that would be
> better to mention, but again I'd need some way of annotating those
> functions which aren't in older C but are in C99 without confusing
> the reader.

Better don't even try. Given that you don't know what's C99 and what's
GNU implementation of C library, imagine what reader will know after
reading your stuff. Just stick to C99.

[snip]

> By the way, I hope my readers will look up the complete definition
> of the function, using Google for example, any place my quickie
> definition isn't quite complete in all details.

Is this how *you* get the documentation? You should consider something
better, like C libraries manuals, man pages, C standard. man pages
are pretty good, they tell you about the standards given function
conforms to. Then you can check that information if you like. You can
also use code samples from man pages. Whatever is there is likely
to be of high value. Higher than random stuff from google.

At some point I did that. I had no idea what documentation to use,
how to use it, where to get it; did googling. I got some
wrong things deep in my mind then, and it's pretty hard to get
rid of those. Some such things are what is standard and what
is not. Please don't "help" other people like this. Don't write
documentation about stuff you don't know. Oh well.

Yevgen

Flash Gordon

unread,
Feb 19, 2007, 10:57:36 PM2/19/07
to
robert maas, see http://tinyurl.com/uh3t wrote, On 19/02/07 19:40:
>> From: Keith Thompson <k...@mib.org>

<snip>

>> Note that gcc is a compiler, not a complete implementation; the C
>> runtime library is provided separately, and varies from one
>> platform to another. There is a GNU C library (separate from, but
>> usable with, gcc).
>
> Yes, I understand that.

You seem not to.

> Presumably if you compile a c program with
> gcc, and specify the generic names of the headers for the various
> libraries, for example:
> #include <stdlib.h>
> rather than specifying the actual path to the header to library, for example:
> #include "/usr/local/bin/ansi/c/stdlib.h"
> and you don't use a switch such as -ansi that forces gcc to use the
> ansi instead of gnu version, then gcc will automatically arrange
> that you get the GNU C version of each library rather than the ANSI
> version. (Correct me if I'm wrong on this point!)

<snip>

What headers you include has no effect on what libraries you link to.
The headers you include are determined by the source code + compiler
options. The libraries you link to are determined by what options you
pass to the linker. If I use GCC on a machine where there is no GNU C
library then it does not use the GNU C library because there is not one
on the machine. It uses the C library that actually is there, such as
the MS one on Windows, the AIX one on AIX etc. If I use it on Linux then
it uses the GNU one because that is the only one installed. The compiler
option just affects the visibility of the extensions in the headers, not
what you link to.
--
Flash Gordon

Keith Thompson

unread,
Feb 20, 2007, 2:12:17 AM2/20/07
to
CBFalconer <cbfal...@yahoo.com> writes:
> "robert maas, see http://tinyurl.com/uh3t" wrote:
>> From: Keith Thompson <k...@mib.org>
>>>
>>> C99 7.19.6.7p3:
>>> The sscanf function returns the value of the macro EOF if an
>>> input failure occurs before any conversion. Otherwise, the
>>> sscanf function returns the number of input items assigned,
>>> which can be fewer than provided for, or even zero, in the
>>> event of an early matching failure.
>>
>> Does the C99 standard say anywhere whether a value is or is not
>> assigned in case of overflow? That is, does it say either of these:
>
> No, it says the behaviour is undefined. Anything may happen,
> including launching WWIII. I believe C99 adds the possibility of
> causing a signal. If you look at the input parsers I have
> published here you will see means of detecting incipient overflow
> and returning an appropriate error.

You're thinking of the behavior of overflow of arithmetic operations
on signed integers (e.g., MAX_INT + 1). In C90, it yields an
implementation-defined result; C99 added the possibility of raising an
implementation-defined signal.

Overflow in sscanf() for any numeric type invokes UB (which, of
course, includes the possibility of raising a signal).

Flash Gordon

unread,
Feb 20, 2007, 2:08:26 PM2/20/07
to
Yevgen Muntyan wrote, On 20/02/07 03:54:

> robert maas, see http://tinyurl.com/uh3t wrote:
>>> From: Yevgen Muntyan <muntyan.removet...@tamu.edu>
>>>> Start at the top of:
>>>> <http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
>>> What's "GNU-c" and why are some functions "GNU-c" while others
>>> are "c"? Then some people read it and think they use GNU-c,
>>> as some people use C/C++...
>>
>> I'm trying to warn the reader about some functions which aren't
>> defined in the official standard, but *are* available in GNU C
>> which is perhaps the most popular implementation (of compiler and
>> loader-commands to use corresponding libraries), so the reader
>> won't be totally confused if he writes code relying on such a
>> library function but it's not available in the implementation he's
>> using.
>
> Imagine your reader is Windows or *BSD or MacOSX or Solaris user.
> Does he care about popular implementations? Just don't mention
> GNU-only functions at all. Good news is that you won't have to,
> there are not so many GNU-only, BSD-only, Whatever-only functions
> of general use.

As a strong supporter of sticking to standard C where possibly I
strongly disagree. There are a vast number of system specific functions
which are extremely useful, it's just that they are not topical here.

>> If you can suggest a better way to note this distinction
>> between functions guaranteed to be available in all conforming
>> implementations and those "extras" provided by major vendors,
>> please do. I don't like the idea of just having a caveat at the top
>> of the document saying "some of the c functions described below
>> aren't available in all implemetations of c". I'd rather
>> individually flag those very few which might not be available.
>>
>> Now if the particular function is in the C99 spec, that would be
>> better to mention, but again I'd need some way of annotating those
>> functions which aren't in older C but are in C99 without confusing
>> the reader.
>
> Better don't even try. Given that you don't know what's C99 and what's
> GNU implementation of C library, imagine what reader will know after
> reading your stuff. Just stick to C99.

Very poor advice. You will leave all the poor users of Visual Studio
Express wondering why they don't have snprintf (only _snprintf IIRC
which has significant differences) etc.

> [snip]
>
>> By the way, I hope my readers will look up the complete definition
>> of the function, using Google for example, any place my quickie
>> definition isn't quite complete in all details.
>
> Is this how *you* get the documentation? You should consider something
> better, like C libraries manuals, man pages, C standard. man pages
> are pretty good, they tell you about the standards given function
> conforms to.

When did you last look at the man pages on SCO, AIX, IRIX etc... I'm
sure some of them do what you say, but you do not know that all man
pages do not that the versions installed on the OPs system do. The
bibliography of the comp.lang.c FAQ references some good books and the
comp.lang.c FAQ is good.

> Then you can check that information if you like. You can
> also use code samples from man pages.

You have to check the copywrite before copying and publishing code.

> Whatever is there is likely
> to be of high value. Higher than random stuff from google.

Random stuff from google is not good for those who don't know the
subject already, I agree.

> At some point I did that. I had no idea what documentation to use,
> how to use it, where to get it; did googling. I got some
> wrong things deep in my mind then, and it's pretty hard to get
> rid of those. Some such things are what is standard and what
> is not. Please don't "help" other people like this. Don't write
> documentation about stuff you don't know. Oh well.

Indeed. Robert has already written documentation about what he does not
know.
--
Flash Gordon

It is loading more messages.
0 new messages