I have a very simple program designed to a base string, which is a
numeral in base 12, supplied at the command line or from standard
input, into decimal, then send it back to the calling function as a
double. Very simple, under 200 lines.
Before converting the string to a decimal double, I test whether the
string is written in exponential notation, and if so, send it to a
function to convert it out of exponential notation. This consists of
juggling the integral marker (hard-coded as a semicolon, at this
point) back or forward in the string depending on the value of the
exponent, cutting off the exponent in the process. It was a hairier
problem than I expected, and it's not pretty, but it works *almost*
every time. I know that the problem is in this function because no
errors occur when the string is *not* in exponential notation, and
this function is therefore never called.
It works properly on all tested strings sent from the command line.
When I supply the strings from standard input, however, some strange
things happen. They only happen on the second string sent through the
function. The errors do *not* occur if I send a string that isn't in
exponential notation, which makes me think the problem is in the
exponential-handling function.
If the first two strings sent through stdin are "X44;4" (decimal
1492.3repeating) and "X;444e2" (the same number), the first is
converted properly and the second is converted to 17908.0000, which
about the proper number multiplied by twelve (not quite, but very
close). All subsequent passes work fine, and the same input in any
place but the second time works fine. "X44;4" does *not* fail when
passed through second.
If the first two strings sent through stdin are "X;3" (10.25 decimal)
and "0;x3e1" (the same number), the first is converted properly and
the second throws a segfault. Entering "0;x3e1" in any other place
(as in, not second from standard input) works perfectly fine,
producing "10.25" as output predictably. "X;3" does *not* fail when
passed through second.
I stared at this for some time, then in desperation sshed over to my
i686 box, recompiled the same code, and tried it there. Both of these
situations worked properly, without any issues.
I'm compiling with the following command line:
gcc -lm -o dec dec.c
I've googled the issue; in some places I'm told to try compiling with
the -arch x86_64 flag, and in other places I'm told that gcc should
automatically compile for x86_64 on an x86_64 system. I can't try it
for eight or nine more hours (I'm at work), so I thought I'd ask here
and see if perhaps the code itself is bad.
Here's the function at issue: it takes a pointer to the string to be
converted, plus an integer representing the point of the string where
the "e" is, computed at call time (I loop through the string looking
for an "e" or "E" to determine whether it needs to be called at all;
since I already had the loop index set for that when I called the
function, I thought it made sense to send it to the function rather
than let the function do the same loop again itself). Error checking
for the input has already occurred at this point, but it's proper
input that's causing this problem anyway. It calls "doztodec(char
*s)" to convert the part of the string that represents the exponent to
decimal; this function works fine on all input, and typically returns
a double. I'm putting it in a char because the exponent has to be
below 127 (my precision doesn't get anywhere like that high here), but
I understood that a cast would be done automatically in this case.
Should I make it explicit? I've sprinkled a lot of extra comments for
this posting to make it clearer what's going on.
*******
int expkill (char *s, int expspot)
{
int i,j;
char zensuf[3]; /* the base-12 exponent suffix */
char exp; /* the value of the suffix */
int semi; /* where semicolon is in original string */
for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
*(zensuf+j) = *(s+i); /* put the exponent in zensuf */
exp = doztodec(zensuf); /* convert the exponent to decimal char */
for (semi=0; *(s+semi) != ';' && *(s+semi) != '\0'; ++semi); /* set
semi = semicolon's location */
*(s+expspot) = '\0'; /* cut off the "e" and the exponent */
if (semi >= strlen(s))
return 0; /* if the semicolon is already at the end of the string,
end processing */
if (exp > 0) { /* if the semicolon needs to be moved to the right */
for (i=semi; *(s+i+1) != '\0'; ++i)
*(s+i) = *(s+i+1); /* move everything over to get rid of original
semicolon */
*(s+i) = '\0'; /* move string terminator to new end of string */
for (i=strlen(s); i > semi+exp; --i)
*(s+i) = *(s+i-1); /* move the rest of string over to make room for
new semi */
*(s+semi+exp) = ';'; /* insert the new semicolon */
} else if (exp < 0) { /* if the semicolon needs to be moved to the
left */
exp = -exp, ++exp;
for (i=strlen(s)+exp; i >= 0; --i)
*(s+i) = *(s+i-exp); /* move string to right to make room for new
digits */
for (i=0; i < exp; ++i)
*(s+i) = '0'; /* pad left side of string with zeroes */
for (semi=0; *(s+semi) != ';'; ++semi); /* find location of
semicolon */
for (i=semi; *(s+i) != '\0'; ++i)
*(s+i) = *(s+i+1); /* move over to eliminate original semicolon */
*(s+i) = '\0'; /* insert new end of string */
*(s+1) = ';'; /* put in new semicolon */
}
if (*(s+strlen(s)-1) == ';')
*(s+strlen(s)-1) = '\0'; /* if there's no fractional part now, don't
have semicolon */
return 0;
}
*******
The contents of the file that produces erroneous input is:
X44;4
X;444e2
The contents of the file that throws a segfault is:
X;3
0;X3e1
Thanks to anyone who can help me figure this out.
Could you post a short, complete program that demonstrates
the problem? Based on your report that "it works" with strings
entered on the command line but fails when input is read from a
stream, I have a sneaking suspicion that the problem is in the
way the strings are read. (If you're using fgets(), did you
remember that the '\n' at the end of the line is still there?)
Also, a little more information on your duodecimal notation
would be welcome. It seems that 'X' or 'x' stands for the
duodecimal digit ten, and I'm going to guess that 'Y'/'y' mean
eleven -- but that doesn't seem to be spelled out anywhere.
--
Eric Sosman
eso...@ieee-dot-org.invalid
Hmm. Not for several more hours, at least; I don't have a compiler
here at work, so I couldn't ensure that what I posted would work.
Plus, to get it really working I'd probably have to include all the
code.
I'm not using fgets(), though; I'm using a home-rolled getline(),
because I'm working through K&R and it's comfortably familiar to me.
Yes, 'X' or 'x' would work for ten; also 'T', 't', 'A', or 'a'. I
tend to use 'X' myself. Eleven is 'E', 'B', or 'b' (not 'e', as that
would conflict with the exponents). I didn't think it would be
relevant since whenever this expkill() function is not called,
everything works great; and indeed, whenever expkill() *is* called, it
all works great except for the second item read in.
I'll go ahead and humiliate myself and post the entire code. I
haven't worked in error checking yet, but since it's good input that's
breaking it I don't suppose that matters at this point.
*******
int main(int argc, char *argv[])
{
int c;
char doznum[MAXLINE];
while (*(++argv) != NULL && *argv[0] == '-') {
while (c = *++argv[0])
switch (c) {
default:
fprintf(stderr,"dec: illegal option \"%c\"\n",c);
break;
}
}
if (*(++argv) == NULL) {
printf("%f\n",doztodec(*(argv-1)));
return 0;
}
while (getline(doznum,MAXLINE) != EOF) {
printf("%f\n",doztodec(doznum));
}
return 0;
}
/* convert exponential notation to normal b/f processing */
int expkill (char *s, int expspot)
{
int i,j;
char zensuf[3];
char exp;
int semi; /* where semicolon is */
for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
*(zensuf+j) = *(s+i);
exp = doztodec(zensuf);
for (semi=0; *(s+semi) != ';' && *(s+semi) != '\0'; ++semi);
*(s+expspot) = '\0';
if (semi >= strlen(s))
return 0;
if (exp > 0) {
for (i=semi; *(s+i+1) != '\0'; ++i)
*(s+i) = *(s+i+1);
*(s+i) = '\0';
for (i=strlen(s); i > semi+exp; --i)
*(s+i) = *(s+i-1);
*(s+semi+exp) = ';';
} else if (exp < 0) {
exp = -exp, ++exp;
for (i=strlen(s)+exp; i >= 0; --i)
*(s+i) = *(s+i-exp);
for (i=0; i < exp; ++i)
*(s+i) = '0';
for (semi=0; *(s+semi) != ';'; ++semi);
for (i=semi; *(s+i) != '\0'; ++i)
*(s+i) = *(s+i+1);
*(s+i) = '\0';
*(s+1) = ';';
}
if (*(s+strlen(s)-1) == ';')
*(s+strlen(s)-1) = '\0';
return 0;
}
double doztodec(char *s)
{
char frac[MAXDIGS];
int i, j;
int endpoint;
double decnum = 0.0;
for (i=0; *(s+i) != '\0'; ++i)
if (*(s+i) == 'e' || *(s+i) == 'E')
expkill(s,i);
for (i=0; *(s+i) != ';'; ++i);
endpoint = i;
if (endpoint < (strlen(s) - 1)) {
for (i=++i,j=0; *(s+i) != '\0'; ++i,++j)
*(frac+j) = *(s+i);
*(s+endpoint) = *(frac+j) = '\0';
decnum += fracdec(frac);
}
decnum += wholedec(s);
return decnum;
}
double wholedec(char *s)
{
int addnum = 0; /* added to decnum each loop */
int i;
int multiple = 1; /* provides multiple for loop */
double decnum = 0.0; /* final result to return */
char sign = 0;
reverse(s);
if (*(s+strlen(s)-1) == '-') {
sign = 1;
*(s+strlen(s)-1) = '\0';
}
for (i=0; *(s+i) != '\0'; ++i) {
addnum = decmify(*(s+i));
addnum *= multiple;
decnum += addnum;
multiple *= 12;
}
return (sign == 0) ? decnum : decnum * -1;
}
double fracdec(char *s)
{
int i;
double divisor = 1.0;
double fracval = 0.0;
for (i=0; *(s+i) != '\0'; ++i)
divisor *= 12.0;
fracval = wholedec(s) / divisor;
return fracval;
}
char decmify(char c)
{
switch (c) {
case '0': return 0; break;
case '1': return 1; break;
case '2': return 2; break;
case '3': return 3; break;
case '4': return 4; break;
case '5': return 5; break;
case '6': return 6; break;
case '7': return 7; break;
case '8': return 8; break;
case '9': return 9; break;
case 'X': case 'x': case 'T': case 't': case 'A': case
'a':
return 10; break;
case 'E': case 'B': case 'b':
return 11; break;
default: return 0; break;
}
}
void reverse(char *s)
{
int i, j;
char tmp;
for (i=0, j=strlen(s)-1; i<j; ++i, --j) {
tmp = *(s+i);
*(s+i) = *(s+j);
*(s+j) = tmp;
}
}
int getline(char *s, int lim)
{
int c, i;
for (i=0; (c=getchar())!=EOF && c!='\n' && c!='\t' && c!=' '; ++i) {
if (i < lim-1)
*(s+i) = c;
else if (i == lim)
*(s+i) = '\0';
}
if (c == EOF)
return EOF;
*(s+i) = '\0';
return i;
}
*******
Input file that returns wrong output:
*******
X44;4
X;444e2
*******
Input file that throws a segfault:
*******
X;3
0;X3e1
*******
These errors only show up on my AMD64 box, not on my x86 box. The
same numbers that throw errors when on the second line of the input
file do not when on any other line.
Thanks again.
#define MAXDIGS 30
#define MAXLINE 30
int getline(char *s, int lim);
void reverse(char *s);
char decmify(char c);
double wholedec(char *s);
double doztodec(char *s);
double fracdec(char *s);
int expkill (char *s, int expspot);
*******
That, plus the rest of it. Thanks again.
The foolishest question is the unasked one.
Nevertheless I'd suggest, as a later exercise,
trying to simplify this code.
> /* convert exponential notation to normal b/f processing */
> int expkill (char *s, int expspot)
> {
> int i,j;
> char zensuf[3];
> char exp;
> int semi; /* where semicolon is */
>
> for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
> *(zensuf+j) = *(s+i);
> exp = doztodec(zensuf);
I'd not be surprised if your error is right here! The for loop
does NOT copy the '\0' character, but doztodec() depends on
it. Since uninitialized bytes in zensuf[] will vary based
on "irrelevant" details, this would tend to explain
your symptoms.
By the way, when in doubt, use asserts or printf's to check
what's going on. For example a
fprintf(stderr, "Expkill %d %d\n", strlen(s), expspot);
at the beginning of expkill() would error-check its args
(though not the actual problem here).
> [several more lines snipped]
I didn't study the whole code. I just assumed that, were I
to find the error before patience ran outit would
be at beginning of examined code. :-)
James Dow Allen
> On Jan 7, 11:37 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>> On 1/7/2010 11:04 AM, Don wrote:
>>
>> > Hello, folks.[...]
>>
>> Could you post a short, complete program that demonstrates
>> the problem?
<snip>
The string in zensuf is not null terminated. You stop when "see" the
null but you never put it into the array.
> exp = doztodec(zensuf);
This call expects a proper string -- one the is null terminated.
See? There many be no null.
> if (*(s+i) == 'e' || *(s+i) == 'E')
> expkill(s,i);
> for (i=0; *(s+i) != ';'; ++i);
> endpoint = i;
> if (endpoint < (strlen(s) - 1)) {
> for (i=++i,j=0; *(s+i) != '\0'; ++i,++j)
> *(frac+j) = *(s+i);
> *(s+endpoint) = *(frac+j) = '\0';
> decnum += fracdec(frac);
> }
> decnum += wholedec(s);
> return decnum;
> }
This is not magic or quick eyes. I compiled with -g, ran it under
valgrind and followed the logic to see what was wrong. i don't know
how I managed without valgrind! There may be other errors, but deal
with them one at a time.
BTW, a quick tip: Your code will loads easier to read is every *(x+e)
was replaced by x[e]. Most people expect to see pointers indexed with
[] and it is much easier to read.
--
Ben.
Yes! I'm actually fairly happy with most of it, other than the expkill
() mess. But that is definitely beyond ugly. There must be a better
way to do that.
> I'd not be surprised if your error is right here! The for loop
> does NOT copy the '\0' character, but doztodec() depends on
> it. Since uninitialized bytes in zensuf[] will vary based
> on "irrelevant" details, this would tend to explain
> your symptoms.
Hmm. Yes, I'll fix that; funny I missed it, when generally I end up
putting '\0's where they already are just to be sure. But would that
really explain why it fails consistently on the second call, and only
the second call, on x86_64 and not on i686?
@Ben Bacarisse:
I'd never heard of valgrind before today; I'll be sure to look at it.
I just recently discovered Beej's Quick Guide to GDB, which looks like
it'll help me a lot, too.
> BTW, a quick tip: Your code will loads easier to read is every *(x+e)
> was replaced by x[e]. Most people expect to see pointers indexed with
> [] and it is much easier to read.
Hmm. When I went through the pointers chapter in K&R, it advised me
to go through my old programs with the array indexing and change them
to pointers, as it would be faster that way, though harder to read "at
least for the uninitiated."
Because you are clearly initiated, I question this now. Is array
indexing more normal? That would be annoying for me, since I've
finally gotten into the habit of using the pointer notation, but I'd
rather the code be maintainable.
To add to Ben's good advice, and despite the fact you may get away
with it, this isn't a good idea:
>> for (i=++i,j=0; *(s+i) != '\0'; ++i,++j)
^^^^^
Note the size for zensuf
> > char exp;
> > int semi; /* where semicolon is */
>
> > for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
> > *(zensuf+j) = *(s+i);
and what happens above if j > 2 ?
--
Fred K
Indeed; it's shockingly bad, bad enough that even I can see it's bad.
It does work, of course, but much better would be i=i+1. Which is
what I'm changing it to.
Nothing; it can't be. I'm using doubles here, and I can't get
precision with more than two digits worth of decimal places. That's
why I'm storing it in a char, too; it'll never get anywhere near 127.
As I mentioned, I haven't put in error checks yet, but I haven't been
testing with that yet, either.
Is it even possible to get more precision than that with C?
So all I need to do to fix this bizarre problem about the second call,
and only the second call, failing when every other call, even with the
same input, succeeds is null terminate the string?
But j isn't > 2 :-) :-)
I do agree with you though, Fred, and if feeling
verbose would have offered:
> Is your memory so tight that a red-zoned zensuf[5] would
> be intolerable wastage?
But then I'd be afraid of followups like
>> Why not just zensuf[100] and get it over with?
(Hopefully no pedant would invoke malloc() for this trivia!)
Meanwhile, this would have given Eric Sosman the
chance to mention Pascal's wager! :-)
James
That's exactly what I'm afraid of. I'm a C dilettante, really, just
getting my feet wet here, and I want to be as strict with myself as
possible. That way I won't go reserving 100 bytes for a string that
can't possibly be more than three (since more than three would produce
a meaningless result in the end anyway).
Frankly, I'm failing at that, too. I use "int" as a return value for
all my functions, even when I don't need a return value at all, and I
rarely *need* an int; a char would do just fine. But when I get the
right results from good input, I'll start working on what to do with
bad input; I just don't want to spend a lot of time error-checking if
I'm going to have to rewrite all the algorithms and then change the
error code anyway.
Oh, what a tangled web we weave
When first we practise to de-C-eive!
Don't feel bad; everyone's a beginner at some point. I'm
not going to try to debug the whole thing for you, but I'll
point out a few errors, awkwardnesses, and opportunities for
simplification (which usually means improvement). My overall
impression is that you're working much too hard ...
> #include<stdio.h>
> #include<stdlib.h>
> #include<string.h>
> #include<math.h>
>
> #define MAXDIGS 30
> #define MAXLINE 30
>
> int getline(char *s, int lim);
> void reverse(char *s);
> char decmify(char c);
> double wholedec(char *s);
> double doztodec(char *s);
> double fracdec(char *s);
> int expkill (char *s, int expspot);
>
> int main(int argc, char *argv[])
> {
> int c;
> char doznum[MAXLINE];
>
> while (*(++argv) != NULL&& *argv[0] == '-') {
> while (c = *++argv[0])
> switch (c) {
> default:
> fprintf(stderr,"dec: illegal option \"%c\"\n",c);
> break;
> }
> }
Not a terribly helpful start, is it? "Find the command-line
arguments beginning with '-' and ignore them, except to issue
complaints about any other characters they contain ..."
Anyhow: At the end of this loop, argv points to a pointer
to the first non-'-' argument, or to a NULL pointer if there
are no such arguments.
> if (*(++argv) == NULL) {
... and you immediately move onward, not having bothered
to test whether an "onward" even exists! If someone runs the
program with no non-'-' command-line arguments, this test takes
you to Never-Never Land because you're trying to inspect a pointer
that isn't in the argv array, has no defined value, and may
not even exist!
> printf("%f\n",doztodec(*(argv-1)));
Here and below, it might be a good idea to print the string
*and* the converted value, as a sanity check. (Since the string
gets destroyed in the process of conversion, print it before
calling doztodec() rather than after.)
> return 0;
> }
Okay (assuming one or more non-'-' arguments), but passing
strange. If there's exactly one argument, you convert it to a
number, print it, and exit. If there are two or more, you
ignore them all and proceed. Weird.
> while (getline(doznum,MAXLINE) != EOF) {
> printf("%f\n",doztodec(doznum));
> }
> return 0;
> }
[In the original, expkill() appeared here. I've moved it
further down because I think the narrative flow works better
that way.]
> double doztodec(char *s)
> {
> char frac[MAXDIGS];
> int i, j;
> int endpoint;
> double decnum = 0.0;
>
> for (i=0; *(s+i) != '\0'; ++i)
Throughout the program, readability would be improved if
you treated pointers like arrays and used array-index notation.
For example, `s[i]' is easier to read than `*(s+i)', and means
exactly the same thing.
> if (*(s+i) == 'e' || *(s+i) == 'E')
> expkill(s,i);
> for (i=0; *(s+i) != ';'; ++i);
If you ever get an input string that has no ';', this
is a Never-Never Land loop ... You should also acquaint
yourself with the strchr() function.
> endpoint = i;
> if (endpoint< (strlen(s) - 1)) {
A simpler test for "Is there anything after the semicolon?"
is to ask whether the next character is the end-of-string '\0'
or is something else: `if (s[endpoint+1] != '\0')'.
> for (i=++i,j=0; *(s+i) != '\0'; ++i,++j)
> *(frac+j) = *(s+i);
`strcpy(frac, s+endpoint)' is a lot simpler. It may or may
not do the same thing as your loop, because what you've written
has no defined behavior in the C language. (Hint: Look closely
at the first expression in the `for' statement.)
> *(s+endpoint) = *(frac+j) = '\0';
> decnum += fracdec(frac);
On the other hand, there doesn't seem to be a need to copy
the fraction digits at all. Instead, you could convert them
straight from the string: `fracdec(s+endpoint+1)' and forget
about frac[] altogether.
> }
> decnum += wholedec(s);
> return decnum;
> }
>
> /* convert exponential notation to normal b/f processing */
> int expkill (char *s, int expspot)
> {
> int i,j;
> char zensuf[3];
> char exp;
> int semi; /* where semicolon is */
>
> for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
> *(zensuf+j) = *(s+i);
You'd better hope there aren't more than three characters
after the 'e' ...
> exp = doztodec(zensuf);
Actually, it's worse: doztodec() expects a '\0'-terminated
string, and you've provided no '\0' to terminate it. Who knows
how far doztodec() might wander through uncharted memory before
it finds a '\0' byte, or what kind of result it might return from
whatever garbage it stumbles across first?
But, as with frac[] above, there doesn't seem to be any need
to copy the exponent. Use doztodec(s+expspot+1), and convert it
right where it lies.
> for (semi=0; *(s+semi) != ';'&& *(s+semi) != '\0'; ++semi);
Again, the strchr() function is your friend. Or wants to be
your friend, at any rate.
By the way, sticking the semicolon at the end of the same line
as its for is a good way to confuse people. A hasty reader may
well think that the next line is the for's body, when in fact the
for's body is that inconspicuous no-space between the ) and the ;.
Rather than make the empty body unnoticeable, highlight it with
for (...; ...; ...)
;
or even with
for (...; ...; ...)
continue;
> *(s+expspot) = '\0';
> if (semi>= strlen(s))
> return 0;
> if (exp> 0) {
> for (i=semi; *(s+i+1) != '\0'; ++i)
> *(s+i) = *(s+i+1);
> *(s+i) = '\0';
The memmove() function is your friend. You should also
meet and greet the memcpy() function, even though you should not
use the latter in this instance.
> for (i=strlen(s); i> semi+exp; --i)
> *(s+i) = *(s+i-1);
> *(s+semi+exp) = ';';
It seems what you're trying to do here is slide the semicolon
rightward to adjust for the exponent; is that it? Okay, fine:
You'll change "12;345e1=2" to 1234;5" and all's well. But what
will you do with "1;e4"? Slide four places right and you'll
slide right off the edge of the world ...
> } else if (exp< 0) {
> exp = -exp, ++exp;
"Simplify, simplify!" Try `exp = 1 - exp'.
> for (i=strlen(s)+exp; i>= 0; --i)
> *(s+i) = *(s+i-exp);
Again, memmove(). And again, what if there are too few
digits to accommodate the slide? "12;345e-1" gives "1;2345",
but what happens to "12;345e-6"?
> for (i=0; i< exp; ++i)
> *(s+i) = '0';
> for (semi=0; *(s+semi) != ';'; ++semi);
> for (i=semi; *(s+i) != '\0'; ++i)
> *(s+i) = *(s+i+1);
> *(s+i) = '\0';
> *(s+1) = ';';
> }
> if (*(s+strlen(s)-1) == ';')
> *(s+strlen(s)-1) = '\0';
> return 0;
As an alternative (and far simpler) approach, consider
a different way of combining the base number and the exponent.
Rather than doing all this string-bashing to slide the ';'
back and forth (and worrying about whether there's enough
room), stop and think: the exponent part multiplies the base
part by some power of twelve. Read up on the pow() function,
and see if anything suggests itself.
Caveat: Because floating-point arithmetic usually involves
rounding errors, the results you get from the string-bashing and
arithmetical versions are likely to be slightly different. For
example, 1;2e1 means 12; means exactly fourteen, but if you
instead compute 1;2 ~= 1.1666666667 and then multiply by 12 you
will probably get something just a little different. This may
or may not be a problem for your application.
> }
>
> double wholedec(char *s)
> {
> int addnum = 0; /* added to decnum each loop */
> int i;
> int multiple = 1; /* provides multiple for loop */
> double decnum = 0.0; /* final result to return */
> char sign = 0;
>
> reverse(s);
> if (*(s+strlen(s)-1) == '-') {
> sign = 1;
> *(s+strlen(s)-1) = '\0';
> }
> for (i=0; *(s+i) != '\0'; ++i) {
> addnum = decmify(*(s+i));
> addnum *= multiple;
> decnum += addnum;
> multiple *= 12;
> }
This is much more complicated than it needs to be. You
could do without reverse() altogether by starting at the end
of the string and working backward instead of reversing the
string and then marching forward. Even easier, you could
work from left to right with (pseudocode)
decnum = 0
for each digit d
decnum = decnum * 12 + decmify(d)
> return (sign == 0) ? decnum : decnum * -1;
> }
>
> double fracdec(char *s)
> {
> int i;
> double divisor = 1.0;
> double fracval = 0.0;
Pointless initialization: The initial value will never be
used, so you might as well initialize to `-2.71828'. Or to
nothing. As it is, specifying an initial value fools the reader
into thinking that the value is important -- when in fact it is
completely UNimportant.
>
> for (i=0; *(s+i) != '\0'; ++i)
> divisor *= 12.0;
> fracval = wholedec(s) / divisor;
> return fracval;
As it turns out, not only is the initial value of fracval
unimportant, but the variable itself is just about useless.
You could write `return wholedec(s) / divisor;' and get rid
of fracval altogether. This will not make your code any more
or less efficient, since a decent compiler will do the equivalent
of discarding fracval anyhow. But by removing irrelevant material
(I recently saw Kevin Kline as Cyrano de Bergerac, and although
the production played more for laughs than for romance the romance
was by no means wanting -- but I digress), you make it easier for
the reader (Jose Ferrer was good, too) to follow what's afoot.
Source code is written for two audiences: Compilers and
programmers. The latter are the more important; write with
their needs foremost in your mind.
> }
>
> char decmify(char c)
> {
> switch (c) {
> case '0': return 0; break;
> case '1': return 1; break;
> case '2': return 2; break;
> case '3': return 3; break;
> case '4': return 4; break;
> case '5': return 5; break;
> case '6': return 6; break;
> case '7': return 7; break;
> case '8': return 8; break;
> case '9': return 9; break;
You might simplify things a little by taking advantage of
the fact that the codes for '0' through '9' are consecutive
and ascending. Hence, if c is one of those digits, c-'0' is
its numerical value.
> case 'X': case 'x': case 'T': case 't': case 'A': case
> 'a':
> return 10; break;
> case 'E': case 'B': case 'b':
> return 11; break;
Strange choice of digits, but that's entirely up to you.
Given 'X' and 'x' as ten, I'd also have expected 'Y' and 'y'
as eleven -- but no. Also, it appears the 'E' case (and what
about 'e', by the way?) can appear only in the exponent, for
an input like "12;345eE".
> default: return 0; break;
You might want to consider returning an "impossible" value
here, something like -1, perhaps. (On the other hand, the rest
of the program doesn't seem to pay any attention to possible
anomalies detected here, but just plows blindly ahead with
whatever value comes back.)
> }
> }
>
> void reverse(char *s)
> {
> int i, j;
> char tmp;
>
> for (i=0, j=strlen(s)-1; i<j; ++i, --j) {
> tmp = *(s+i);
> *(s+i) = *(s+j);
> *(s+j) = tmp;
> }
> }
>
> int getline(char *s, int lim)
> {
> int c, i;
>
> for (i=0; (c=getchar())!=EOF&& c!='\n'&& c!='\t'&& c!=' '; ++i) {
Treating tabs and spaces as "line ends" seems odd. You might
want to rename the function to getword() or some such, since "line"
almost always means '\n'-terminated to a C programmer.
> if (i< lim-1)
> *(s+i) = c;
> else if (i == lim)
> *(s+i) = '\0';
Note that this *always* stores beyond the end of the array.
You should either change the call to pass MAXLINE-1 or (better,
IMHO) change the way you handle lim in this function.
> }
> if (c == EOF)
> return EOF;
> *(s+i) = '\0';
Oh, darn! If i reached lim, the loop above was careful to
stop storing characters (except for the off-by-one error). But
after the loop is finished, you store to s[i] anyhow. When
i==lim+12345, it'll all end in tears ...
> return i;
> }
As I said at the start, I haven't tried to find every single
bug -- I haven't even tried to find the exact cause(s) of the
bugs you reported. I've spotted a few that may (or may not) be
to blame, and I've pointed out some things you could simplify.
I think that if you undertake some of the simplifications, the
resulting code may be easier to debug than the original. Good luck!
--
Eric Sosman
eso...@ieee-dot-org.invalid
That's not a useful kind of strictness, IMHO.
Basically, if you want to be strict, be strict about being cautious and
ensuring that you have buffers against problems. It is usually unwise
to assume that you know what input might yield meaningful results after
future changes...
> Frankly, I'm failing at that, too. I use "int" as a return value for
> all my functions, even when I don't need a return value at all, and I
> rarely *need* an int; a char would do just fine.
But "int" is almost always better, because it's the native thing -- on
many targets, using char would hurt performance significantly.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
Unless you are specifically working on your use of arrays and pointers,
you might want to simplify the way you approach this. If you use some
of the standard functions, it becomes much more readable.
This is a quick and dirty example (for concept rather than style). It
takes a copy of the string (let's say "X;444e2") and:
- Takes a copy so it's non-destructive;
- Breaks off the exponent and keeps track of it's value which
it obtains by calling itself* leaving the copy as "X;444";
- Finds where the ';' is, remembers where it is and removes it
to leave the copy as "X444";
- Calculates the value of the number;
- Corrects for the ';' and exponent.
It's much shorter and simpler.
No doubt it will now be picked apart for some horrible error or its
style. ;-)
double doz2dec(char *s)
{
char *copy;
char *ptr;
int exp=0;
int offset=0;
double value;
/* Use a copy to preserve original */
copy = malloc(strlen(s)+1);
if (!copy) {
perror("Couldn't find enough memory");
exit(EXIT_FAILURE);
}
memcpy(copy, s, strlen(s)+1);
/* Find the exponent and remove
* from the copy to leave the
* number */
if ((ptr = strchr(copy, 'e')) || (ptr = strchr(copy, 'E'))) {
*ptr = '\0';
exp = doz2dec(ptr+1);
}
/* Remove ';' whilst remembering
offset */
if ((ptr = strchr(copy, ';'))) {
offset = strlen(copy) - (ptr-copy) - 1;
memmove(ptr, ptr+1, offset+1);
}
/* Calculate value */
value = 0.0L;
for (ptr=copy; *ptr; ptr++) {
value = (value * 12) + decmify(*ptr);
}
/* Adjust for exponent and
offset */
if (offset-exp) {
value /= pow(12, offset-exp);
}
free(copy); /* Don't need the copy any more */
return value;
}
You misunderstood K&R's advice. What they were suggesting is that it's
usually faster to walk a pointer through an array rather than using
subscripts. For example, instead of:
for (i = 0; i < n; i++)
a[i] += 10;
do:
for (p = a; p < a + n; p++)
*p += 10;
The notation is irrelevant -- *(a + i) is exactly equivalent to a[i],
but a[i] is usually much easier to read.
--
Larry Jones
You can never really enjoy Sundays because in the back of your
mind you know you have to go to school the next day. -- Calvin
>It does work, of course
There's no of course about it. An implementation might work out
what the incremented value will be, assign it to i, and then do the
increment.
-- Richard
--
Please remember to mention me / in tapes you leave behind.
On Jan 7, 3:31 pm, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> Not a terribly helpful start, is it? "Find the command-line
> arguments beginning with '-' and ignore them, except to issue
> complaints about any other characters they contain ..."
Placeholders, for when I do include options. I have a few I plan to
work in. I've got a converter that goes in the other direction with
arguments concerning output format and desired precision; I'll
implement the same options here, once I get the core conversion
working.
> > if (*(++argv) == NULL) {
>
> ... and you immediately move onward, not having bothered
> to test whether an "onward" even exists! If someone runs the
> program with no non-'-' command-line arguments, this test takes
> you to Never-Never Land because you're trying to inspect a pointer
> that isn't in the argv array, has no defined value, and may
> not even exist!
Is it possible to run a program with no non-'-' arguments? The
program name will always be there, won't it? Of course, I see that I
skip over that anyway with the initial ++argv in the while loop.
I suppose I should just test it "if (*(argv) == NULL)" then?
> > if (*(s+i) == 'e' || *(s+i) == 'E')
> > expkill(s,i);
> > for (i=0; *(s+i) != ';'; ++i);
>
> If you ever get an input string that has no ';', this
> is a Never-Never Land loop ... You should also acquaint
> yourself with the strchr() function.
Which is weird, because it works anyway. But clearly I've just been
lucky; I've changed it to:
for (i=0; *(s+i) != ';' && *(s+i) != '\0'; ++i);
(I haven't gotten to changing all the pointer notation yet, so I wrote
it that way to be consistent. I'll get to it.)
> A simpler test for "Is there anything after the semicolon?"
> is to ask whether the next character is the end-of-string '\0'
> or is something else: `if (s[endpoint+1] != '\0')'.
Much simpler, no function call overhead. Fixed.
>
> > for (i=++i,j=0; *(s+i) != '\0'; ++i,++j)
> > *(frac+j) = *(s+i);
>
> `strcpy(frac, s+endpoint)' is a lot simpler. It may or may
> not do the same thing as your loop, because what you've written
> has no defined behavior in the C language. (Hint: Look closely
> at the first expression in the `for' statement.)
>
> > *(s+endpoint) = *(frac+j) = '\0';
> > decnum += fracdec(frac);
>
> On the other hand, there doesn't seem to be a need to copy
> the fraction digits at all. Instead, you could convert them
> straight from the string: `fracdec(s+endpoint+1)' and forget
> about frac[] altogether.
Yes, and one less string I'll have to test for overflow. frac[] is
eliminated; fixed.
> You'd better hope there aren't more than three characters
> after the 'e' ...
There can't be, as I understand it, because that would mean much more
precision than the data type can handle. I'll write in error checking
on it later.
> > exp = doztodec(zensuf);
>
> Actually, it's worse: doztodec() expects a '\0'-terminated
> string, and you've provided no '\0' to terminate it. Who knows
> how far doztodec() might wander through uncharted memory before
> it finds a '\0' byte, or what kind of result it might return from
> whatever garbage it stumbles across first?
Yes; fixed that by giving zensuf a '\0'.
> But, as with frac[] above, there doesn't seem to be any need
> to copy the exponent. Use doztodec(s+expspot+1), and convert it
> right where it lies.
Which also saves yet another loop. Thanks; fixed.
> It seems what you're trying to do here is slide the semicolon
> rightward to adjust for the exponent; is that it? Okay, fine:
> You'll change "12;345e1=2" to 1234;5" and all's well. But what
> will you do with "1;e4"? Slide four places right and you'll
> slide right off the edge of the world ...
Yes; I should pad it with zeroes, and will do so.
> Again, memmove(). And again, what if there are too few
> digits to accommodate the slide? "12;345e-1" gives "1;2345",
> but what happens to "12;345e-6"?
Ha; I actually thought of that one. :-) They're padded on the left
with zeroes, and then semicolon inserted later. That's what
> > for (i=0; i< exp; ++i)
> > *(s+i) = '0';
is for. I'm not sure if this actually works (in fact, looking at it I
think it doesn't), but that's what I was going for. I'll test it out
tonight.
> will probably get something just a little different. This may
> or may not be a problem for your application.
It is; I'm looking for the maximum possible accuracy. Eliminating all
those small inaccuracies is something I'm very interested in.
> decnum = 0
> for each digit d
> decnum = decnum * 12 + decmify(d)
Implemented this with the following:
*******
while ((c = s[i++]) != '\0') {
decnum = decnum * 12 + decmify(c);
*******
My, that looks a lot better. Thanks; that's quite clever. (I haven't
tested it yet, but I'm pretty sure it matches your pseudocode.) And
yes, I remembered to declare char c above.
> > double fracdec(char *s)
> > {
> > int i;
> > double divisor = 1.0;
> > double fracval = 0.0;
>
> Pointless initialization: The initial value will never be
> used, so you might as well initialize to `-2.71828'. Or to
> nothing. As it is, specifying an initial value fools the reader
> into thinking that the value is important -- when in fact it is
> completely UNimportant.
Well, divisor *has* to start at 1.0, doesn't it? Otherwise, when I
say divisor *= 12.0; I'm going to get zero. Aren't number variables
intialized to zero unless otherwise stated? I don't have the standard
with me here, but that's what I was going for there. fracval, though,
I follow you; though it doesn't matter, because I got rid of fracval
entirely, as you suggested.
> Source code is written for two audiences: Compilers and
> programmers. The latter are the more important; write with
> their needs foremost in your mind.
That's as much as I could work through now; I'll get the rest later.
Thanks again; it's been very eye-opening.
> memcpy(copy, s, strlen(s)+1);
Wouldn't "strcpy(copy, s)" be quicker (no need to search through
twice), less error-prone (won't forget the "+1") and more readable
(says what is intended)?
Yes, but I'm allergic to strcpy. ;-)
Yes, the program name will always be there (it might be the
empty string "" if no name can be determined), but suppose
that's the *only* thing that's there. Then the *++argv in the
first loop will find the NULL at the end of the two-place array
{name, NULL}, and the *++argv in the next test will walk right
off the end.
> I suppose I should just test it "if (*(argv) == NULL)" then?
That would work: "Look before you leap." An alternative
would be to use the argc value, which tells you the total
number of command-line arguments.
>> A simpler test for "Is there anything after the semicolon?"
>> is to ask whether the next character is the end-of-string '\0'
>> or is something else: `if (s[endpoint+1] != '\0')'.
>
> Much simpler, no function call overhead. Fixed.
It's not about the overhead: Even though there is some, it
is likely[*] to be so tiny as to be difficult to measure. The
real improvement comes from the better expression of intent:
You really don't care about the length of the string, but about
whether the string continues or doesn't continue beyond the
semicolon -- so write your test that way. Don't weigh the car
to find out how many passengers are in it; look through the
window.
[*] I once had the experience of working on a machine where
the overhead was LARGE and could not be neglected. Is it any
wonder that the maker of that machine is no longer in business?
>> You'd better hope there aren't more than three characters
>> after the 'e' ...
>
> There can't be, as I understand it, because that would mean much more
> precision than the data type can handle. I'll write in error checking
> on it later.
Take your own route, do what works for you. In close to
forty years of writing programs in dozens of languages, it's
always -- always! -- seemed to me that handling the exceptional
cases is best thought about first, not last. One big reason
is that it's not enough just to detect a strangeness: You must
also choose how to respond to it. That choice may have profound
effects on the overall program structure and on the definitions
of your functions/subroutines/lambdas/methods, and is the sort
of thing that's hard to "stick on" at the end.
>> It seems what you're trying to do here is slide the semicolon
>> rightward to adjust for the exponent; is that it? Okay, fine:
>> You'll change "12;345e1=2" to 1234;5" and all's well. But what
>> will you do with "1;e4"? Slide four places right and you'll
>> slide right off the edge of the world ...
>
> Yes; I should pad it with zeroes, and will do so.
Ah, but is there enough room in the string? Those command-
line arguments, for example: If one of them is "12;3e7" you can
only be sure of seven characters (including the '\0'). If you
try to store the eleven characters of "123000000;" into the
space that suffices for seven, ...
> Ha; I actually thought of that one. :-) They're padded on the left
> with zeroes, and then semicolon inserted later. That's what
>
>>> for (i=0; i< exp; ++i)
>>> *(s+i) = '0';
>
> is for. I'm not sure if this actually works (in fact, looking at it I
> think it doesn't), but that's what I was going for. I'll test it out
> tonight.
I was running out of steam when I got to that loop, and didn't
look at it closely (still haven't). But again: How do you know
there's enough room for all those zeroes? The seven characters
of "12;e-7" won't hold the nine characters of ";0000012".
>> will probably get something just a little different. This may
>> or may not be a problem for your application.
>
> It is; I'm looking for the maximum possible accuracy. Eliminating all
> those small inaccuracies is something I'm very interested in.
Okay. A full-scale job that squeezes the last little bit of
accuracy out of the conversion is, to coin a phrase, "non-trivial."
>>> double divisor = 1.0;
>>> double fracval = 0.0;
>>
>> Pointless initialization: The initial value will never be
>> used, so you might as well initialize to `-2.71828'. Or to
>> nothing. As it is, specifying an initial value fools the reader
>> into thinking that the value is important -- when in fact it is
>> completely UNimportant.
>
> Well, divisor *has* to start at 1.0, doesn't it? [...]
Sorry if I was unclear: This comment was aimed only at the
single line that preceded it. Yes, divisor's value is used --
but fracval's is not, and that's the useless initialization.
--
Eric Sosman
eso...@ieee-dot-org.invalid
> On Jan 7, 12:35 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> The string in zensuf is not null terminated. You stop when "see" the
>> null but you never put it into the array.
>>
>> > exp = doztodec(zensuf);
>>
>> This call expects a proper string -- one [that] is null terminated.
<I edited a typo if mine>
>
> So all I need to do to fix this bizarre problem about the second call,
> and only the second call, failing when every other call, even with the
> same input, succeeds is null terminate the string?
I don't know for sure. It's a start. I'd be surprised it that is all
you need to fix. I only had a moment so I just wanted indicate what
the problem was.
Most people are quite friendly here, so don't be afraid to post again
if you want the code to be looked over once you have done what you
feel you can.
--
Ben.
Only by accident. It's behavior is completely undefined -- it could do
*anything* (including appearing to work, which seems to be the behavior
you're seeing).
> but much better would be i=i+1.
Or just ++i or i++. For what it's worth, most C programmers choose i++
when they don't care one way or the other.
--
Larry Jones
It COULD'VE happened by accident! -- Calvin
Why so? Are you one of these who has been scared off strcpy? It's a
cargo cult thing. There's nothing wrong with strcpy that can be fixed
with strncpy.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
I normally go with strncpy - I chose went with memcpy for no particular
reason. It's strcpy I have problems with - for obvious reasons.
What are those obvious reasons?
I can't think of anything specific that would make me use memcpy instead
of strcpy for data that's intended to be a string.
Perhaps you misread what I wrote. There's nothing wrong with strcpy that
can be fixed with strncpy. Why use strncpy, then? It offers no
advantages and one or two disadvantages.
> - I chose went with memcpy for no particular
> reason. It's strcpy I have problems with - for obvious reasons.
They're not obvious to me.
Nah - I'm being stupid.
I use memcpy when implementing dupstr, on the grounds that I've already
calculated the size anyway so I might as well use it and possibly gain a
nanopicofemptosecond or two: char*dupstr(const char*s){size_t
len=strlen(s)+1;}char*p=malloc(len);if(p)memcpy(p,s,len);return p;}
See my reply to Seebs. ;-)
I saw your later response to Seebs, but just to mention -- if
strncpy is the answer, you're very likely asking the wrong question.
If the data being copied just fits the target, the target won't be
properly null-terminated. If the target is bigger than the source,
the target will be passed with multiple null characters, which are
most likely useless.
If you really want a data structure consisting of a sequence of
non-null characters followed by *zero* or more null characters,
filling the entire array, then strncpy is the right tool. But more
often, what you really want is a string, which is a sequence of
non-null characters terminated by a single null character; what
follows the null terminator typically doesn't matter, and there's
no point in wasting effort initializing it.
(And of course don't confuse the null character, '\0', with a null
pointer, NULL.)
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
All fair points, and I do know all that. I was quite literally being
stupid - just a brainfart.
It happens to all of us.
The reason I was having trouble there was that I was
dereferencing the pointer. The *value* of the pointer
shouldn't be NULL; rather, the *address* should be NULL, to
invoke that part of the program. It now reads:
if (argv == NULL) {
And all is well.
> Take your own route, do what works for you. In close to
> forty years of writing programs in dozens of languages, it's
> always -- always! -- seemed to me that handling the exceptional
> cases is best thought about first, not last. One big reason
> is that it's not enough just to detect a strangeness: You must
> also choose how to respond to it. That choice may have profound
> effects on the overall program structure and on the definitions
> of your functions/subroutines/lambdas/methods, and is the sort
> of thing that's hard to "stick on" at the end.
Well, I certainly don't want to disregard the wisdom of my
elders. (Whether or not any of you are older than I am.)
In this case, I didn't think these were really "exceptional
cases," as you put it; they were just cases that wouldn't
work. As such, I thought I'd write the code for the cases
that can work, and write later code to reject data that
can't possibly work---like triple-digit exponents for
doubles.
>> Yes; I should pad it with zeroes, and will do so.
>
> Ah, but is there enough room in the string? Those command-
> line arguments, for example: If one of them is "12;3e7" you can
> only be sure of seven characters (including the '\0'). If you
> try to store the eleven characters of "123000000;" into the
> space that suffices for seven, ...
Ah! Yes; I was still thinking in the mode of the other
direction, where I allocated a string that was safely large
enough. In that program, I could just move the null
character back, provided that I didn't move it back past the
allocated space. In this case, the string is set
automatically when received into argv, and I don't have that
option. Hmm. I really have little idea how to handle this.
Do I need to measure how much bigger this string must be,
then call realloc() to make sure I've got room?
>> It is; I'm looking for the maximum possible accuracy. Eliminating all
>> those small inaccuracies is something I'm very interested in.
>
> Okay. A full-scale job that squeezes the last little bit of
> accuracy out of the conversion is, to coin a phrase, "non-trivial."
Yes, and probably (by which I mean "certainly") beyond my
competence. But I want to do what little I can do right, if
I can.
Absolutely. Usenet is the best place to find help. I've
learned almost everything I know about computers in general,
Unix, Linux, TeX, Metafont, and so on (more than my C
knowledge would indicate, I promise) from Usenet. If you're
an arrogant jerk demanding free tech support, you won't find
it welcoming. But if you do your homework and don't expect
everyone to do everything for you, it'll be a great help.
I was pleased to find comp.lang.c as helpful as the other
groups I've used. Thank you all again.
That seems very unlikely.
argv should never be null. argv is always a valid pointer to an array of
other pointers, the last one of which is null.
You're right.
But testing whether (*(++argv)) is null *does* work, and
sending *(argv-1) to my function works. It seems like this
doesn't really step out of array bounds, then. When I test
*(++argv), when a non-'-' command line argument is passed,
I'm testing *argv[argc], which is null. So that's fine.
When there is no such argument, though, I *am* stepping
beyond the array. I think I may finally understand this.
Why not test whether *argv is null? What do you all think
of this?
*******
while (*(++argv) != NULL && *argv[0] == '-') {
if (isdigit((*argv+1)[1]) || (*argv+1)[1] == '.')
break; /* negative numbers are not optional args */
while (c = *++argv[0])
switch (c) {
default:
fprintf(stderr,"dec: illegal option \"%c\"\n",c);
break;
}
}
if (*argv != NULL) {
printf("%f\n",doztodec(*(argv)));
return 0;
}
while (getword(doznum,MAXLINE) != EOF) {
printf("%f\n",doztodec(doznum));
}
return 0;
*******
This seems to do everything I want it to do. (That is, to
convert the command line argument if present, to convert
what it receives on stdin if not.) Am I still missing any
hidden errors here?
> >>>> No doubt it will now be picked apart for some horrible error or its
> >>>> style. ;-)
>
> >>>> memcpy(copy, s, strlen(s)+1);
>
> >>> Wouldn't "strcpy(copy, s)" be quicker (no need to search through
> >>> twice), less error-prone (won't forget the "+1") and more readable
> >>> (says what is intended)?
>
> >> Yes, but I'm allergic to strcpy. ;-)
why? and why the smiley? You do realise that any problem with strcpy()
(non-string in source, not enough space in target) apply EQUALLY to
strlen/memcpy? Plus strcpy/memcpy is more expensive- it scans the
string twice.
> > Why so? Are you one of these who has been scared off strcpy? It's a
> > cargo cult thing. There's nothing wrong with strcpy that can be fixed
> > with strncpy.
and a couple of possible problems that may be introduced by strncpy()
> I normally go with strncpy -
why? Have you read its specification?
> I chose went with memcpy for no particular
> reason. It's strcpy I have problems with - for obvious reasons.
which are?
--
"she's as headstrong as
an allegory on the banks of the Nile"
I'll refer you to my earlier answers: I was being stupid and not
thinking properly at the end of a long day - simple as that.
Okay.
To give a longer answer, I don't use strcpy much...but in my own code,
for various reasons I have the length precalculated in any case, so the
cost of strncpy/memcpy is no big deal. I completely acknowledge that
(in this case) strcpy is the better choice by far. A little less haste
would have changed that, as would a second more of considering the
follow-up.
Okay?
Can I put a smiley now? :-P
(In case it comes across wrongly, I am just lightening the mood)
[re strcpy/strncpy]
> Can I put a smiley now? :-P
> (In case it comes across wrongly, I am just lightening the mood)
It is actually quite difficult to express disagreement on Usenet (and,
it sometimes seems, in real life) without connotations of antagonism and
irritation - even when one's comments are not intended to express these
ideas, they are sometimes read that way anyway. I think you may have
over-interpreted Nick K's and my responses.
Not long ago (a few years, that's all), I had a long, long, long long
long debate with a guy in alt.comp.lang.learn.c-c++ about strncpy vs
strcpy. He was quite convinced that strcpy was unsafe, and that strncpy
was safe - at first. He argued well, so it was a good debate. It was an
even better debate because each of us was sceptical of the other's
position but nevertheless ready to try to understand it. And it was even
more betterer debate because, throughout, it was conducted with civility
and respect on both sides.
Incidentally... by the end of it, my opponent had changed his mind. And
that's the best kind of debate - one where the participants are
intellectually honest and are prepared to change their mind if necessary.
But the most important thing is that, at the end of the debate, we were
still friends. It can happen, and it does happen.
> Seebs <usenet...@seebs.net> wrote:
>> argv should never be null. argv is always a valid pointer to an array of
>> other pointers, the last one of which is null.
>
> You're right.
>
> But testing whether (*(++argv)) is null *does* work, and
> sending *(argv-1) to my function works. It seems like this
> doesn't really step out of array bounds, then. When I test
> *(++argv), when a non-'-' command line argument is passed,
> I'm testing *argv[argc], which is null. So that's fine.
argc can be 0 with *argv == NULL on some systems. When that happens,
argv may be a pointer to (the start of) a single-element array in
which case *++argv does step outside the array.
> When there is no such argument, though, I *am* stepping
> beyond the array. I think I may finally understand this.
> Why not test whether *argv is null?
Yes, you need to test *argv for NULL or, alternatively, for argc being
0. This makes me wonder why you don't!
> What do you all think of this?
> *******
> while (*(++argv) != NULL && *argv[0] == '-') {
You have to test *argv not *++argv. This alters almost everything
that follows, though you'll get away with it if there is always a
non-NULL argv[0].
> if (isdigit((*argv+1)[1]) || (*argv+1)[1] == '.')
I don't think you meant that. argv[0][1] is the character that
follows the '-' you have just identified in the 'while' test. If you
want to write that with fewer indexes, that would be *(argv[0] + 1) or
*(*argv + 1) or (*argv + 1)[0] but argv[0][1] is so much clearer than
any of the alternatives. (*argv + 1)[1] is the second digit. Did you
test with single digit negative number?
> break; /* negative numbers are not optional args */
> while (c = *++argv[0])
Whilst I don't object to modifying argv, modifying argv[0] worries me
a little. It's permitted, but it complicates matters (debugging in
particular) so I prefer to set a 'flags' pointer to the current arg
and "walk" that.
> switch (c) {
> default:
> fprintf(stderr,"dec: illegal option \"%c\"\n",c);
> break;
> }
> }
> if (*argv != NULL) {
> printf("%f\n",doztodec(*(argv)));
> return 0;
> }
> while (getword(doznum,MAXLINE) != EOF) {
> printf("%f\n",doztodec(doznum));
> }
> return 0;
<snip>
--
Ben.
Actually, I was actually trying to ensure it was clear that *I* wasn't
irritated. I didn't take any of the comments badly - it was me that was
being stupid. My only irritation was with myself. ;-)
Really? The program could be executed without its own name ever being
called?
> > When there is no such argument, though, I *am* stepping
> > beyond the array. I think I may finally understand this.
> > Why not test whether *argv is null?
>
> Yes, you need to test *argv for NULL or, alternatively, for argc being
> 0. This makes me wonder why you don't!
Well, I did, later on.
> > What do you all think of this?
> > *******
> > while (*(++argv) != NULL && *argv[0] == '-') {
>
> You have to test *argv not *++argv. This alters almost everything
> that follows, though you'll get away with it if there is always a
> non-NULL argv[0].
I thought that there *was* always a non-NULL argv[0]. K&R sayeth, "By
convention, argv[0] is the name by which the program was invoked, so
argc is at least 1." Granted, that's not a statement of the standard,
but I'm having a hard time seeing how a program can be invoked without
it being invoked by some name, which would be argv[0].
*******
while (*argv != NULL && *argv[0] == '-') {
*******
Isn't that redundant anyway, though? If *argv[0] == '-', then we know
that *argv isn't null. But say I increment argv elsewhere; won't it
fail on the very first loop, unless it's called with a name beginning
with '-'?
What about the following:
*******
while (--argc > 0 && (*++argv)[0] == '-') {
}
> > if (isdigit((*argv+1)[1]) || (*argv+1)[1] == '.')
>
> I don't think you meant that. argv[0][1] is the character that
> follows the '-' you have just identified in the 'while' test. If you
> want to write that with fewer indexes, that would be *(argv[0] + 1) or
> *(*argv + 1) or (*argv + 1)[0] but argv[0][1] is so much clearer than
> any of the alternatives. (*argv + 1)[1] is the second digit. Did you
> test with single digit negative number?
Yes; haven't figured that one out yet.
> I thought that there *was* always a non-NULL argv[0]. K&R sayeth, "By
> convention, argv[0] is the name by which the program was invoked, so
> argc is at least 1." Granted, that's not a statement of the standard,
> but I'm having a hard time seeing how a program can be invoked without
> it being invoked by some name, which would be argv[0].
> *******
> while (*argv != NULL && *argv[0] == '-') {
> *******
> Isn't that redundant anyway, though? If *argv[0] == '-', then we know
> that *argv isn't null. But say I increment argv elsewhere; won't it
> fail on the very first loop, unless it's called with a name beginning
> with '-'?
>
> What about the following:
*******
while (--argc > 0 && (*++argv)[0] == '-') {
while (c = *++argv[0])
/* option stuff here */
}
if (argc >= 1)
printf("%f\n",doztodec(*argv));
return 0;
}
while (getword(doznum,MAXLINE) != EOF) {
printf("%f\n",doztodec(doznum));
}
******
The first and second while tests I took out of K&R, p117. I'd tried
them before but they'd failed for reasons I couldn't understand; I
suppose it was probably due to other mistakes, because it looks to me
like it ought to work. Here, if I call it with "./dec "X44;4"", it'll
start by testing 1 > 0 and 'X' != '-', so it drops out of the loop.
It then sees that argc >= 1, so it executes that code and returns 0 to
the shell. If I call it with "./dec < testfile", it'll test 0 > 0,
drop out of the loop; it'll then see that argc < 1, so it'll go to the
second loop and input from stdin. If, on the other hand, I call it
with some future option, say "./dec -e "X44;4"", it'll start be
testing 2 > 0 and '-' == '-', so it'll execute the option code; it'll
then decrement argc and test 1 > 0, then increment argv and see that
'X' != '-', so it'll drop out and test argc, which is still 1, and run
the code there. But if I call it with "./dec -e < testfile", it'll
test 1 > 0 and '-' == '-', run the option code, decrement argc and
increment argv, then test 0 > 0, it'll fail, and it'll drop out of the
loop, see that argc < 1, and skip down to the second loop to receive
input from stdin.
It still doesn't cover the case where argv[0] is null, but otherwise,
does that sound about right?
> > > if (isdigit((*argv+1)[1]) || (*argv+1)[1] == '.')
>
> > I don't think you meant that. argv[0][1] is the character that
> > follows the '-' you have just identified in the 'while' test. If you
> > want to write that with fewer indexes, that would be *(argv[0] + 1) or
> > *(*argv + 1) or (*argv + 1)[0] but argv[0][1] is so much clearer than
> > any of the alternatives. (*argv + 1)[1] is the second digit. Did you
> > test with single digit negative number?
You're right. Plus, it doesn't account for the ';', 'X', and the
like. But to keep it simple for demonstrative purposes, what I mean
is rather this:
*******
if (isdigit(argv[0][1]) || argv[0][1] == ';')
*******
That should work, since argv now points to the current option. That
also explains the problems I couldn't figure out about single-digit
negative numbers. (Thanks; stupid of me, really.)
I busily rewriting expkill() to make sure I've got enough memory for
the new string, and to utilize strchr and memmove as you suggested.
I'll put it up when I've got something reasonably non-idiotic (I hope).
> On Jan 8, 6:57 am, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> argc can be 0 with *argv == NULL on some systems. When that happens,
>> argv may be a pointer to (the start of) a single-element array in
>> which case *++argv does step outside the array.
>
> Really? The program could be executed without its own name ever being
> called?
Yes, really! The rather messily numbered 5.1.2.2.1 paragraph 2 only
states that:
2 If they are declared, the parameters to the main function shall obey
the following constraints:
- The value of argc shall be nonnegative.
- argv[argc] shall be a null pointer.
- If the value of argc is greater than zero, the array members
argv[0] through argv[argc-1] inclusive shall contain pointers to
strings, [...]
- If the value of argc is greater than zero, the string pointed to
by argv[0] represents the program name; [...]
[...]
argc can be 0 and then argv[0] will be a null pointer.
>> > When there is no such argument, though, I *am* stepping
>> > beyond the array. I think I may finally understand this.
>> > Why not test whether *argv is null?
>>
>> Yes, you need to test *argv for NULL or, alternatively, for argc being
>> 0. This makes me wonder why you don't!
>
> Well, I did, later on.
Too late by then -- you might have gone over the array.
>> > What do you all think of this?
>> > *******
>> > while (*(++argv) != NULL && *argv[0] == '-') {
>>
>> You have to test *argv not *++argv. This alters almost everything
>> that follows, though you'll get away with it if there is always a
>> non-NULL argv[0].
>
> I thought that there *was* always a non-NULL argv[0]. K&R sayeth, "By
> convention, argv[0] is the name by which the program was invoked, so
> argc is at least 1." Granted, that's not a statement of the standard,
> but I'm having a hard time seeing how a program can be invoked without
> it being invoked by some name, which would be argv[0].
I doubt K&R2 says that. You can't rely on a statement about C as it
was more than 30 years ago!
> *******
> while (*argv != NULL && *argv[0] == '-') {
> *******
> Isn't that redundant anyway, though?
No, but if there is an argv[0] it tests the program name for starting
with a '-' which is probably not what you want.
> If *argv[0] == '-', then we know
> that *argv isn't null.
No you don't. You know that if *argv != NULL you can ask about
*argv[0] but if you don't test first that *argv != NULL you could get
any result at all from *argv[0] because it is undefined behaviour.
> But say I increment argv elsewhere; won't it
> fail on the very first loop, unless it's called with a name beginning
> with '-'?
I don't follow this, sorry. The key point is that, to be portable,
you need to be aware that argc can be 0 and argv can point to a single
item that will (in that case) be a null pointer.
> What about the following:
> *******
> while (--argc > 0 && (*++argv)[0] == '-') {
>
> }
That's OK. Maybe not the clearest, but it's fine.
<snip>
--
Ben.
> *grumble* Stupid web interface I'm stuck with at work...
>
>> I thought that there *was* always a non-NULL argv[0]. K&R sayeth, "By
>> convention, argv[0] is the name by which the program was invoked, so
>> argc is at least 1." Granted, that's not a statement of the standard,
>> but I'm having a hard time seeing how a program can be invoked without
>> it being invoked by some name, which would be argv[0].
>> *******
>> while (*argv != NULL && *argv[0] == '-') {
>> *******
>> Isn't that redundant anyway, though? If *argv[0] == '-', then we know
>> that *argv isn't null. But say I increment argv elsewhere; won't it
>> fail on the very first loop, unless it's called with a name beginning
>> with '-'?
>>
>> What about the following:
> *******
> while (--argc > 0 && (*++argv)[0] == '-') {
> while (c = *++argv[0])
> /* option stuff here */
> }
Yup.
> if (argc >= 1)
> printf("%f\n",doztodec(*argv));
Missing { but, yes, that's fine. I don't find it that clear. I had
to think long and hard to see that the -- in the while has already
changed argc so this test is the correct one.
My personal taste is to process all non-flag arguments so that last
part would be a while rather than an if.
> return 0;
> }
> while (getword(doznum,MAXLINE) != EOF) {
> printf("%f\n",doztodec(doznum));
> }
> ******
> The first and second while tests I took out of K&R, p117. I'd tried
> them before but they'd failed for reasons I couldn't understand; I
> suppose it was probably due to other mistakes, because it looks to me
> like it ought to work. Here, if I call it with "./dec "X44;4"", it'll
> start by testing 1 > 0 and 'X' != '-', so it drops out of the loop.
> It then sees that argc >= 1, so it executes that code and returns 0 to
> the shell. If I call it with "./dec < testfile", it'll test 0 > 0,
> drop out of the loop; it'll then see that argc < 1, so it'll go to the
> second loop and input from stdin. If, on the other hand, I call it
> with some future option, say "./dec -e "X44;4"", it'll start be
> testing 2 > 0 and '-' == '-', so it'll execute the option code; it'll
> then decrement argc and test 1 > 0, then increment argv and see that
> 'X' != '-', so it'll drop out and test argc, which is still 1, and run
> the code there. But if I call it with "./dec -e < testfile", it'll
> test 1 > 0 and '-' == '-', run the option code, decrement argc and
> increment argv, then test 0 > 0, it'll fail, and it'll drop out of the
> loop, see that argc < 1, and skip down to the second loop to receive
> input from stdin.
>
> It still doesn't cover the case where argv[0] is null, but otherwise,
> does that sound about right?
Yes it does cover that case! If argv[0] is null argc *must* be zero
and neither while nor later if will do anything dangerous (at least I
don't think so).
<snip>
--
Ben.
Right; it's within the standard. But I'm still having trouble seeing
how it's actually possible, in practice.
> > I thought that there *was* always a non-NULL argv[0]. K&R sayeth, "By
> > convention, argv[0] is the name by which the program was invoked, so
> > argc is at least 1." Granted, that's not a statement of the standard,
> > but I'm having a hard time seeing how a program can be invoked without
> > it being invoked by some name, which would be argv[0].
>
> I doubt K&R2 says that. You can't rely on a statement about C as it
> was more than 30 years ago!
Ha! For the first time on comp.lang.c, I'm right! :-) Really,
that's a direct quotation from page 115 of the second edition. I'm an
attorney by trade, so every time I put quotation marks around
something that isn't an exact quote, a puppy dies; I'm therefore very
careful about it. And my edition was published in 1988, which is
considerably less computer-ancient than thirty years ago. Still
definitely ancient, though, I'll grant you.
I guess my question is this: granted that argv[0] == NULL is possible
under the standard, is it possible with any actual execution on a real
system? You've proven to me that it's *legal*, but I'm having a hard
time seeing how it could really happen.
> That's OK. Maybe not the clearest, but it's fine.
Thanks. As I mentioned, it didn't work originally, but there were so
many other things wrong with the initial program that it's worth
trying again.
> On Jan 8, 9:28 am, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> argc can be 0 and then argv[0] will be a null pointer.
>
> Right; it's within the standard. But I'm still having trouble seeing
> how it's actually possible, in practice.
>
>> > I thought that there *was* always a non-NULL argv[0]. K&R sayeth, "By
>> > convention, argv[0] is the name by which the program was invoked, so
>> > argc is at least 1." Granted, that's not a statement of the standard,
>> > but I'm having a hard time seeing how a program can be invoked without
>> > it being invoked by some name, which would be argv[0].
>>
>> I doubt K&R2 says that. You can't rely on a statement about C as it
>> was more than 30 years ago!
>
> Ha! For the first time on comp.lang.c, I'm right! :-) Really,
> that's a direct quotation from page 115 of the second edition.
That's a shame. Errors are rare. I wonder if it is corrected in the
errata. I can't see to get the official list at the moment.
> I'm an
> attorney by trade, so every time I put quotation marks around
> something that isn't an exact quote, a puppy dies; I'm therefore very
> careful about it. And my edition was published in 1988, which is
> considerably less computer-ancient than thirty years ago. Still
> definitely ancient, though, I'll grant you.
I was not doubting your quote. You just did not say what edition it
was from and since errors in K&R2 are rare, some doubt it natural.
> I guess my question is this: granted that argv[0] == NULL is possible
> under the standard, is it possible with any actual execution on a real
> system? You've proven to me that it's *legal*, but I'm having a hard
> time seeing how it could really happen.
You can certainly make it happen by calling execv.
<snip>
--
Ben.
It's not really an error, is it? After all, he says "[b]y
convention," not "[b]y standard." By convention that's certainly
true, isn't it?
> You can certainly make it happen by calling execv.
Wow. execv is news to me. Okay, well, double thanks for helping me
account for that possible situation, then.
I'm working on rewriting expkill() using strchr() and memmove() and
having a hard time doing it. Basically, I think I can get it working
with any input except when that input would require padding the left
with zeroes. But I'll get there.
While I'm at it, here's what I've worked out for ensuring that my
string has sufficient memory to hold the expanded number (if the
number has to expand, that is). As I see it, the problem is that argv
gives me a pointer to a string bounded by exactly enough memory to
hold the characters I see, plus '\0'. So, for example, if I run "./
dec "X44;4e5"", I've got 8 bytes of memory reserved. The expanded
string, though, should be "X4440000", requiring 9 bytes of memory
(including the '\0'). So I need to ensure that this allocated memory
for my string gets expanded by the appropriate amount.
Because I've been advised repeatedly not to be so stingy with
allocating memory, I'm going to go ahead and give myself more than I
need (probably), while ensuring that it's *at least* what I need
(definitely). So I'll allocate the length of the string, plus a
number of bytes equal to the exponent given. Prior to this, I've
already truncated the 'e' and the exponent. So, for "X44;4e5", then,
I'll wind up with 11 bytes reserved. As I understand it, sizeof()
returns the number of bytes allocated as an integer. So sizeof(s),
where s="X44;4e5", will return 8; however, where I've received the
string from stdin, I've allocated much more than a double can possibly
hold anyway, so I want to ensure that I'm not needlessly adding memory
when that's the case. So I want to make sure that I'm only increasing
the available memory if there's a chance the memory might be
insufficient. Here's what I've got:
*******
if (sizeof(s) < sizeof(strlen(s) + exp))
if (realloc(s, sizeof(strlen(s) + exp)) == NULL) {
fprintf(stderr,"dec: insufficient memory; converting %s\n",s);
return 0;
}
*******
I believe that this code will execute only if s has less memory
allocated than is sufficient to hold the current strlen of s, plus a
number of bytes equal to the value of my exponent, exp. If that's the
case, it reallocates memory to s such that s now has enough memory to
hold itself, plus exp more bytes. The characters allocated beyond the
'\0' that terminates s are garbage; however, those from 0 to strlen(s)
remain unaltered. If there isn't enough memory (which seems
exceedingly unlikely, but why not?), I print an error message and
return. This does return a meaningful value to the calling function
because the 'e' and the exponent have already been truncated when this
is called. They won't get the answer they're expecting, though, hence
the error message.
How's it look? Be gentle; it's my very first malloc/realloc/calloc/
etc call.
Ben has thrown you a curve: The execve() function is not
part of C, but part of the POSIX standards for Unix-style
systems. In those systems, it's one of a family of functions
whose job is to start a new program, essentially by terminating
the current program and launching a new one to replace it.
By using exec*() on POSIX systems, you can start a new
program with whatever command-line arguments you like, including
"none at all." You can also provide an argv[0] that has nothing
to do with the "name" of the program -- indeed, some POSIX
shells do this on purpose.
> I'm working on rewriting expkill() using strchr() and memmove() and
> having a hard time doing it. Basically, I think I can get it working
> with any input except when that input would require padding the left
> with zeroes. But I'll get there.
"Back at the ranch," as they say, here's a thought you might
want to ponder: What is the purpose of padding with leading or
trailing zeroes? That is, what will your program do differently
with "123;" than with "123000;" or ";000123"? And here's the
crucial question: Do you actually need those zeroes to get the
effect you want, or could you get it some other way? If nothing
comes to mind, study the way your fracval() function computes
and uses the `divisor' value ...
--
Eric Sosman
eso...@ieee-dot-org.invalid
On Unix-like systems, programs are invoked via the "exec" family of
functions (execl, execlp, execle, execv, execvp). Some or all of
these let the caller specify the argv values that the invoked program
will see. (When you run a program by typing a command to the shell,
the shell parses your command line and calls one of the exec*
functions.)
Ah. Clever. Just truncate the exponent off after saving its value
and pass it to fracdec(), then loop up to it, multiplying divisor by
12 each time. Do the same with exponents the other way, passing them
to wholdec() and working with them there. Let's see; to determine how
many extra passes through the loop are required:
*******
passes = semi + exp;
if (passes > strlen(s))
passes -= strlen(s);
*******
So if my number is "X;444e-4", I get passes == -3. If it's "X;444e6",
after exponent truncation, I get passes == 3. I then put the
semicolon first in the former case and strip it out in the second.
Then I test if the semicolon is first; if so, I give passes to
fracdec; if not, I give it to wholedec. In both, I run the operations
the appropriate number of extra times, then return the value as
before. But I only need to do this if it pushes the semicolon either
left or right beyond string boundaries; otherwise, I just move the
semicolon, which is pretty easy with strchr and memmove.
Is that what you're driving at?
> On Jan 8, 10:53 am, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> That's a shame. Errors are rare. I wonder if it is corrected in the
>> errata. I can't see to get the official list at the moment.
>
> It's not really an error, is it? After all, he says "[b]y
> convention," not "[b]y standard." By convention that's certainly
> true, isn't it?
I don't have the book, so I can't be sure of the context. Maybe it's
fine, but you certainly seem to have taken it the wrong way. Is it
clear form the context that argc can be 0 with argv[0] == 0 as the
only element in argv? If so, it's fine.
<snip>
--
Ben.
Well, that's wrong. But I've got something I think is right. I'll
get it home tonight and compile it, then run it through my eternally
increasing battery of tests, and once I've got it working I'll post it
here. Then you all can proceed to tell me all the ways that it'll
overrun its buffers, increase global warming, and murder innocent
kittens so I can get to fixing it.
This thread has improved my code and my approach to programming in
general more than the last month of working on my own. Thank you all
immensely.
Without looking at the context, I can't comment on whether it's wrong,
but it is inefficient, since you call strlen twice on the same string.
Save the result in a variable (a size_t, not an int):
const size_t len = strlen(s);
if (passes > len) {
passes -= len;
}
strlen() can be a bit deceptive; it looks simple, but it can be
expensive. An even worse example is:
for (i = 0; i < strlen(s); i ++) {
/* ... */
}
which turns what should be an O(N) loop into an O(N**2) loop.
This cannot work, except by amazing coincidence. Here's why:
- sizeof(s) reports the number of bytes occupied by s. So,
what is s? If it's an array, you get the number of array
elements times the sizeof each element (1, in this case).
But if s is a pointer, a char*, you get the number of bytes
in the pointer, not the number of bytes in the memory it
points to.
- If s is an array, you did not obtain it from malloc() and
company. Therefore, you cannot resize it with realloc().
(The compiler might allow you to try, in much the same way
that a divided highway might allow you to try driving the
wrong way. In both cases, a crash is likely.)
There is also evidence here of a fundamental misunderstanding
of the sizeof operator and of the way realloc() works. To put it
more bluntly, you don't know what you're doing, and you need to
revisit your textbook. Making semi-random semi-plausible changes
until something sort of seems to kind of work in a few cases is
not the way to program.
> How's it look? Be gentle; it's my very first malloc/realloc/calloc/
> etc call.
Okay, I'll be gentle: I'll let you put bubble wrap on your
head before I apply the clue-by-four. Ready? ...
(Back to the books, laddie. Usenet is a fine vehicle for
some things, but it's not a good channel for inculcating wholesale
lots of basic information.)
--
Eric Sosman
eso...@ieee-dot-org.invalid
> (Back to the books, laddie. Usenet is a fine vehicle for
> some things, but it's not a good channel for inculcating wholesale
> lots of basic information.)
Okay; I thought I got the memory allocation functions. Obviously I
don't. I'll be back. (Of course, it doesn't look like I'll need to
worry about it anyway, if I can get your other idea working.)
The only textbook I have to go with is K&R, which says precious little
about sizeof, malloc, and friends. Thus, I tried to go from what it
says in its description of the standard libraries, which clearly
wasn't enough for me. Any recommendations of where to go for more?
> On Jan 8, 11:22 am, Don <dgoodman...@gmail.com> wrote:
>> I'm working on rewriting expkill() using strchr() and memmove() and
>> having a hard time doing it. Basically, I think I can get it working
>> with any input except when that input would require padding the left
>> with zeroes. But I'll get there.
>
> While I'm at it, here's what I've worked out for ensuring that my
> string has sufficient memory to hold the expanded number (if the
> number has to expand, that is). As I see it, the problem is that argv
> gives me a pointer to a string bounded by exactly enough memory to
> hold the characters I see, plus '\0'. So, for example, if I run "./
> dec "X44;4e5"", I've got 8 bytes of memory reserved. The expanded
> string, though, should be "X4440000", requiring 9 bytes of memory
> (including the '\0'). So I need to ensure that this allocated memory
> for my string gets expanded by the appropriate amount.
You can't expand a memory region you've not allocated. You can't
increase the space provided for command-line arguments. You'll need
to make some new storage a copy the final value there (but see later).
> Because I've been advised repeatedly not to be so stingy with
> allocating memory, I'm going to go ahead and give myself more than I
> need (probably), while ensuring that it's *at least* what I need
> (definitely). So I'll allocate the length of the string, plus a
> number of bytes equal to the exponent given.
That's a reasonable strategy.
> Prior to this, I've
> already truncated the 'e' and the exponent. So, for "X44;4e5", then,
> I'll wind up with 11 bytes reserved. As I understand it, sizeof()
> returns the number of bytes allocated as an integer.
No. sizeof returns the number of bytes used to represent it's operand
(it's not a function)...
> So sizeof(s),
> where s="X44;4e5", will return 8;
If s is char *, it will tell you the size of the character pointers on
your machine. Use strlen(s) + 1 to find out how many bytes there are
in the string s points to.
> however, where I've received the
> string from stdin, I've allocated much more than a double can possibly
> hold anyway, so I want to ensure that I'm not needlessly adding memory
> when that's the case. So I want to make sure that I'm only increasing
> the available memory if there's a chance the memory might be
> insufficient. Here's what I've got:
> *******
> if (sizeof(s) < sizeof(strlen(s) + exp))
> if (realloc(s, sizeof(strlen(s) + exp)) == NULL) {
> fprintf(stderr,"dec: insufficient memory; converting %s\n",s);
> return 0;
> }
> *******
You can't realloc here because you did not malloc s (at least you
don't show us that code if you do). The sizeof stuff is wrong and the
realloc call is wrong also. Rather than type an essay about this, I'd
rather suggest you read K&R some more. You need to look at sizeof,
strlen and realloc.
> I believe that this code will execute only if s has less memory
> allocated than is sufficient to hold the current strlen of s, plus a
> number of bytes equal to the value of my exponent, exp. If that's the
> case, it reallocates memory to s such that s now has enough memory to
> hold itself, plus exp more bytes. The characters allocated beyond the
> '\0' that terminates s are garbage; however, those from 0 to strlen(s)
> remain unaltered. If there isn't enough memory (which seems
> exceedingly unlikely, but why not?), I print an error message and
> return. This does return a meaningful value to the calling function
> because the 'e' and the exponent have already been truncated when this
> is called. They won't get the answer they're expecting, though, hence
> the error message.
>
> How's it look? Be gentle; it's my very first malloc/realloc/calloc/
> etc call.
I don't know why you are allocating anything. Having found the
location of the ; and the value of the exponent I think you can just
print the result without storing it anywhere. I may not have all the
details of what you are trying to do yet, but I'd think about it.
--
Ben.
Have you read the C FAQ: http://c-faq.com/ ? It's not a learning
tool, per se, but it might be useful but it is a very useful adjunct
to any text you might have.
--
Ben.
My K&R is the original, now valuable more for sentimental
than for practical reasons. I've not seen the second edition,
and can't comment on it.
A good source for the avid learner is the comp.lang.c
Frequently Asked Questions (FAQ) site, <http://www.c-faq.com/>.
Even if some of its topics seem irrelevant to you now, I'd
recommend that you at least skim them so that someday in mid-
quandary you'll think "Wasn't this in the FAQ somewhere?"
Also, it includes a section on resources (showing some age,
but it's a starting point).
--
Eric Sosman
eso...@ieee-dot-org.invalid
It's supposed to work on all POSIX systems.
Take a close look at the execv() function (not a Standard C function,
but the relevance to Standard C will become obvious), which is one
of a set of functions used to invoke programs on UNIX/POSIX systems.
First, note that the program to execute and argv[] are separate
arguments. In spite of any statements about argv[0] being, by
convention, the name of the program being executed, any caller of
execv() can trivially (and sometimes accidentally) violate that
convention. In particular, any malicious caller can violate that
convention.
There's also a convention that argv[0][0] (the first character of
argv[0] is '-' for a login shell. Not too many filenames and *no*
absolute pathnames begin with '-' on a POSIX system. Normally,
program designated as a login shell (in, say, the password file)
expects to be used as a shell, although sometimes programs like
/bin/true or shutdown are used here.
The second argument of execv is the arguments to be passed to the
program. This code invokes /bin/ls with argv[0] == NULL :
#include <stdlib.h>
#include <unistd.h>
#define BIG_ENUF_I_HOPE 100
char *arg[BIG_ENUF_I_HOPE];
arg[0] = NULL;
execv("/bin/ls", arg);
It's also perfectly possible to invoke a program like this:
arg[0] = "/bin/ls";
execv("/home/me/bin/trojan", arg);
so that the program executed and argv[0] are totally unrelated.
Yes, I tested this approach executing a little program called zero
which prints out argc and all the strings for argv[] up to but not
including one where argv[i] is NULL. FreeBSD 7.1, i386 architecture.
It works.
Thanks for the link, both of you; I've been poring over it all
weekend. My central problem seems to have been overgeneralization. I
have, indeed, overgeneralized a lot of what I've read in K&R,
particularly the relationship between arrays and pointers. For
example, because I declared an array of char, then passed a pointer to
that array to a function, I tried to apply sizeof to it in that
function as if it were an array, which of course it isn't. And, as a
corollary, I completely misunderstood sizeof and the alloc crew, as
you mentioned. This link has been a big help. I've written a small
program which uses malloc() and free(), apparently successfully (at
least, valgrind accepts it as leakless), and I've written in bounds-
checking so that it works rationally no matter how ridiculous the
input.
The FAQ also references some places where I can read some code, which
has already cleared up a lot for me and will continue to do so. I
think I need to spend a good deal of time reading code, as well as
writing it, to make the idioms of C more second-nature to me, to avoid
at least the stupider gotchas. I've already caught myself writing to
memory prior to the start of an array, although it only happened with
some particularly absurd input, and fixed it. The wisdom of writing
in code to deal with edge cases and bad input at the beginning, as
someone here already pointed out to me, is becoming very clear.
After I've spent some time coming to grok this information, I'll
rewrite this converter and marvel at how awful my previous versions
were. Thanks again.
It motivates the rather odd truth that conflict can produce two winners.
I used to think that it was "competition" that did this. Now that
I've witnessed a couple of decades of "competition" here in the states,
I've come to regard it with disdain.
--
fred