Looking for a Better atof()

Jonathan Wood

unread,

Mar 8, 2004, 4:57:29 PM3/8/04

to

Does anyone know of any code for a better atof() routine?

Specifically, I'm looking for code that will report bad characters or
out-of-range strings. I'm looking for the same thing for int, _int64, float,
and single as well. I can code my own for integers but just don't know where
to start for floating point. (It appears routines like atof() simply return
0.0 if the string is invalid, but 0.0 could also be a valid result.)

Any tips appreciated!

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

Doug Harrison [MVP]

unread,

Mar 8, 2004, 6:10:46 PM3/8/04

to

Jonathan Wood wrote:

>Does anyone know of any code for a better atof() routine?
>
>Specifically, I'm looking for code that will report bad characters or
>out-of-range strings. I'm looking for the same thing for int, _int64, float,
>and single as well. I can code my own for integers but just don't know where
>to start for floating point. (It appears routines like atof() simply return
>0.0 if the string is invalid, but 0.0 could also be a valid result.)
>
>Any tips appreciated!

strtod. See this message for more:

http://groups.google.com/groups?selm=04lmevc8n9dj4gqhtsfrfekq6a3qp5ae28%404ax.com

For integer types, strtol or strtoul. See this message for more:

http://groups.google.com/groups?selm=eij430t48s0tpvmh48oap5hadp1djuuheh%404ax.com

--
Doug Harrison
Microsoft MVP - Visual C++

David Lowndes

unread,

Mar 8, 2004, 6:32:18 PM3/8/04

to

>Does anyone know of any code for a better atof() routine?

strtod perhaps?

Dave
--
MVP VC++ FAQ: http://www.mvps.org/vcfaq

Joseph M. Newcomer

unread,

Mar 8, 2004, 9:41:20 PM3/8/04

to

For a "better floating point", there is a simple fix: code your own that simply validates
the syntax of the floating point number. When you know it is correct, you call the
standard C routine.

There is a floating-point parser that does exactly this checking shown in my Validating
Edit Control on my MVP Tips site.
joe

On Mon, 8 Mar 2004 14:57:29 -0700, "Jonathan Wood" <jw...@softcircuits.com> wrote:

>Does anyone know of any code for a better atof() routine?
>
>Specifically, I'm looking for code that will report bad characters or
>out-of-range strings. I'm looking for the same thing for int, _int64, float,
>and single as well. I can code my own for integers but just don't know where
>to start for floating point. (It appears routines like atof() simply return
>0.0 if the string is invalid, but 0.0 could also be a valid result.)
>
>Any tips appreciated!

Joseph M. Newcomer [MVP]
email: newc...@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Jonathan Wood

unread,

Mar 9, 2004, 2:04:10 AM3/9/04

to

I have some concerns about performance and so my preference would be to
avoid parsing the string twice. Also, since I'm testing for overflow, it
really seems like I need to do so as the value is being converted.

Thanks.

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

"Joseph M. Newcomer" <newc...@flounder.com> wrote in message
news:ckbq409a89nq86peh...@4ax.com...

Jonathan Wood

unread,

Mar 9, 2004, 2:04:24 AM3/9/04

to

I think that will work.

Thanks.

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

"David Lowndes" <dav...@example.invalid> wrote in message
news:hi0q4052hf4iu698a...@4ax.com...

Jonathan Wood

unread,

Mar 9, 2004, 2:10:52 AM3/9/04

to

It looks like I can make that work.

However, it sure seem quirky. First of all, it seems like it would be so
simple to just pass a pointer to an error value that the function could set.
Instead, I need to mess with errno, and an end pointer, et. al. to figure
out if it was valid. (Some return values do signify errors but they could
also be valid return values so I couldn't even begin to grasp that
nonsense.) And strtoul() seems quite happy to accept negative numbers.
Hopefully, I can get this to work.

One other thing, does anyone know how I would test for overflow in a float.
strtod() sets a double but I also want to support float.

Hmmm... I guess I could do something like this:

double d;
float f;

d = strtod(...)

f = d;
if (f != d)
// f overflowed

Although I'm not certain that's the best approach.

Thanks for the tip!

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

"Doug Harrison [MVP]" <d...@mvps.org> wrote in message
news:85vp40toie6ij98io...@4ax.com...

David Lowndes

unread,

Mar 9, 2004, 4:06:35 AM3/9/04

to

>One other thing, does anyone know how I would test for overflow in a float.
>strtod() sets a double but I also want to support float.

Test the double against +/- FLT_MAX and FLT_MIN?

Joseph M. Newcomer

unread,

Mar 9, 2004, 10:31:41 AM3/9/04

to

Until you measure performance you have no data to substantiate your concern. It probably
takes an order of magnitude more time to READ the data than to parse it, so unless you can
quantify the costs (measure once with a single parse, and then with two parses, and
measure the overall program behavior, not just the cost of a double-parsing a single
number) then the concern is unfounded.

By the way, you have the sources to the C runtimes...
joe

On Tue, 9 Mar 2004 00:04:10 -0700, "Jonathan Wood" <jw...@softcircuits.com> wrote:

>I have some concerns about performance and so my preference would be to
>avoid parsing the string twice. Also, since I'm testing for overflow, it
>really seems like I need to do so as the value is being converted.
>
>Thanks.

Joseph M. Newcomer [MVP]

Doug Harrison [MVP]

unread,

Mar 9, 2004, 11:40:21 AM3/9/04

to

Jonathan Wood wrote:

>It looks like I can make that work.
>
>However, it sure seem quirky. First of all, it seems like it would be so
>simple to just pass a pointer to an error value that the function could set.
>Instead, I need to mess with errno, and an end pointer, et. al. to figure
>out if it was valid. (Some return values do signify errors but they could
>also be valid return values so I couldn't even begin to grasp that
>nonsense.)

It is quirky, but its error-detection capability is not lacking. There's no
problem distinguishing success from errors, and the end pointer provides
useful information for many applications, e.g. parsing a line of CSV.

>And strtoul() seems quite happy to accept negative numbers.

Yeah, well, C++ defines a negative integer to unsigned conversion, and
strtoul supports this. The conversion is such that 1+unsigned(-1) == 0,
2+unsigned(-2) == 0, and so on, which makes sense in a confusing sort of
way. :) If you want to use strtoul and reject negative numbers, you'll have
to eat leading whitespace and verify the first non-whitespace character
isn't the minus sign, which is easy.

>Hopefully, I can get this to work.

Follow the outlines I gave in the two messages I linked to. If you expect
the whole thing to be a number, then you can consider this to be an error:

errno = 0;
x = strtowhatever
if (errno || ep == p || *ep)
an error occurred;

Of course, you still may have to check the range if you want to store in a
less capacious type.

>One other thing, does anyone know how I would test for overflow in a float.
>strtod() sets a double but I also want to support float.

The float to double conversion is exact, so you can compare fabs(d) to
FLT_MAX. It's the same principle as shown in my strtol example.

>Hmmm... I guess I could do something like this:
>
>double d;
>float f;
>
>d = strtod(...)
>
>f = d;
>if (f != d)
> // f overflowed
>
>Although I'm not certain that's the best approach.

In the case of overflow in "f = d", it induces undefined behavior, so I
wouldn't recommend it. It would also cause many false positives as the
double to float conversion is not exact for most values of double.

>Thanks for the tip!

Sure.

Jonathan Wood

unread,

Mar 9, 2004, 11:59:22 AM3/9/04

to

I'm confused about this. Aren't FLT_MAX and FLT_MIN valid float values?

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

"David Lowndes" <dav...@example.invalid> wrote in message
news:u52r40lqemnch7sgf...@4ax.com...

David Lowndes

unread,

Mar 9, 2004, 12:04:38 PM3/9/04

to

>I'm confused about this. Aren't FLT_MAX and FLT_MIN valid float values?

Yes, so comparing the double value as outside that range would qualify
as an invalid float value.

Jonathan Wood

unread,

Mar 9, 2004, 12:08:58 PM3/9/04

to

Yeah, I must admit that I don't have a ton of interest in timing this. As a
general rule, I pretty much assume that doing one thing is faster than doing
that one thing plus something else, regardless of how fast that something
else is. But your recommendation also opens the possibility that the parsing
could accept something that the "reading" does not (or the reverse). (And I
still don't understand how the parser could check for overflow.)

I guess you just start to have preferrences about how things are done after
so many years of programming. I'm not saying it couldn't be a good
technique. It just doesn't feel right to me for the reasons given above.

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

"Joseph M. Newcomer" <newc...@flounder.com> wrote in message

news:olor40llrc1g21822...@4ax.com...

Jonathan Wood

unread,

Mar 9, 2004, 12:15:24 PM3/9/04

to

Doug,

> It is quirky, but its error-detection capability is not lacking. There's
no
> problem distinguishing success from errors, and the end pointer provides
> useful information for many applications, e.g. parsing a line of CSV.

Yes, the code I'm working with checks all these, including the end pointer.

However, I'm still dissapointed that it happily accepts a negative sign. The
only way I can think to test for this is to scan the original string for a
minus sign. I don't even see the purpose of having both a signed and
unsigned version of the routine as they appear to use the same code.

> Follow the outlines I gave in the two messages I linked to. If you expect
> the whole thing to be a number, then you can consider this to be an error:
>
> errno = 0;
> x = strtowhatever
> if (errno || ep == p || *ep)
> an error occurred;

This is essentially what I'm doing. However, I decided to scan ep after the
call and, if all I found were spaces, then I'd consider it good. In fact,
here's one of the routines so far.

bool CMultiType::StrToU4(CString &s, LPVOID p)
{
unsigned long ul;
LPTSTR pEnd;

// Clear error
errno = 0;
// Attempt to convert string
ul = _tcstoul(s, &pEnd, 0);
// Test for error
if (errno != 0)
return false;
// Test for invalid characters
for ( ; *pEnd != '\0'; pEnd++)
if (!isspace(*pEnd))
return false;
// Set value
memcpy(p, &ul, sizeof(U4(p)));
// Indicate success
return true;
}

> Of course, you still may have to check the range if you want to store in a
> less capacious type.

I'm doing that for BYTE data types but they show overflow for negative
numbers while the code above does not. That's the main thing I need to deal
with--and I may just end up scanning for a minus sign.

> >One other thing, does anyone know how I would test for overflow in a
float.
> >strtod() sets a double but I also want to support float.
>
> The float to double conversion is exact, so you can compare fabs(d) to
> FLT_MAX. It's the same principle as shown in my strtol example.

I was confused about this, thinking FLT_MAX was a valid floating point
value. If it is not, then this approach should work great. If it is, then
...

> In the case of overflow in "f = d", it induces undefined behavior, so I
> wouldn't recommend it. It would also cause many false positives as the
> double to float conversion is not exact for most values of double.

Gotcha. Thanks again.

Jonathan Wood

unread,

Mar 9, 2004, 12:43:31 PM3/9/04

to

Doh! I'm with you now. Thanks!

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

"David Lowndes" <dav...@example.invalid> wrote in message

news:j6ur40dq002eko5qm...@4ax.com...

Jonathan Wood

unread,

Mar 9, 2004, 2:05:43 PM3/9/04

to

BTW, I guess I'm still a little confused on this.

FLT_MIN is documented as being the smallest *positive* number. So how to
check for a valid range.

Doug mentioned comparing fabs(d) to FLT_MAX. Is testing for a valid float
range as simple as:

if (fabs(d) > FLT_MAX)

Or do I need to test against FLT_MIN as well?

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

"David Lowndes" <dav...@example.invalid> wrote in message

news:j6ur40dq002eko5qm...@4ax.com...

Joseph M. Newcomer

unread,

Mar 9, 2004, 5:32:07 PM3/9/04

to

Doing floating-point conversion is a nontrivial task. Back in 1967 I was given a piece of
assembly code that did the sort of conversions that keep numerical analysts happy. To say
that it was intimidating is to understate the situation. I had no idea floating point
conversion was so difficult to get right.

I thought the strtdod code was in the CRT source, but it isn't, really. It calls a
function _fltin which does not appear to have any source, that does the real conversion.

However, if you are concerned about efficiency, I suggest single-stepping through this
code. I ran out of time after reading the first couple characters, but I'm sure that my
little syntax checker is about an order of magnitude faster (I should point out that the
fltin function also handles NLS issues such as the proper punctuation marks, and I don't,
but I don't think that's going to change the performance very much). A simple syntax check
(without range checking and conversion) just isn't all that hard. The hard part is getting
the conversion right.
joe

On Tue, 9 Mar 2004 10:08:58 -0700, "Jonathan Wood" <jw...@softcircuits.com> wrote:

>Yeah, I must admit that I don't have a ton of interest in timing this. As a
>general rule, I pretty much assume that doing one thing is faster than doing
>that one thing plus something else, regardless of how fast that something
>else is. But your recommendation also opens the possibility that the parsing
>could accept something that the "reading" does not (or the reverse). (And I
>still don't understand how the parser could check for overflow.)
>
>I guess you just start to have preferrences about how things are done after
>so many years of programming. I'm not saying it couldn't be a good
>technique. It just doesn't feel right to me for the reasons given above.

Joseph M. Newcomer [MVP]

Jonathan Wood

unread,

Mar 9, 2004, 7:52:50 PM3/9/04

to

Joseph,

> Doing floating-point conversion is a nontrivial task. Back in 1967 I was
given a piece of
> assembly code that did the sort of conversions that keep numerical
analysts happy. To say
> that it was intimidating is to understate the situation. I had no idea
floating point
> conversion was so difficult to get right.
>
> I thought the strtdod code was in the CRT source, but it isn't, really. It
calls a
> function _fltin which does not appear to have any source, that does the
real conversion.
>
> However, if you are concerned about efficiency, I suggest single-stepping
through this
> code. I ran out of time after reading the first couple characters, but I'm
sure that my
> little syntax checker is about an order of magnitude faster (I should
point out that the
> fltin function also handles NLS issues such as the proper punctuation
marks, and I don't,
> but I don't think that's going to change the performance very much). A
simple syntax check
> (without range checking and conversion) just isn't all that hard. The hard
part is getting
> the conversion right.

Your approach is only faster if the data is invalid. For my application, if
the data is invalid then performance won't really matter since the operation
will come to a halt. In most cases, the data will be valid and I'll want to
proceed with the conversions as quickly as possible.

Interesting about the source. I suspect that _fltin may be written in
assembly and works with the floating point library (which uses the math
coprocessor, if available). I haven't delved into that area and don't have
much interest in doing so now. :-)

Joseph M. Newcomer

unread,

Mar 9, 2004, 10:48:17 PM3/9/04

to

Yes, but the CRT also contains assembly code source. That's why I was surprised that the
real strtod code wasn't there.

There is no question that two parses are slower than one. It is only a question of whether
or not that matters. I was only observing that a simple syntax check is substantially
faster than an syntax-check-coupled-with-a-conversion, and the conversion is going to be
very complex.
joe

Joseph M. Newcomer [MVP]

David Lowndes

unread,

Mar 10, 2004, 4:56:43 AM3/10/04

to

>Or do I need to test against FLT_MIN as well?

Yes, FLT_MIN is 1.175494351e-38F, whereas DBL_MIN is
2.2250738585072014e-308

Jonathan Wood

unread,

Mar 10, 2004, 11:41:24 AM3/10/04

to

Thanks, but that doesn't help with the area I'm having trouble with.

Obviously then, this is not valid:

if (d > FLT_MAX || d < FLT_MIN)
// Out or range for float

So I start wondering if I need something like this:

if (d != 0 && (fabs(d) > FLT_MAX || fabs(d) < FLT_MIN)
// Out or range for float

Do you know how that would look?

Thanks for any suggestions.

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

"David Lowndes" <dav...@example.invalid> wrote in message

news:rbpt40lqq5g3hc6d4...@4ax.com...

David Lowndes

unread,

Mar 10, 2004, 12:31:31 PM3/10/04

to

>if (d != 0 && (fabs(d) > FLT_MAX || fabs(d) < FLT_MIN)
> // Out or range for float

I'm not sure why you're testing for non-zero, I'd probably do:

const double td = fabs(d);
if ( (td <= FLT_MAX) && (td >= FLT_MIN) )
{
float f = d;
}
else
{
// Out of range for a float value

Jonathan Wood

unread,

Mar 10, 2004, 12:44:31 PM3/10/04

to

David,

> I'm not sure why you're testing for non-zero, I'd probably do:

I was assumine zero would be a valid value. No?

Doug Harrison [MVP]

unread,

Mar 10, 2004, 12:49:16 PM3/10/04

to

Jonathan Wood wrote:

>Thanks, but that doesn't help with the area I'm having trouble with.
>
>Obviously then, this is not valid:
>
>if (d > FLT_MAX || d < FLT_MIN)
> // Out or range for float
>
>So I start wondering if I need something like this:
>
>if (d != 0 && (fabs(d) > FLT_MAX || fabs(d) < FLT_MIN)
> // Out or range for float
>
>Do you know how that would look?
>
>Thanks for any suggestions.

FLT_MIN is quite different than INT_MIN. Don't check against FLT_MIN unless
you honestly care about underflow. I mean, do you really care if 1E-100
becomes 0? If not, just use:

if (fabs(d) > FLT_MAX) ...

Doug Harrison [MVP]

unread,

Mar 10, 2004, 12:54:50 PM3/10/04

to

Jonathan Wood wrote:

> ul = _tcstoul(s, &pEnd, 0);

Judging by your third argument, I guess you really do want to recognize
octal and hex numbers in addition to base 10 numbers? I ask only because
IME, it's rare to want 010 to equal 8, unless you're writing something like
a C++ compiler.

Jonathan Wood

unread,

Mar 10, 2004, 1:15:12 PM3/10/04

to

Doug,

> FLT_MIN is quite different than INT_MIN. Don't check against FLT_MIN
unless
> you honestly care about underflow. I mean, do you really care if 1E-100
> becomes 0? If not, just use:
>
> if (fabs(d) > FLT_MAX) ...

In my case, I really do want to check against FLT_MIN. This code is for my
hex editor and the converted data will be written to the file. If the user
goes to the trouble of entering 1E-100, they should know that rounding would
occur. I thought I also had to compare to zero (which would be valid) but I
guess not.

> Judging by your third argument, I guess you really do want to recognize
> octal and hex numbers in addition to base 10 numbers? I ask only because
> IME, it's rare to want 010 to equal 8, unless you're writing something
like
> a C++ compiler.

Yeah, I'm not crazy about the octal indicator either. But identifying the
hex indicator is critical and it's pretty much all or nothing with strtol().

Thanks.

Doug Harrison [MVP]

unread,

Mar 10, 2004, 1:26:04 PM3/10/04

to

Jonathan Wood wrote:

>In my case, I really do want to check against FLT_MIN. This code is for my
>hex editor and the converted data will be written to the file. If the user
>goes to the trouble of entering 1E-100, they should know that rounding would
>occur. I thought I also had to compare to zero (which would be valid) but I
>guess not.

In that case, I guess you do, because 0 is a valid number < FLT_MIN.

David Lowndes

unread,

Mar 10, 2004, 2:24:27 PM3/10/04

to

>I was assumine zero would be a valid value. No?

True, but like Doug says, are you worried about underflow - on
reflection I'd agree with him and suggest that you just need to test
for FLT_MAX.

David Lowndes

unread,

Mar 10, 2004, 2:37:37 PM3/10/04

to

>True, but like Doug says, are you worried about underflow - on
>reflection I'd agree with him and suggest that you just need to test
>for FLT_MAX.

And now I've read the other messages in the thread, I'll change my
mind again ;)

Jonathan Wood

unread,

Mar 10, 2004, 2:38:45 PM3/10/04

to

Yup. Thanks.

--
Jonathan Wood
SoftCircuits
http://www.softcircuits.com
Available for consulting: http://www.softcircuits.com/jwood/resume.htm

"Doug Harrison [MVP]" <d...@mvps.org> wrote in message
news:ebnu40h4j1s9la1he...@4ax.com...