[perl #27690] Numeric comparison bug

Simon Glover

unread,

Mar 16, 2004, 4:02:11 PM3/16/04

to bugs-bi...@rt.perl.org

# New Ticket Created by Simon Glover
# Please include the string: [perl #27690]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=27690 >

This code:

new P0, .PerlNum
set P0, -1.2
new P1, .PerlString
set P1, "-1.2"
eq_num P0, P1, OK
print "not "
OK: print "ok\n"
end

prints "not ok", when it should print "ok". A similar problem exists with
the ne_num op, suggesting the underlying bug is somewhere in the
implementation of the cmp_num vtable function.

[And yes, I'm well aware of the problems inherent in doing floating point
comparisons. However, in this case, the floating-point respresentation
of the string '1.2' should be exactly the same as the representation of
the number 1.2, since the former is converted into the latter
internally]

Simon

Leopold Toetsch

unread,

Mar 17, 2004, 4:08:26 AM3/17/04

to perl6-i...@perl.org

Simon Glover <bugs-...@netlabs.develooper.com> wrote:

> This code:

> new P0, .PerlNum
> set P0, -1.2
> new P1, .PerlString
> set P1, "-1.2"
> eq_num P0, P1, OK
> print "not "
> OK: print "ok\n"
> end

> [And yes, I'm well aware of the problems inherent in doing floating point
> comparisons.

Breakpoint 1, Parrot_PerlNum_cmp_num (interpreter=0x82654f0, pmc=0x40305850,
value=0x40305838) at perlnum.c:301
301 diff = PMC_num_val(pmc) - VTABLE_get_number(interpreter, value);
(gdb) n
302 return diff > 0 ? 1 : diff < 0 ? -1 : 0;
(gdb) p diff
$1 = 2.2204460492503131e-16
(gdb)

> Simon

leo

Simon Glover

unread,

Mar 17, 2004, 11:22:01 AM3/17/04

to Leopold Toetsch, perl6-i...@perl.org

OK, that suggests that there's a bug somewhere in our string->number
conversion. Either that, or we're going to have to live with the fact
that 1.2 and '1.2' are not the same number (which would suck).

Simon

Larry Wall

unread,

Mar 17, 2004, 2:20:52 PM3/17/04

to perl6-i...@perl.org

On Wed, Mar 17, 2004 at 11:22:01AM -0500, Simon Glover wrote:
: OK, that suggests that there's a bug somewhere in our string->number

: conversion. Either that, or we're going to have to live with the fact
: that 1.2 and '1.2' are not the same number (which would suck).

The basic bug has to be that it's using two different routines to
do the conversion (or possibly there's one routine that's paying
attention to some context it shouldn't be paying attention to--but
that seems less likely).

I hope this doesn't point to a more general problem. Using different
routines to do the same thing for compilation and execution was a bad
mistake I made in early Perls, and had to work hard to fix in Perl 5.
For instance, all the constant folding in Perl 5 happens by actually
running the code in question, not by trying to guess what the run-time
system will do, as Perl 4 did.

I recommend not remaking my mistakes. Please make different mistakes. :-)

Larry

S. Livingston

unread,

Mar 17, 2004, 3:20:43 PM3/17/04

to perl6-i...@perl.org

Its the old problem of rounding errors in floating point arithmetic.

In string_to_num() in src/string.c,
f = f * sign * pow(10.0, exponent); /* ugly, oh yeah */

Which in this case translates to 12.0*-1*0.1 which is -1.2000...2 != -1.2.

I replaced this bit of code from a block I found in mysql. I only tested
this
on linux, but it seems to do the trick. See attached.

string.c.patch

S. Livingston

unread,

Mar 17, 2004, 4:13:40 PM3/17/04

to perl6-i...@perl.org

Oops, use this one instead... "re"-fixes the exponent support...

string.c.patch

Larry Wall

unread,

Mar 17, 2004, 5:18:51 PM3/17/04

to perl6-i...@perl.org

On Wed, Mar 17, 2004 at 04:13:40PM -0500, S. Livingston wrote:
: Oops, use this one instead... "re"-fixes the exponent support...

This still doesn't explain why the compiler would be using a different
routine to turn the string 1.2 into a number than the run time does.
This is not code that should be duplicated. Or am I misunderstanding
the problem?

Larry

Skip Livingston

unread,

Mar 17, 2004, 9:59:21 PM3/17/04

to perl6-i...@perl.org

IMCC uses atof() because it doesn't use (need?) any of the encoding
stuff. (See add_const_num() in imcc/pbc.c). Packing the cstring up
in a STRING then calling string_to_num() *fixes* the problem, but it
doesn't address the issue Larry brought up. The interpreter makes
heavy use of the encoding sensitive STRING type, the compiler does
not.

I'm not very familiar with the garbage collector, would the call to
string_make() be a memory leak? Or will it be collected?

diff -r1.68 pbc.c
593c593,594
< int k;
---
> int k, l;
> STRING *str;
594a596
> str = const_string(interpreter, buf);
596d597
<
600c601,602
< (FLOATVAL)atof(buf);
---
> string_to_num(str);

Leopold Toetsch

unread,

Mar 17, 2004, 4:28:42 PM3/17/04

to Larry Wall, perl6-i...@perl.org

Larry Wall <la...@wall.org> wrote:

> The basic bug has to be that it's using two different routines to
> do the conversion (or possibly there's one routine that's paying
> attention to some context it shouldn't be paying attention to--but
> that seems less likely).

Well, that's a very good point. The compiler just does atof() while
string_to_num() is a long and possibly buggy conversion routine that
tries to deal with unicode codepoints.

> I recommend not remaking my mistakes. Please make different mistakes. :-)

We'll try hard ...

> Larry

leo

Leopold Toetsch

unread,

Mar 18, 2004, 3:52:01 AM3/18/04

to Skip Livingston, perl6-i...@perl.org

Skip Livingston <ski...@yahoo.com> wrote:
> IMCC uses atof() because it doesn't use (need?) any of the encoding
> stuff.

Well imcc used to be a standalone language compiler that converted PIR
to PASM, which got then assembled to bytecode by assembler.pl. Now
imcc is fully integrated, parrot is the evolution of the imcc
executable and parrot has two assemblers (or compiler) inside.

*But* there is still a lot of code inside imcc, which needs
adaption. Imcc didn't use any of Parrot's internals except the necessary
things to generate the bytecode.

> ...(See add_const_num() in imcc/pbc.c). Packing the cstring up

> in a STRING then calling string_to_num() *fixes* the problem,

Done that now too, similar to your patch.

> ... but it

> doesn't address the issue Larry brought up. The interpreter makes
> heavy use of the encoding sensitive STRING type, the compiler does
> not.

Imcc or the lexer has no syntax yet to even read an Unicode string. It's
just reading bytes and converts everything to C-strings. That needs of
course fixing.

> I'm not very familiar with the garbage collector, would the call to
> string_make() be a memory leak? Or will it be collected?

I used string_from_cstring(), the memory gets collected automatically.

leo

Leopold Toetsch

unread,

Mar 18, 2004, 3:37:33 AM3/18/04

to S. Livingston, perl6-i...@perl.org

S. Livingston <ski...@yahoo.com> wrote:

> Oops, use this one instead... "re"-fixes the exponent support...

Thanks, applied.
leo

Brent 'Dax' Royal-Gordon

unread,

Mar 18, 2004, 4:07:15 AM3/18/04

to l...@toetsch.at, S. Livingston, perl6-i...@perl.org

How close is string_to_num() to being adequate for Parrot_sprintf()'s
purposes? (It currently falls back to the platform sprintf for floats,
because I didn't have, and still don't have, any idea how to properly
format a float.)

--
Brent "Dax" Royal-Gordon <br...@brentdax.com>
Perl and Parrot hacker

Oceania has always been at war with Eastasia.

Leopold Toetsch

unread,

Mar 18, 2004, 5:51:39 AM3/18/04

to Brent 'Dax' Royal-Gordon, perl6-i...@perl.org

Brent 'Dax' Royal-Gordon <br...@brentdax.com> wrote:
> How close is string_to_num() to being adequate for Parrot_sprintf()'s
> purposes? (It currently falls back to the platform sprintf for floats,
> because I didn't have, and still don't have, any idea how to properly
> format a float.)

Well, string_to_num() is the opposite of it. And string.c has
string_from_num() which is using Parrot_sprintf() ;)

BTW sscanf() is missing too.

leo

Uri Guttman

unread,

Mar 18, 2004, 9:40:31 AM3/18/04

to l...@toetsch.at, Brent 'Dax' Royal-Gordon, perl6-i...@perl.org

>>>>> "LT" == Leopold Toetsch <l...@toetsch.at> writes:

LT> BTW sscanf() is missing too.

perl5 mever had a scanf (or variation) and for good reason IIRC. it was
never needed and it has a very nasty api. i hated using it in c because
you could never control how much to read and parse and also find out
where in the input stream it fails. <> and a regex were always better in
p5. so i don't think we need to do it in parrot. in fact with
rules/grammars being able to read from a stream, you could code one in
that and have it work properly. do any other langs we want to support
(other than c) have a scanf flavor in them?

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Brent 'Dax' Royal-Gordon

unread,

Mar 18, 2004, 1:47:10 PM3/18/04

to l...@toetsch.at, perl6-i...@perl.org

Leopold Toetsch wrote:
> BTW sscanf() is missing too.

I have never used scanf. I have no idea what it does or how.
Therefore, I'm hardly qualified to write an implementation of it.

Mark A Biggar

unread,

Mar 17, 2004, 4:08:32 PM3/17/04

to S. Livingston, perl6-i...@perl.org

The real problem is that you always want to use exactly the same code for
ALL cases of string-to-float conversion. The first public example of this
problem was the FORTRAN II compiler from IBM in the 60's. The compiler and
the IO library was written by two different people and so constants in
program code didn't match those read in from IO, OOPS! you'd think
people would remember and learn from the past. By exactly the same,
I mean exactly the same machine code (hardware floating point status and rounding mode bits included) not just the same HL source code. I.e., You
need exactly one routine, compiled only once and used EVERYWHERE. It also pays
to have a single routine for the other direction that has the property:
S = ftos(atof(S)) and F = atof(ftoa(F)). Otherwise you get obscure
very hard to find bugs.

--
Mark Biggar
mark.a...@comcast.net

> *** tmp/parrot/src/string.c Sat Mar 6 03:00:06 2004
> --- parrot/src/string.c Wed Mar 17 12:24:02 2004
> ***************
> *** 1836,1844 ****
> int exp_sign = 0;
> INTVAL in_exp = 0;
> INTVAL in_number = 0;
> ! FLOATVAL exponent = 0;
> INTVAL fake_exponent = 0;
> INTVAL digit_family = 0;
>
> while (start < end) {
> UINTVAL c = s->encoding->decode(start);
> --- 1836,1845 ----
> int exp_sign = 0;
> INTVAL in_exp = 0;
> INTVAL in_number = 0;
> ! INTVAL exponent = 0;
> INTVAL fake_exponent = 0;
> INTVAL digit_family = 0;
> + FLOATVAL exp_log=10.0, exp_val=1.0;
>
> while (start < end) {
> UINTVAL c = s->encoding->decode(start);
> ***************
> *** 1849,1855 ****
>
> if (df && df == digit_family) {
> if (in_exp) {
> ! exponent = exponent * 10 + s->type->get_digit(s->type,c);

> if (!exp_sign) {
> exp_sign = 1;
> }
> --- 1850,1856 ----
>
> if (df && df == digit_family) {
> if (in_exp) {
> ! exponent = exponent + s->type->get_digit(s->type,c);
> if (!exp_sign) {
> exp_sign = 1;
> }
> ***************
> *** 1906,1912 ****
>
> exponent = fake_exponent + exponent * exp_sign;
>
> ! f = f * sign * pow(10.0, exponent); /* ugly, oh yeah */
> }
>
> return f;
> --- 1907,1936 ----
>
> exponent = fake_exponent + exponent * exp_sign;
>
> ! if(exponent < 0) {
> ! exponent = -exponent;
> ! exp_sign=-1;
> ! }
> !
> ! for (;;) {
> ! if (exponent & 1) {
> ! exp_val *= exp_log;
> ! exponent--;
> ! }
> ! if (!exponent)
> ! break;

> ! exp_log *= exp_log;
> ! exponent >>= 1;
> ! }
> !
> ! if(exp_sign < 0)
> ! f /= exp_val;
> ! else
> ! f *= exp_val;
> !
> !
> ! if(sign < 0)
> ! f = -f;
> }
>
> return f;

Leopold Toetsch

unread,

Mar 19, 2004, 5:53:06 AM3/19/04

to Mark A Biggar, perl6-i...@perl.org

Mark A Biggar <mark.a...@comcast.net> wrote:
> The real problem is that you always want to use exactly the same code for
> ALL cases of string-to-float conversion.

Yep, was outlined by Larry too. That's already changed.

[ Please don't quote the whole thread, thanks ]

leo