int("0x123") == int(0x123) ?

Autrijus Tang

unread,

Mar 15, 2005, 10:09:11 AM3/15/05

to perl6-c...@perl.org

Currently Pugs numifies hexadecimal and octal strings as if they
are literals; that means "0x123" and "0o456" all work as expected.
Is that an acceptable treatment? What about "Inf" and "NaN" in
numeric context?

Thanks,
/Autrijus/

Aaron Sherman

unread,

Mar 15, 2005, 10:34:04 AM3/15/05

to Autrijus Tang, perl6-c...@perl.org

This has long been a point of contention in Perl 5.

There are two camps when it comes to how to interpret strings as
numbers:

1. You must treat them the same for consistency.
2. You must not treat them the same because someone reading in
lines of a report like the following could be seriously shocked
by the behavior:
$ Book
5foundation
8rama
0xanth
The logic goes: why should "8rama" and "0xanth" be treated
differently when they are both numbers followed by letters?

There's no particularly right answer, IMHO, so why doesn't Perl 6 just
go all ways?

Have a Perl 5 atof mode, a hard-line mode that throws an exception if
handed anything other than a number that would be valid in Perl source
and a mode that silently makes assumptions, deletes trailing junk and
tries to act "perl6ish".

# Untried, likely wrong pseudo-code:

use atof::perl5;
assert { "0xanth" == 0 };

use atof::strict;
try { "0xanth" == 0; CATCH { assert { $@ && $@ ~ /atof/ } } }
assert { "0xa" == 0xa }

use atof::perl6; # Default?
assert { "0xanth" == 0xa }

Pardon me if I'm forgetting a document that already decided this. Of
course, partly this is better directed at p6l... I wasn't sure if I
should move it to that list or not.

--
Aaron Sherman <a...@ajs.com>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback

Luke Palmer

unread,

Mar 15, 2005, 10:35:54 AM3/15/05

to Autrijus Tang, perl6-c...@perl.org

If we follow Perl 5's lead, they all numify to zero (generalizing for
"Inf" and "NaN").

Of course, we could afford to get stricter about numeric prefix
numification, and if we did that, then we could make them work
correctly. Personally, I'd like to see C<+"345abc"> be an error, and
allow a function that extracts a numeric prefix.

Luke

> Thanks,
> /Autrijus/

Patrick R. Michaud

unread,

Mar 15, 2005, 10:55:51 AM3/15/05

to Luke Palmer, Autrijus Tang, perl6-c...@perl.org

On Tue, Mar 15, 2005 at 08:35:54AM -0700, Luke Palmer wrote:
> Autrijus Tang writes:
> > Currently Pugs numifies hexadecimal and octal strings as if they
> > are literals; that means "0x123" and "0o456" all work as expected.
> > Is that an acceptable treatment? What about "Inf" and "NaN" in
> > numeric context?
>
> If we follow Perl 5's lead, they all numify to zero (generalizing for
> "Inf" and "NaN").

In absence of anything to the contrary in the synopses/apocalypses,
string numification should work as it does in Perl 5. In other words,
all of the above are zero, and we only numify leading digits as
decimal numbers.

FWIW, I ran into a similar problem with "Inf" in some PHP programming
I did a year or so ago. The question is, how does the string
"Information" numify? Just as "8abcde" numifies to 8, PHP took the
approach that "Information" numifies to Inf. (Actually, it was worse
than this, as someone decided that "Inf" should not numify, so they
did a string comparison for "Inf" and had it return zero, thus "Inf"
returned zero while "Info" returned Inf. Sigh.)

Perl 5's approach, while somewhat controversial, does at least
have the advantage of being consistent and reducing the number of
surprises. And from a compiler writing perspective, we'll follow
the spec as it's written until/unless p6l changes them.

> Of course, we could afford to get stricter about numeric prefix
> numification, and if we did that, then we could make them work
> correctly. Personally, I'd like to see C<+"345abc"> be an error, and
> allow a function that extracts a numeric prefix.

Instead of an error, how about a warning?

Pm

Jonathan Scott Duff

unread,

Mar 15, 2005, 11:29:53 AM3/15/05

to Aaron Sherman, Autrijus Tang, perl6-c...@perl.org

On Tue, Mar 15, 2005 at 10:34:04AM -0500, Aaron Sherman wrote:
> On Tue, 2005-03-15 at 10:09, Autrijus Tang wrote:
> > Currently Pugs numifies hexadecimal and octal strings as if they
> > are literals; that means "0x123" and "0o456" all work as expected.
> > Is that an acceptable treatment? What about "Inf" and "NaN" in
> > numeric context?
>
> This has long been a point of contention in Perl 5.
>
> There are two camps when it comes to how to interpret strings as
> numbers:
>
> 1. You must treat them the same for consistency.
> 2. You must not treat them the same because someone reading in
> lines of a report like the following could be seriously shocked
> by the behavior:
> $ Book
> 5foundation
> 8rama
> 0xanth

They may also be shocked if their data happens to have a line like the
following:

3e2-home

That particular configuration is also rarer and therefore more likely
to surprise if the programmer isn't aware.

I think that perl should auto-numify but not recognize hex, octal, or
even scientific notation without some explicit conversion.

-Scott
--
Jonathan Scott Duff
du...@pobox.com

Larry Wall

unread,

Mar 15, 2005, 1:43:07 PM3/15/05

to perl6-c...@perl.org

On Tue, Mar 15, 2005 at 11:09:11PM +0800, Autrijus Tang wrote:
: Currently Pugs numifies hexadecimal and octal strings as if they

: are literals; that means "0x123" and "0o456" all work as expected.
: Is that an acceptable treatment?

It's okay by me. The restriction on not autoconverting hex and octal
stems from the days when 0777 was assumed to be octal. Since we've
changed that to 0o777, it's much less likely to autoconvert zip codes
to a different number. We can allow 0b111 as well. (Scientific notation
is already allowed in Perl 5 autoconversion.)

: What about "Inf" and "NaN" in numeric context?

We should be intelligent about Inf and NaN, which means Inf is infinity
but Info is 0 (plus warning). NaN is not a number, while NaNa is 0 (plus
warning). We also recognize +Inf and -Inf.

Larry

Autrijus Tang

unread,

Mar 15, 2005, 1:57:36 PM3/15/05

to perl6-c...@perl.org

On Tue, Mar 15, 2005 at 10:43:07AM -0800, Larry Wall wrote:
> On Tue, Mar 15, 2005 at 11:09:11PM +0800, Autrijus Tang wrote:
> : Currently Pugs numifies hexadecimal and octal strings as if they
> : are literals; that means "0x123" and "0o456" all work as expected.
> : Is that an acceptable treatment?
>
> It's okay by me. The restriction on not autoconverting hex and octal
> stems from the days when 0777 was assumed to be octal. Since we've
> changed that to 0o777, it's much less likely to autoconvert zip codes
> to a different number. We can allow 0b111 as well. (Scientific notation
> is already allowed in Perl 5 autoconversion.)

... pmichaud has since talked me into only allowing only digits and
dots during numifying and thereby outlawing scientific notations
altogether, which is another end of the spectrum and is also consistent.

So, between the two consistencies, do you think that the more DWIMmy
one of parsing "0o123" is more helpful? I'll implement it tomorrow
if that's the case. :)

Thanks,
/Autrijus/

Larry Wall

unread,

Mar 15, 2005, 2:14:07 PM3/15/05

to perl6-c...@perl.org

On Wed, Mar 16, 2005 at 02:57:36AM +0800, Autrijus Tang wrote:
: So, between the two consistencies, do you think that the more DWIMmy

: one of parsing "0o123" is more helpful? I'll implement it tomorrow
: if that's the case. :)

Yes, I do. That was one of the main reasons for switching to the
less ambiguous 0o123 notation. I do think that we can afford to be
picker for the dwimmier things nowadays, but that's probably covered
by warnings being on by default. In any event, 123foo, 1e3f4x,
0o789, 0b1x, Info, and NaNa all produce warnings by default if used
in numeric context. Extra whitespace should be ignored, however.

And we might even allow exponents on 0x et al., though what radix
the exponent is assumed to be is an interesting question, though.

Larry

James Mastros

unread,

Mar 15, 2005, 6:01:29 PM3/15/05

to perl6-c...@perl.org

Larry Wall wrote:
> And we might even allow exponents on 0x et al., though what radix
> the exponent is assumed to be is an interesting question, though.

Well, the perl6-documentation guys tried to hash this out, but were
hampered by a lack of both bits that effected the bits that we were
trying to work on, and a clear wish by the powers that be (that'd by
you, Larry) to take what we worked out, and bless the good bits, and fix
the bad bits.

If you're interested, check p6-documentation for the last post of the
form "numeric literals, take \d+". I've got to get to bed, though.

Goodnight,
-=- James Mastros,
theorbtwo

Autrijus Tang

unread,

Mar 20, 2005, 5:12:46 AM3/20/05

to perl6-c...@perl.org

On Tue, Mar 15, 2005 at 11:14:07AM -0800, Larry Wall wrote:
> On Wed, Mar 16, 2005 at 02:57:36AM +0800, Autrijus Tang wrote:
> : So, between the two consistencies, do you think that the more DWIMmy
> : one of parsing "0o123" is more helpful? I'll implement it tomorrow
> : if that's the case. :)
>
> Yes, I do. That was one of the main reasons for switching to the
> less ambiguous 0o123 notation.

Implemented in r948. Can you take a look at
http://svn.openfoundry.org/pugs/t/builtins/numify.t
and see if it makes sense? More tests are welcome too.

Thanks,
/Autrijus/