rb_cstr_to_inum use of strtoul as an optimization has unfortunate side effects

2 views
Skip to first unread message

Florian Gross

unread,
Jul 14, 2007, 9:08:31 PM7/14/07
to ruby...@ruby-lang.org
Hi,

rb_cstr_to_inum which is used for methods like String#hex,
Kernel#Integer and so on short-circuits the case of strings whose
maximum possible number is known to fit into a long to strtoul().
Unfortunately, this causes unintended side effects.

I'm not sure if this behaviour is a good thing: (Please note that I'm
on a 32 bit machine, results will vary on a 64 bit machine)

Integer("0x-1") # => 4294967295
Integer("-0x-1") # => -4294967295
Integer("0x 123") # => 291
Integer("0x 123") # ~> in `Integer': invalid value for Integer:
"0x 123123123" (ArgumentError)
Integer("0x0x5") # => 5
Integer("0x0x000000005") # ~> in `Integer': invalid value for Integer:
"0x0x000000005" (ArgumentError)

Fixing those cases would involve a look-ahead of the first two chars
after the base prefix. If they both are [0-9a-fA-F] the optimization
can safely be used.

But I'm not sure if it is important enough to fix. Thoughts?

Kind regards,
Florian Gross


Florian Gross

unread,
Jul 15, 2007, 1:07:44 PM7/15/07
to ruby...@ruby-lang.org
On another note, String#oct allows the base to be changed by a base
prefix:

"0b1010".oct.to_s(2) # => "1010"
"0xff".oct.to_s(16) # => "ff"
"0d20".oct.to_s(10) # => "20"

Which is somewhat against the expectation the method name creates and
inconsistent with String#hex:

"0b1010".hex.to_s(16) # => "b1010"
"0o77".hex.to_s(16) # => "0"
"0d20".hex.to_s(16) # => "d20"

Is this by design? The change would be highly trivial. (Change one
character in string.c)


Nobuyoshi Nakada

unread,
Jul 15, 2007, 2:46:05 PM7/15/07
to ruby...@ruby-lang.org
Hi,

At Mon, 16 Jul 2007 02:07:44 +0900,
Florian Gross wrote in [ruby-core:11693]:


> On another note, String#oct allows the base to be changed by a base
> prefix:

> Is this by design?

Yes, it's intentional.

--
Nobu Nakada

Florian Gross

unread,
Jul 15, 2007, 4:55:35 PM7/15/07
to ruby...@ruby-lang.org
On Jul 15, 8:46 pm, Nobuyoshi Nakada <n...@ruby-lang.org> wrote:
> > On another note, String#oct allows the base to be changed by a base
> > prefix:
> > Is this by design?
>
> Yes, it's intentional.

OK, any chance we can adjust the RI documentation?

Currently it is:

Treats leading characters of _str_ as a string of octal digits
(with an optional sign) and returns the corresponding number.
Returns 0 if the conversion fails.

"123".oct #=> 83
"-377".oct #=> -255
"bad".oct #=> 0
"0377bad".oct #=> 255

Let me propose changing it to:

Treats leading characters of _str_ as a string of octal digits
(with an optional sign) and returns the corresponding number.
Non-octal bases are supported via "0x" style base prefixes.

"123".oct #=> 83
"-377".oct #=> -255
"bad".oct #=> 0
"0377bad".oct #=> 255
"0xff".oct #=> 255
"0b1111".oct #=> 15
"0d84".oct #=> 84

Thank you!


Florian Gross

unread,
Jul 15, 2007, 5:07:58 PM7/15/07
to ruby...@ruby-lang.org
Oops:

> Let me propose changing it to:
>
> Treats leading characters of _str_ as a string of octal digits
> (with an optional sign) and returns the corresponding number.
> Non-octal bases are supported via "0x" style base prefixes.
Returns 0 if the conversion fails.
>
> "123".oct #=> 83
> "-377".oct #=> -255
> "bad".oct #=> 0
> "0377bad".oct #=> 255

Reply all
Reply to author
Forward
0 new messages