Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] int_of_string bug

16 views
Skip to first unread message

Yaron Minsky

unread,
Mar 29, 2007, 12:29:03 PM3/29/07
to caml-list
So, there's a weird int_of_string bug where positive decimal numbers are
sometimes read in as negative numbers without error. Here's the bug:

http://caml.inria.fr/mantis/view.php?id=0004210

This has been marked as "wontfix" in the bug database because apparently
there's some weird spot in the lexer that depends on the wrong behavior of
int_of_string.

First of all, people should be aware of this behavior and should defend
against it in their code. Secondly, the justification for not fixing it
seems really thin. The behavior seems obviously wrong, and it's hard to see
why one wouldn't simply fix the lexer (perhaps by providing an alternate
broken implementation of int_of_string) and leave the ordinary int_of_string
where it is.

y

Oliver Bandel

unread,
Mar 29, 2007, 5:30:41 PM3/29/07
to caml...@inria.fr
On Thu, Mar 29, 2007 at 12:27:05PM -0400, Yaron Minsky wrote:
> So, there's a weird int_of_string bug where positive decimal numbers are
> sometimes read in as negative numbers without error. Here's the bug:
>
> http://caml.inria.fr/mantis/view.php?id=0004210
>
> This has been marked as "wontfix" in the bug database because apparently
> there's some weird spot in the lexer that depends on the wrong behavior of
> int_of_string.
[...]

Oh, that's bad. :(

But btw. it's also bad that, when overflowing of int occurs, no
exception is thrown. :(

Ciao,
Oliver Bandel

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Yaron Minsky

unread,
Mar 29, 2007, 8:27:10 PM3/29/07
to Oliver Bandel
On 3/29/07, Oliver Bandel <oli...@first.in-berlin.de> wrote:
>
> On Thu, Mar 29, 2007 at 12:27:05PM -0400, Yaron Minsky wrote:
> > So, there's a weird int_of_string bug where positive decimal numbers are
> > sometimes read in as negative numbers without error. Here's the bug:
> >
> > http://caml.inria.fr/mantis/view.php?id=0004210
> >
> > This has been marked as "wontfix" in the bug database because apparently
> > there's some weird spot in the lexer that depends on the wrong behavior
> of
> > int_of_string.
> [...]
>
> Oh, that's bad. :(
>
> But btw. it's also bad that, when overflowing of int occurs, no
> exception is thrown. :(


That's a problem too, but there is at least a defensible reason for that,
which is that it is expensive to get integer overflow to throw an exception.

Brian Hurt

unread,
Mar 29, 2007, 9:09:48 PM3/29/07
to Yaron Minsky

On Thu, 29 Mar 2007, Yaron Minsky wrote:

> So, there's a weird int_of_string bug where positive decimal numbers are
> sometimes read in as negative numbers without error. Here's the bug:
>
> http://caml.inria.fr/mantis/view.php?id=0004210

I'm actually not sure this is a bug either. Note that ocaml will quite
happily compute max_int+1 without an error either.

Wether this behavior (silent wrap around) is correct or not is another
argument. Elsewhere I have opinioned that the only purpose for having
more than one type of integer in your programming language is so that
programmers can pick the wrong one. But I'm widely known to be a heretic.

Ocaml's behavior is, at least, *consistent*.

Brian

Yaron Minsky

unread,
Mar 29, 2007, 9:27:07 PM3/29/07
to Brian Hurt
On 3/29/07, Brian Hurt <bh...@spnz.org> wrote:
>
>
> Wether this behavior (silent wrap around) is correct or not is another
> argument. Elsewhere I have opinioned that the only purpose for having
> more than one type of integer in your programming language is so that
> programmers can pick the wrong one. But I'm widely known to be a heretic.
>
> Ocaml's behavior is, at least, *consistent*.


Not really all that consistent:

# int_of_string "1073741824";;
- : int = -1073741824
# int_of_string "1073741825";;
Exception: Failure "int_of_string".
#
y

skaller

unread,
Mar 30, 2007, 12:30:48 AM3/30/07
to Yaron Minsky
On Thu, 2007-03-29 at 21:26 -0400, Yaron Minsky wrote:
> On 3/29/07, Brian Hurt <bh...@spnz.org> wrote:
>
> Wether this behavior (silent wrap around) is correct or not is
> another
> argument. Elsewhere I have opinioned that the only purpose
> for having
> more than one type of integer in your programming language is
> so that
> programmers can pick the wrong one. But I'm widely known to
> be a heretic.
>
> Ocaml's behavior is, at least, *consistent*.
>
> Not really all that consistent:
>
> # int_of_string "1073741824";;
> - : int = -1073741824
> # int_of_string "1073741825";;
> Exception: Failure "int_of_string".
> #

skaller@rosella:/work/felix/svn/felix/felix/trunk$ ledit ocaml
Objective Caml version 3.10+dev25 (2007-03-26)

# int_of_string "1073741824";;
- : int = 1073741824
# int_of_string "1073741825";;
- : int = 1073741825

--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

Erik de Castro Lopo

unread,
Mar 30, 2007, 2:01:54 AM3/30/07
to caml-list
skaller wrote:

> On Thu, 2007-03-29 at 21:26 -0400, Yaron Minsky wrote:
> > # int_of_string "1073741824";;
> > - : int = -1073741824
> > # int_of_string "1073741825";;
> > Exception: Failure "int_of_string".

Thats the behaviour on 32 bit systems.

> # int_of_string "1073741824";;
> - : int = 1073741824
> # int_of_string "1073741825";;
> - : int = 1073741825

But 64 bit systems get it right.

Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo
+-----------------------------------------------------------+
"Java, the best argument for Smalltalk since C++." -- Frank Winkler

skaller

unread,
Mar 30, 2007, 2:24:48 AM3/30/07
to caml-list
On Fri, 2007-03-30 at 15:59 +1000, Erik de Castro Lopo wrote:
> skaller wrote:
>
> > On Thu, 2007-03-29 at 21:26 -0400, Yaron Minsky wrote:
> > > # int_of_string "1073741824";;
> > > - : int = -1073741824
> > > # int_of_string "1073741825";;
> > > Exception: Failure "int_of_string".
>
> Thats the behaviour on 32 bit systems.
>
> > # int_of_string "1073741824";;
> > - : int = 1073741824
> > # int_of_string "1073741825";;
> > - : int = 1073741825
>
> But 64 bit systems get it right.

The point being .. the behaviour for large values is
platform independent anyhow, so in the abstract
you can say the behaviour is undefined for large values,
where 'large' isn't specified.

If you want to get it RIGHT: if you have a user input string
possibly containing digits, and you want to convert it,
you must already write a parser to parse the input,
so you won't be using int_of_string anyhow.

If the input was generated (say by another Ocaml program),
then it will already be correct.

In the Felix compiler, after lexing 'string of digits'
I use the Big_int module to convert to an integer:
that behaviour is platform independent.

If I really want an int (say for indexing), and there's
a risk of the conversion overflowing .. there's a risk
that even without overflowing the data is wrong and will
blow up, eg .. I'm not going to be indexing arrays
with max_int elements .. :)

If I really want to check, I'll use an application specific
bound such as 16000, and check the big_int against that
before converting. Thus, all the operations are deterministic
and platform independent if you do things properly.

So the 'bug' in string_of_int is just an inconvenience.

IMHO there is a 'bug' in some Ocaml documentation, where the
abstract language is not clearly distinguished from the
implementation. Throwing exceptions on error should generally
NOT be considered a specified part of the language.

Undefined behaviour is sometimes the right specification because it
allows superior optimisation and prevents programmers
relying on exceptions. This doesn't prevent the implementation
throwing them, it just means catching them locally in your
code is a bug (because you can't be sure they will be thrown).

Bounds violations are a good example of this, and indeed
since Ocaml allows -unsafe switch to disable bound checks
you'd better NOT rely on catching them. The same applies
to match failures -- use a wildcard if you want to catch
unmatched cases (otherwise be willing to sketch a proof
to your boss that there can't be a violation :)


--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

_______________________________________________

Florian Weimer

unread,
Mar 30, 2007, 3:32:27 AM3/30/07
to Yaron Minsky
* Yaron Minsky:

> That's a problem too, but there is at least a defensible reason for
> that, which is that it is expensive to get integer overflow to throw
> an exception.

i386 and amd64 have hardware support for that, so it's not very
expensive. There are pretty short RISC sequences for the checks, too.

MLton uses the i386 hardware support, and I think you can disable the
checks, so measuring the overhead shouldn't be too hard.

skaller

unread,
Mar 30, 2007, 4:46:33 AM3/30/07
to Florian Weimer
On Fri, 2007-03-30 at 09:30 +0200, Florian Weimer wrote:
> * Yaron Minsky:
>
> > That's a problem too, but there is at least a defensible reason for
> > that, which is that it is expensive to get integer overflow to throw
> > an exception.
>
> i386 and amd64 have hardware support for that, so it's not very
> expensive. There are pretty short RISC sequences for the checks, too.
>
> MLton uses the i386 hardware support, and I think you can disable the
> checks, so measuring the overhead shouldn't be too hard.

But there is a difference you may have missed: Ocaml integers
are 31 or 63 bits, not 32 or 64 bits.

--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

_______________________________________________

Andreas Rossberg

unread,
Mar 30, 2007, 5:03:18 AM3/30/07
to skaller
skaller wrote:
>>
>>> That's a problem too, but there is at least a defensible reason for
>>> that, which is that it is expensive to get integer overflow to throw
>>> an exception.
>> i386 and amd64 have hardware support for that, so it's not very
>> expensive. There are pretty short RISC sequences for the checks, too.
>>
>> MLton uses the i386 hardware support, and I think you can disable the
>> checks, so measuring the overhead shouldn't be too hard.
>
> But there is a difference you may have missed: Ocaml integers
> are 31 or 63 bits, not 32 or 64 bits.

But it uses the most significant 31/63 bits for ints, so that becomes a
non-issue. ;-)

--
Andreas Rossberg, ross...@ps.uni-sb.de

skaller

unread,
Mar 30, 2007, 5:22:21 AM3/30/07
to Andreas Rossberg
On Fri, 2007-03-30 at 10:59 +0200, Andreas Rossberg wrote:
> skaller wrote:
> >>
> >>> That's a problem too, but there is at least a defensible reason for
> >>> that, which is that it is expensive to get integer overflow to throw
> >>> an exception.
> >> i386 and amd64 have hardware support for that, so it's not very
> >> expensive. There are pretty short RISC sequences for the checks, too.
> >>
> >> MLton uses the i386 hardware support, and I think you can disable the
> >> checks, so measuring the overhead shouldn't be too hard.
> >
> > But there is a difference you may have missed: Ocaml integers
> > are 31 or 63 bits, not 32 or 64 bits.
>
> But it uses the most significant 31/63 bits for ints, so that becomes a
> non-issue. ;-)

For addition maybe, certainly not for multiplication: one of the
operands has to be shifted right 1 place.

But it depends on the code generator internal details. You could
always shift BOTH operands, do the register calculation, then
shift back .. in which case you'd lose overflow detection.
The problem is you cannot use the carry bit after the shift back
because the bit will definitely be set if the result is negative.

>From what I've seen Ocaml actually uses tricks which also might
defeat detection, for example I've seen some use of LEA
(load effective address) with the scale by 2 option to
load and shift one bit in a single instruction.

Processors are quirky about flag bits .. some set sign bit
on loading and others don't, etc, so it could be quite messy.
This is why C doesn't specify what happens on overflow:
it would compromise performance on some processors.

--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

_______________________________________________

Markus Mottl

unread,
Mar 30, 2007, 9:42:58 AM3/30/07
to Erik de Castro Lopo
On 3/30/07, Erik de Castro Lopo <mle+...@mega-nerd.com> wrote:
> But 64 bit systems get it right.

Not really:

# int_of_string "4611686018427387903";;
- : int = 4611686018427387903
# int_of_string "4611686018427387904";;
- : int = -4611686018427387904
# int_of_string "4611686018427387905";;
Exception: Failure "int_of_string".

The problem is just shifted to bigger numbers. This problem arises
with all integer conversion functions, i.e. Int64.of_string,
Int32.of_string, Nativeint.of_string, int_of_string.

Regards
Markus

--
Markus Mottl http://www.ocaml.info markus...@gmail.com

Toby Kelsey

unread,
Apr 3, 2007, 1:54:46 PM4/3/07
to caml-list
Markus Mottl wrote:

> The problem is just shifted to bigger numbers. This problem arises
> with all integer conversion functions, i.e. Int64.of_string,
> Int32.of_string, Nativeint.of_string, int_of_string.
>
> Regards
> Markus

This bug is not just a conversion problem:

# let x = 1073741824;;
val x : int = -1073741824
# (x < 0) && (x >= -x);;
- : bool = true

Toby

ls-ocaml-de...@m-e-leypold.de

unread,
Apr 3, 2007, 6:33:35 PM4/3/07
to caml-list

Toby Kelsey <toby....@gmail.com> writes:

> Markus Mottl wrote:
>
>> The problem is just shifted to bigger numbers. This problem arises
>> with all integer conversion functions, i.e. Int64.of_string,
>> Int32.of_string, Nativeint.of_string, int_of_string.
>> Regards
>> Markus
>
> This bug is not just a conversion problem:
>
> # let x = 1073741824;;
> val x : int = -1073741824
> # (x < 0) && (x >= -x);;
> - : bool = true


# let x = - 1073741824;;


val x : int = -1073741824

# -x;;
- : int = -1073741824

But this is as specified for modular ints. No surprise here ...


Regards -- Markus

0 new messages