Thanks
Haskell has an Integer type, which is arbitrary precision, and an Int
type, which is a machine int that can overflow with silent wraparound.
The usual document to look for in such a case is the Haskell98 Report.
>> I mean when you have two
>> integers, what happens when you multiply them?
You can't multiply them directly, you first have to convert one of them
to other type. Usually, you'd convert Int to Integer, and then multiply
two Integers, which cannot overflow (though you can run out of memory).
>> Is overflow checked?
> Haskell has an Integer type, which is arbitrary precision, and an Int
> type, which is a machine int that can overflow with silent wraparound.
Also note that the precision of an Int isn't necessarily a multiple of
eight bits. The Haskell98 Report gives a lower bound:
The finite-precision integer type Int covers at least the range
[-2^29 , 2^29 - 1].
- Dirk
You can multiply Int by Int:
Prelude> let a = 7777777777 :: Int
Prelude> a
7777777777
Prelude> a*a
5153594927266406881
Prelude> a*a*a
-6296163071503143855
Compare results with:
Prelude> let b = 7777777777 :: Integer
Prelude> b*b
60493827148395061729
Prelude> b*b*b
470507544440466392332359396433
Int is evil. I didn't even notice at first that the a*a result is
wrong.
> Int is evil. I didn't even notice at first that the a*a result is
> wrong.
abs minBound :: Int was a bit of a surprise for me. (-:
Mark
Well, if one knows that two's complement is assymetric by design ... :-)
And it's a good idea to be *very* careful when handling extreme values
in a given type, anyway. Which for me normally means converting them to
a larger type at first opportunity. Unless I'm really sure all
comparisons etc. work out, and the compiler is not going to introduce
errors by "optimizing" the code. (No matter if C or Haskell).
- Dirk
I asked because I have introduced an overflow checking option in my
C compiler system lcc-win.
What is interesting is that the responsability of handling overflows
apparently is the business of the programmer in Haskell, the same as
in C.
The problem with that approach (hence my modification to my compiler)
is that in most cases you do not EXPECT the overflow so you will NOT
take the measures that *could* be taken because when writing code you
do not think about all possible stuff.
In that sense, haskell is no different as C is now. I wanted to know
what is the situation in "higher level" languages like Haskell to gather
ideas. The answer is surprising.
jacob
---
lcc-win, a free C compiler
http://www.cs.virginia.edu/~lcc-win32
> >> I would like to know if you could point me to a document specifying
> >> the behavior of haskel with integer overflow. I mean when you have
> >> two integers, what happens when you multiply them? Is overflow
> >> checked?
> >
> > Haskell has an Integer type, which is arbitrary precision, and an
> > Int type, which is a machine int that can overflow with silent
> > wraparound.
>
> I asked because I have introduced an overflow checking option in my C
> compiler system lcc-win.
>
> What is interesting is that the responsability of handling overflows
> apparently is the business of the programmer in Haskell, the same as
> in C.
>
> The problem with that approach (hence my modification to my compiler)
> is that in most cases you do not EXPECT the overflow so you will NOT
> take the measures that *could* be taken because when writing code you
> do not think about all possible stuff.
>
> In that sense, haskell is no different as C is now. I wanted to know
> what is the situation in "higher level" languages like Haskell to
> gather ideas. The answer is surprising.
If overflow should be an error condition, then it's simply wrong to use
the Int type. Use a type, which offers the precision you need. If you
are not sure, use Integer. If you _want_ to limit the precision of your
caluclations for some reason, then your code needs to be aware of it
anyway, so throwing exceptions or aborting the program is probably not
the right way to go. Instead make a type, which implements the logic
you need and turn it into a Num instance.
Greets,
Ertugrul.
--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://blog.ertes.de/
I think that is wrong. From http://www.pphsg.org/cdsmith/types.html :
I find it amusing when novice programmers believe their main job
is preventing programs from crashing... More experienced
programmers realize that correct code is great, code that crashes
could use improvement, but incorrect code that doesn't crash is a
horrible nightmare.
There is a separate set of types (Data.Word) for machine words that
have computer-like wraparound behavior. Int is supposed to represent
integers but in a limited and optimized form. Integer has rather high
overhead for the sake of supporting numbers so large that they don't
occur in most programs. Int is what we'd normally use when the
numbers are expected to be in range. In that situation, if the number
is unexpectedly out of range, that is a genuine exception condition
and throwing an exception is exactly the right thing to do. The
alternative is nasty failures like the ones afflicting old C programs
written with the entirely reasonable (at the time) expectation that
file sizes would fit in 32-bit ints. It wasn't that long ago that we
started actually seeing > 4GB files. It's better for those programs
to throw exceptions than fail in weird and possibly exploitable ways.
Floating point arithmetic is supposed to be a limited, optimized
representation of real numbers. When a float is out of range, there
is an arithmetic exception. Ints should be treated similarly. The
current mechanism is just evil.
> If you _want_ to limit the precision of your caluclations for some
> reason, then your code needs to be aware of it anyway, so throwing
> exceptions or aborting the program is probably not the right way to
> go. Instead make a type, which implements the logic you need and
> turn it into a Num instance.
And the trouble with this is that a lot of built-in functions like
"take", "drop", and "length" expect and return Int values. So you
need ugly conversions all over the place, where Int is a quite natural
and reasonable choice, even though it has some chance of unexpected
failure, say for taking the length of a lazy list that is larger than
2**32.
Not really. The practice in Haskell is that normally, you just
use Integers, so you don't have to check for overflows because there
are none. You can't do that in C (unless you use special libraries).
If you really care about performance, you can replace Integer's by
Int's, but the trade-off is that in that case it's indeed the
programmer's responsibility to make sure there are no overflows.
For example, by inspecting the code manually and calculating
the minimum/maximum possible value, and restricting only
the input values (say). Absence of side-effects makes that easy.
But automatic overflow checking would be contraproductive in that
situation, since the only reason for using Int's is performance,
and overflow checks everywhere would kill the performance gain.
Nevertheless, if for some perverse reason you'd still want this,
you could just define your own CheckedInt type by introducing another
Num instance. Another thing you can't do in C. :-)
- Dirk
Maybe have the checking happen with Control.Exception.assert, so it gets
fast automatically when you turn optimization on. At least it might be
useful if your unit tests run on the unoptimized code are good enough.
We have some generic- functions now, but the number of things
implemented in the standard libraries with Int -- IntSets, for example
-- doesn't make it very convenient to just use CheckedInt everywhere
instead.
Mark
The problem, precisely, is that it is VERY difficult to know if overflow
can be an error condition.
That is precisely the point here.
Requirements change. Quantities that would have fit in 31 bits do
not fit any more. The file size goes beyond 1GB for instance,
a table that handles an image grows too much to cope with a
resolution of 6560 x 1900 (two monitors), etc etc.
Testing for overflow costs essentially zero. I do not understand
why this is not done, since an absence of tests means that
a+b
can be WRONG. If the system traps on overflow at least we
KNOW there is a problem instead of going on with wrong results!
No. In my implementation overflow checking is essentially free.
The generated code looks like this (for an x86 implementation)
(1) do the operation (add,subtract, etc)
(2) jo errlab1 (jump on overflow to errlab1)
(3) retlab1:
(4) The rest of the program
(5) errlab1: call the error handler
(6) if the error label returns goto retlab1
(i.e. ignore overflow)
Note that the check for overflow is a SINGLE INSTRUCTION
(jo). This jump on overflow will be correctly predicted as
not taken since it is a forward branch. This means that
the pipeline will not be disturbed in any way.
Replacing all arithmetic operations with checked operations
produces NO NOTICEABLE performance loss, i.e. the overhead
is NOT MEASURABLE!
The myth of performance loss by checking for overflow is just
that: A MYTH.
jacob
>> Nevertheless, if for some perverse reason you'd still want this,
>> you could just define your own CheckedInt type by introducing another
>> Num instance. Another thing you can't do in C. :-)
> We have some generic- functions now, but the number of things
> implemented in the standard libraries with Int -- IntSets, for example
> -- doesn't make it very convenient to just use CheckedInt everywhere
> instead.
That's true. OTOH, these operations aren't likely to overflow (IntSets
don't do arithmetic, for instance). And you could still convert
between CheckedInt's and Int's for those operations, checking during
the conversion. If it get's inconvienient, write a quick wrapper.
But anyway, I'm not advocating it to do it this way. The overhead
alone is a good reason not to.
- Dirk
I don't, because I have been bitten by it in the past. It's a classical
case of premature optimization.
So now I use Integer's by default everywhere, I convert the few Int's
that still creep in as soon as possible. And I only switch to Int's
if I want to tweak performance. And then I watch for possible overflows
very carefully first.
> Floating point arithmetic is supposed to be a limited, optimized
> representation of real numbers.
No, it's not optimized. It's a compromise to represent an uncountable
datatype in a finite size, at all. You can't compare this situation to
Int and Integer, because there is no equivalent to Integer for real
numbers.
> When a float is out of range, there is an arithmetic exception.
Actually, the IEEE standard works hard to make it *not* an exception,
but to go on with various pseudo-values as result. And that still
doesn't take care of the numerical problems you have because of the
finite precision. If you have a system of linear equations that's
badly conditioned, the limited precision can make your result
completely wrong. All without causing any exception.
> Ints should be treated similarly. The current mechanism is just evil.
Only if you use Int's when you're not supposed to :-)
- Dirk
>> But automatic overflow checking would be contraproductive in that
>> situation, since the only reason for using Int's is performance,
>> and overflow checks everywhere would kill the performance gain.
> No. In my implementation overflow checking is essentially free.
Even if the cost is low, it's the wrong approach, because you
can do much better with the extreme cases: Either you use Integer's,
and don't have to worry about overflows at all (but it's slower),
or you use Int's, but then you'd better make damn sure that they
are no overflows. In the last case, you want it as fast as possible,
so you want manual control over the checks (with as few checks
as necessary).
The in-the-middle-approach is IMHO pretty useless, because you
combine the disadvantages of both cases: The program can fail
with overflows, and it doesn't run as fast as it could.
> The generated code looks like this (for an x86 implementation)
> (1) do the operation (add,subtract, etc)
> (2) jo errlab1 (jump on overflow to errlab1)
That doesn't work if the representation isn't just a vanilla number.
There's a reason the standard guarantees only 29 bits precision.
I think we should EOT here.
- Dirk
Then you just test the 2 higher bits (or similar)
test $0xC0000000,result
jnz errlab1
The overhead would be 3 instructions (jo + 2 tests), without
any pipeline turbulence, i.e. less than 0.1%...
> I think we should EOT here.
>
If you say so.
OK.
jacob
I just don't agree with this. You might be damn sure the ints won't
overflow, but the runtime should check anyway, just like you shouldn't
use array subscripts unless you're damn sure they're in bounds, but
the runtime should check those too. Array subscript overruns are a
notorious source of bugs in all kinds of code, and in each of them the
implementer was damn sure that the subscripts wouldn't overrun. It's
the same with int overflow.
In a supposedly safe language, static guarantees are great if you can
get them. In their absence, dynamic checks can keep your program from
running amuck. But mere strength of conviction on the programmer's
part is never enough. If you want wraparound operations, use the Word
types. The default types for very standard operations like "length"
should not have unsafe behavior.
Note that if the compiler is sufficiently Int aware, it can lift quite
a lot of overflow checks out of loops, similar to what's done with
array bounds checks. That's probably harder to do with user-defined
types that attempt to implement safe Ints. Safe should be the default
and if necessary, unsafe can be available on explicit request.
I said EOT because you're obviously in love with the concept, so
discussion is pretty futile. But this
> Then you just test the 2 higher bits (or similar)
> test $0xC0000000,result
> jnz errlab1
just doesn't work: As a counterexample to your code, try e.g. adding (-1)
and (-1).
The overflow flag is the XOR of the carry coming into the sign bit
with the carry going out of the sign bit. That's easy to implement on
the gate level, so special instructions like x86 "jo" are cheap.
But these special instructions don't work when the sign bit is not
the MSB of the register, or when the compiler uses C code as target
language instead of emitting assembly (as some Haskell compilers do),
or when you implement the checking directly in Haskell. In those cases,
the costs are much higher, and you usually use a different method
(like comparing the signs of the operands and the result).
- Dirk
> In a supposedly safe language, static guarantees are great if you can
> get them. In their absence, dynamic checks can keep your program from
> running amuck. But mere strength of conviction on the programmer's
> part is never enough.
It's enough if there is a safe alternative, and if you use Int's only
if you really do want the performance gain (and as a payment, if you
have checked the code). That's not that hard to do formally, you
assign ranges to all Int arguments/results and then work forward or
backward to ensure the ranges are conservative enough. And then you
can insert dynamic checks at points where there matter.
Of course if you use Int's when you just have a hunch the values might
be small enough, then yes, this isn't enough, and if you prefer testing
to a formal proof, then yes, overflow checks everywhere are useful.
But that's a question of the approach you take.
- Dirk
Of course there is. You could use IntMaps instead of arrays, for example.
> Of course if you use Int's when you just have a hunch the values might
> be small enough, then yes, this isn't enough, and if you prefer testing
> to a formal proof, then yes, overflow checks everywhere are useful.
>
> But that's a question of the approach you take.
Reality is that while Haskell has more formal machinery than something
like C, it is not Agda and it is full of partial functions. For
example it's possible to attempt pattern matching or using "head" on
an empty list, and so forth. The runtime does the only sane thing and
throws exceptions when those things happen.
I think the more normal implementation of tag bits is in the low order
bits of the word, so you would just add normally and use "jo" to
check for overflow.
> or when the compiler uses C code as target language instead of
> emitting assembly (as some Haskell compilers do),
You'd tweak the generated code to generate the best instruction
sequence via your favorite C compiler, and hope for the best
(performance-wise, not correctness-wise) with other compilers.
> or when you implement the checking directly in Haskell. In those
> cases, the costs are much higher, and you usually use a different
> method (like comparing the signs of the operands and the result).
That's why the knowledge about low level overflow checking should be
built into the compiler rather than leaving the poor user to his own
devices.
If that were really the practice, "length" would return an Integer and
not an Int. The actual practice, as observed in the actual world, is
that Ints are used all over the place where Integer would be safer but
would cause an unacceptable performance hit.
I don't know that I'd call something an optimization if the compiler
and language are gravitate so much towards doing it by default. An
optimization is where you go out of your way hairing up the code in
the hope of making it faster. If you just do the simplest and most
obvious thing, that's not an optimization. The problem is that in
Haskell, doing the simplest and most obvious thing results in wrong
code.
> But automatic overflow checking would be contraproductive in that
> situation, since the only reason for using Int's is performance,
> and overflow checks everywhere would kill the performance gain.
You're missing the point, I think: x86 CPUs can do automatic overflow
on int checking, it's just that the C standrd doesn't allow for it, so
it's almost never used. There's no reason why Haskell couldn't use it
(it probably exists on all platforms that Haskell exists on). The
reason it doesn't is almost certainly because the early
implementations where either interpreters written in C derived
languages or copmilers that compiled to C code as an intermediate
language.
> Nevertheless, if for some perverse reason you'd still want this,
> you could just define your own CheckedInt type by introducing another
> Num instance. Another thing you can't do in C. :-)
That might actually be something sensible to propose to the ghc crowd
:)
Phil
--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
> If that were really the practice, "length" would return an Integer and
> not an Int.
That's a historic accident, and has been partly corrected with the
"generic*" variants.
- Dirk
Are you saying the x86 has an integer overflow trap?! I'm no expert
but I don't remember anything like that from perusing the manuals.
Are you sure?
> it's just that the C standrd doesn't allow for it, so it's almost
> never used. There's no reason why Haskell couldn't use it (it
> probably exists on all platforms that Haskell exists on).
?????
A little googling shows that the Vax had something like that (it had
everything including the kitchen sink) but I don't see mention of
the x86 having it.
> > Nevertheless, if for some perverse reason you'd still want this,
> > you could just define your own CheckedInt type by introducing another
> > Num instance. Another thing you can't do in C. :-)
>
> That might actually be something sensible to propose to the ghc crowd :)
Only if all the functions (like "length") that currently use unchecked
ints switched to checked ints. But then the checked int type should
simply be called "Int". After all, there already is an assortment of
unchecked int types in Data.Word.
Not a trap. A conditional jump on overflow. That should have been
obvious from my message describing the generated code for checked
operations. Again, it looks like this
(1) You do the operation. This sets the overflow flag if there is
overflow, zeroes it if not.
(2) You do a conditional jump on the overflow flag. If it is one
the jump is taken, otherwise the processor continues with the next
instruction.
> I'm no expert
> but I don't remember anything like that from perusing the manuals.
> Are you sure?
>
The manuals repeat at each instruction the flags affected. Just read
them again. Addition, subtraction, multiplication and division may
cause the overflow being set.
>> it's just that the C standrd doesn't allow for it, so it's almost
>> never used.
This is actually wrong. The C standard says that overflow is
undefined behavior.
>> There's no reason why Haskell couldn't use it (it
>> probably exists on all platforms that Haskell exists on).
>
> ?????
> A little googling shows that the Vax had something like that (it had
> everything including the kitchen sink) but I don't see mention of
> the x86 having it.
>
There is no known processor that can't detect overflow. Maybe some
embedded micro-controller for coffee machines doesn't have it but it
is unlikely that haskell would run in such a machine.
>>> Nevertheless, if for some perverse reason you'd still want this,
>>> you could just define your own CheckedInt type by introducing another
>>> Num instance. Another thing you can't do in C. :-)
>> That might actually be something sensible to propose to the ghc crowd :)
>
> Only if all the functions (like "length") that currently use unchecked
> ints switched to checked ints. But then the checked int type should
> simply be called "Int". After all, there already is an assortment of
> unchecked int types in Data.Word.
Int overflow should trap, because it is an error, the result is WRONG!
I did this in my C compiler, and as told before the performance loss
is not measurable...
Now, the 29 bit problem can be disposed very easily, either masking the
higher bits before the operation or testing them afterwards. Note that
the example you gave me (adding -1 and -1) will not provoke overflow,
so the tests can be ignored.
Oh ok, that's not what I would have described as automatic, but whatever.
> There is no known processor that can't detect overflow.
What I meant about the vax is I think it had an actual overflow trap,
so if you enabled it, "add a,b" would trap on int overflow without
your having to execute an extra instruction to check a flag in the
non-overflow case.
Hi Jacob,
I think that your approach is right for C, checking for overflow on
_signed_ int is a _good_ thing as soon as there is no performance loss
(say <2-3%). But asking for the behavior of Int in a functional
language is not appropriate since most (all?) functional languages
deal with pointers in a specific way and store tags as part of the
address, typically 2 bits on 32 bits arch and 4 bits on 64 bits arch,
and require that the size of pointers be greater or equal to the size
of integer. These tags allow to reinterpret the pointer as a stored
integer value (after masking and/or shifts) and avoid boxed value
(object) allocation, but also to make the difference between pointer
to object and pointer to function required to handle functions as
values transparently. Haskell implementations use extra free tags to
identify unevaluated expression (laziness) or type's constructors to
optimize pattern matching. So in principle they have to check for
overflow (or mask the result) to avoid the corruption of these tags,
unless type and strictness analysis allow to work locally on plain
unboxed values at full speed (common).
You can google for "dynamic pointer tagging" to have more information
on the implementation details.
a+, ld.
I think that's what I meant :)
There probably are C implementations out there that do check for
overflow, but the major ones don't & you can't write portable code
that relies on overflow checking because there's not even an adhoc
universal standard for it. Gcc compiled code just aborts if you set
the -ftrapv option for instance, and even then it doesn't catch all
the potential problems.
On a modern OoO CPU, a 'not taken' branch is almost cost
free. There's a (small) amount of code bloat of course, so it's not
*entirely* free.
It's insane that no modern language gives access to checkable overflow
on machine words: pretty much every CPU can do it & integer overflow
has been a source of security issues for years.
>> There is no known processor that can't detect overflow.
>
> What I meant about the vax is I think it had an actual overflow trap,
> so if you enabled it, "add a,b" would trap on int overflow without
> your having to execute an extra instruction to check a flag in the
> non-overflow case.
Sure, that's the other way to implement it. You lose the ability to
branch to local cleanup code though.
But why should a "modern language" deal with such low level cruft?
There should just be a safe int type that traps on overflow, either
through generated code or through a machine trap.
> > What I meant about the vax is I think it had an actual overflow trap,
>
> Sure, that's the other way to implement it. You lose the ability to
> branch to local cleanup code though.
Just set up the trap handler appropriately. In Haskell though, if I
understand properly, any exception is an IO effect, with no local
cleanup allowed.
I agree completely.
>> > What I meant about the vax is I think it had an actual overflow trap,
>>
>> Sure, that's the other way to implement it. You lose the ability to
>> branch to local cleanup code though.
>
> Just set up the trap handler appropriately. In Haskell though, if I
> understand properly, any exception is an IO effect, with no local
> cleanup allowed.
If it's just memory, then the garbage collector will deal with it. If
it's some other object that needs cleanup, then you'll have to keep
track of it in state which the exception handler can access[1]. Haskell
is no different to any other language in that respect.
Phil
[1] You could probably do something with ForeignPtrs & finalisers to
handle the cleanup, but then you've got the same problem the Java
people had/have with unpredictable finalisation.
> Paul Rubin <http://phr...@nospam.invalid> wrote:
(snip)
>> Just set up the trap handler appropriately. In Haskell though, if I
>> understand properly, any exception is an IO effect, with no local
>> cleanup allowed.
>
> If it's just memory, then the garbage collector will deal with it. If
> it's some other object that needs cleanup, then you'll have to keep
> track of it in state which the exception handler can access[1]. Haskell
> is no different to any other language in that respect.
I'd have liked to have available a non-IO mechanism for synchronous
exceptions from the current thread, that evaluates all the other thunks
from the relevant expression and hands you a set of exceptions so you
can't tell what the evaluation order was, but people seemed to think
this very silly when I suggested it years ago so I kept quiet about it
since! Of course, Either lets me roll it by hand for exceptions I throw
manually, but it clutters code rather.
Of course, cleaning up objects probably involves IO operations in the
handler anyway, but an arithmetic exception (or taking the head of a
list) needn't.
Mark
> > If overflow should be an error condition, then it's simply wrong to
> > use the Int type. Use a type, which offers the precision you need.
> > If you are not sure, use Integer.
>
> I think that is wrong. From http://www.pphsg.org/cdsmith/types.html :
>
> I find it amusing when novice programmers believe their main job
> is preventing programs from crashing... More experienced
> programmers realize that correct code is great, code that crashes
> could use improvement, but incorrect code that doesn't crash is a
> horrible nightmare.
>
> There is a separate set of types (Data.Word) for machine words that
> have computer-like wraparound behavior. Int is supposed to represent
> integers but in a limited and optimized form. Integer has rather high
> overhead for the sake of supporting numbers so large that they don't
> occur in most programs. Int is what we'd normally use when the
> numbers are expected to be in range. In that situation, if the number
> is unexpectedly out of range, that is a genuine exception condition
> and throwing an exception is exactly the right thing to do. The
> alternative is nasty failures like the ones afflicting old C programs
> written with the entirely reasonable (at the time) expectation that
> file sizes would fit in 32-bit ints. It wasn't that long ago that we
> started actually seeing > 4GB files. It's better for those programs
> to throw exceptions than fail in weird and possibly exploitable ways.
Adding overflow checks to Int will make it just as slow as Integer.
Integer prefers machine words, where possible. It's better to use it
right away, because it's not slower than overflow-checked Int, but it
behaves correctly when an overflow does occur instead of causing your
program to crash (or forcing the programmer to place exception handlers
everywhere -- note also that you can only handle exceptions in the IO
monad).
As said, if you want to actually handle overflow in a non-fatal way, use
an appropriate type, e.g. Maybe Int32 or your own creation. Add a
suitable Num instance. Then you can handle it without exceptions in a
natural way, including all the goodies of monads and pattern matching.
> Floating point arithmetic is supposed to be a limited, optimized
> representation of real numbers. When a float is out of range, there
> is an arithmetic exception. Ints should be treated similarly. The
> current mechanism is just evil.
Int is supposed to be a good type for small integers like counting
(small) things and doing logic with only few bits. It is usually well
optimized and thus suitable for that kind of work. Overflow checking is
not necessary for it and would only make it slower without any real use.
If it were overflow-checked most of my arithmetic code would get much
slower unnecessarily.
You wouldn't want 'length' to return an Integer or do overflow checking
for virtually all purposes. If you do, there is 'genericLength'.
> > If you _want_ to limit the precision of your caluclations for some
> > reason, then your code needs to be aware of it anyway, so throwing
> > exceptions or aborting the program is probably not the right way to
> > go. Instead make a type, which implements the logic you need and
> > turn it into a Num instance.
>
> And the trouble with this is that a lot of built-in functions like
> "take", "drop", and "length" expect and return Int values. So you
> need ugly conversions all over the place, where Int is a quite natural
> and reasonable choice, even though it has some chance of unexpected
> failure, say for taking the length of a lazy list that is larger than
> 2**32.
See the generic list functions, but as said, you won't need them for all
but very exotic purposes.
Greets,
Ertugrul.
--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://blog.ertes.de/
> > If overflow should be an error condition, then it's simply wrong to
> > use the Int type.
>
> The problem, precisely, is that it is VERY difficult to know if
> overflow can be an error condition.
>
> That is precisely the point here.
The programmer does know. After all you should know whether an overflow
will be fatal and prevent your application from continuing or not. The
language and library designers don't need to know. They simply provide
both.
> Requirements change. Quantities that would have fit in 31 bits do not
> fit any more. The file size goes beyond 1GB for instance, a table
> that handles an image grows too much to cope with a resolution of 6560
> x 1900 (two monitors), etc etc.
>
> Testing for overflow costs essentially zero. I do not understand why
> this is not done, since an absence of tests means that
>
> a+b
>
> can be WRONG. If the system traps on overflow at least we KNOW there
> is a problem instead of going on with wrong results!
Because using Integer costs the same: essentially zero. As said in
another post, it uses machine integers where possible. Overflow
checking will have the same overhead.
It's not done, because Int is not supposed to be used that way. We have
Integer for that, which can never overflow. If you want to make
overflow a non-fatal error, then exceptions are not the right way to go,
at least not in Haskell. After all we have an extremely powerful type
system for some reason. Using exceptions for this purpose is like
making fun of algebraic datatypes.
> >> Not really. The practice in Haskell is that normally, you just use
> >> Integers,
> >
> > If that were really the practice, "length" would return an Integer
> > and not an Int.
>
> That's a historic accident, and has been partly corrected with the
> "generic*" variants.
I don't think it's an accident. There is a very good reason to prefer
Int over Integer in this place.
> > Even if the cost is low, it's the wrong approach, because you can do
> > much better with the extreme cases: Either you use Integer's, and
> > don't have to worry about overflows at all (but it's slower), or you
> > use Int's, but then you'd better make damn sure that they are no
> > overflows.
>
> I just don't agree with this. You might be damn sure the ints won't
> overflow, but the runtime should check anyway, just like you shouldn't
> use array subscripts unless you're damn sure they're in bounds, but
> the runtime should check those too. Array subscript overruns are a
> notorious source of bugs in all kinds of code, and in each of them the
> implementer was damn sure that the subscripts wouldn't overrun. It's
> the same with int overflow.
That's a useless comparison. You don't create arrays of gigantic sizes
and use only a tiny part of them somewhere in the middle. Instead you
make them as small as possible to exploit them fully and not waste any
space. Hence in normal applications you always approach the edge of an
array, so there is a good reason to check indices for overflows.
Int is different. No serious application exploits anywhere near the
full range of an Int. Most can guarantee that it will never overflow or
at least make the probability of overflow neglibly small.
> In a supposedly safe language, static guarantees are great if you can
> get them.
You have them. It's guaranteed that Int has 29 bits at least. If
that's not enough, use Integer.
> Note that if the compiler is sufficiently Int aware, it can lift quite
> a lot of overflow checks out of loops, similar to what's done with
> array bounds checks. That's probably harder to do with user-defined
> types that attempt to implement safe Ints. Safe should be the default
> and if necessary, unsafe can be available on explicit request.
It's equally easy for the compiler to make Integers behave like Ints,
where possible, because the condition is exactly the same.
This is just not true. Please read the other posts in this thread.
The measurements I did point to an overhead for overflow checking
of less than 0.01%
> If it were overflow-checked most of my arithmetic code would get much
> slower unnecessarily.
>
Again you repeat this without any measurement data to back it up...
A little searching reveals that SPJ offered to imlement a CheckedInt
back in 1999 & no-one took him up on it at the time.
As a side issue, I do find this particular behaviour a little
surprising:
$ ghci
GHCi, version 6.10.4: http://www.haskell.org/ghc/ :? for help
...
Prelude> (fromInteger 2^31) :: Int
-2147483648
Prelude> (fromInteger 21474836499) :: Int
19
Holy Integer wraparound batman!
Naïvely, I would have expected an exception, or at least truncation to
INT_MAX from a conversion from Integer -> Int where the value didn't
fit in an Int. If I wanted a raw bit conversion of the least
significant word to a C Int I'd have asked for it.
The speed difference between Int and Integer is very noticable and
it's not just from the overflow checks. I don't believe (at least
with GHC 6.8.3) that Integer ever uses machine ints. Anyway, what
would it do with STUArray Integer or Uvector Integer?
You can of course write a safeFromInteger wrapper around fromInteger,
which isn't hard, but it's a gotcha that's waiting to trip people up.
Unfortunately that is naive wishful thinking, as the long and sad
history of integer overflow exploits (similar to buffer overflow
exploits) continues to show. I just got a security update for some
XML library on Ubuntu a few hours ago because of such an overflow.
Here's one in the Java runtime from a few months ago:
http://sunsolve.sun.com/search/document.do?assetkey=1-66-263428-1
Here's some from xemacs:
http://www.securityfocus.com/bid/35473
Ada actually has ranged integer types, e.g. you can declare an integer
variable whose value is constrainted to be between 37 and 205. If you
do mixed-mode arithmetic using those and regular ints, the same effect
may not be so easy to code in Haskell.
Hah, there is a Haskell-oriented blog named
http://intoverflow.wordpress.com/ !!!
>> > If that were really the practice, "length" would return an Integer
>> > and not an Int.
>>
>> That's a historic accident, and has been partly corrected with the
>> "generic*" variants.
>
> I don't think it's an accident. There is a very good reason to prefer
> Int over Integer in this place.
Counting the number of newbies asking "why doesn't my function to calculate
the average of a list compile?", and counting the number of times
I had to insert "toInteger" or change it to "genericLength", I still
say it's an accident.
The general form should be called "length". Having a specialized version
for Int, either by compiler rules or by a different name, is fine
if you want to optimize for performance. But that's the only good reason
to prefer Int here, and performance comes second.
"First make it right, then make it fast."
- Dirk
> You're missing the point, I think: x86 CPUs can do automatic overflow
> on int checking, it's just that the C standrd doesn't allow for it,
The C standard says that the result of an overflow in signed integer
arithmetic is undefined. The implementation is free to wrap, trap, or
whatever. Unsigned integer arithmetic is required to wrap.
> > Because using Integer costs the same: essentially zero. As said in
> > another post, it uses machine integers where possible.
>
> The speed difference between Int and Integer is very noticable and
> it's not just from the overflow checks. I don't believe (at least
> with GHC 6.8.3) that Integer ever uses machine ints.
I don't know, but the implementation looks like:
data Integer
= GHC.Integer.Internals.S# GHC.Prim.Int#
| GHC.Integer.Internals.J# GHC.Prim.Int# GHC.Prim.ByteArray#
So the potential is there. Seems like the implementation isn't as good
as it should be yet.
> Anyway, what would it do with STUArray Integer or Uvector Integer?
Integer is an instance of Ix, so you can use it as the index type with
no problems.
True, but another historical accident is with 'map' vs. 'fmap'. We
don't need the current list 'map' function. The Functor function used
to be named 'map' instead of 'fmap', but then its type signature and its
appearance in non-list code confused newbies. Now we have an extra
list-specialized 'map' and a general 'fmap'. The list 'map' function is
totally superfluous and probably its definition looks like 'map = fmap'
anyway.
Yet another such accident is with list comprehensions. Haskell used to
have more general monad comprehensions, of which list comprehensions are
a special case. Because that confused beginners, monad comprehensions
have been taken out of the language _completely_ and replaced by list
comprehensions. Sounds like a bad joke, but it's sad truth.
I don't consider that "right". Yes, they confused beginners. But we
should clear them up instead of limiting ourselves.
> > Int is different. No serious application exploits anywhere near the
> > full range of an Int. Most can guarantee that it will never
> > overflow or at least make the probability of overflow neglibly
> > small.
>
> Unfortunately that is naive wishful thinking, as the long and sad
> history of integer overflow exploits (similar to buffer overflow
> exploits) continues to show. I just got a security update for some
> XML library on Ubuntu a few hours ago because of such an overflow.
> Here's one in the Java runtime from a few months ago:
>
> http://sunsolve.sun.com/search/document.do?assetkey=1-66-263428-1
>
> Here's some from xemacs:
>
> http://www.securityfocus.com/bid/35473
Such overflow bugs are impossible with Integer. Adding overflow
checking wouldn't eliminate those bugs, but it would make them easier to
spot. As Dirk Thierbach pointed out: "First make it right" (make
overflows impossible by concept), "then make it fast" (optimize
Integer).
> Ada actually has ranged integer types, e.g. you can declare an integer
> variable whose value is constrainted to be between 37 and 205. If you
> do mixed-mode arithmetic using those and regular ints, the same effect
> may not be so easy to code in Haskell.
You can do that with some type hackery, but you need some extensions for
that. However, it's easy to do for fixed ranges.
> Hah, there is a Haskell-oriented blog named
> http://intoverflow.wordpress.com/ !!!
Neat. =)
Overflows aren't conceptually impossible with Integer. Your process
can run out of memory and throw an error. If Haskell used Integer
exclusively and got rid of Int altogether (or renamed it to something
like UnsafeInt so that it was quite inconvenient to get to), that
would improve on the present situation, but the performance hit would
be intolerable.
Right now we've got a situation where (+) is a partial function with
Integer but total with Int. And the price is that Int fails to
satisfy what I would have thought was a basic law of an integral type,
which is that if a > b, then a+1 > b. I think it's better to make
(+) a partial function for Int the way it is for Integer.
Sorry, I think I meant (STUarray s i Int), i.e. the elements of the
array are unboxed ints, so you expect each of them to fit in one
machine word, and for a billion-element array to fit in 4GB of
physical memory. That precludes using Integer, boxing, etc.
> "First make it right" (make overflows impossible by concept), "then
> make it fast" (optimize Integer).
Of course, retrofitting Int where there used to be Integer can require
a reanalysis of the program logic to make sure that overflow isn't even
indirectly assumed won't happen by some code written with Integers in
mind.
Mark
That precludes Int as well, because you can't predict its size. You
need to use Data.Word or Data.Int for this to be useful. I think the
best bet is to add a type, which is specifically an overflow-checked
machine word. But I still recommend using Integer where there is no
strict necessity to use machine words.
> > > http://www.securityfocus.com/bid/35473
> >
> > Such overflow bugs are impossible with Integer. Adding overflow
> > checking wouldn't eliminate those bugs, but it would make them
> > easier to spot. As Dirk Thierbach pointed out: "First make it
> > right" (make overflows impossible by concept), "then make it fast"
> > (optimize Integer).
>
> Overflows aren't conceptually impossible with Integer. Your process
> can run out of memory and throw an error.
That's not an overflow, but an out of memory error. It is handled
properly through an exception being thrown, while a silent wraparound
leads to bugs, which are hard to find. You can't compare these to kinds
of errors. Also an out of memory error is not a bug. An unexpected
integer overflow is. Integer eliminates this bug.
> If Haskell used Integer exclusively and got rid of Int altogether (or
> renamed it to something like UnsafeInt so that it was quite
> inconvenient to get to), that would improve on the present situation,
> but the performance hit would be intolerable.
Yes, I fully agree with that one.
> Right now we've got a situation where (+) is a partial function with
> Integer but total with Int. And the price is that Int fails to
> satisfy what I would have thought was a basic law of an integral type,
> which is that if a > b, then a+1 > b. I think it's better to make (+)
> a partial function for Int the way it is for Integer.
How is (+) a partial function with Integer?
Also note that (+) is not related to any order. It's not the same as
the mathematical addition operation for most types. If you need such an
operator, you would have to write a class for it first.
An out of memory error from an excessively large Integer is a bug if
you expected the numbers in your program to never be larger than 10 or
20 bits. It's the same way with excessively large Ints, except the
limit is the size of the machine word (minus a few tag bits) rather
than the size of memory.
> How is (+) a partial function with Integer?
It can fail with an out-of-memory error.
> Also note that (+) is not related to any order. It's not the same as
> the mathematical addition operation for most types. If you need such an
> operator, you would have to write a class for it first.
It's a bit of a deficiency that the relationship between (+) and
ordering isn't somehow embedded in the integer-related typeclasses.
Of course (+) itself shouldn't require order since it should work on
finite fields, complex numbers, etc.
> > > Overflows aren't conceptually impossible with Integer. Your
> > > process can run out of memory and throw an error.
> >
> > That's not an overflow, but an out of memory error. It is handled
> > properly through an exception being thrown, while a silent
> > wraparound leads to bugs, which are hard to find. You can't compare
> > these to kinds of errors. Also an out of memory error is not a bug.
> > An unexpected integer overflow is. Integer eliminates this bug.
>
> An out of memory error from an excessively large Integer is a bug if
> you expected the numbers in your program to never be larger than 10 or
> 20 bits. It's the same way with excessively large Ints, except the
> limit is the size of the machine word (minus a few tag bits) rather
> than the size of memory.
If that happens, something is seriously wrong with your arithmetic.
> > How is (+) a partial function with Integer?
>
> It can fail with an out-of-memory error.
That doesn't make it partial. It's still defined for all Integers and
would return a result in theory. Otherwise you have to consider all
functions partial, because the thread could get killed or some other
fatal error could happen. Note how this doesn't make any sense. A
function could be defined for a set of input values at one invocation,
but fail for the same in another, destroying referential transparency
and so on.
Yes, programs sometimes have serious bugs. That is not exactly news.
> > It can fail with an out-of-memory error.
> That doesn't make it partial. It's still defined for all Integers and
> would return a result in theory. Otherwise you have to consider all
> functions partial, because the thread could get killed or some other
> fatal error could happen.
Yes, serious high-availability applications have to do exactly that,
hence the features in Erlang to monitor spawned threads and restart
crashed ones, and have supervision trees to monitor and restart the
restarters, etc.
In more typical domains of programming (where Haskell is usually used)
it's ok for such an unexpected occurence to just make the program
crash, like if you take the head of an empty list. You then figure
out where the bug was, fix it, and run the program again.
Worst of all is for the program to keep running while silently
converting correct arithmetic into wrong arithmetic. That's what
Int currently does.
> >> It can fail with an out-of-memory error.
> >
> > That doesn't make it partial. It's still defined for all Integers
> > and would return a result in theory. Otherwise you have to consider
> > all functions partial, because the thread could get killed or some
> > other fatal error could happen.
>
> Yes, serious high-availability applications have to do exactly that,
> hence the features in Erlang to monitor spawned threads and restart
> crashed ones, and have supervision trees to monitor and restart the
> restarters, etc.
You can do that without viewing all functions as partial. Actually you
can't write partial functions in Haskell.
> Worst of all is for the program to keep running while silently
> converting correct arithmetic into wrong arithmetic. That's what Int
> currently does.
Yes. And I'm proposing a different solution than you.
???????
Any function that can fail to pattern match (like head on an empty
list) is partial. Any function that can throw an arithmetic exception,
like division by zero, is partial.
It's possibly a stretch from a purely mathematical point of view, but
seems fairly valid from a practical software standpoint, to say that
any function that can throw any exception at all, such as out-of-memory,
is partial.
> > Worst of all is for the program to keep running while silently
> > converting correct arithmetic into wrong arithmetic. That's what Int
> > currently does.
>
> Yes. And I'm proposing a different solution than you.
Your proposal IIRC was to use Integer, but the hassle factor (not
being able to use normal Prelude functions the normal way) and the
performance hit are excessive given the way Haskell currently works.
And I'm still unpersuaded by the logic. I don't think the concept of
Integer would be horribly perverted if there was a GHC command line
option that instructed the RTS to restrict the size of Integer values
to, say, 1 megabyte. If one of your numbers got bigger than that, the
system would signal an exception instead. Very few types of
applications would be affected since hardly any of us actually program
with numbers that large. For typical high-precision arithmetic
applications like public-key cryptography, even a 1 kilobyte limit
would usually be enough. Even still, if some unexpected input or
program bug causes a number to be larger than 1 kilobyte (or
megabyte), the program should NOT merrily keep running with incorrect
numbers.
Most programmers realize that intuitively, and in fact the typical
application wants low rather than high precision, so that 3 or 4 bytes
is enough. That happens so often that there's a special type for it,
called Int, which takes advantage of a very efficient machine
representation. Int should act like a subrange of Integer, with error
behavior if its value gets out of range. Instead it acts like
something with totally different properties.
> > You can do that without viewing all functions as partial. Actually
> > you can't write partial functions in Haskell.
>
> ???????
> Any function that can fail to pattern match (like head on an empty
> list) is partial. Any function that can throw an arithmetic exception,
> like division by zero, is partial.
> It's possibly a stretch from a purely mathematical point of view, but
> seems fairly valid from a practical software standpoint, to say that
> any function that can throw any exception at all, such as
> out-of-memory, is partial.
If f fails to pattern-match with the argument x, then it's still well
defined for it:
f x = _|_
Since _|_ is a valid inhabitant of the result type, f is defined for x.
> > > Worst of all is for the program to keep running while silently
> > > converting correct arithmetic into wrong arithmetic. That's what
> > > Int currently does.
> >
> > Yes. And I'm proposing a different solution than you.
>
> Your proposal IIRC was to use Integer, but the hassle factor (not
> being able to use normal Prelude functions the normal way) and the
> performance hit are excessive given the way Haskell currently works.
Yes, that's a good point. The base library should once be redesigned
from ground up, but unfortunately progress in this area is very slow.
> And I'm still unpersuaded by the logic. I don't think the concept of
> Integer would be horribly perverted if there was a GHC command line
> option that instructed the RTS to restrict the size of Integer values
> to, say, 1 megabyte. If one of your numbers got bigger than that, the
> system would signal an exception instead. Very few types of
> applications would be affected since hardly any of us actually program
> with numbers that large. For typical high-precision arithmetic
> applications like public-key cryptography, even a 1 kilobyte limit
> would usually be enough. Even still, if some unexpected input or
> program bug causes a number to be larger than 1 kilobyte (or
> megabyte), the program should NOT merrily keep running with incorrect
> numbers.
Well, you can restrict the heap size with a command line option. See
the RTS option -M. I know no way to hardcode that, though.
> Most programmers realize that intuitively, and in fact the typical
> application wants low rather than high precision, so that 3 or 4 bytes
> is enough. That happens so often that there's a special type for it,
> called Int, which takes advantage of a very efficient machine
> representation. Int should act like a subrange of Integer, with error
> behavior if its value gets out of range. Instead it acts like
> something with totally different properties.
If Integer were implemented properly, there would be nothing wrong with
using it instead of Int, where possible. A switch for limiting its size
would be useful, though, but that limit could be chosen arbitrarily at
least.
> Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>
>> > You can do that without viewing all functions as partial. Actually
>> > you can't write partial functions in Haskell.
>>
>> ???????
>> Any function that can fail to pattern match (like head on an empty
>> list) is partial. Any function that can throw an arithmetic exception,
>> like division by zero, is partial.
>> It's possibly a stretch from a purely mathematical point of view, but
>> seems fairly valid from a practical software standpoint, to say that
>> any function that can throw any exception at all, such as
>> out-of-memory, is partial.
>
> If f fails to pattern-match with the argument x, then it's still well
> defined for it:
>
> f x = _|_
>
> Since _|_ is a valid inhabitant of the result type, f is defined for x.
This does not make f total, though. The idea of _|_ comes from denotational
semantics, where ones associates syntactic expressions with mathematical
objects. Since the mathematical functions used as targets are total, each
type is equipped with the extra value _|_ (beyond what can be built out of
constructors) to allow partial functions to have denotations as total
functions.
_|_ turns out to be a genuinely useful value to think about in Haskell
(unlike languages like ML or Scheme), but functions that result in _|_ are
still partial (although one might not consider functions where the only
partial case is f _|_ = _|_ to be genuinely partial).
Of course, denotational semantics certainly doesn't view out of memory
errors as making (all) functions partial, since there is no memory in the
mathematical model. And quite frankly, I think it's a bit silly to view out-
of-memory conditions as making functions on Integer partial. If that's true,
then functions on Int are no less partial, as you may well be 4 (8) bytes
away from being out of memory, and even Int operations may need to allocate
new space for another Int (more than that for boxed Ints, really, since they
involve pointers, too).
Incidentally, it seems no one has mentioned this library from hackage:
http://hackage.haskell.org/package/checked
which implements both Ints and Words of all sizes that check for overflow on
all (I think) operations. It does so by going through Integer, instead of
magic hooks into the CPU, so it's probably not particularly speedy, but
that's about the best you can do in pure Haskell, and it will prevent you
from having single numbers that take up multiple gigabytes of memory if you
believe that's a real possibility for your program.
-- Dan
The ideal, according to the IEEE standard, is for it to be a programmer
choice. There are situations in which carrying on with a NaN or infinity
is the right answer, and other situations in which an exception would be
much, much better.
Unfortunately, I have yet to see a language standard that does all the
things IEEE 754-2008 says a language standard should do.
Patricia