idea: Long ≠ Int64

12 views
Skip to first unread message

Stefan Karpinski

unread,
Dec 15, 2011, 7:51:02 PM12/15/11
to Julia Dev
This commit got me thinking (again) about the problem of integer literals like 1 and 0 "accidentally" forcing everything to be Int64. The solution of using one(x) and zero(x) everywhere has never been a very pleasant one, and it's easy to forget. Here's a new thought...

When you write int32(1) or uint8(1) — or 0x01 — you are saying that you care about the specific representation of the number 1 — you want signed 32-bits or an unsigned byte. When you just write 1, on the other hand, you don't really care how it's represented. In fact, since 1 means different things on 32- and 64-bit systems, if you just write 1, you specifically *don't* care about the representation size, because if you did, you'd have to write int64(1) or int32(1) in order to ensure the correct representation size.

So what if we distinguish between 1 where you don't care about the representation and int64(1) or int32(1) where you do care about the representation? In other words, instead of having Long be an alias for Int64 or Int32, make it a different integer type that happens to be the same size as Int64 or Int32 and largely behave the same way. Except that when promoting a Long together with another value that isn't a Long, the type of the non-Long determines the result. After all, someone at some point specifically indicated that the other value should use that representation, so we should respect it. This would allow you to just use 1 and 0 literals without causing unwanted promotion problems. I.e., we could stop using one(x) and zero(x) everywhere and just use 1 and 0.

This would even works nicely with unsigned integer types. With this proposal you would get the following:

julia> uint8(0)-1
0xff

julia> uint32(0)-1
0xffffffff

julia> uint64(0)-1
0xffffffffffffffff

These are all very reasonable answers in the respective arithmetics of Uint8, Uint32, and Uint64. Currently, you get this on a 64-bit machine:

julia> uint8(0)-1
-1

julia> typeof(ans)
Int64

julia> uint32(0)-1
-1

julia> typeof(ans)
Int64

And on a 32-bit machine you get this:

julia> uint8(0)-int32(1)
-1

julia> typeof(ans)
Int32

julia> uint32(0)-int32(1)
-1

julia> typeof(ans)
Int64

julia> uint64(0)-int32(1)
0xffffffffffffffff

This seems like kind of a shit show, not to mention the fact that these answers were different yesterday. Note that under this proposal, expressions like uint32(0)-int32(1) would still promote the same way they currently do — but it wouldn't depend at all on the machine type since this expression has explicitly specified integer types.

The main difference between this proposal and an IntLiteral approach is that *most* integer computation *should* be done with Longs, not with Int64s or Int32s — after all, most of the time, we don't really care what the actual representation type is, we just want some reasonable representation of an integer on the current machine. It's actually only when some computation specifically asks for a different representation of an integer that one would get anything besides a Long.

Stefan Karpinski

unread,
Dec 15, 2011, 7:54:08 PM12/15/11
to Julia Dev
The unsigned int examples should read:

Currently, you get this on a 64-bit machine:

julia> uint8(0)-1
-1

julia> typeof(ans)
Int64

julia> uint32(0)-1
-1

julia> typeof(ans)
Int64

julia> uint64(0)-1
0xffffffffffffffff

And on a 32-bit machine you get this:

julia> uint8(0)-1
-1

julia> typeof(ans)
Int32

julia> uint32(0)-1
-1

julia> typeof(ans)
Int64

julia> uint64(0)-1
0xffffffffffffffff

Stefan Karpinski

unread,
Dec 15, 2011, 7:55:22 PM12/15/11
to Julia Dev
I also think that this approach would increase overall type-stability.

Stefan Karpinski

unread,
Dec 15, 2011, 8:08:19 PM12/15/11
to Julia Dev
Incidentally, I also think that -0x01 should be 0xff. Under this proposal 0-0x01 would also give the same thing. Currently, -0x01 is -1::Int16 and 0-0x01 is -1::Int64 on a 64-bit machine and -1::Int32 on a 32-bit machine.

Stefan Karpinski

unread,
Dec 15, 2011, 8:12:41 PM12/15/11
to Julia Dev
Ulong would be similar except that it means "I want an integer, I don't care how you really represent it, but I want it to be unsigned." Since the unsigned part is explicitly requested, it should be respected, thereby justifying the rule that promoting Long and Ulong should give you a Ulong. That's why the rule instituted by the aforementioned commit makes sense.

Jeff Bezanson

unread,
Dec 15, 2011, 9:23:03 PM12/15/11
to juli...@googlegroups.com
C seems to give these:

(gdb) p (short)0 - (unsigned short)1
$20 = -1
(gdb) p (int)0 - (unsigned int)1
$24 = 4294967295
(gdb) p (long long)0 - (unsigned int)1
$27 = -1

which is a little weird.

We should work out the promotion behavior of the sized integer types
first. Should we always preserve width, or try to give correct answers
by promoting to a larger type when possible?
Negation should be easy since it's unary. Maybe negation should always
preserve the type of its argument? But there is something to be said
for trying to give the numerically correct answer if possible. That's
what's going on in the alleged shit show; giving the right answer if
possible, otherwise doing something else.
This stuff:

julia> uint32(0)-int32(1)
-1

julia> uint64(0)-int32(1)
0xffffffffffffffff

will of course not change if Long!=Int64. Separate issue.

Random question: should we promote numbers to matrices so that "a\1"
inverts a matrix?

Unfortunately one() and zero() will still be necessary, in cases like this:

den(x::Int) = one(x)

But I guess they would be needed in fewer cases. Actually several of
the current uses of one and zero are not necessary.

In a case like this:

f = one(n)
for i = 2:n
f *= i
end

the one() is helpful since otherwise the type of f could change during the loop.

Otherwise the Long idea seems pretty good. It'd be nice if it had a
different name. Maybe we should call it Int and make Integer the
abstract type. Several times people have seen Int and thought of the C
type, which is a reasonable type to use, unlike our current Int type.

Stefan Karpinski

unread,
Dec 15, 2011, 9:40:35 PM12/15/11
to juli...@googlegroups.com
C seems to give these:

(gdb) p (short)0 - (unsigned short)1
$20 = -1
(gdb) p (int)0 - (unsigned int)1
$24 = 4294967295
(gdb) p (long long)0 - (unsigned int)1
$27 = -1

which is a little weird.

C is not necessarily the language to emulate here :-). Then again, I'm not sure that anyone has actually gotten these "right".
 
We should work out the promotion behavior of the sized integer types
first. Should we always preserve width, or try to give correct answers
by promoting to a larger type when possible?

I would argue that for specified integer types (i.e not Long in this proposal), we should do what we're doing now and promote to a type that's large enough to represent all values. The potentially lossy cases are (Uint64,Int64), (Float64,Int64), (Float64,Uint64). For the latter two, we should obviously map ints to the nearest representable float, losing some precision, but generally remaining accurate. For the unsigned-signed case, we can just not define a promotion perhaps.
 
Negation should be easy since it's unary. Maybe negation should always
preserve the type of its argument? But there is something to be said
for trying to give the numerically correct answer if possible. That's
what's going on in the alleged shit show; giving the right answer if
possible, otherwise doing something else.

I'm generally for viewing unsigned integer arithmetic as modular. Seems like the sanest approach. That means that -0x01 should be 0xff. If someone knows enough to be using unsigned integers, this is what they want. If they didn't specify a type, we use Longs and give then a "best effort" answer.

This stuff:

julia> uint32(0)-int32(1)
-1

julia> uint64(0)-int32(1)
0xffffffffffffffff

will of course not change if Long!=Int64. Separate issue.

What I meant to write was uint32(0)-1 and uint64(0)-1 — I was using int32(1) on my machine to emulate what happens on a 32-bit machine. Yes, if the types are explicitly requested, the Long≠Int64 thing will have no effect on it. Which is good.

This Long≠Int64 approach has the nice property that any integer computation that never requests a specific type always ends up with a Long and if any specific types are requested, regardless of mixing with Longs, the answer is the same regardless of 32- or 64-bit machine type.
 
Random question: should we promote numbers to matrices so that "a\1"
inverts a matrix?

I wouldn't mind that at all. It might also be really nice if you could write things like:

[A 1
 1  B]

and have the 1s be promoted to correctly sized matrices. That is standard mathematical notation, after all. Writing ones(size(A,1),size(B,2)) and ones(size(A,2),size(B,1)) is such a huge hassle.

Unfortunately one() and zero() will still be necessary, in cases like this:

den(x::Int) = one(x)

But I guess they would be needed in fewer cases. Actually several of
the current uses of one and zero are not necessary.

In a case like this:

   f = one(n)
   for i = 2:n
       f *= i
   end

the one() is helpful since otherwise the type of f could change during the loop.

Otherwise the Long idea seems pretty good. It'd be nice if it had a
different name. Maybe we should call it Int and make Integer the
abstract type. Several times people have seen Int and thought of the C
type, which is a reasonable type to use, unlike our current Int type.

I actually think that's a very good idea. I like Int as "a concrete integer type that I don't really care about the exact size of". Integer is good for the Integer abstract type also. We might eventually want to distinguish between fixed-size integer types and, say, BigInts, but that's another bridge, and we can burn it when we get there. We could always have something like FixedSizeInteger <: Integer or whatever.

Stefan Karpinski

unread,
Dec 15, 2011, 9:52:23 PM12/15/11
to juli...@googlegroups.com
Yes, we will definitely still need one(x) and zero(x) in some places, but far fewer. The most important thing is that the chances of someone writing code like x+1 that *always* produces an Int64 and therefore isn't as polymorphic as it is intended to be go way down.

Speaking of which, we once discussed how `++x` should probably mean `x += one(x)`, but we never implement it. Do we have any interest in that, or are we just passing on it? I guess that `x++` could mean `t = x; x += one(x); t` or something like that.

Stefan Karpinski

unread,
Dec 15, 2011, 10:20:44 PM12/15/11
to juli...@googlegroups.com
Also, suppose someone defined `den(x::Int) = 1`. With this approach, this is really not a big deal since that 1 will behave nicely with whatever other values you combine it with. So one(x) is more correct, but 1 is not really wrong either.
Reply all
Reply to author
Forward
0 new messages