Google Groups

Re: Clojure 1.3 treatment of integers and longs


Rich Hickey Oct 23, 2011 2:21 PM
Posted in group: Clojure
Hi all,

This reply is to the thread, not Luc specifically.

Thanks everyone for your feedback and input.

I have pushed 3 commits:

1) Fixes the inconsistency between the hash function used by Clojure maps (was .hashCode) and =. Thanks Paul for the report.

2) Changes core/hash to return the result of this hashing function. Thus, it returns a different value than does .hashCode for Integers, Shorts, Bytes and Clojure collections. Feedback welcome.

3) Only due to the first fix, it now becomes possible to box ints to Integers without much grief. This commit implements that for evaluation purposes, and is not a commitment to that policy. Note well that while in the first commit the answer is clear, on this point there is always going to be a tradeoff and there is no 'right' answer.

Here are the issues as I see them:

First, note there is no 'following' of Java semantics as an objective. Java semantics are that Integers are never equal to Longs, and I presume no one wants to go back to that.

Second, boxing is a change of type, period. There is no valid complaint that 'you changed my type'. int != Integer either.

Third, there are 2 scenarios in consuming things you box in Clojure from Java:

a) You control the Java. In this case, having Clojure make everything uniform (Longs) make things easier for you. There is no heterogeneousness regardless of the source or manipulation of numbers, and can always expect Longs.

b) You don't control the Java. In this case you must match consuming expectations i.e. conforming to Java promotion, types of generics etc. ***This will *always* require vigilance and explicitness due to arithmetic conversions etc***. Auto promotion is only one part. Note that this is true in Java as well - while type checker may scold you, you still have to cast/coerce on mismatch.
 
Even with the auto box change, you are only an arithmetic operation away from having the problem again. For instance in the original report, wrapping .getValue with dec generates an interop mismatch again:

(let [amap {1 (dec (.getValue obj))}] …)

There is no way we are going to 'fix' that by adopting Java's numeric tower, which is dangerous and requires static types. The bottom line is specific type requirements on the Java side require explicit boxing on order to have correct and non-brittle code.

The final consideration is collection equality. When Clojure autoboxes to Longs, you get homogeneous collection contents, and thus .equals is still true for the collection on the Java side,  vs random - 'depends on where I got the contents from and what I did with them'.

FYI - there are the RT/box functions that box as per Java. These could be exposed in Clojure.

-----
In short, having autoboxing match Java does not really free you from your responsibility to create specific boxed types when you need them on the Java side. I.e., Clojure can't help you.

On the flip side, when you are in charge of the Java code, Clojure's being more consistent makes things more consistent on the other side and *does* give you less to do to make sure things work.

I prefer what we had (auto box to Longs), but I think it matters a somewhat less now with = consistent hashing. If we decide to revert to that we can discuss making auto boxing of short and byte consistent.
-----

In any case, those of you who still know how to use Clojure from Git can try these commits, and please provide feedback as to its actual effects on actual code. I think the opinion phase of this is now over :)

Thanks again for the feedback,

Rich

1) https://github.com/clojure/clojure/commit/b5f5ba2e15dc2f20e14e05141f7de7c6a3d91179
2) https://github.com/clojure/clojure/commit/b4a2216d78173bb81597f267b6025c74a508bd03
3) https://github.com/clojure/clojure/commit/a2e4d1b4eaa6dad26a1a96b9e9af129cccca9d10

On Oct 23, 2011, at 4:01 PM, Luc Prefontaine wrote:

>
>
> On Sun, 23 Oct 2011 20:31:51 +0200
> Ivan Koblik <ivank...@gmail.com> wrote:
>
>> Hello Luc,
>>
>> In all fairness I don't see how converting ints to Integers returned
>> by class methods would break the abstraction. If you start talking
>> about portability of Clojure code, then Long is as portable as
>> Integer is. (In general they are not.)
>
> It's simpler to use one representation to port the core. You can choose the
> fastest/efficient one. You do not have to carry all these intermediate types
> with you.
>
> The day a 128 bits primitive type become available, there's little changes to do to support
> that. If you keep mixed types, that adds another one to the babel tower.
>
> The problem is not to choose between ints or longs, it has to do with carrying
> all these intermediate types. Frankly aside from interop, how many are using
> short ints in Clojure ? That's a leftover from the PDP-11 era.
>
>>
>> Could you explain your position on the fact that shorts get converted
>> to Short? Why is it not possible to do the same for ints?
>
> This should disappear. I think all the small primitive types including ints
> should be promoted to long except when doing an interop call.
> Rich can explain why it's been kept. Maybe a question of priority/effort
> or something else.
>
>>
>> I don't think that there was anyone in this thread that would suggest
>> keeping 32bit math in Clojure. For what it's worth, Integer can be
>> converted to Long first time it is used in any computation.
>>
>
> That is unnecessary overhead, again lets split boxed values from primitive types.
> If you compute in Clojure, keeping primitive ints/shorts/bytes around has no value.
> You end up having type conversion to do depending on what is specified in the
> expression.
>
> When doing an interop call, this is when you need to be specific. Elsewhere
> I see no value in keeping this scheme.
>
> This way of thinking about primitive types has been sticking around for at least
> 35 years carrying 64/32/16/8 bit unsigned/signed int values. Maybe it's time we toss this away.
>
> I have been writing a couple of hundred thousand lines of assembly code in my
> professional life and I understand this model. Of course when you deal with
> hardware in a device driver you need these things, but in Clojure ?
>
> And with today's hardware, why stick with these data types ? To reduce memory footprint ?
> Ha ! Ha !, I used to work on computers with 256K of physical memory.
> This concern was legitimate in this prehistoric era. But today ?
>
> If you need bit manipulation in Clojure, better write a lib for this than mangling with
> these data types.
>
>> Cheers,
>> Ivan.
>>
>>
>> On 23 October 2011 17:16, Luc Prefontaine
>> <lprefo...@softaddicts.ca>wrote:
>>
>>> CON1 - I'm buying your argumentation about consistency in Clojure
>>> maps and fixing them. Integer OBJECTS (as opposed to int primitive)
>>> should be handle as objects consistenly, not as primitive values
>>> promoted to long.
>>>
>>> CON2, CON3 and CON4 - No way, the current design choice is the good
>>> one.
>>>
>>> So many languages have been plagued with numbers of different
>>> sizes/formats for ints and floating point values,
>>> it's not a direction that Clojure should follow.
>>> These distinct types are source of many problems (overflow handling,
>>> precision problems, ...).
>>>
>>> The need for Clojure to support these things is similar to calling
>>> assembler
>>> from C. You matter about bytes, shorts and similar things at the
>>> frontier, when it's time to call a low level service, you need to
>>> be able to pass these values.
>>>
>>> By no means this implies that you have to support them in your
>>> language runtime.
>>> It complects (;) everything including computations and makes your
>>> runtime much more harder to port.
>>>
>>> It's an interop centric thing and interop is by essence not
>>> portable. It does not belong to the core of Clojure. It's better to
>>> rely on cast operators
>>> to call interop than to expect Clojure to box numeric values
>>> according to some interop
>>> convention that may vary according to the platform Clojure runs on.
>>>
>>> Luc P.
>>>
>>> On Sun, 23 Oct 2011 07:19:41 -0400
>>> Paul Stadig <pa...@stadig.name> wrote:
>>>
>>>> On Sat, Oct 22, 2011 at 5:51 PM, Stuart Halloway
>>>> <stuart....@gmail.com>wrote:
>>>>
>>>>> I am dropping off this thread now.  At this point I think it
>>>>> would be more useful for me (or someone) to expand the notes
>>>>> about numerics into better documentation, rather than
>>>>> continuing this rambling point-by-point treatment without
>>>>> getting all of the considerations into play at once. I hope to
>>>>> get that done by conj.
>>>>
>>>>
>>>> So you are still thinking that the current behavior is OK and just
>>>> needs to be documented better? Or are you saying that we need to
>>>> collect the various pros and cons to decide whether the current
>>>> behavior should change or remain the same?
>>>>
>>>> Having reviewed the thread there is lots of confusion, but from
>>>> the points made it seems clear to me that the behavior should
>>>> change.
>>>>
>>>> CON (The "we should box ints as Longs" (or "we should keep things
>>>> as they are") camp):
>>>> 1) If we box ints as Integers it will break Clojure's collections
>>>> (Stu Halloway)
>>>> 2) Boxing ints as Integers would make Clojure's design
>>>> inconsistent (David Nolen)
>>>> 3) Clojure now only has 64-bit primitives (David Nolen/Kevin
>>>> Downey) 4) If 32-bit ints are allowed to exist, the Clojure's
>>>> numeric operators would have to handle them (David Nolen)
>>>>
>>>> CON1 is a bug in PersistentHashMap, and I opened a Jira bug for
>>>> it ( http://dev.clojure.org/jira/browse/CLJ-861).
>>>> CON2 is false. The way primitives are boxed for interop doesn't
>>>> and shouldn't have any effect on Clojure's design as such. This
>>>> is a discussion about interop consistency, and if you look at the
>>>> PRO section you will see Clojure is already inconsistent with
>>>> respect to interop. Nathan and others are arguing that it should
>>>> be made consistent. CON3 is false. 32-bit primitives do exist in
>>>> Clojure (at least Java Clojure), they are just not the optimized
>>>> case. They may get immediately converted to longs or boxed in
>>>> some way, but we cannot deny their existence, especially around
>>>> interop. CON4 Again, 32-bit integers do exist, and are already
>>>> handled by the numeric operators. When you compile a function
>>>> with primitive args, Clojure also generates a method that takes
>>>> Objects. If you pass in anything other than a long it gets boxed,
>>>> cast to a java.lang.Number, has its longValue method called, and
>>>> that value gets passed to the primitive arg version. This is slow
>>>> (as expected) because you are not using the optimized case
>>>> (64-bit primitives). Absolutely none of that would have to
>>>> change/get slower because ints were boxed as Integers instead of
>>>> Longs.
>>>>
>>>> I think the problem with all of these CONs is that they confuse
>>>> boxing for interop with either a bug in PersistentHashMap, or fast
>>>> primitive maths, and neither of those has anything to do with how
>>>> ints are boxed.
>>>>
>>>> PRO (The "we should box ints as Integers" camp):
>>>> 1) Clojure is inconsistent in how it boxes primitive data (Chris
>>>> Perkins) Clojure 1.3:
>>>>
>>>> (class (Long/parseLong "1"))  =>  java.lang.Long
>>>> (class (Integer/parseInt "1"))  =>  java.lang.Long
>>>> (class (Short/parseShort "1"))  =>  java.lang.Short
>>>> (class (Byte/parseByte "1"))  =>  java.lang.Byte
>>>> (class (Float/parseFloat "1"))  =>  java.lang.Float
>>>> (class (Double/parseDouble "1"))  =>  java.lang.Double
>>>>
>>>>
>>>> Paul
>>>>
>>>
>>>
>>>
>>> --
>>> Luc P.
>>>
>>> ================
>>> The rabid Muppet
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient
>>> with your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>>
>>
>
>
>
> --
> Luc P.
>
> ================
> The rabid Muppet
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en