Clojure 1.3 treatment of integers and longs

2,010 views
Skip to first unread message

nathanmarz

unread,
Oct 18, 2011, 5:00:50 PM10/18/11
to Clojure
Hey all,

I recently started upgrading Storm to Clojure 1.3, and I ran into
various issues due to Clojure's treatment of integers and longs. In
particular, I have a situation like the following:

1. A Java object returns me an int. Let's call this value "v".
2. I put "v" into a map, and pass that map into a Java object
3. I get ClassCastExceptions when that Java object tries to read that
Integer and instead gets a Long back

The error arises due to Clojure's auto-coercion of primitive ints to
longs.

Auto-coercing ints to longs is prone to errors like I ran into,
especially when interoperating with Java code. It becomes especially
confusing when considering that "Integer" objects do not get coerced
to "Long" objects. Also, if Clojure is trying to treat everything as
longs, I don't understand why there's an unchecked-divide-int function
and not an unchecked-divide-long function.

What's the rationale behind all this? Why not support both ints and
longs? I'm sure this has been discussed before, so feel free to point
me to earlier discussions on this.

-Nathan

David Nolen

unread,
Oct 18, 2011, 5:25:30 PM10/18/11
to clo...@googlegroups.com
233 messages long thread from June 2010, http://groups.google.com/group/clojure/browse_thread/thread/c8c850595c91cc11/171cacba292a0583

David


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

nathanmarz

unread,
Oct 18, 2011, 10:45:42 PM10/18/11
to Clojure
Thanks. I read through that and it didn't quite answer my question. To
me it seems more logical that:

1. Clojure defaults to longs when you create a new number (with a
literal 0, 1, etc)
2. You can create ints by doing (int 0)
3. Clojure never changes the types of things you're using

I find Clojure's behavior of changing the types of primitive ints to
longs highly unusual, and it is causing a lot of pain in upgrading
Storm to 1.3.

I don't mean to imply that the design choices made were wrong; I know
that Rich et al put a lot of thought and work into these changes. But
I would like to understand why changing the types of ints to longs is
necessary instead of supporting primitive ints as well.

-Nathan


On Oct 18, 2:25 pm, David Nolen <dnolen.li...@gmail.com> wrote:
> 233 messages long thread from June 2010,http://groups.google.com/group/clojure/browse_thread/thread/c8c850595...

Kevin Downey

unread,
Oct 19, 2011, 12:19:57 AM10/19/11
to clo...@googlegroups.com
On Tue, Oct 18, 2011 at 7:45 PM, nathanmarz <natha...@gmail.com> wrote:
> Thanks. I read through that and it didn't quite answer my question. To
> me it seems more logical that:
>
> 1. Clojure defaults to longs when you create a new number (with a
> literal 0, 1, etc)
> 2. You can create ints by doing (int 0)
> 3. Clojure never changes the types of things you're using

I think you'll find that clojure doesn't change types, except where
required, mostly for boxing. Clojure 1.2 would construct a new Integer
around an int when required. Clojure 1.3 constructs a new Long around
an int instead, because rich has decided he prefers longs and doubles
to ints and floats. If you want to do your own boxing prior to using a
value in a way that would box it, you can, and your type will not
"change"

user> (def boxed-by-clojure (.intValue 3))
#'user/boxed-by-clojure
user> (type boxed-by-clojure)
java.lang.Long
user> (def boxed-by-me (Integer. (.intValue 3)))
#'user/boxed-by-me
user> (type boxed-by-me)
java.lang.Integer
user>

--
And what is good, Phaedrus,
And what is not good—
Need we ask anyone to tell us these things?

Stuart Halloway

unread,
Oct 19, 2011, 10:38:56 AM10/19/11
to clo...@googlegroups.com
Thanks. I read through that and it didn't quite answer my question. To
me it seems more logical that:

1. Clojure defaults to longs when you create a new number (with a
literal 0, 1, etc)
2. You can create ints by doing (int 0)
3. Clojure never changes the types of things you're using

I find Clojure's behavior of changing the types of primitive ints to
longs highly unusual, and it is causing a lot of pain in upgrading
Storm to 1.3.

Integers and longs are going to be painful no matter what because they are broken in Java, e.g.

        Object[] objects = new Object[] {-1, -1L};
        System.out.println(objects[0].hashCode());
        System.out.println(objects[1].hashCode());

Clojure avoids this pit by standardizing on longs, which leaves you with the need to specifically request ints when you need them for interop. You can use (int n) hints to select the correct interop method invocation, or box an int if you want to hold on to a value guaranteed to be int.

Can you post a code example that shows a problem you are having?

Stu


Stuart Halloway
Clojure/core
http://clojure.com

nathanmarz

unread,
Oct 19, 2011, 8:14:24 PM10/19/11
to Clojure
Here's a code example illustrating the problem I'm having:
https://gist.github.com/1300034 I've simplified it to the bare minimum
necessary to illustrate the problem.

Agree 100% that ints and longs are broken in Java. The hashcode/
equality stuff is messed up. Clojure can try really hard to hide this,
but it can't hide it completely since Java libraries can always return
you Integer objects. The additional complexity added from changing the
types on you isn't worth it IMO.

Here's my proposal for what I think would be better behavior:

1. Clojure boxes ints into Integers rather than convert them into
longs
2. If you wrap the form in "(long ...)", Clojure skips the boxing and
does what it does now. Since this is done explicitly, there's no
confusion about types.

-Nathan

Kevin Downey

unread,
Oct 19, 2011, 8:29:22 PM10/19/11
to clo...@googlegroups.com
On Wed, Oct 19, 2011 at 5:14 PM, nathanmarz <natha...@gmail.com> wrote:
> Here's a code example illustrating the problem I'm having:
> https://gist.github.com/1300034 I've simplified it to the bare minimum
> necessary to illustrate the problem.
>
> Agree 100% that ints and longs are broken in Java. The hashcode/
> equality stuff is messed up. Clojure can try really hard to hide this,
> but it can't hide it completely since Java libraries can always return
> you Integer objects. The additional complexity added from changing the

Existing Integer objects are unchanged. If the method call in your
example did return an Integer object then you would never notice
anything.

> types on you isn't worth it IMO.
>
> Here's my proposal for what I think would be better behavior:
>
> 1. Clojure boxes ints into Integers rather than convert them into
> longs

clojure does not convert ints into longs, you are putting a primitive
into a collection, which requires boxing, clojure 1.3 boxes ints as
Longs. If you put (Integer. …) around your call you will be fine.

> 2. If you wrap the form in "(long ...)", Clojure skips the boxing and
> does what it does now. Since this is done explicitly, there's no
> confusion about types.
>
> -Nathan
>
>
> On Oct 19, 7:38 am, Stuart Halloway <stuart.hallo...@gmail.com> wrote:
>> > Thanks. I read through that and it didn't quite answer my question. To
>> > me it seems more logical that:
>>
>> > 1. Clojure defaults to longs when you create a new number (with a
>> > literal 0, 1, etc)
>> > 2. You can create ints by doing (int 0)
>> > 3. Clojure never changes the types of things you're using
>>
>> > I find Clojure's behavior of changing the types of primitive ints to
>> > longs highly unusual, and it is causing a lot of pain in upgrading
>> > Storm to 1.3.
>>
>> Integers and longs are going to be painful no matter what because they are broken in Java, e.g.
>>
>>         Object[] objects = new Object[] {-1, -1L};
>>         System.out.println(objects[0].hashCode());
>>         System.out.println(objects[1].hashCode());
>>
>> Clojure avoids this pit by standardizing on longs, which leaves you with the need to specifically request ints when you need them for interop. You can use (int n) hints to select the correct interop method invocation, or box an int if you want to hold on to a value guaranteed to be int.
>>
>> Can you post a code example that shows a problem you are having?
>>
>> Stu
>>
>> Stuart Halloway
>> Clojure/corehttp://clojure.com
>

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

--

nathanmarz

unread,
Oct 20, 2011, 4:54:17 AM10/20/11
to Clojure
Yes, I understand the behavior perfectly well. The primitive int gets
converted to a Long immediately, as this code demonstrates:

user=> (class (Integer/parseInt "5"))
java.lang.Long

The int isn't being "boxed" into a Long -- the type is being changed.

I'm aware that I can "fix" things by converting the type back to an
Integer manually, but that's not the point. Changing the types is
unusual behavior that leads to hard to track down bugs like I ran into
with the ClassCastException. My proposal still stands.

-Nathan


On Oct 19, 5:29 pm, Kevin Downey <redc...@gmail.com> wrote:
> On Wed, Oct 19, 2011 at 5:14 PM, nathanmarz <nathan.m...@gmail.com> wrote:
> > Here's a code example illustrating the problem I'm having:
> >https://gist.github.com/1300034I've simplified it to the bare minimum

Stuart Halloway

unread,
Oct 20, 2011, 9:00:23 AM10/20/11
to clo...@googlegroups.com
> Yes, I understand the behavior perfectly well. The primitive int gets
> converted to a Long immediately, as this code demonstrates:
>
> user=> (class (Integer/parseInt "5"))
> java.lang.Long
>
> The int isn't being "boxed" into a Long -- the type is being changed.
>
> I'm aware that I can "fix" things by converting the type back to an
> Integer manually, but that's not the point. Changing the types is
> unusual behavior that leads to hard to track down bugs like I ran into
> with the ClassCastException. My proposal still stands.
>
> -Nathan

Somebody has to work hard: either users of collections, or interop callers. The current behavior makes things "just work" for collections, at the cost of having to be explicit for some interop scenarios.

There are two reasons to favor collection users over interop users:

(1) Interop problems are local, and can be resolved by checking the type signature at the point of the problem. Collection key problems are global and break the composability of collections. It is a *huge* benefit of Clojure that collections are sane.

(2) There are a lot more lines of code working with collections than doing interop.

Stu

Luc Prefontaine

unread,
Oct 20, 2011, 9:41:12 AM10/20/11
to clo...@googlegroups.com
We still have a sizable Java chunk here closely interacting with Clojure
and fully agree with #1 and #2.

Interop is environment specific and should not be driving the Clojure language design.
Otherwise Clojure generics would have to "bend" to Java, CLR, JS and future implementations in
other environments, loosing its identity along the way and creating a Babel tower.

Luc P.

--
Luc P.

================
The rabid Muppet

Kevin Downey

unread,
Oct 20, 2011, 12:36:13 PM10/20/11
to clo...@googlegroups.com
On Thu, Oct 20, 2011 at 1:54 AM, nathanmarz <natha...@gmail.com> wrote:
> Yes, I understand the behavior perfectly well. The primitive int gets
> converted to a Long immediately, as this code demonstrates:
>
> user=> (class (Integer/parseInt "5"))
> java.lang.Long

class is a clojure function that takes Objects, so the int must be boxed.

Justin Kramer

unread,
Oct 20, 2011, 1:13:03 PM10/20/11
to clo...@googlegroups.com
Here's a quick proof using an interface-based primitive detector:

(definterface IPrimitiveTester
  (getType [^int x])
  (getType [^long x])
  ;; other types elided
  )

(deftype PrimitiveTester []
  IPrimitiveTester
  (getType [this ^int x] :int)
  (getType [this ^long x] :long)
  ;; other types elided
  )

(defmacro primitive-type [x]
  `(.getType (PrimitiveTester.) ~x))

(comment

  user=> (primitive-type 5) ;unboxed
  :long
  user=> (primitive-type (Integer/parseInt "5")) ;unboxed
  :int
  user=> (class (Integer/parseInt "5")) ;boxed
  java.lang.Long

  )

Justin

Justin Kramer

unread,
Oct 20, 2011, 1:19:37 PM10/20/11
to clo...@googlegroups.com
Oops, I elided a little too much. Need a method with an Object signature to distinguish Integer from int:

(definterface IPrimitiveTester
  (getType [^int x])
  (getType [^long x])
  ;; etc
  (getType [^Object x]))

(deftype PrimitiveTester []
  IPrimitiveTester
  (getType [this ^int x] :int)
  (getType [this ^long x] :long)
  ;; etc
  (getType [this ^Object x] :object))

(defmacro primitive-type [x]
  `(.getType (PrimitiveTester.) ~x))

(comment

  user=> (primitive-type (Integer. 5))       
  :object
  user=> (primitive-type (Integer/parseInt "5"))
  :int

  )

nathanmarz

unread,
Oct 20, 2011, 3:15:25 PM10/20/11
to Clojure
Thanks, that clarifies the behavior. Regardless though, at some point
the "int" is becoming a "Long" which is a change of type. I'm arguing
that Clojure should box primitive ints as Longs.

Stu, I wouldn't say Clojure's behavior makes it "just work". For
example, if I obtained by number using Integer/valueOf, then Clojure
will not change the Integer to a Long and will not prevent me from
putting it in a collection. It's confusing that Integer/valueOf will
stay an Integer in Clojure-land, and Integer/parseInt will become a
Long in Clojure-land.

The use case I'm interested in here is just this one point of Java
interop: what Clojure does with primitive ints that it gets from a
Java object (as far as I can tell, this is the only way to get a
primitive int in Clojure 1.3). I think it's better that Clojure be
consistent in its treatment of Integer objects and primitive ints by
not changing the types on you.

-Nathan

nathanmarz

unread,
Oct 20, 2011, 3:16:04 PM10/20/11
to Clojure
Oops, I meant "Clojure should box primitive ints as Integers." :-)

David Nolen

unread,
Oct 20, 2011, 3:26:39 PM10/20/11
to clo...@googlegroups.com
Such a change would be make Clojure's numeric design inconsistent. You keep saying that Clojure is changing the types - that's probably not the right way to look at it.

It's a semantic change, Clojure now only has 64bit primitives - the same way that JavaScript only has doubles.

Prior to the 1.3 change, the semantics gave you a free lunch around primitive ints in the interop scenario. Now you have be explicit just as you do with pretty much any kind of Java interop.

David

nathanmarz

unread,
Oct 20, 2011, 3:45:03 PM10/20/11
to Clojure
But Clojure is already inconsistent. ints and Integers in interop are
treated differently. The only way to make Clojure consistent is to
either:

1. Box ints as Integers
2. Always convert Integers to Longs.

I'm not sure on the feasibility of #2.

I'm not trying to be obtuse, but I really don't see the benefit of
boxing primitive ints as Longs given how Integer objects are treated.
Right now, if you obtain an Integer object via interop and want it to
be compatible with Clojure's regular numerics, you still have to
manually convert that Integer object into a Long. What I'm proposing
is that you treat primitive ints obtained via interop the exact same
way, which avoids the weird type issues that I ran into.

-Nathan

David Nolen

unread,
Oct 20, 2011, 3:50:13 PM10/20/11
to clo...@googlegroups.com
On Thu, Oct 20, 2011 at 3:45 PM, nathanmarz <natha...@gmail.com> wrote:
But Clojure is already inconsistent. ints and Integers in interop are
treated differently. The only way to make Clojure consistent is to
either:

Clojure is consistent. Whether or not that makes *interop* easier or harder is orthogonal.

You do know that Clojure now supports primitive args and return, right? How is what you proposing going to be reconciled with that?

David

Kevin Downey

unread,
Oct 20, 2011, 3:51:58 PM10/20/11
to clo...@googlegroups.com
On Thu, Oct 20, 2011 at 12:45 PM, nathanmarz <natha...@gmail.com> wrote:
> But Clojure is already inconsistent. ints and Integers in interop are
> treated differently. The only way to make Clojure consistent is to
> either:

as David said "Clojure now only has 64bit primitives".

an Integer is not a primitive, an int is.

--

nathanmarz

unread,
Oct 20, 2011, 4:11:40 PM10/20/11
to Clojure
I'm not sure we're arguing about the same thing. I think that Clojure
only supporting 64 bit primitive arithmetic is fine, and I'm not
proposing that it support 32 bit primitive arithmetic. The sole point
of contention is what Clojure does when it has to box a primitive int.
I think this is orthogonal to primitive args/return, but correct me if
I'm wrong.

Right now, it boxes ints as a Long, which I think is changing the
type. My proposal is that it box ints as Integer objects. Would
changing the behavior in this way cause a fundamental performance
limitation in Clojure?

-Nathan




On Oct 20, 12:50 pm, David Nolen <dnolen.li...@gmail.com> wrote:

Sean Corfield

unread,
Oct 20, 2011, 9:14:43 PM10/20/11
to clo...@googlegroups.com
On Thu, Oct 20, 2011 at 1:11 PM, nathanmarz <natha...@gmail.com> wrote:
> of contention is what Clojure does when it has to box a primitive int.

My understanding is that Clojure 1.3 has 64-bit primitives, i.e.,
longs and double. You only have a primitive int if you coerce the
value to int (for an interop call that expects an int) - based on what
I've understood of the numerics discussions. Similarly, you only have
a primitive float if you coerce the value.

So Clojure boxes a long as Long. If you want to box a long as Integer,
you have to explicitly say so: (Integer. 42) - and Clojure will give
you an Integer and not do anything to it.

(Is my understanding correct? I'm finding the discussion interesting
but not 100% sure whether I fully understand Clojure 1.3's primitive
numerics)
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/
Railo Technologies, Inc. -- http://www.getrailo.com/

"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)

Luc Prefontaine

unread,
Oct 20, 2011, 9:16:36 PM10/20/11
to clo...@googlegroups.com
So you propose this:

user=> (time (dotimes [i 10000000] (let [ a (Integer. 1) b (Integer. 2)] (+ a b))))
"Elapsed time: 31.14886 msecs"
nil

Instead of this:

user=> (time (dotimes [i 10000000] (let [ a 1 b 2] (+ a b))))
"Elapsed time: 15.680386 msecs"
nil

Using a wrapper instead of a primitive type as a significant cost in a computation.

One of the purpose of normalizing to 64 bits was to get maximum performance for
compute bound Clojure applications.

Computing with wrappers is inefficient. Your proposal looks only at one facet
of the whole problem.

It's not a Java centric issue, it's a Clojure performance enhancement.

You are coding in Clojure, not in Java. It happens that Clojure reuses some native types
efficiently implemented by the JVM and used by Java (String, long, ....) but not all of them.

Let's say one day you end up coding in ClojureScript or Clojure on JS, what do you prefer ?
Deal with idiosyncrasies of the underlying environment or have a consistent implementation that provides
the best performance for that given pseudo-metal ?

What about the day that long long (128 bits) comes around ? Clojure will drag behind because it's carrying
32 bit values ?

Obviously it creates issues when you work at the fringe but interop is not the purpose of
Clojure. It happens to be much more easier to access the "outside" world than in other environments but it
cannot justify to compromise the performance or feature list of Clojure.

Luc P.

--

David Nolen

unread,
Oct 20, 2011, 9:19:08 PM10/20/11
to clo...@googlegroups.com
On Thu, Oct 20, 2011 at 4:11 PM, nathanmarz <natha...@gmail.com> wrote:
I'm not sure we're arguing about the same thing. I think that Clojure
only supporting 64 bit primitive arithmetic is fine, and I'm not
proposing that it support 32 bit primitive arithmetic. The sole point
of contention is what Clojure does when it has to box a primitive int.
I think this is orthogonal to primitive args/return, but correct me if
I'm wrong.

If 32bit ints are allowed to exist then the various numeric operators must handle them. If the numeric operators handle them then primitive arg and return should probably be supported. But that would exponentially increase the number of interfaces required for primitive arg return support.

David 

nathanmarz

unread,
Oct 21, 2011, 12:11:41 AM10/21/11
to Clojure
Now I'm confused. So when I do this:

(def i (Integer/parseInt "1"))

Is "i" a primitive int, a primitive long, or a Long object?

I was under the impression that it was a primitive int based on
Justin's test code, but when I run "(primitive-type i)" in the REPL it
tells me :object.

If "i" is a primitive int, then the only change I'm proposing is that
if Clojure needs to box that value later on, that it box it as an
Integer instead of a Long. This change in behavior would not affect
primitive number performance since it's at a point when Clojure is
already boxing.

If "i" is a primitive long (which is what I thought was happening
originally), I propose that Clojure box the value as an Integer unless
you wrap the form in a "(long ...") form. In the latter case Clojure
would do what it's doing currently so you can still get the
performance if you need it. The difference is that you're being
explicit about the type changing so there's no possible confusion in
that regard.

Finally, if "i" is a Long object, I propose that it instead be boxed
as an Integer object.

Note that I am not saying:

1. That Clojure always box primitives into an object form
2. That Clojure implement 32 bit arithmetic

In all these cases, you can still get maximum performance without
Clojure changing ints to longs. Please correct me if there's something
I'm missing here.

Stu's argument from above is that Clojure boxes ints to Longs instead
of Integer to avoid weirdness with hashcode/equality in collections.
This is a reasonable point, but consider this code example:

user=> (def m1 {(Integer/valueOf "1") 2})
#'user/m1
user=> (def m2 {(Integer/parseInt "1") 2})
#'user/m2
user=> (map class (keys m1))
(java.lang.Integer)
user=> (map class (keys m2))
(java.lang.Long)

Clojure doesn't prevent you from putting Integer objects in
collections. So there are cases where you still need to do type
coercion yourself. Given that Clojure can't hide this problem
completely from you, I think it's better that it treat "int" and
"Integer" consistently by boxing ints as Integers. Then there's no
weirdness like I ran into with getting ClassCastExceptions because the
type changed.

-Nathan







On Oct 20, 6:19 pm, David Nolen <dnolen.li...@gmail.com> wrote:

Alan Malloy

unread,
Oct 21, 2011, 12:35:36 AM10/21/11
to Clojure
It is a Long object. Vars hold objects, so it has to be boxed.
However, if instead of def'ing it you immediately called some java
method that will accept either a primitive int or a primitive long, my
understanding is that Clojure would arrange for the int version to be
called, because no boxing would happen.

nathanmarz

unread,
Oct 21, 2011, 12:57:49 AM10/21/11
to Clojure
Thanks Alan, that makes sense. This code example illustrates that
Clojure values can already be primitive ints:

user=> (let [i 1] (primitive-type i))
:long
user=> (let [i (Integer/parseInt "1")] (primitive-type i))
:int

So it appears that Clojure's behavior is case #2 from my last comment.
All I'm proposing is that when Clojure needs to box a primitive int,
that Clojure box it as an Integer rather than a Long. Then this code
example:

(let [m {:a (Integer/parseInt "1")}]
(map class (vals m)))

will behave the same as this one:

(let [m {:a (Integer/valueOf "1")}]
(map class (vals m)))


-Nathan

Luc Prefontaine

unread,
Oct 21, 2011, 1:27:51 AM10/21/11
to clo...@googlegroups.com

The "weirdness" here is that you seem to confuse the Java context and the Clojure
context. They are not the same. Clojure has to satisfy to performance and consistency
criterias. It's a language of it's own, not a Java offspring.

user=> (class (Integer/parseInt "1"))
java.lang.Long
user=>

Integer/parseInt returns a primitive type. Not a boxed Integer object.
If used as a key in a map or anything else in Clojure, it will get promoted to a long value as per the math
promotion rules (long/double representation). Obviously needed if it is to be used later in a computation
otherwise it would break math operations consistency by allowing mixed int/long operands.

If passed as an interop parameter it will retain it's int type.

user=> (class (Integer/valueOf 1))
java.lang.Integer

Integer/valueOf returns an Integer object, not a primitive type.
It's an object, not a primitive type, Clojure will not change it.
If used as a key in a Clojure map or any Clojure data structure, it will retain its object status.

Just cast your keys accordingly if you want Integer objects as keys.
In your short example, 1 as a key will not do it, it gets promoted to primitive long.

You may not recall but in Java, int used not to be compatible with Integer objects.
It's only since java 5 that you can assign an Integer object to a primitive int.
That's the compiler tricking things to allow you to do that. In the JVM there's still
not represented in the same way.

The above Integer member functions and their behavior have nothing to do with Clojure.
They result from bad decisions made years ago when designing Java and the JVM and you are blaming
Clojure for not handling them according to some patch implemented afterward in the Java compiler.

You ran in the ClassCast exception by yourself. Clojure did not push you into it.
When using Java interop you have to obey to Java rules and bend accordingly.
It's not Clojure that needs to bend, it's you to adapt to the interop
restrictions/conventions.

If Java expects an Integer object somewhere make sure you are providing it.

Luc P.

--

nathanmarz

unread,
Oct 21, 2011, 3:52:50 AM10/21/11
to Clojure
Luc, what you're saying sounds to me like "this is the way it is so
deal with it". Can you give me some concrete code snippets showing why
it's better to box ints as Longs? Do you really think the following is
at all intuitive?

user=> (class (Integer/parseInt "1"))
java.lang.Long
user=> (class (Integer/valueOf "1"))
java.lang.Integer


-Nathan



On Oct 20, 10:27 pm, Luc Prefontaine <lprefonta...@softaddicts.ca>
wrote:

Sean Corfield

unread,
Oct 21, 2011, 4:04:36 AM10/21/11
to clo...@googlegroups.com
On Fri, Oct 21, 2011 at 12:52 AM, nathanmarz <natha...@gmail.com> wrote:
> user=> (class (Integer/parseInt "1"))
> java.lang.Long

(Integer/parseInt "1") returns an int - which Clojure promotes to long
(since it only has 64-bit primitives) and class takes an Object so
long is boxed to Long.

> user=> (class (Integer/valueOf "1"))
> java.lang.Integer

(Integer/valueOf "1") returns an Integer - which is already an Object
so (class Integer-value) returns Integer.

You're only going to get 32-bit ints if you are calling into Java and
that API expects an int - and you coerce the (64-bit primitive long)
Clojure primitive to a 32-bit int (again, as I understand the 1.3
numerics).

If Java gives Clojure an int, it will be treated as a 64-bit long. If
Java gives Clojure an Integer, it will be treated as an Object. I
rather like the simplicity of 1.3's numeric handling: there are only
longs and doubles - but you can coerce them to whatever you need for
interop. It's performant by default, it's simple and consistent (in my
eyes) and yet still flexible.

Stuart Halloway

unread,
Oct 21, 2011, 7:24:16 AM10/21/11
to clo...@googlegroups.com
> Luc, what you're saying sounds to me like "this is the way it is so
> deal with it". Can you give me some concrete code snippets showing why
> it's better to box ints as Longs? Do you really think the following is
> at all intuitive?
>
> user=> (class (Integer/parseInt "1"))
> java.lang.Long
> user=> (class (Integer/valueOf "1"))
> java.lang.Integer
>
>
> -Nathan


If you box Ints and Longs separately then collection keys stop working, as I said at the start of this thread.

There is no such thing as intuitive. The behavior above follows from a clear rule that can be followed locally in all cases. What you propose, in addition to breaking collections, requires more contextual reasoning to understand code in a bunch of scenarios.

Could we meet on IRC and discuss this later today? Feel like we are going in circles.

Stu

Luc Prefontaine

unread,
Oct 21, 2011, 8:36:42 AM10/21/11
to clo...@googlegroups.com
Like Stu says, this conversation is going in circle.

"Concrete code examples" cannot be a replacement for consistent rules when
designing software and especially a prog. language.

Since the beginning of this thread, you have been exposed to two of these:

a) make collections consistent
b) make computations efficient in Clojure

These are 2 major reasons why promoting to long is rational. No need for code snippets.

"Intuitive" is relative, it's not an objective criteria. What may seem
intuitive to you can be counter intuitive to others. It all depends on the background
of individuals.

If you know the rules relative to numerics in Clojure and why they have been
chosen, then its perfectly rational.

Read carefully:

http://dev.clojure.org/display/doc/Documentation+for+1.3+Numerics

They is nothing else to say about this subject.

Luc

Chris Perkins

unread,
Oct 21, 2011, 9:54:54 AM10/21/11
to clo...@googlegroups.com
Perhaps I can clarify why the 1.3 behavior is confusing. For those who have focused on issues like "primitives need to be boxed, therefore you get a long" - I think you are missing Nathan's point.  Here is what changed about boxing in 1.3:

Clojure 1.2:

(class (Long/parseLong "1"))  =>  java.lang.Long
(class (Integer/parseInt "1"))  =>  java.lang.Integer
(class (Short/parseShort "1"))  =>  java.lang.Short
(class (Byte/parseByte "1"))  =>  java.lang.Byte
(class (Float/parseFloat "1"))  =>  java.lang.Float
(class (Double/parseDouble "1"))  =>  java.lang.Double

Clojure 1.3:

(class (Long/parseLong "1"))  =>  java.lang.Long
(class (Integer/parseInt "1"))  =>  java.lang.Long
(class (Short/parseShort "1"))  =>  java.lang.Short
(class (Byte/parseByte "1"))  =>  java.lang.Byte
(class (Float/parseFloat "1"))  =>  java.lang.Float
(class (Double/parseDouble "1"))  =>  java.lang.Double

So the issue is not "why do primitives get boxed at all?" - it is "why are primitive ints, uniquely amongst all primitive types, singled out and boxed as a wrapper type that is not the analogue of their primitive type?"

I suspect that this is what Nathan is objecting to.

- Chris


nathanmarz

unread,
Oct 21, 2011, 7:07:30 PM10/21/11
to Clojure
Yea let's chat on IRC. I'll ping you when I see you online.

-Nathan

Paul Stadig

unread,
Oct 22, 2011, 7:13:08 AM10/22/11
to clo...@googlegroups.com
On Wednesday, October 19, 2011 10:38:56 AM UTC-4, stuart....@gmail.com wrote:
>Integers and longs are going to be painful no matter what because they are broken in Java, e.g.

It is incorrect to say that "Integers and longs...are broken in Java."

user=> (.hashCode (Integer. -1))
-1
user=> (.hashCode (Long. -1))
0
user=> (.equals (Integer. -1) (Long. -1))
false

This is consistent with the contract for hashCode. Java would be broken only if equals returned true, but the hashCodes were different. If anything Clojure was (and in fact still is) broken, since Clojure makes Long and Integers in the same range equal, but does not make their hashCodes equal:

user=> (hash (Integer. -1))
-1
user=> (hash (Long. -1))
0
user=> (= (Integer. -1) (Long. -1))
true

Henceforth referred to as "the hashCode problem".


On Thursday, October 20, 2011 9:00:23 AM UTC-4, stuart....@gmail.com wrote:
>Somebody has to work hard: either users of collections, or interop callers. The current behavior makes things "just work" for collections, at the cost of having to be explicit for some interop scenarios.
>
>There are two reasons to favor collection users over interop users:
>
>   (1) Interop problems are local, and can be resolved by checking the type signature at the point of the problem. Collection key problems are global and break the composability of collections. It is a *huge* benefit of Clojure that collections are sane.

Munging the data as it goes into a collection does not fix the hashCode problem.

PersistentArrayMaps don't have the hashCode problem, because they don't actually bother with hashCodes:

user=> (get {(Long. -1) :here} (Integer. -1))
:here

But boxing ints as Long doesn't actually fix the hashCode problem for PersistentHashMaps.  Big 'I' Integers still hash differently than big 'L' Longs, yet Clojure considers Longs in the Integer range to be equal to Integers, and this is the fundamental problem with Clojure's collections. E.g.

user=> (get (clojure.lang.PersistentHashMap/create {(Long. -1) :here}) (Integer. -1))
nil
user=> (get (clojure.lang.PersistentHashMap/create {(Long. 0) :here}) (Integer. 0))
:here

Since Clojure isn't making the hashCodes for Integers and Longs the same, the collection experience is still broken.  One could say, "Yes, Paul, but it is less broken now, because you will only see this issue if you explicitly create a big 'I' Integer."

Then I could say, "Yes, One, that may be true, but in that case presumably I have a reason to explicitly ask for a big 'I' Integer, and I should understand the implications. Similarly, I probably have a reason for asking for a little 'i' int.  Clojure may think it knows best by boxing ints as Longs, but I'm pretty sure I know what's best in this particular situation in my code."

Then One could say, "But using only longs makes math much faster, and makes the collection experience more consistent."

Then I could say, "One, you are complecting two different issues. Making Clojure literals always longs is fine, it's great.  Making the Clojure compiler generate fast code for little 'l' longs is great. That means that you should only run into this collection brokenness if you are explicitly asking for and creating big 'I' Integers. However, the collection experience not being consistent is a problem with the collection implementation.  PersistentHashMap should not be using Integer's hashCode method if it is not using Integer's equals method."


>   (2) There are a lot more lines of code working with collections than doing interop.

I think the issue with interop is that I am explicitly asking for ints and/or Integers, and when I'm doing interop I expect that Java semantics be preserved, which means that ints get boxed into Integers.  I don't believe that boxing ints as Integers should harm any of the primitive math enhancements, nor would it harm the concept of Clojure "as a language unto itself" having only 64-bit math.  Those are orthogonal issues.


Paul

Chas Emerick

unread,
Oct 22, 2011, 9:48:17 AM10/22/11
to clo...@googlegroups.com
If Clojure's primary objective were Java interop, I might agree with you.  However, it's not, and it's bizarre to see someone argue that this is not broken:

user=> (.equals (Integer. -1) (Long. -1))
false

Sure, not broken according to the Java object model and its equality semantics, but damn well broken if your standard is something other than what was canonized in java.lang.Object 20 years ago.  1 == 1 all the time — or, it should — regardless of the containers such integers incidentally find themselves within.

Thus, Clojure's notion of equivalence, which leads to pleasantly consistent behaviour, e.g. (== (Integer. 1) 1 1N). Which, of course, doesn't preclude one using .equals if you truly want Java math semantics instead of = or == (neither of which have ever been advertised as adhering to the hashcode/.equals contract, at least since Clojure 1.0 IIRC).

If there are some common rough spots in the interop for certain use cases, perhaps those can be smoothed out with a library, maybe contributed by someone that acutely feels that pain.

- Chas

Paul Stadig

unread,
Oct 22, 2011, 10:48:00 AM10/22/11
to clo...@googlegroups.com
On Sat, Oct 22, 2011 at 9:48 AM, Chas Emerick <ceme...@snowtide.com> wrote:
If Clojure's primary objective were Java interop, I might agree with you.  However, it's not, and it's bizarre to see someone argue that this is not broken:

user=> (.equals (Integer. -1) (Long. -1))
false

Sure, not broken according to the Java object model and its equality semantics, but damn well broken if your standard is something other than what was canonized in java.lang.Object 20 years ago.  1 == 1 all the time — or, it should — regardless of the containers such integers incidentally find themselves within.

From the beginning Clojure's story has been, "why reinvent the wheel when there's this great JVM with a million man-months of engineering," and I do believe interop has been a huge objective. There are lots of existing libraries that can be used, and the whole "Clojure integers are java.lang.Integers, and Clojure Strings are java.lang.Strings" always seemed to me to be about interop and being a good, integrated citizen on the JVM.

Of course I was not saying that 1 should not equal 1. I was saying that to be on the JVM you should adhere to the hashCode contract. And it's not the java.lang.Object equality semantics that are broken. The hashCode contract is a mathematical contract that you must follow if you want to implement a hash table in any language. Sure, Integer and Long seem to be weird in that they are not equal to each other when they are in the same range, but that's a problem with Integer and Long semantics, not java.lang.Object semantics. And you can't fix that problem by essentially rewriting/overriding the equals method for Integer and Long, and not also rewriting/overriding the hashCode method for those same classes. If you don't also override hashCode, then you get broken behavior as I demonstrated.

Thus, Clojure's notion of equivalence, which leads to pleasantly consistent behaviour, e.g. (== (Integer. 1) 1 1N). Which, of course, doesn't preclude one using .equals if you truly want Java math semantics instead of = or == (neither of which have ever been advertised as adhering to the hashcode/.equals contract, at least since Clojure 1.0 IIRC).

Clojure PersistentHashMaps are java.util.Maps, and to whatever extend Clojure defines new types on the JVM and implements an equals method for those types, it should also implement a hashCode method that adheres to the contract.
 
If there are some common rough spots in the interop for certain use cases, perhaps those can be smoothed out with a library, maybe contributed by someone that acutely feels that pain.

I don't intend to muddle the discussion, but only to point out that there are two separate issues:

1) the way collections behave when you use Longs that are in the Integer range. This is a problem with the implementation of PersistentHashMap, and unrelated to boxing ints as Longs. Boxing ints as Longs only hides the underlying issue that PersistentHashMap should not be using the default implementation of hashCode, but it's own implementation of equals.

2) ints being boxed as Longs. When you looks a Chris Perkin's post it certainly seems broken that ints are the *only* primitive that is not boxed into its java.lang equivalent. Also, AFAICT boxing ints as Integers would have no effect on the faster numeric maths.


Paul

Luc Prefontaine

unread,
Oct 22, 2011, 1:49:38 PM10/22/11
to clo...@googlegroups.com

Java != JVM.

That's a too common mistake. Integer vs Long, Byte, ... are Java creations.
They have nothing to do with the JVM primitive data types.

Clojure implements a semantic different than Java on top of the JVM, why not ?
That's the whole idea of having the JVM around. Abstracting the metal.

Clojure reuses Java strings as is but it could have implemented its own on top of the
char primitive type at the expense of less transparent interop. This is an implementation choice.
It does not tie Clojure to Java.

These are Clojure centric decisions. Lets get out Java out of this discussion.
Clojure is not Java and even if it provides a "soft" bridge
to reuse Java code, its feature set is certainly not Java centric.

A Clojure persistent map is a ... Clojure data structure, not a Java data structure.
Interfaces like java.util.Map have nothing to do with the content of the map itself.
If they help make interop calls smoother fine. But do not tie their Java semantic to
the Clojure semantic. It's unrelated outside of the interop domain.

I do not care about Java centric stuff. I adopted Clojure to get away from Java
ASAP.

Luc P.

--

Paul Stadig

unread,
Oct 22, 2011, 3:04:33 PM10/22/11
to clo...@googlegroups.com
On Sat, Oct 22, 2011 at 1:49 PM, Luc Prefontaine <lprefo...@softaddicts.ca> wrote:

Java != JVM.

That's a too common mistake. Integer vs Long, Byte, ... are Java creations.
They have nothing to do with the JVM primitive data types.

Clojure implements a semantic different than Java on top of the JVM, why not ?
That's the whole idea of having the JVM around. Abstracting the metal.

Clojure reuses Java strings as is but it could have implemented its own on top of the
char primitive type at the expense of less transparent interop. This is an implementation choice.
It does not tie Clojure to Java.

Um...I guess I don't understand how what you're saying is relevant. Are you saying that Clojure should implement it's own Byte, Short, Integer, and Long? If you are, then the hashCode contract should be obeyed. If you're not, then it's fine for PersistentHashMap to redefine equals for java.lang.{Byte,Short,Integer,Long}, but hashCode should also be redefined.

The hashCode contract is not a Java thing, it is a JVM thing, and in fact (as I mentioned before) it is a mathematical contract that you must obey to implement a hash table in any language and on any platform.

Python has a similar contract
http://docs.python.org/reference/datamodel.html#object.__hash__

C# has a similar contract
http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx

Common Lisp has a similar contract
http://www.lispworks.com/documentation/HyperSpec/Body/f_sxhash.htm#sxhash

The brokenness of PersistentHashMap with respect to the hashCode problem has nothing to do with Java (or even the JVM).
 
These are Clojure centric decisions. Lets get out Java out of this discussion.
Clojure is not Java and even if it provides a "soft" bridge
to reuse Java code, its feature set is certainly not Java centric.

A Clojure persistent map is a ... Clojure data structure, not a Java data structure.
Interfaces like java.util.Map have nothing to do with the content of the map itself.
If they help make interop calls smoother fine. But do not tie their Java semantic to
the Clojure semantic. It's unrelated outside of the interop domain.

I do not care about Java centric stuff. I adopted Clojure to get away from Java
ASAP.

The reality is that PersistentHashMap does implement j.u.Map, and as much as possible Clojure tries to live at peace with other classes/objects on the JVM. There will always be some level of interop and semantics that must be matched with the platform. If you want to totally avoid Java, then I don't think Clojure is going to help you. It's not just a coincidence that Clojure strings are java.lang.Strings, and there are probably many people who would not have found Clojure as compelling if it didn't have a great interop story, and the ability to access a huge set of existing libraries. I feel like this is drifting off topic though.

Coming back to the original issues:

1) PersistentHashMap should be using a hashing function that is congruent with the equals function it uses.

2) Boxing ints as Longs sticks out when every other primitive is boxed into its java.lang.* equivalent.

3) Boxing ints as Integers would not have any adverse effect on the improvements to primitive maths.

I'd be glad to help out with any of this.


Paul

Luc Prefontaine

unread,
Oct 22, 2011, 3:40:58 PM10/22/11
to clo...@googlegroups.com

a) Clojure does not to implement Integer, Byte, ... or any of the number related Java classes.
It uses native JVM data types. The Integer class has nothing to do with the JVM primitive types.
These are Java concepts. It has nothing to do with Clojure itself. It's alien stuff.
Dunno why you insists on these. Clojure has not been designed to be a Java superset.
It's a language of its own.

b) The way Java interprets equality based on the hash code is a Java specific behavior.
It's defined by the Java API.
There's nothing in the jvm spec that defines how a hash code should be used. It's a reference.
Nothing more.

All the contracts you mention are language centric, each of them defined their contract according
to their own needs. Clojure should have the right to do so.

Clojure prefers to avoid having different rules than Java regarding number handling ? Be so, it's legitimate.

If people like you were free to take these decisions we would end up with three different languages, one on the
jvm, one on CLR and one on JS. Nonsense. Having to deal with three different interops and trying to
unify them a bit is enough work by itself.

Interop stuff is low level and should remain there. If a single interop implementation starts to influence
the Clojure language and result in such short term and narrow scope decisions, we will all have a problem
in the future.

Luc P.

--

Paul Stadig

unread,
Oct 22, 2011, 4:06:04 PM10/22/11
to clo...@googlegroups.com
Luc,


On Sat, Oct 22, 2011 at 3:40 PM, Luc Prefontaine <lprefo...@softaddicts.ca> wrote:
All the contracts you mention are language centric, each of them defined their contract according
to their own needs. Clojure should have the right to do so.

The contract is required for implementing any kind of hash map anywhere. This is not Java or the JVM influencing Clojure to do something it wouldn't have otherwise. The references were examples to show that widely varied languages/platforms agree: if you want to implement a hash map in any language on any platform, then when two objects are equal their hashCodes should be equal.

I'm fine with changing the Java semantics with respect to Integer and Long equality, BUT if we're going to change equality, then the hashing function has to be congruent with that equality function. If equals and hashCode are not congruent, on any platform, in any language, anywhere, then you do not have a hash map, you have a broken hash map. You can see the brokenness in the example code I posted.
 
Clojure prefers to avoid having different rules than Java regarding number handling ? Be so, it's legitimate.

I'm not arguing against that. I'm saying make the equality different, BUT you also have to make the hashCode function congruent.

If people like you were free to take these decisions we would end up with three different languages, one on the
jvm, one on CLR and one on JS. Nonsense. Having to deal with three different interops and trying to
unify them a bit is enough work by itself.

Interop stuff is low level and should remain there. If a single interop implementation starts to influence
the Clojure language and result in such short term and narrow scope decisions, we will all have a problem
in the future.

I mean, again, I don't understand why you're saying this. I agree interop needs to exist and to be at a low level. The question is, given that some form of interop must exist, how should it work? Right. The discussion that we're having here is about how Java interop should work. I don't think it makes sense to come in and say, that we can't let Java and the JVM influence how we interop with Java and the JVM. Perhaps you are misunderstanding me, or I you.

I'm saying given that there must be some form of interop, it does not make sense to box ints as Longs. And the decision to box them as Integers instead should not have any effect on the semantics of anything. Two objections have been raised against boxing ints as Integers so far: 1) it breaks Clojure's collections, and 2) it would have bad effects on the new faster primitive maths.

I'm saying: 1) Clojure's PersistentHashMap is broken because it is using incongruent equals and hashCode methods, auto-promoting ints to Longs only hides this, and if you explicitly create an Integer (or get one from Java) PersistentHashMap will still behave badly. Promoting ints to Longs only masks the issue.

2) autoboxing ints to Integers would not have any bad effects on the new faster primitive maths.

And as a third point, ints being boxed to Longs stands out as inconsistent with the way all the other primitive integer types are handled.


Paul

Luc Prefontaine

unread,
Oct 22, 2011, 4:31:29 PM10/22/11
to clo...@googlegroups.com

The contract in Clojure is clear, they are only long integer values except if you cast
them accordingly for Java interop purposes. Anything else is a long so there are no
contract breaches.

It's not a platform issue, it's a Java issue. Equality is implemented by Java for objects.
It has nothing to do with Clojure which uses primitive types like long for numeric representation.

user=> (= 1 (int 1))
true
user=> (.equals 1 (int 1))
true
user=> (.equals (long 1) (int 1))
true
user=> (.equals (Long. "1") (Integer. "1"))
false

Where's the contract breach here ? Don't mention breaking the Java contract in the two first
examples. You are in Clojure's playground here, arithmetic primitive types (not classes) are getting promoted
according to Clojure's rules.

The third one starts in Clojure's playground were the same promotion occurs and
both values get boxed to Long for interop purposes (.equals on java objects) and the result
satisfies the Java contract.

The fourth one is not in Clojure's playground, it's pure Java and it also respects the Java contract
between comparing objects, not primitive jvm types.

Were's the problem ? The Java contract is respected as soon as you dive in Java.

It's the same here:

user=> (class (key (first { (int 1) :a})))
java.lang.Long

You create a Clojure map with a numeric key, the key gets promoted from primitive int to primitive long.
As soon as you jump in interop, you get a Long object key. Obviously, the map comes from the Clojure's
playground but java cannot cope with primitive types in keys, you need an object so Clojure boxes
accordingly to a Long object.

user=> (class (key (first { (Integer. 1) :a})))
java.lang.Integer
user=>

Here you decide to create a key type from the Java world (a Java object) and it gets preserved so you
ship the map to a Java call for interop.

Were are the contract breaches ? Aside from the fact that Clojure implements a contract of its own
and respects it, there are no breaches.

You have been mixing Java objects with primitive types defined by the JVM since you entered this
discussion. It's two different things.

--

Chris Perkins

unread,
Oct 22, 2011, 5:23:24 PM10/22/11
to clo...@googlegroups.com
On Saturday, October 22, 2011 4:31:29 PM UTC-4, Luc wrote:

Where's the contract breach here ?

Glad you asked. Consider the following clojure session (1.3), shortened for your reading pleasure:

map-1  =>  {-1 :yo}
map-2  =>  {-1 :yo}
key-1  =>  -1
key-2  =>  -1

Just some simple maps and values, right?

(= map-1 map-2)  =>  true
(= key-1 key-1 -1)  =>  true

Yup, they're the same. But:

(map-1 key-1)  =>  :yo
(map-2 key-1)  =>  :yo
(map-1 key-2)  =>  :yo
(map-2 key-2)  =>  nil

Oops! Despite being "equal", the two maps behave differently. Why? 

(class map-1)  =>  clojure.lang.PersistentArrayMap
(class map-2)  =>  clojure.lang.PersistentHashMap
(class key-1)  =>  java.lang.Integer
(class key-2)  =>  java.lang.Long

Unless I am mistaken, the difference between an ArrayMap and a HashMap is supposed to be an implementation detail - an optimization. I'm sure that they shouldn't have different semantics. But when hashCodes and equality do not agree, this is the sort of thing that can happen.

Note that I'm not claiming to have any deep insights into what's broken and what's not, either in Clojure or in Java. All I'm saying is that claiming anything along the lines of "Clojure is not Java, so we can do whatever we want - contracts do not apply" does not lead to sane map behavior. Those contracts were created for a reason.

To be honest, I've sort-of lost the plot of how this is related to the boxing-ints-as-Longs issue, but that's probably due to both my lack of expertise in this area and to the generous glass of whiskey I had while watching Megamind with my kids this afternoon. But I digress. The point I think I was trying to back up is "if clojure changes equality semantics, it should change hashcodes to match". That sounds right to me.

- Chris
 

Luc Prefontaine

unread,
Oct 22, 2011, 5:42:42 PM10/22/11
to clo...@googlegroups.com
Your example is so short that I cannot replicate it:

user=> (def a (hash-map -1 :a))
#'user/a
user=> (def b (array-map -1 :a))
#'user/b
user=> (= a b)
true
user=> (= (key (first a)) (key (first b)) -1)
true

I said to myself, "Ok he's been throwing some ints in there":

user=> (def a (hash-map (int -1) :a))
#'user/a
user=> (def b (array-map -1 :a))
#'user/b
user=> (= a b)
true
user=> (= (key (first a)) (key (first b)) -1)
true

Still ok.

Now if you have been using Integer and Long objects as key, of course maps will not match.
You are using Java objects as keys, not primitive types. You're not in Clojure's playground
anymore, half of your map is in Java's own sandbox.

What's missing from your shortened example ?

--

Stuart Halloway

unread,
Oct 22, 2011, 5:51:37 PM10/22/11
to clo...@googlegroups.com
> Note that I'm not claiming to have any deep insights into what's broken and what's not, either in Clojure or in Java. All I'm saying is that claiming anything along the lines of "Clojure is not Java, so we can do whatever we want - contracts do not apply" does not lead to sane map behavior. Those contracts were created for a reason.

Clojure defines equiv separately from dot-equals. dot-equals respects Java's rules.

> To be honest, I've sort-of lost the plot of how this is related to the boxing-ints-as-Longs issue, but that's probably due to both my lack of expertise in this area and to the generous glass of whiskey I had while watching Megamind with my kids this afternoon. But I digress. The point I think I was trying to back up is "if clojure changes equality semantics, it should change hashcodes to match". That sounds right to me.

Mmm, whiskey.

I am dropping off this thread now. At this point I think it would be more useful for me (or someone) to expand the notes about numerics into better documentation, rather than continuing this rambling point-by-point treatment without getting all of the considerations into play at once. I hope to get that done by conj.

Stu

Paul Stadig

unread,
Oct 22, 2011, 6:55:52 PM10/22/11
to clo...@googlegroups.com
On Sat, Oct 22, 2011 at 5:42 PM, Luc Prefontaine <lprefo...@softaddicts.ca> wrote:
What's missing from your shortened example ?

I think what you want is the example I posted originally:


user=> (get {(Long. -1) :here} (Integer. -1))
:here

That works fine because you are actually creating an PersistentArrayMap, which does not care about hash codes. However, when you use a PersistentHashMap you see were things break down because the hashing function and the equality function that PersistentHashMap is using are not congruent (i.e. they break the hashing contract):


user=> (get (clojure.lang.PersistentHashMap/create {(Long. -1) :here}) (Integer. -1))
nil
user=> (get (clojure.lang.PersistentHashMap/create {(Long. 0) :here}) (Integer. 0))
:here

This happens because PersistentHashMap does not use .equals to compare keys, however it does use .hashCode to hash the keys. So it's fine to not use .equals and define Clojurey semantics for integer comparisons, but if we're not using .equals, then we should not be using .hashCode, and instead redefine .hashCode with Clojurey semantics as well. The contract that is being broken is the contract for hashing, not equality.

This problem has nothing to do with Java interop. I has nothing to do with the Java language or the JVM. It has nothing to do with whether ints are boxed as Integers or Longs. What is happening is PersistentHashMap is supposed to be an implementation of an abstract Computer Science data structure called a hash table, and for a hash table to work correctly the following must be true: if two keys are equal, then their computed hash values for those keys should be equal.

The reason we wandered into this is because one of the objections that has been raised against boxing ints as Integers is that doing so would break Clojure's collections. What I have been trying (unsuccessfully I gather) to communicate is that PersistentHashMap is broken in and of itself, and boxing ints as Longs only hides the issue. Boxing ints as Longs makes it less likely that you would actually be using an Integer as a key, because you have to explicitly ask for an Integer. However, if you explicitly ask for an Integer you still get the broken behavior, because PersistentHashMap needs to be fixed.

Bottom line: changing Clojure to box ints as Integers would not break Clojure's collection, but Clojure's collections need to be fixed to use a hashing function that is congruent with their equality function.


Paul

Luc Prefontaine

unread,
Oct 22, 2011, 7:53:05 PM10/22/11
to clo...@googlegroups.com

Ha ! Ok, I missed the digression here and I now understand the issue.
Considering that a PersistentArrayMap may eventually become a PersistentHashMap
this opens the door to *funny* bugs.

Is this the only known case ?

Luc

--

Paul Stadig

unread,
Oct 23, 2011, 6:36:45 AM10/23/11
to clo...@googlegroups.com
On Sat, Oct 22, 2011 at 7:53 PM, Luc Prefontaine <lprefo...@softaddicts.ca> wrote:

Ha ! Ok, I missed the digression here and I now understand the issue.
Considering that a PersistentArrayMap may eventually become a PersistentHashMap
this opens the door to *funny* bugs.

Is this the only known case ?

The bug in PersistentHashMap also infects PersistentHashSet. I've created a Jira bug about it you can see the details there:

http://dev.clojure.org/jira/browse/CLJ-861


Paul

Paul Stadig

unread,
Oct 23, 2011, 7:19:41 AM10/23/11
to clo...@googlegroups.com
On Sat, Oct 22, 2011 at 5:51 PM, Stuart Halloway <stuart....@gmail.com> wrote:
I am dropping off this thread now.  At this point I think it would be more useful for me (or someone) to expand the notes about numerics into better documentation, rather than continuing this rambling point-by-point treatment without getting all of the considerations into play at once. I hope to get that done by conj.

So you are still thinking that the current behavior is OK and just needs to be documented better? Or are you saying that we need to collect the various pros and cons to decide whether the current behavior should change or remain the same?

Having reviewed the thread there is lots of confusion, but from the points made it seems clear to me that the behavior should change.

CON (The "we should box ints as Longs" (or "we should keep things as they are") camp):
1) If we box ints as Integers it will break Clojure's collections (Stu Halloway)
2) Boxing ints as Integers would make Clojure's design inconsistent (David Nolen)
3) Clojure now only has 64-bit primitives (David Nolen/Kevin Downey)
4) If 32-bit ints are allowed to exist, the Clojure's numeric operators would have to handle them (David Nolen)

CON1 is a bug in PersistentHashMap, and I opened a Jira bug for it (http://dev.clojure.org/jira/browse/CLJ-861).
CON2 is false. The way primitives are boxed for interop doesn't and shouldn't have any effect on Clojure's design as such. This is a discussion about interop consistency, and if you look at the PRO section you will see Clojure is already inconsistent with respect to interop. Nathan and others are arguing that it should be made consistent.
CON3 is false. 32-bit primitives do exist in Clojure (at least Java Clojure), they are just not the optimized case. They may get immediately converted to longs or boxed in some way, but we cannot deny their existence, especially around interop.
CON4 Again, 32-bit integers do exist, and are already handled by the numeric operators. When you compile a function with primitive args, Clojure also generates a method that takes Objects. If you pass in anything other than a long it gets boxed, cast to a java.lang.Number, has its longValue method called, and that value gets passed to the primitive arg version. This is slow (as expected) because you are not using the optimized case (64-bit primitives). Absolutely none of that would have to change/get slower because ints were boxed as Integers instead of Longs.

I think the problem with all of these CONs is that they confuse boxing for interop with either a bug in PersistentHashMap, or fast primitive maths, and neither of those has anything to do with how ints are boxed.

PRO (The "we should box ints as Integers" camp):
1) Clojure is inconsistent in how it boxes primitive data (Chris Perkins)

Clojure 1.3:

(class (Long/parseLong "1"))  =>  java.lang.Long
(class (Integer/parseInt "1"))  =>  java.lang.Long
(class (Short/parseShort "1"))  =>  java.lang.Short
(class (Byte/parseByte "1"))  =>  java.lang.Byte
(class (Float/parseFloat "1"))  =>  java.lang.Float
(class (Double/parseDouble "1"))  =>  java.lang.Double


Paul

Luc Prefontaine

unread,
Oct 23, 2011, 11:16:09 AM10/23/11
to clo...@googlegroups.com
CON1 - I'm buying your argumentation about consistency in Clojure maps and
fixing them. Integer OBJECTS (as opposed to int primitive) should be
handle as objects consistenly, not as primitive values promoted to long.

CON2, CON3 and CON4 - No way, the current design choice is the good one.

So many languages have been plagued with numbers of different sizes/formats for ints and floating point values,
it's not a direction that Clojure should follow.
These distinct types are source of many problems (overflow handling, precision problems, ...).

The need for Clojure to support these things is similar to calling assembler
from C. You matter about bytes, shorts and similar things at the frontier,
when it's time to call a low level service, you need to be able to pass
these values.

By no means this implies that you have to support them in your language runtime.
It complects (;) everything including computations and makes your runtime much more harder to port.

It's an interop centric thing and interop is by essence not portable.
It does not belong to the core of Clojure. It's better to rely on cast operators
to call interop than to expect Clojure to box numeric values according to some interop
convention that may vary according to the platform Clojure runs on.

Luc P.

--

Ivan Koblik

unread,
Oct 23, 2011, 2:31:51 PM10/23/11
to clo...@googlegroups.com
Hello Luc,

In all fairness I don't see how converting ints to Integers returned by class methods would break the abstraction. If you start talking about portability of Clojure code, then Long is as portable as Integer is. (In general they are not.)

Could you explain your position on the fact that shorts get converted to Short? Why is it not possible to do the same for ints?

I don't think that there was anyone in this thread that would suggest keeping 32bit math in Clojure. For what it's worth, Integer can be converted to Long first time it is used in any computation.

Cheers,
Ivan.


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Luc Prefontaine

unread,
Oct 23, 2011, 4:01:02 PM10/23/11
to clo...@googlegroups.com

On Sun, 23 Oct 2011 20:31:51 +0200
Ivan Koblik <ivank...@gmail.com> wrote:

> Hello Luc,
>
> In all fairness I don't see how converting ints to Integers returned
> by class methods would break the abstraction. If you start talking
> about portability of Clojure code, then Long is as portable as
> Integer is. (In general they are not.)

It's simpler to use one representation to port the core. You can choose the
fastest/efficient one. You do not have to carry all these intermediate types
with you.

The day a 128 bits primitive type become available, there's little changes to do to support
that. If you keep mixed types, that adds another one to the babel tower.

The problem is not to choose between ints or longs, it has to do with carrying
all these intermediate types. Frankly aside from interop, how many are using
short ints in Clojure ? That's a leftover from the PDP-11 era.

>
> Could you explain your position on the fact that shorts get converted
> to Short? Why is it not possible to do the same for ints?

This should disappear. I think all the small primitive types including ints
should be promoted to long except when doing an interop call.
Rich can explain why it's been kept. Maybe a question of priority/effort
or something else.

>
> I don't think that there was anyone in this thread that would suggest
> keeping 32bit math in Clojure. For what it's worth, Integer can be
> converted to Long first time it is used in any computation.
>

That is unnecessary overhead, again lets split boxed values from primitive types.
If you compute in Clojure, keeping primitive ints/shorts/bytes around has no value.
You end up having type conversion to do depending on what is specified in the
expression.

When doing an interop call, this is when you need to be specific. Elsewhere
I see no value in keeping this scheme.

This way of thinking about primitive types has been sticking around for at least
35 years carrying 64/32/16/8 bit unsigned/signed int values. Maybe it's time we toss this away.

I have been writing a couple of hundred thousand lines of assembly code in my
professional life and I understand this model. Of course when you deal with
hardware in a device driver you need these things, but in Clojure ?

And with today's hardware, why stick with these data types ? To reduce memory footprint ?
Ha ! Ha !, I used to work on computers with 256K of physical memory.
This concern was legitimate in this prehistoric era. But today ?

If you need bit manipulation in Clojure, better write a lib for this than mangling with
these data types.

Paul Stadig

unread,
Oct 23, 2011, 4:55:49 PM10/23/11
to clo...@googlegroups.com
On Sun, Oct 23, 2011 at 4:01 PM, Luc Prefontaine <lprefo...@softaddicts.ca> wrote:
It's simpler to use one representation to port the core. You can choose the
fastest/efficient one. You do not have to carry all these intermediate types
with you.

There are already at least two numeric types: long and BigInt. If you want to try to be blissfully unaware of any of this, then you can use promoting math (+' and friends). Adding more numeric types to the tower doesn't seems to make things more complicated in the general case, only in the interop case, or in the case that you are trying to optimize your code because it is too slow or uses too much memory. Which is what we're talking about here.

You have said before that you grant there are interop cases at the edges of Clojure, and they should be kept at the edge. What we are discussing in this thread are exactly those edge/interop cases. You would never have an int or Integer unless you asked for one or got one from some Java code. It doesn't make sense to come into a discussion about interop, and say that we shouldn't let interop determine the core of the language. This thread is not about the language core.
 
When doing an interop call, this is when you need to be specific. Elsewhere
I see no value in keeping this scheme.

Exactly, we're assuming in this thread that we're already at the edge doing interop, or trying to optimize our code. So any comments that assume we're not doing interop are out of scope.

And with today's hardware, why stick with these data types ? To reduce memory footprint ?
Ha ! Ha !, I used to work on computers with 256K of physical memory.
This concern was legitimate in this prehistoric era. But today ?

There are good reasons at both ends of the computing spectrum to want to be efficient with memory. Embbeded systems and mobile platforms don't necessarily have terabytes of memory to access. And on the other end of the spectrum at work, we process terabytes of data using byte arrays and byte streams, if we all of a sudden needed 8 times the memory to do the same job, it would probably be a deal killer.

Similarly, we have some native JNI libraries we use that limit us to using a 32-bit JVM on some of our nodes, and we are constantly fighting OOMEs in those restricted heaps. Which is an interop case, which is the context of this thread. The core of the language can use only longs, which is fine.
 
If you need bit manipulation in Clojure, better write a lib for this than mangling with
these data types.

I'd rather write that code in Clojure than Java and use it from Clojure (if that's what you're saying). And if I'm dealing with data formats, (c.f. the gloss library) it would be really inconvenient to always have things converted to longs on me. I prefer to not have a language/platform that thinks it knows what is better for me, than I do.


Paul

Rich Hickey

unread,
Oct 23, 2011, 5:21:52 PM10/23/11
to clo...@googlegroups.com
Hi all,

This reply is to the thread, not Luc specifically.

Thanks everyone for your feedback and input.

I have pushed 3 commits:

1) Fixes the inconsistency between the hash function used by Clojure maps (was .hashCode) and =. Thanks Paul for the report.

2) Changes core/hash to return the result of this hashing function. Thus, it returns a different value than does .hashCode for Integers, Shorts, Bytes and Clojure collections. Feedback welcome.

3) Only due to the first fix, it now becomes possible to box ints to Integers without much grief. This commit implements that for evaluation purposes, and is not a commitment to that policy. Note well that while in the first commit the answer is clear, on this point there is always going to be a tradeoff and there is no 'right' answer.

Here are the issues as I see them:

First, note there is no 'following' of Java semantics as an objective. Java semantics are that Integers are never equal to Longs, and I presume no one wants to go back to that.

Second, boxing is a change of type, period. There is no valid complaint that 'you changed my type'. int != Integer either.

Third, there are 2 scenarios in consuming things you box in Clojure from Java:

a) You control the Java. In this case, having Clojure make everything uniform (Longs) make things easier for you. There is no heterogeneousness regardless of the source or manipulation of numbers, and can always expect Longs.

b) You don't control the Java. In this case you must match consuming expectations i.e. conforming to Java promotion, types of generics etc. ***This will *always* require vigilance and explicitness due to arithmetic conversions etc***. Auto promotion is only one part. Note that this is true in Java as well - while type checker may scold you, you still have to cast/coerce on mismatch.

Even with the auto box change, you are only an arithmetic operation away from having the problem again. For instance in the original report, wrapping .getValue with dec generates an interop mismatch again:

(let [amap {1 (dec (.getValue obj))}] …)

There is no way we are going to 'fix' that by adopting Java's numeric tower, which is dangerous and requires static types. The bottom line is specific type requirements on the Java side require explicit boxing on order to have correct and non-brittle code.

The final consideration is collection equality. When Clojure autoboxes to Longs, you get homogeneous collection contents, and thus .equals is still true for the collection on the Java side, vs random - 'depends on where I got the contents from and what I did with them'.

FYI - there are the RT/box functions that box as per Java. These could be exposed in Clojure.

-----
In short, having autoboxing match Java does not really free you from your responsibility to create specific boxed types when you need them on the Java side. I.e., Clojure can't help you.

On the flip side, when you are in charge of the Java code, Clojure's being more consistent makes things more consistent on the other side and *does* give you less to do to make sure things work.

I prefer what we had (auto box to Longs), but I think it matters a somewhat less now with = consistent hashing. If we decide to revert to that we can discuss making auto boxing of short and byte consistent.
-----

In any case, those of you who still know how to use Clojure from Git can try these commits, and please provide feedback as to its actual effects on actual code. I think the opinion phase of this is now over :)

Thanks again for the feedback,

Rich

1) https://github.com/clojure/clojure/commit/b5f5ba2e15dc2f20e14e05141f7de7c6a3d91179
2) https://github.com/clojure/clojure/commit/b4a2216d78173bb81597f267b6025c74a508bd03
3) https://github.com/clojure/clojure/commit/a2e4d1b4eaa6dad26a1a96b9e9af129cccca9d10

Stuart Sierra

unread,
Oct 23, 2011, 9:52:20 PM10/23/11
to clo...@googlegroups.com
As a reminder, you don't need Git to use the latest development version of Clojure. Just set your Clojure dependency version to "1.4.0-master-SNAPSHOT" and add Sonatype to your Maven repositories.

Detailed instructions here: http://dev.clojure.org/display/doc/Maven+Settings+and+Repositories

-Stuart Sierra
clojure.com

Rich Hickey

unread,
Oct 24, 2011, 7:04:07 AM10/24/11
to clo...@googlegroups.com
How can people toggle between the various commits I mentioned using Maven?

Rich

Kevin Downey

unread,
Oct 24, 2011, 1:46:30 PM10/24/11
to clo...@googlegroups.com
;; lein for all 3 commits
[org.clojure/clojure "1.4.0-master-20111023.210239-5"]

and I imagine you can do something similar with maven, the main thing
is you need to add the sonatype snapshot repo.

but you can't access individual commits because the build machine
polls and gathers the latest commits together and does a build.

and the readme.txt has build instructions for ant and maven for those
who don't know how to build clojure from git.

--
And what is good, Phaedrus,
And what is not good—
Need we ask anyone to tell us these things?

Stuart Sierra

unread,
Oct 24, 2011, 9:06:28 PM10/24/11
to clo...@googlegroups.com
You can't jump around at a per-commit level (unless there's one build for each commit) but you can jump around among individual builds.

You can see a list of all completed builds on our Hudson server:
http://build.clojure.org/view/Clojure/job/clojure/

The "module builds" pages show the Git commit messages and corresponding snapshot version number:
http://build.clojure.org/view/Clojure/job/clojure/318/org.clojure$clojure/

With Git post-commit hooks, we could theoretically ensure there is always a snapshot build corresponding to each commit.

-S

Paul Stadig

unread,
Mar 7, 2012, 6:28:55 PM3/7/12
to clo...@googlegroups.com


On Sunday, October 23, 2011 5:21:52 PM UTC-4, Rich Hickey wrote:
Hi all,

This reply is to the thread, not Luc specifically.

Thanks everyone for your feedback and input.

I have pushed 3 commits:

1) Fixes the inconsistency between the hash function used by Clojure maps (was .hashCode) and =. Thanks Paul for the report.

2) Changes core/hash to return the result of this hashing function. Thus, it returns a different value than does .hashCode for Integers, Shorts, Bytes and Clojure collections. Feedback welcome.

3) Only due to the first fix, it now becomes possible to box ints to Integers without much grief. This commit implements that for evaluation purposes, and is not a commitment to that policy. Note well that while in the first commit the answer is clear, on this point there is always going to be a tradeoff and there is no 'right' answer.

Here are the issues as I see them:

First, note there is no 'following' of Java semantics as an objective. Java semantics are that Integers are never equal to Longs, and I presume no one wants to go back to that.

Second, boxing is a change of type, period. There is no valid complaint that 'you changed my type'. int != Integer either.

Third, there are 2 scenarios in consuming things you box in Clojure from Java:

a) You control the Java. In this case, having Clojure make everything uniform (Longs) make things easier for you. There is no heterogeneousness regardless of the source or manipulation of numbers, and can always expect Longs.

b) You don't control the Java. In this case you must match consuming expectations i.e. conforming to Java promotion, types of generics etc. ***This will *always* require vigilance and explicitness due to arithmetic conversions etc***. Auto promotion is only one part. Note that this is true in Java as well - while type checker may scold you, you still have to cast/coerce on mismatch.
 
Even with the auto box change, you are only an arithmetic operation away from having the problem again. For instance in the original report, wrapping .getValue with dec generates an interop mismatch again:

(let [amap {1 (dec (.getValue obj))}] …)

There is no way we are going to 'fix' that by adopting Java's numeric tower, which is dangerous and requires static types. The bottom line is specific type requirements on the Java side require explicit boxing on order to have correct and non-brittle code.

The final consideration is collection equality. When Clojure autoboxes to Longs, you get homogeneous collection contents, and thus .equals is still true for the collection on the Java side,  vs random - 'depends on where I got the contents from and what I did with them'.

FYI - there are the RT/box functions that box as per Java. These could be exposed in Clojure.

-----
In short, having autoboxing match Java does not really free you from your responsibility to create specific boxed types when you need them on the Java side. I.e., Clojure can't help you.

On the flip side, when you are in charge of the Java code, Clojure's being more consistent makes things more consistent on the other side and *does* give you less to do to make sure things work.

I prefer what we had (auto box to Longs), but I think it matters a somewhat less now with = consistent hashing. If we decide to revert to that we can discuss making auto boxing of short and byte consistent.
-----

Rich,
In Clojure 1.4.0-beta3 ints are boxed as Integers.

Clojure 1.4.0-beta3
user=> (map type [(byte 1) (short 1) (int 1) (long 1)])
(java.lang.Byte java.lang.Short java.lang.Integer java.lang.Long)

Based on the above and my conversation with you at the Conj you seemed to be pretty convinced that ints should be boxed as Longs. You made a temporary commit to box them as Integers (https://github.com/clojure/clojure/commit/a2e4d1b4eaa6dad26a1a96b9e9af129cccca9d10), then Stu Halloway reverted it (https://github.com/clojure/clojure/commit/abfa803838a1884d0c5112bc6b876cf33a8a05cc), then he reverted the revert (https://github.com/clojure/clojure/commit/798a98bc1b844b0fe08e9309886823cf7ca92604).

Are we still in the temporary period for evaluation purposes? Have you changed your mind? If so, I'd be interested to hear why. Should we expect this behavior from beta3 to change any time soon?


Paul

Reply all
Reply to author
Forward
0 new messages