Enhanced primitive support - redux

13 views
Skip to first unread message

Rich Hickey

unread,
Jun 25, 2010, 3:04:15 PM6/25/10
to clo...@googlegroups.com
equiv, the revenge of num

The latest revision of primitive support is in the equiv branch. It
takes the approach of num, with no auto-promotion, and bigint
contagion, and adds several things to better support contagion. I
think contagion is the best way to continue to support polymorphic
numeric code, something I consider important.

Contagion has several issues. First and foremost, it means that it it
will be possible for 42 and 42N to be produced in the course of normal
operations, so strict type-specific equality is not a good match. The
equiv branch brings back equivalence-based =, with a slightly tighter
notion of =, supporting only similar categories of numbers, so (= 42
42.0) => false, but (= 42 42N) => true. == is still available.

The second problem is the use of numbers (and collections of numbers)
as keys in hash maps and members of hash sets. We already had an issue
here, as there wasn't completely uniform boxing, and there will always
be the possibility of numbers from the outside. The equal branch tried
to use consistent boxing and type-specific =, but it was still open to
mismatch with numbers from outside. The equiv branch extends = to keys
and set members when used from Clojure, i.e. Clojure get and contains
will use = logic, while the Java get and containsKey will
use .equals(). I.e. we will still satisfy the semantics of Java when
used through the Java APIs, but nothing said the Clojure API must
match those semantics. When combined with the new BigInt class (see
below), this will allow you to look up (and find!) the integer 42 with
either 42 or 42N.

The equiv branch also has a new BigInt type. This is strictly
necessary for this scheme, as Java's BigIntegers and Longs can produce
different hashCodes for the same values. In addition to matching
hashCode support, the BigInt class opens the door to better
performance when used with numbers long-or-smaller. These performance
enhancements have not yet been made.

More details have been added to the top here:

https://www.assembla.com/wiki/show/b4-TTcvBSr3RAZeJe5aVNr/Enhanced_Primitive_Support

Code is here:

http://github.com/richhickey/clojure/commits/equiv


Feedback welcome,

Rich

Garth Sheldon-Coulson

unread,
Jun 25, 2010, 4:18:01 PM6/25/10
to clo...@googlegroups.com
This looks excellent.

I'd like to re-raise a question I had a few days ago: Will this ultimately come to affect floats, too?

In particular:

1) If there is going to be BigInt contagion, why not BigDecimal contagion?

user=> (* 4 5)
20
user=> (* 4 5N)
20N

but

user=> (* 4.0 5.0)
20.0
user=> (* 4.0 5.0M) ;no contagion
20.0
user=> (* 4.0M 5.0M)
20.00M
user=> (* 4.0 5.0000000000000000001M) ;not even when it's needed
20.0
user=> (* 4.0M 5.0000000000000000001M)
20.00000000000000000040M

2) Will the same reasoning that produced BigInt give us a BigDec with a better hashCode() than that of BigDecimal?

user=> (hash 4343434343)
48467046
user=> (hash (BigInteger. "4343434343")) ;not very nice
48467078
user=> (hash 4343434343N) ;nice as of equiv branch
48467046

but

user=> (hash 4343.4343)
1861754824
user=> (hash 4343.4343M) ;not very nice
1346464637

user=> (defn -hashCode [this]
  (let [dv (.doubleValue this)]
    (if (== this dv)
      (.hashCode dv)
      (.hashCode this))))
#'user/-hashCode

user=> (hash 4343.4343)
1861754824
user=> (-hashCode 4343.4343M) ;nicer
1861754824

Garth


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Nicolas Oury

unread,
Jun 25, 2010, 4:33:14 PM6/25/10
to clo...@googlegroups.com
Amazingly good ideas.
Will try it tomorrow.

Garth Sheldon-Coulson

unread,
Jun 25, 2010, 4:33:54 PM6/25/10
to clo...@googlegroups.com
Following up, point (2) in my previous message would be helpful only if = were changed so that it looked at the stripTrailingZeros() value of BigDecimals. I would advocate this in any case, because the following behavior appears silly to me (is it useful to anyone?):

user=> (== 434343.00M 434343.0M) ;BigDecimal works this way, apparently
false
user=> (= 434343.00M 434343.0M) ;odd, however
false
user=> (= 434343.00M 434343.0)
false
user=> (== 434343.00M 434343.0) ;even odder, in my opinion, given the above
true
user=> (= (.stripTrailingZeros 434343.00M) (.stripTrailingZeros 434343.0M)) ;shouldn't = default to this?
true

Garth

Mark Engelberg

unread,
Jun 25, 2010, 4:35:30 PM6/25/10
to clo...@googlegroups.com
floats and doubles are "inexact" numbers and ratios, ints, longs,
bigints, and bigdecimals are "exact" numbers.
The expected behavior is that inexact things contaminate exact things,
because now the result is inexact. This is what Clojure does.

On Fri, Jun 25, 2010 at 1:18 PM, Garth Sheldon-Coulson <ga...@mit.edu> wrote:
> 1) If there is going to be BigInt contagion, why not BigDecimal contagion?

doubles contaminate bigdecimals, not the other way around. That's how
it should be.

> 2) Will the same reasoning that produced BigInt give us a BigDec with a
> better hashCode() than that of BigDecimal?

You don't want the hashCode of bigdecimal to match the corresponding
double. They aren't equal, so they shouldn't have the same hashcode.
They aren't equal because one is exact and one is inexact.

Now, one interesting question is whether the hashCode of 40.0M should
match the hashCode for 40N, because they are two exact numbers that
represent the same value.

Mark Engelberg

unread,
Jun 25, 2010, 4:43:36 PM6/25/10
to clo...@googlegroups.com
A couple points of clarification:

On Fri, Jun 25, 2010 at 1:35 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
> You don't want the hashCode of bigdecimal to match the corresponding
> double.  They aren't equal, so they shouldn't have the same hashcode.
> They aren't equal because one is exact and one is inexact.

I should have said "they shouldn't be equal" based on Rich Hickey's
explanation that from now on (= 1 1.0) will return false. I think by
this logic (= 1.0M 1.0) should also be false. I have no idea what the
current branch actually does though -- haven't tried it yet.

>
> Now, one interesting question is whether the hashCode of 40.0M should
> match the hashCode for 40N, because they are two exact numbers that
> represent the same value.
>

The more I think about it, the more I think that big decimals are sort
of their own universe and we really wouldn't want the hashCodes of an
integral bigdecimal to match the integer hashcode. I mean, if 40.0M
and 40.00M are considered different by Java, it seems fruitless to try
to unify these with their integer counterparts.

Garth Sheldon-Coulson

unread,
Jun 25, 2010, 5:39:31 PM6/25/10
to clo...@googlegroups.com
I should have said "they shouldn't be equal" based on Rich Hickey's
explanation that from now on (= 1 1.0) will return false.  I think by
this logic (= 1.0M 1.0) should also be false.  I have no idea what the
current branch actually does though -- haven't tried it yet.


Ah, yeah, fair enough. I would merely like to have floating point hash codes that match floating point equality.

In other words, if we were to stick with the status quo where (= 0.3 0.3M) => true and (= 3/10 0.3) => true, then it would be nice to get BigDec hash codes that match Double hash codes.

But you're right that Rich's message suggests we're moving to a world where (= 0.3 0.3M) => false. I hadn't realized the (good) consequences for floating point equality. In that world, you're right about hash codes. I also see what you mean about contagion.

Personally, I think (= 3 3M) => true and (= 3/10 0.3M) => true would be nice to have, given that (= 3 3N) => true. Both are currently false in equiv. I also think (= 3.0M 3.00M) => true would be nice. As Rich said, there's no particular reason to have Clojure = work exactly like Java .equals(). I think all it would take is a call to .stripTrailingZeros() on each = comparison of a BigDecimal.

Garth

Mark Engelberg

unread,
Jun 25, 2010, 5:55:48 PM6/25/10
to clo...@googlegroups.com
On Fri, Jun 25, 2010 at 2:39 PM, Garth Sheldon-Coulson <ga...@mit.edu> wrote:
> Personally, I think (= 3 3M) => true and (= 3/10 0.3M) => true would be nice
> to have, given that (= 3 3N) => true. Both are currently false in equiv. I
> also think (= 3.0M 3.00M) => true would be nice. As Rich said, there's no
> particular reason to have Clojure = work exactly like Java .equals(). I
> think all it would take is a call to .stripTrailingZeros() on each =
> comparison of a BigDecimal.

Yeah, if it's technically feasible, this definitely makes the most
mathematical sense.

Mike Meyer

unread,
Jun 25, 2010, 6:47:21 PM6/25/10
to clo...@googlegroups.com, richh...@gmail.com
I'll reiterate a question I asked a while back, but was never answered:

Are the bit-bashing operators (bit-*) going to get the same treatment
as the arithmetic operators? I would expect that many of the fields
that benefit if the latter to be fast would also benefit from the former
being fast, and I know that fast algorithms in combinatorics and crypto
tend to make use of them.

<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Andrzej

unread,
Jun 25, 2010, 11:58:43 PM6/25/10
to clo...@googlegroups.com
On Sat, Jun 26, 2010 at 4:04 AM, Rich Hickey <richh...@gmail.com> wrote:
> equiv, the revenge of num

Has it already been decided that some sort of this new numeric tower
will find its way into Clojure?

Personally, I think this change will open a can of worms and make
programming in Clojure more difficult and error prone.

I don't think there is a nice and clean solution to the numeric
problem (Clojure is not the first language tackling it) but there are
two possibilities that could produce an acceptable trade off:
- using boxed math everywhere and optimizing performance by storing
preallocated Integers in a cache,
- using both boxed and primitive math but keeping the boundary between
them as explicit as possible (different operators, no automatic
conversion etc.). Existing operators should default to boxed math (for
runtime safety and compatibility).

Andrzej

B Smith-Mannschott

unread,
Jun 26, 2010, 2:59:26 AM6/26/10
to clo...@googlegroups.com
On Sat, Jun 26, 2010 at 05:58, Andrzej <ndrw...@googlemail.com> wrote:
> On Sat, Jun 26, 2010 at 4:04 AM, Rich Hickey <richh...@gmail.com> wrote:
>> equiv, the revenge of num
>
> Has it already been decided that some sort of this new numeric tower
> will find its way into Clojure?
>
> Personally, I think this change will open a can of worms and make
> programming in Clojure more difficult and error prone.
>
> I don't think there is a nice and clean solution to the numeric
> problem (Clojure is not the first language tackling it) but there are
> two possibilities that could produce an acceptable trade off:
> - using boxed math everywhere and optimizing performance by storing
> preallocated Integers in a cache,

This was suggested on the previous thread on this topic as well, but
I don't think it was pointed out that *Java already does this*.

See the inner class IntegerCache at line 608:

http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/3956cdee6712/src/share/classes/java/lang/Integer.java

And for Long as well (line 543):

http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/3956cdee6712/src/share/classes/java/lang/Long.java


> - using both boxed and primitive math but keeping the boundary between
> them as explicit as possible (different operators, no automatic
> conversion etc.). Existing operators should default to boxed math (for
> runtime safety and compatibility).
>
> Andrzej
>

Stefan Kamphausen

unread,
Jun 26, 2010, 5:06:30 AM6/26/10
to Clojure
Hi,

On 25 Jun., 21:04, Rich Hickey <richhic...@gmail.com> wrote:
> equiv, the revenge of num
[...]
> Feedback welcome,

sorry, no feedback, but one question which is rather important to me:
will this make it into Clojure 1.2?

Kind regards,
Stefan

Nicolas Oury

unread,
Jun 26, 2010, 5:12:48 AM6/26/10
to clo...@googlegroups.com
I think I pointed it out, and I reiterate it will probably not improve performance a lot (Except if you use always the 5 same numbers).

Mike Meyer

unread,
Jun 26, 2010, 9:04:27 AM6/26/10
to clo...@googlegroups.com
[Format recovered from top posting.]

"Nicolas Oury" <nicola...@gmail.com> wrote:
>On Sat, Jun 26, 2010 at 7:59 AM, B Smith-Mannschott
><bsmit...@gmail.com>wrote:
>
>> This was suggested on the previous thread on this topic as well, but
>> I don't think it was pointed out that *Java already does this*.

Doesn't matter if clojure is boxing them as well. Yes, all the boxes will point to the cache, but you still have different boxes, and allocating those is the problem.

>I think I pointed it out, and I reiterate it will probably not improve
>performance a lot (Except if you use always the 5 same numbers).

Reiteration won't make it true.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Andrzej

unread,
Jun 26, 2010, 9:24:21 AM6/26/10
to clo...@googlegroups.com
On Sat, Jun 26, 2010 at 3:59 PM, B Smith-Mannschott
<bsmit...@gmail.com> wrote:
>
> This was suggested on the previous thread on this topic as well, but
> I don't think it was pointed out that *Java already does this*.
>
> See the inner class IntegerCache at line 608:
>
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/3956cdee6712/src/share/classes/java/lang/Integer.java

Thank you. I wasn't aware of it.

It is not exactly what I meant, though. The above Java code
preallocates some low Integers, that are likely to be encountered in
actual programs. It obviously doesn't help at all when you happen to
use other numbers.

What I'd rather like to have is an array of N preallocated objects
waiting to be assigned values and used. This way an allocation cycle
could be triggered every N Integer constructor calls and all boxes
used in a single procedure would be gathered in one place.

Andrzej

Nicolas Oury

unread,
Jun 26, 2010, 9:31:58 AM6/26/10
to clo...@googlegroups.com
http://tech.puredanger.com/2007/02/01/valueof/

Apparently, a program that only access 1 integer (0) is only twice as fast when caching.
We are very far from the *10/20 that occurs.

Nicolas Oury

unread,
Jun 26, 2010, 9:45:34 AM6/26/10
to clo...@googlegroups.com
And, by the way, after having looked into the compiler source, the current Master already uses the valeOf function.

David Powell

unread,
Jun 26, 2010, 11:36:04 AM6/26/10
to Mike Meyer

Hi,

Re: caching boxed ints:

>>I think I pointed it out, and I reiterate it will probably not improve
>>performance a lot (Except if you use always the 5 same numbers).

> Reiteration won't make it true.

At about 10m - 12m into this video, Cliff Click suggests that Java's
caching of Integer objects might do some harm to performance because
it prevents the JIT from being able to do inlining and escape
analysis.

http://www.infoq.com/presentations/click-fast-bytecodes-funny-languages


--
Dave

Daniel

unread,
Jun 25, 2010, 8:45:44 PM6/25/10
to Clojure
Thirded.

On Jun 25, 4:55 pm, Mark Engelberg <mark.engelb...@gmail.com> wrote:

Todd

unread,
Jun 26, 2010, 3:51:57 PM6/26/10
to clo...@googlegroups.com
(running clojure 1.2 snapshot)

Q1: Why does pmap use the number of available processors + 2? I would
have thought it would just use the number of avail processors...

Q2: Could someone clear up my misunderstanding of pmap w/ respect to the
code snippets below? Pmap doesn't seem to be limiting the number of
threads to the number of processors + 2...

I've created an anon func that does't return, so I wouldn't expect the
pmap step function to advance beyond 4 (I have 2 processors):

#1: Limit pmap to # of processors
----------------------------------

user=> (pmap #(while true (do (println "Thread: " (.getId
(Thread/currentThread)) "item: " %)(Thread/sleep 500))) (range
(.availableProcessors (Runtime/getRuntime))))
Thread: 12 item: 0
(Thread: 13 item: 1
Thread: 12 item: 0
Thread: 13 item: 1
Thread: 12 item: 0
Thread: 13 item: 1
Thread: 12 item: 0
Thread: 13 item: 1
Thread: 12 item: 0
Thread: 13 item: 1
Thread: 12 item: 0
Thread: 13 item: 1
Thread: 12 item: 0
Thread: 13 item: 1
Thread: 12 item: 0
Thread: 13 item: 1
Thread: 12 item: 0

--> just two threads running, as expected


#2: Limit pmap to # of processors * 10
--------------------------------------

user=> (pmap #(while true (do (println "Thread: " (.getId
(Thread/currentThread)) "item: " %)(Thread/sleep 500))) (range (* 10
(.availableProcessors (Runtime/getRuntime)))))
Thread: Thread: 12 item: 0
(Thread: 25 item: 13
Thread: 24 item: 12
Thread: 23 item: 11
Thread: 22 item: 10
Thread: 21 item: 9
Thread: 20 item: 8
Thread: 19 item: 7
Thread: 18 item: 6
Thread: 17 item: 5
Thread: 16 item: 4
Thread: 15 item: 3

--> In this short snippet, you can see > 4 threads running...expected?


toddg=> (source pmap)
(defn pmap
"Like map, except f is applied in parallel. Semi-lazy in that the
parallel computation stays ahead of the consumption, but doesn't
realize the entire result unless required. Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead."
{:added "1.0"}
([f coll]
(let [n (+ 2 (.. Runtime getRuntime availableProcessors))
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets))))
<snip>

-Todd

Daniel

unread,
Jun 26, 2010, 7:27:24 PM6/26/10
to Clojure
If there is some sort of new numeric tower on the plate, what would be
some desirable properties?

On Jun 25, 10:58 pm, Andrzej <ndrwr...@googlemail.com> wrote:
> On Sat, Jun 26, 2010 at 4:04 AM, Rich Hickey <richhic...@gmail.com> wrote:
> > equiv, the revenge of num
>
> Has it already been decided that some sort of this new numeric tower
> will find its way into Clojure?
>
> Andrzej

LauJensen

unread,
Jun 28, 2010, 7:27:14 AM6/28/10
to Clojure
Hi Todd,

I'll recommend you to open this up as a separate issue, seeing how
this thread is meant to specifically discuss changes in the 'equiv'
branch. Your examples makes sense, as a 2 item collection cannot open
up more than 2 threads, and a 20 item collection, will open n + 2
threads as promised by pmap.

Lau
Reply all
Reply to author
Forward
0 new messages