Enhanced Primitive Support

Rich Hickey

unread,

Jun 17, 2010, 4:13:54 PM6/17/10

to Clojure

I've been doing some work to enhance the performance, and unify the
semantics, of primitives, in three branches. I've started to document
this work here:

https://www.assembla.com/wiki/show/clojure/Enhanced_Primitive_Support

Feedback welcome,

Rich

Rob Lachlan

unread,

Jun 17, 2010, 5:10:15 PM6/17/10

to Clojure

I think the enhanced support for primitives is fantastic. I'm looking
forward to doing more numerical work in clojure.

Quibble: Using a multiple-recursive algorithm for calculating
fibonnaci values.

user> (defn fib-2 [n] (if (>= n 1)
(loop [i 0 f0 0 f1 1]
(if (== i n) f0 (recur (inc i)
f1
(+ f0 f1))))))
#'user/fib-2
user> (time (fib-2 38))
"Elapsed time: 0.12 msecs"
39088169

I understand that it's just for demonstration purposes, but
impressionable people (like me!) might not know better.
(Of course, we can calculate fibonacci values using the closed form
formula, but that's neither here nor there.)

Other than that, looks great.

Rob

Rich Hickey

unread,

Jun 17, 2010, 5:15:57 PM6/17/10

to Clojure

On Jun 17, 5:10 pm, Rob Lachlan <robertlach...@gmail.com> wrote:
> I think the enhanced support for primitives is fantastic. I'm looking
> forward to doing more numerical work in clojure.
>
> Quibble: Using a multiple-recursive algorithm for calculating
> fibonnaci values.
>
> user> (defn fib-2 [n] (if (>= n 1)
> (loop [i 0 f0 0 f1 1]
> (if (== i n) f0 (recur (inc i)
> f1
> (+ f0 f1))))))

Naive fib is often used as a benchmark, the point being not to do a
fast fib, but comparative, 'what is the perf on naive fib'? Not being
tail-recursive, it tests function calling overhead, and thus here
demonstrates what we gain by not having to box on args and returns.

Rich

Rob Lachlan

unread,

Jun 17, 2010, 5:20:08 PM6/17/10

to Clojure

Ah -- well that makes sense then. Bravo!

Rob

Mark Engelberg

unread,

Jun 17, 2010, 9:55:10 PM6/17/10

to clo...@googlegroups.com

It's great to be discussing the possibility of enhanced primitive support.

I like most of the proposals. But this part troubles me:

# For bigints, specify bigints
* new literal bigint format 42N
# bigints contagious, no more auto-reductions
* (\+ 0N 21 21) => 42N

One of the big reasons I prefer Clojure over languages like Scala/F#
is that in Clojure, numbers "just work". I don't have to worry about
overflow errors, and the use of numbers in sets/hash tables is
generally pretty intuitive (it gets a little wonky if you're using
numbers produced by Java methods, but at least you're good if you stay
within Clojure).

As a case in point, consider writing a naive recursive factorial function.

(defn fact [n]
(if (zero? n) 1 (* n (fact (dec n)))))

In Clojure, (fact 3) produces the Integer 6. (fact 40) produces the
BigInteger 815915283247897734345611269596115894272000000000. It just
works. The answers produced by fact can be used in collections,
compared for equality with other numbers.

In F#, Scala, if you want factorial to not break on something like an
input of 40, you have to essentially use the bigint literal syntax
like what you're proposing:
(defn fact [n]
(if (zero? n) 1N (* n (fact (dec n)))))

Now, the results of this function are "contaminating bigints" which
can't be used as effectively with other Clojure data structures. As
you pointed out, it causes problems when the same number can have two
different type representations depending on the sequence of
computations used to get to that number. This will always be somewhat
of a problem due to Java interop, but it's important to me that all
numbers arrived at through Clojure computations just work with respect
to equality and use in sets/hash tables. The lack of auto-reduction,
in conjunction with the proposal to make equality take numeric type
into account, will really cripple the utility of bigints.

Even more frustrating, in my experience with languages like F# and
Scala, functions written like this are way, way slower. The reason is
that Java's bigint arithmetic is sloooooow. To protect yourself from
overflow, you've forced the function to use bigint arithmetic on
1N*2N*3N*..., well before you even reach a bigint. For this
particular function, it reaches bigint status quite quickly, so the
performance degradation is fairly minimal. But I have found that in
general, requiring programmers to explicitly use bigint arithmetic for
small numbers in order to protect from overflow is an absolute
performance killer. Clojure's technique of auto-promotion to bigint
upon overflow, and then auto-reduction when appropriate, is way, way
faster than using bigints all the time.

In summary:
No auto-reduction would make bigints second-class citizens and make
them difficult to use in other functions and collections.
No auto-promotion to bigints means that programmers who want to
protect from overflow have to use bigints everywhere, which destroys
performance.

So what would be the downside of implementing most of these proposals
(e.g., 1 is a literal for long 1), but retain auto-promotion and
auto-reduction rather than throwing an error for primitive long math?
Seems to me like this would be the best of both worlds. What am I
missing?

Rich Hickey

unread,

Jun 17, 2010, 11:00:57 PM6/17/10

to clo...@googlegroups.com

You raise several points, I'll take them in turn. First, things are
not as dire as you state. Certainly one could write a version of fact
that hardwired bigints as you showed. But you need not - here's the
original naive definition:

(defn fact [n]
(if (zero? n) 1 (* n (fact (dec n)))))

Such logic that doesn't specify one of the components (in this case,
n) is still polymorphic, as are operations with mixed components. So
the power is in the hands of the supplier of n:

user=> (fact 10)
3628800

user=> (fact 40)
java.lang.ArithmeticException: integer overflow

user=> (fact 40N)
815915283247897734345611269596115894272000000000N

I think you can always write code in this way, such that it will work
with any of the numeric types, without dictating.

----
It is true that Java's BigIntegers are slow, especially for numbers
small enough to fit in Long or Integer, so using them for smaller
numbers has overhead. But there are other options, not yet pursued
here. For instance, Kawa has a higher performing biginteger that uses
an internal int component until it overflows. I'm sure no one has a
particular attachment to BigInteger should we find a better
alternative. It would be easy to create a big integer class based on
the Kawa concept, perhaps with a long or int basis overflowing to
BigInteger itself.

http://groups.google.com/group/jvm-languages/msg/0413ed05d1371907

----------
As far as performance killer, that depends on your perspective. It is
my belief that overwhelming majority of code *never* uses numbers that
exceed long, And those people are suffering a 'performance killer' on
the order of 10-20x by having numbers be boxed all the time.

------
The interaction between equality and the BigInteger schism is
something I am still thinking about. That is why the branches are
staged. Only the 'equal' branch has the type-strict =

--------
The thing that you are missing is that there is no single
representation on the JVM* that can hold either a primitive or a
reference to an Object like a BigInteger. That is where the current
semantic difference creeps in - a primitive op sourced by a primitive
must have as its target a primitive - should it produce an object
there would be nowhere to put it. And any interface points (like the
math operations themselves) cannot dynamically take either a primitive
or an object. So the auto-promotion has an unavoidable cost of the use
of objects to hold all values, and the allocation thereof. Opting out
of the objects means a change in semantics. Auto-promotion cannot be
made a uniform semantic for both primitives and objects.

---------
Another alternative is to provide a second set of operations with auto-
promoting, object based semantics. These would never operate on
primitives. Since a key argument for auto-promotion is the poor
behavior of small bigints, making a better bigint seems preferable.

--------
In the end, this does put the interests of two sides of the community
at odds. Only one set of interests can be the default. Part of the
point of this discussion is to measure the sides (so speak up people!).

I think the current default doesn't serve the majority well, and has a
semantic rift. Making ordinary programs that use longs and doubles run
fast requires extensive and tedious annotation that is easy to get
wrong. This work proves it could be completely otherwise.

Rich

* There are proposals:

http://blogs.sun.com/jrose/entry/fixnums_in_the_vm

Note that even on mature platforms like CL with very good tagging and
fixnum systems, actually getting at full-width 32- and 64-bit hardware
operations requires extensive, tedious and error-prone annotation.
I.e. a tagged number is not a hardware number.

David Nolen

unread,

Jun 17, 2010, 11:47:36 PM6/17/10

to clo...@googlegroups.com

On Thu, Jun 17, 2010 at 11:00 PM, Rich Hickey <richh...@gmail.com> wrote:

In the end, this does put the interests of two sides of the community at odds. Only one set of interests can be the default. Part of the point of this discussion is to measure the sides (so speak up people!).

I am squarely on the side of prim/num/equals. The number of questions on the ML, #clojure, StackOverflow, etc. for the past two years about proper type-hinting shows how much incidental complexity the current system brings to the table.

Look no further than:

http://clj-me.cgrand.net/2010/06/04/primitive-support-for-fns/

vs.

http://clj-me.cgrand.net/2010/06/10/primitive-types-support-for-fns-coming-to-a-clojure-branch-near-you/.

The difference is startling.

In order to write decently fast numeric code in Clojure up until now you had to be Christophe Grand ;) or have a fairly good understanding of macros that walk code to tag primitive types. And even so, the inability to return primitives made many nested recursive algorithms perform poorly. The only way to work around this was to convert your functions to macros and inline everything resulting of course in a considerable loss of modularity. Dirty hacks.

For those that need BigInts (the majority of whom are solving Project Euler problems, god bless them), telling them to add a single N is so much simpler than explaining the current complexities to newcomers who don't understand why floating point arithmetic is so darn slow.

More importantly, I think that deftype/defrecord/protocols/static/prim/num/equals represent an astounding work of *unification*. You no longer need to resort to Java - all the tools for building fast code is right there.

My gut feeling is that many Clojure devs won't think to much about this work - they've got their hands full with the rest of the language or are working in domains where they are under the impression that this unification work does not affect them. But I think this kind of work is paving the way for the community to give more of a helping hand to converting core data structures from Java to Clojure, extending them, patching them, and contributing new ones - all in a language that (unlike Java) naturally guides you to "do the right thing".

So though I don't know how much of a stir this thread will create (because I think it's more of a large under the surface wave), I have no doubt it will pay off in a big, big way.

Mark Engelberg

unread,

Jun 17, 2010, 11:57:28 PM6/17/10

to clo...@googlegroups.com

Thanks for the responses.

Going back to the naive factorial function:

(defn fact [n]
(if (zero? n) 1 (* n (fact (dec n)))))

Right now,
user=> (fact 40)
815915283247897734345611269596115894272000000000

Under the proposed changes,

user=> (fact 40)
java.lang.ArithmeticException: integer overflow

So the same code now risks an arithmetic exception. What have I, the
programmer, gained from this new level of risk? The answer, if I'm
the kind of programmer who has no interest in putting in type
annotations into the header of the function, is nothing. There is no
speed improvement without the additional type annotations. So all
I've gained is the /potential/ for a speed improvement.

I assume that most Clojure users really like its dynamic nature. If
this is true, then for most of us, the common case is to NOT annotate
our code with types. Certainly I like the idea of making it as easy
as possible to write fast code in Clojure. I want my dynamically
typed code to be as fast as it can be, and it's nice to know that
there is a way to write statically typed code that is even faster.
But I really don't want to incur a penalty (i.e., possibility of
Arithmetic Exception) when I'm not using type annotations.

I agree that the burden on those who want to write type annotated code
is too high. I agree that this should be made easier. But
ultimately, I still want the burden to be on them (or me, the 1% time
I need it), not those of us who prefer unannotated code. I that some
of the changes you have proposed for moving the annotations into the
header, along with literal notation for long (i.e., 0L rather than
(long 0)) would be preferable to the bigint dichotomy.

I agree that an improved bigint type would address the performance
aspect of my argument. On the other hand, it wouldn't really address
the problem of using these things in sets and hash tables. I think
it's very important that Clojure have one unique canonical
representation for numbers arrived at through Clojure-based
computations. It is very difficult to use numbers sanely if that's
not the case.

I see your point that there are a couple of sides to this argument. I
hope I can persuade some more people to speak up for my side :)

David Nolen

unread,

Jun 18, 2010, 12:20:00 AM6/18/10

to clo...@googlegroups.com

On Thu, Jun 17, 2010 at 11:57 PM, Mark Engelberg <mark.en...@gmail.com> wrote:

I assume that most Clojure users really like its dynamic nature. If
this is true, then for most of us, the common case is to NOT annotate
our code with types. Certainly I like the idea of making it as easy
as possible to write fast code in Clojure.

The problem is that it distinctly *not* easy to write fast numeric code in Clojure. It requires expert Clojure knowledge. On the other hand, the mental burden for someone who wants BigInts in the new system is very low - you'll get a trivial to track exception.

And the new system doesn't remove any of the lovely dynamic stuff. There's absolutely no compromise there. In fact in the new system you can write your code initially sans ^:static and fn type hints entirely, you work to get your algorithm right, play around at the REPL as usual, then you add ^:static, type hint only the arglist and return value ...

BOOM, you get raw JVM speed.

But of course I'm biased as you are :) In two years I've never once been interested in BigInts. I have however have mangled and munged my code since 2008 trying to get basic numeric algorithms to perform even remotely in the vicinity to their Java counterparts to no avail.

David

Daniel Gagnon

unread,

Jun 18, 2010, 12:22:27 AM6/18/10

to clo...@googlegroups.com

The problem is that it distinctly *not* easy to write fast numeric code in Clojure. It requires expert Clojure knowledge. On the other hand, the mental burden for someone who wants BigInts in the new system is very low - you'll get a trivial to track exception.

Also, when done, it doesn't look like clojure anymore.

+1 on the new behaviour.

Antony Blakey

unread,

Jun 18, 2010, 12:24:55 AM6/18/10

to clo...@googlegroups.com

On 18/06/2010, at 12:30 PM, Rich Hickey wrote:

> You raise several points, I'll take them in turn. First, things are not as dire as you state. Certainly one could write a version of fact that hardwired bigints as you showed. But you need not - here's the original naive definition:
>
> (defn fact [n]
> (if (zero? n) 1 (* n (fact (dec n)))))
>
> Such logic that doesn't specify one of the components (in this case, n) is still polymorphic, as are operations with mixed components. So the power is in the hands of the supplier of n:
>
> user=> (fact 10)
> 3628800
>
> user=> (fact 40)
> java.lang.ArithmeticException: integer overflow
>
> user=> (fact 40N)
> 815915283247897734345611269596115894272000000000N

That's fine for fact, but as a consumer of library functions, how do I know when I should pass in bigints? How do I know when an intermediate value, for my particular combination of parameter values, is going to overflow?

Should I just use 'N' on every number to every function? This proposal destroys the black-box nature of functions, and seems to me to be the easy solution rather than the best solution, which, I acknowledge, might be very hard work, possibly involving every such function having 2 polymorphic variants, one generic boxed form and one unboxed form, automatically selected and call-site-cached like OO dispatch.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

If you pick up a starving dog and make him prosperous, he will not bite you. This is the principal difference between a man and a dog.
-- Mark Twain

David Nolen

unread,

Jun 18, 2010, 12:56:59 AM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 12:24 AM, Antony Blakey <antony...@gmail.com> wrote:

That's fine for fact, but as a consumer of library functions, how do I know when I should pass in bigints? How do I know when an intermediate value, for my particular combination of parameter values, is going to overflow?

If the library isn't communicating that to you isn't that just a broken API that you don't want to participate in? Tag information on arglists is available:

(defn foo [^BigInteger a b ^BigInteger c]

(+ a b))

(print (map (fn [xs] (map meta xs)) (:arglists (meta #'foo))))

;(({:tag BigInteger} nil {:tag BigInteger}))

IDEs can easily (and should) expose this.

David

Jason Wolfe

unread,

Jun 18, 2010, 1:03:59 AM6/18/10

to Clojure

At first read this all looks awesome to me, and I'd love to see prim/
num/equals in 1.2!

Type hinting math has been a pain, I rarely need bigints and would be
happy being explicit about them when I do, and I can't think of
obvious cases where I'd need (equals? [2.0] [2]), which I gather will
no longer be possible in the equals branch (for any value of
"equals?").

Moreover, the distinction between = and .equals (and, hence, = and set/
map lookups) was a significant source of complexity and confusion for
me learning Clojure, and I'd be happy to see that go away.

I guess it'll take some experience to see if similar difficulties
arise with collections & boxing in the new setting, but at first
glance the new design seems very well thought-out.

One minor question: with this change, will it still be possible to do
primitive math on floats and ints? On modern machines I suppose
there's little reason to want this (other than perhaps saving memory),
but for other hosts like 32-bit dalkvik or javascript I'd imagine this
might present a performance problem?

Thanks!

Richard Newman

unread,

Jun 18, 2010, 1:22:39 AM6/18/10

to clo...@googlegroups.com

I don't want to commit to a full-fledged response on this issue (yet), but:

On the other hand, the mental burden for someone who wants BigInts in the new system is very low - you'll get a trivial to track exception.

Assuming that I understand your implication (that ArithmeticExceptions are trivial to track): that might be true in Java, with checked exceptions, but it wouldn't be true in Clojure. If some operations overflow some of the time, you need thorough tests… and for all of the libraries you use to have thorough tests. It's not a small mental burden to constantly fear that any library call at all could suddenly overflow at any time. What are we, C programmers?

Automatic type promotion is a good thing, if not without its quirks (map behavior being one of them). One of the fantastic things about Common Lisp is its rich numeric tower.

That said, if we could have speed and safety…

Mark Engelberg

unread,

Jun 18, 2010, 1:23:35 AM6/18/10

to clo...@googlegroups.com

On Thu, Jun 17, 2010 at 9:20 PM, David Nolen <dnolen...@gmail.com> wrote:
> The problem is that it distinctly *not* easy to write fast numeric code in
> Clojure. It requires expert Clojure knowledge.

Right, but with just the prim branch and a shorthand for long literals, you get:
(defn ^:static fib ^long [^long n]
(if (<= n 1L)
1L
(+ (fib (dec n)) (fib (- n 2L)))))

which really isn't that bad. So my claim is that the prim branch gets
you far enough along the road to easy fast numeric code, without
screwing with the numeric tower.

Mark Engelberg

unread,

Jun 18, 2010, 1:33:17 AM6/18/10

to clo...@googlegroups.com

Hmmm, here's an idea:

How about inside of a static function with type annotations, all
literals are automatically converted into long and doubles, otherwise
behavior stays the same.

So:

(defn ^:static fib ^long [^long n]

(if (<= n 1)
1
(+ (fib (dec n)) (fib (- n 2)))))

is automatically converted to

(defn ^:static fib ^long [^long n]

(if (<= n (long 1))
(long 1)
(+ (fib (dec n)) (fib (- n (long 2))))))

whereas
(defn fib [n]
(if (<= n 1)
1
(+ (fib (dec n)) (fib (- n 2)))))
works the way it always did.

This behavior makes a lot of sense because (I think) the only time you
really gain from literals being primitives is when the variables
around it are also hinted. Otherwise, the polymorphic nature of the
variable will cause the literal to get boxed anyway.

Richard Newman

unread,

Jun 18, 2010, 1:43:44 AM6/18/10

to clo...@googlegroups.com

> ... which, I acknowledge, might be very hard work, possibly

> involving every such function having 2 polymorphic variants, one
> generic boxed form and one unboxed form, automatically selected and
> call-site-cached like OO dispatch.

That's pretty much what Common Lisp compilers do. The compiler (or
your own compiler macro, if you so choose) can examine the input forms
and the compiler environment to decide what code to produce.

The following is from Allegro Common Lisp, but all of the input is
ANSI CL -- part of the spec.

Observe how the compiler switches between different implementations of
trivial addition as it gets a progressively looser or tighter bound on
x. (I've elided some of the irrelevant preamble.)

What would be nice is if Clojure and/or the JVM could produce safe
code/bytecode that wouldn't overflow, but could also be JIT-optimized
into unboxed primitive arithmetic for known input ranges.

;; Range known, optimizing for speed: assembly add instruction.
cl-user(7): (disassemble
(compile nil
(lambda (x)
(declare (type (integer 0 255) x)
(optimize (speed 3)))
(+ x 2))))
...
15: 48 83 c7 10 add rdi,$16
19: f8 clc
20: 4c 8b 74 24 10 movq r14,[rsp+16]
25: c3 ret

;; We know that x is a fixnum: the arithmetic doesn't need a
function call, but
;; the result will be a boxed bignum, because fixnum + 2 won't fit
in a fixnum.
;; If this function were inlined elsewhere, the compiler might
avoid that decision.
cl-user(8): (disassemble
(compile nil
(lambda (x)
(declare (type fixnum x)
(optimize (speed 3)))
(+ x 2))))
...
21: 74 01 jz 24
23: 17 (pop ss) ; sys::trap-signal-hit
24: 48 c1 ff 03 sar rdi,$3
28: 49 c7 c5 02 00 movq r13,$2
00 00
35: 49 03 fd addq rdi,r13
38: 4c 8b ef movq r13,rdi
41: 49 d1 e5 sall r13,$1
44: 70 19 jo 71
46: 49 d1 e5 sall r13,$1
49: 70 14 jo 71
51: 49 d1 e5 sall r13,$1
54: 70 0f jo 71
56: 49 8b fd movq rdi,r13
59: f8 clc
60: 48 8d 64 24 78 leaq rsp,[rsp+120]
65: 4c 8b 74 24 10 movq r14,[rsp+16]
70: c3 ret
71: 41 ff 57 47 call *[r15+71] ; sys::box-to-new-bignum
75: eb ef jmp 60
77: 90 nop

;; Nothing known: call to excl::+_2op, the two-arg addition function.
cl-user(9): (disassemble
(compile nil
(lambda (x)
(declare (optimize (speed 3)))
(+ x 2))))
...
21: 74 01 jz 24
23: 17 (pop ss) ; sys::trap-signal-hit
24: 40 f6 c7 07 testb dil,$7
28: 75 12 jnz 48
30: 48 83 c7 10 add rdi,$16
34: f8 clc
35: 70 1e jo 67
37: 48 8d 64 24 78 leaq rsp,[rsp+120]
42: 4c 8b 74 24 10 movq r14,[rsp+16]
47: c3 ret
48: 49 8b af 2f ff movq rbp,[r15-209] ; excl::+_2op
ff ff
55: 48 c7 c6 10 00 movq rsi,$16 ; 2
00 00
62: ff 53 d0 call *[rbx-48] ; sys::tramp-two
65: eb e2 jmp 37
67: 49 c7 c5 10 00 movq r13,$16 ; 2
00 00
74: 49 2b fd subq rdi,r13
77: eb e1 jmp 48
79: 90 nop

Antony Blakey

unread,

Jun 18, 2010, 1:44:33 AM6/18/10

to clo...@googlegroups.com

On 18/06/2010, at 2:26 PM, David Nolen wrote:

> On Fri, Jun 18, 2010 at 12:24 AM, Antony Blakey <antony...@gmail.com> wrote:
>
> That's fine for fact, but as a consumer of library functions, how do I know when I should pass in bigints? How do I know when an intermediate value, for my particular combination of parameter values, is going to overflow?
>
> If the library isn't communicating that to you isn't that just a broken API that you don't want to participate in? Tag information on arglists is available:

No. How do I know whether I need to pass a bigint literal or not? For what possible input values will overflow be an issue? It's easy to see with fact, but that's because the function implementation is clearly visible, and has well known growth properties of a single argument. And it's not just literals - when functions are composed from other functions that have this same problem and return primitive types - hence using the static versions of functions - the question in general becomes impossible to answer, not only because there might be a number of parameters with interdependencies, but because the structure of intermediate results may not be visible to the consumer. And what if you don't have the source to the function? This proposal not only puts the burden of promotion and representation choice onto the user of a function, but it requires that promotion be done before the function is executed, in a precautionary way, rather than iff needed.

This proposal is IMO a very bad idea.

Antony Blakey
--------------------------

CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Lack of will power has caused more failure than lack of intelligence or ability.
-- Flower A. Newhouse

David Nolen

unread,

Jun 18, 2010, 2:01:08 AM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 1:44 AM, Antony Blakey <antony...@gmail.com> wrote:

> On Fri, Jun 18, 2010 at 12:24 AM, Antony Blakey <antony...@gmail.com> wrote:
This proposal is IMO a very bad idea.

Why do you need know? You're assumption is built on someone writing a writing a bad library (one that doesn't handle long & BigInt that should). The tools are there for the library to behave correctly regardless of the input.

(defn fact [n]

(if (zero? n) 1N (* n (fact (dec n)))))

(fact 40)

(fact 40N)

both work. The burden is not on the consumer but the designer of the library. What's the problem?

David

Mark Engelberg

unread,

Jun 18, 2010, 2:10:32 AM6/18/10

to clo...@googlegroups.com

Why this is marked as abuse? It has been marked as abuse.

Report not abuse

On Thu, Jun 17, 2010 at 11:01 PM, David Nolen <dnolen...@gmail.com> wrote:
> What's the problem?
> David

It's a composability issue.
Elaborating on Anthony's explanation, let's say you call (fact (foo n)).

Do you know what values of n, when passed to foo, produce a value
large enough that fact will produce an exception? If you do know
this, do you know enough about the internals of foo to know the best
way to fix the problem? If you pass a bigint to foo, are you
guaranteed to get a bigint as an output, or do you need to explicitly
wrap a protective call to bigint between the two calls? In other
words, do you call (fact (foo 20N)) or (fact (bigint (foo 20))?

This imposes too high a burden on any programmer who cares about safety.

David Nolen

unread,

Jun 18, 2010, 2:15:16 AM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 2:10 AM, Mark Engelberg <mark.en...@gmail.com> wrote:

This imposes too high a burden on any programmer who cares about safety.

Rich Hickey's branch work boil down to:

1) double/long crowd get to stop eating bark

2) BigInt crowd loses free lunch

I'd say this is a bona fide community compromise and Rich Hickey saw it coming :)

David

David Nolen

unread,

Jun 18, 2010, 2:35:37 AM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 2:10 AM, Mark Engelberg <mark.en...@gmail.com> wrote:

Elaborating on Anthony's explanation, let's say you call (fact (foo n)).

This imposes too high a burden on any programmer who cares about safety.

Don't buy it. That's the whole point of BigInt contagion. If fact and foo are correctly written this will work.

(defn fact [n]

(if (zero? n) 1N (* n (fact (dec n)))))

(defn foo [n]

(inc n))

(fact (foo 40))

(fact (foo 40N))

Both work.

Richard Newman

unread,

Jun 18, 2010, 2:58:08 AM6/18/10

to clo...@googlegroups.com

This imposes too high a burden on any programmer who cares about safety.

Don't buy it. That's the whole point of BigInt contagion. If fact and foo are correctly written this will work.

It only takes one library to screw up, and the whole stack falls down.

Screwing up can occur because of omission (fact being written with a "1", not "1N", and not being tested with large inputs), or by design.

This whole debate arose because people care about the speed of their numeric functions! The safety bignums (which make things slower) will be omitted for performance reasons, or by accident, and people won't obey the contract of the function… or won't be able to because of the complex interactions between libraries. I can't control where every value in my program goes.

In Clojure as it stands now, those edge cases would see a small performance penalty as bignums occurred (and then it would go away if they got demoted!). After this change, a one-in-a-million collision of numbers and performance-oriented programming would throw an exception with no way to recover from higher up the stack.

A programmer who really cares about safety would thus indeed have to shoulder a massive burden: either review and get complete test coverage over every library in their dependency stack, or ensure that no non-bignum can ever enter a library function. I guess that also means no strings, no file sizes, no URLs, no values which might get recombined… because any of those values could produce a non-bignum and enter an unsafe Clojure library function, causing an overflow with no possibility of continuing. (This ain't Common Lisp.)

Having digested Rich's notes, pretty much the only thing that I disagree with is the lack of automatic bignum promotion/demotion. It seems like too much of a bad tradeoff, sacrificing one of Clojure's selling points for a little numeric performance. Thus far, Clojure has done pretty well in lexically scoping its "please make this fast and unsafe" bits — primitive calls, loop, transients. This would break that pattern.

I wonder: is there's a middle ground, where primitives are automatic almost everywhere, but bignum promotion on overflow is still possible?

Antony Blakey

unread,

Jun 18, 2010, 3:12:11 AM6/18/10

to clo...@googlegroups.com

+1, what I've been trying to say.

> I wonder: is there's a middle ground, where primitives are automatic almost everywhere, but bignum promotion on overflow is still possible?

That's what I was suggesting by having two versions of functions, one with primitives and one with boxed values. I did some investigation on this for a Smalltalk JIT using LLVM, in the form of partial specialisation e.g. dynamically generating type-specialised versions of functions (methods in ST) and using the types available at the call site to dispatch - which is more obvious in Clojure because we already have multi-method dispatch. For Clojure you wouldn't test all types, it could just be a binary choice of all primitives where expected or not Of course where a type-specialised function calls another type-specialised function you can avoid the dispatch entirely because you have the type information at the call site. This way, you can pay for the type-driven dispatch once at the start of a computation and end up doing minimal testing (basically dynamically checking that type invariants are maintained e.g. args haven't been promoted in the case under discussion). IMO this is speed + safety.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Reflecting on W.H. Auden's contemplation of 'necessary murders' in the Spanish Civil War, George Orwell wrote that such amorality was only really possible, 'if you are the kind of person who is always somewhere else when the trigger is pulled'.
-- John Birmingham, "Appeasing Jakarta"

Chris Dean

unread,

Jun 18, 2010, 3:36:39 AM6/18/10

to clo...@googlegroups.com

Richard Newman <holy...@gmail.com> writes:
> Having digested Rich's notes, pretty much the only thing that I
> disagree with is the lack of automatic bignum promotion/demotion. It
> seems like too much of a bad tradeoff,

This nicely summarizes my thoughts. The other parts of Rich's notes
seem very elegant, but I can see lack of promotion leading to problem in
my own code. Some external input causing an intermediate result to
overflow would certainly happen. Probably at the least convenient time
possible.

(Have we discarded the concept of a set of non-promoting math functions
that are only used during optimization? prim-+ prim-/ etc.)

Cheers,
Chris Dean

Nicolas Oury

unread,

Jun 18, 2010, 4:44:14 AM6/18/10

to clo...@googlegroups.com

Dear all,

just finished to read the proposals.

- prim is great and would allow to keep performance in more idiomatic code

- I am pro-num.

It allows to get read of plenty of annotations or java methods.

Annotations are difficult for beginners, makes code hard to read, maintain and modify, even for experts.

Moreover, in those days of 64-bits integers, the need for bigint is quite rare, except for Fibonacci or factorial functions ;-b.

That's a bit provocative, but I think most code that needs bigints are "number-centric" code.

I mean code whose primary interest is the treatment of numbers.

Whereas every single program suffers from annotation or boxing.

Currently, all programs have to pay (annotations, speed, and/or java code).

I feel it is fairer that a few programs, that have high expectation on numbers, should pay.

I think, however, it is important we ensure they have a way to pay. And that it is not too expensive.

Maybe, it could be interesting to keep the old behaviour in safe-+, safe-* and the like?

Or, as Rich said, to make a good big numbers implementation that scales well to small integers.

Maybe, there should be a compiler option for default arithmetic :

* choose from long (the default default), int (for people wanting to run clojure on a mobile phone), FastAndCleverBigNums (for people caring about safety and having unusually big numbers)

* the same could be made for default decimal arithmetic.

One of the great benefit of num, is that it allows:

- equal, which is a big step forward.

Altogether, I like very much the fact that these proposals, together with protocols and datatypes, gives a way to obtain high performance Clojure code in a Java-agnostic way.

It seems a good thing for clojure in clojure and porting to other platforms.

Best regards,

Nicolas.

Nicolas Oury

unread,

Jun 18, 2010, 4:48:49 AM6/18/10

to clo...@googlegroups.com

On a totally different point:

is there a way of having he equivalent of #^final/#^static for defs?

Sometimes, you know a def won't be dynamically bound/redefined.

Would there be a benefit for the JITed code (less memory barriers or more asm reordering) to give a way to pass this information

to the JVM?

Best,

Nicolas.

Christophe Grand

unread,

Jun 18, 2010, 5:44:52 AM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 8:58 AM, Richard Newman <holy...@gmail.com> wrote:
> It only takes one library to screw up, and the whole stack falls down.
> Screwing up can occur because of omission (fact being written with a "1",
> not "1N", and not being tested with large inputs), or by design.

It can already happen: a library can screw you by design.

(defn fact [n] (loop [r (int 1) n (int n)] (recur (* r n) (dec n))))
user=> (fact 40)
java.lang.ArithmeticException: integer overflow (NO_SOURCE_FILE:0)

Actually people who want numerical perfs have to do very invasive type
hinting and other interfaces workaround because primitives are...
primitives (not objects).

With contagious bigints (let's nick name them "safeints" and assume
they are not BigInteger but something à la kawa) a single N on
literals or having all your inputs going through a safeint conversion
would trigger safe computations (unless you try to call a static
primitive-hinted fn but the compiler would then bark).

Of course a rogue fn can still internally convert to primitives and
overflow but you can't do anything to protect you from that.

As for mixing several numebr representations in a single collection, I
think it's a problem of normalizing data but most of the time your are
either in safeland or in fastland so your representations will be
homogeneous (safehints or (boxed) primitives).

A middleground may be to keep the semantics proposed by Rich but:
* integer literals default to safeint (unless suffixed by L for long)
* scientific notation and decimal literals default to double (unless
suffixed by M or N)
* implement a better bigint.

Christophe

--
European Clojure Training Session: Brussels, 23-25/6 http://conj-labs.eu/
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.cgrand.net/ (en)

Michał Marczyk

unread,

Jun 18, 2010, 6:35:01 AM6/18/10

to clo...@googlegroups.com

On 18 June 2010 05:57, Mark Engelberg <mark.en...@gmail.com> wrote:
> I assume that most Clojure users really like its dynamic nature. If
> this is true, then for most of us, the common case is to NOT annotate
> our code with types. Certainly I like the idea of making it as easy
> as possible to write fast code in Clojure. I want my dynamically
> typed code to be as fast as it can be, and it's nice to know that
> there is a way to write statically typed code that is even faster.
> But I really don't want to incur a penalty (i.e., possibility of
> Arithmetic Exception) when I'm not using type annotations.
>
> I agree that the burden on those who want to write type annotated code
> is too high. I agree that this should be made easier. But
> ultimately, I still want the burden to be on them (or me, the 1% time
> I need it), not those of us who prefer unannotated code. I that some
> of the changes you have proposed for moving the annotations into the
> header, along with literal notation for long (i.e., 0L rather than
> (long 0)) would be preferable to the bigint dichotomy.

QFT -- I'm in absolute agreement.

It seems to me that in its current shape, the new design -- actually
the "no automatic promotion/demotion" part -- would amount to
sacrificing correctness for performance *by default*; that doesn't
strike me as a good way to go. Not a single language I like using does
that... this of course includes my favourite language, Clojure, as it
currently stands on the master branch. :-)

Part of the reason why I think this is a bad idea is that while it is
all very well to say that people should know what they're doing,
libraries should be well-designed and have excellent test coverage
etc., this is not what happens in practice. (It's also perfectly
possible to write correct concurrent programmes with locks...). And
not necessarily through lack of dilligence; I know I have learned many
a gotcha by getting bitten by it (and rather expect to accumulate
plenty more "bites" in the years to come).

I suppose that's fine as long as one is learning something useful in
the process, but I should think that top numeric performance is
something required only by a certain part of the community -- a
sizeable one, no doubt, but hardly a majority share -- in fact
comprising the people most likely to know what they're doing when
dealing with numbers (why would someone *not* doing involved numeric
computations know what s/he's doing with numbers on a comparable level
to someone who is?). Making things significantly easier for them would
be welcome, of course, but perhaps not at the cost of making
*everyone* deal with entirely new kinds of complexity to support that.

I wouldn't know if the "good" solution Antony Blakey mentions is
feasible currently, but if it isn't, I'd prefer the language to err on
the side of caution, while perhaps offering a better system for
hand-annotating things for top performance (some people have been
using pretty ingenious macros to ease things a little -- perhaps these
could be improved upon to produce an acceptable close-to-universal
solution)?

Right, so these are my initial impressions. I'll continue thinking
about this, the new developments do seem attractive in many ways
(^:static is very cool indeed!) and I'm curious to see if I might be
able to bring myself to see more positive sides to the long/bigint
split (as well as come to an opinion on the equals issue).

All the best,
Michał

Carson

unread,

Jun 18, 2010, 6:47:31 AM6/18/10

to Clojure

Just tried out num branch and I really like how easy it is to be
fast! However...

Consider:

(defn fact [n] (if (zero? n) 1 (* n (fact (dec n)))))
(defn twice-fact [n] (fact (fact n)))
(defn bad-twice-fact [n] (fact (-> n fact range last inc)))

user=> (fact (fact 5))
java.lang.ArithmeticException: integer overflow (NO_SOURCE_FILE:3)
user=> (fact (fact 5N))
6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000N
user=> (twice-fact 5)
java.lang.ArithmeticException: integer overflow (NO_SOURCE_FILE:5)
user=> (twice-fact 5N)
6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000N
user=> (bad-twice-fact 5)
java.lang.ArithmeticException: integer overflow (NO_SOURCE_FILE:7)
user=> (bad-twice-fact 5N)
java.lang.ArithmeticException: integer overflow (NO_SOURCE_FILE:8)

> Don't buy it. That's the whole point of BigInt contagion. If fact and foo
> are correctly written this will work.

BigInt contagion doesn't help if in some convoluted manner a BigInt's
value is used to construct a primitive sufficiently large that later
causes an overflow.

It'd sure be nice if I could just write ((safe bad-twice-fact) 5) and
get back a BigInt. Maybe ((safe foo) x) can ensure all arguments and
return values of function foo are boxed, and recursively, ensures
boxing of arguments and return values for all functions called by foo,
etc. Is that possible? Would that guarantee safety?

Carson

On Jun 17, 11:35 pm, David Nolen <dnolen.li...@gmail.com> wrote:

Nicolas Oury

unread,

Jun 18, 2010, 7:22:21 AM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 11:47 AM, Carson <c.sci...@gmail.com> wrote:

> Don't buy it. That's the whole point of BigInt contagion. If fact and foo
> are correctly written this will work.

BigInt contagion doesn't help if in some convoluted manner a BigInt's
value is used to construct a primitive sufficiently large that later
causes an overflow.

I would like to see that in real examples. And not in something that tend to be mainly a program that crunch numbers.

(I think a program "about numbers" could take a bit of complexity to handle numbers.

If you write a program to that generates web page, you agree to have an html library and not have "every structure in clojure is html".)

On the other hand, programs that have few number computations - mainly to handle a few stats and indices - but that are spread in a lot of code

(I think most program are like that) should benefit from good performance without annotating all the code.

Carson

unread,

Jun 18, 2010, 7:54:41 AM6/18/10

to Clojure

On Jun 18, 2:44 am, Christophe Grand <christo...@cgrand.net> wrote:
> With contagious bigints (let's nick name them "safeints" and assume
> they are not BigInteger but something à la kawa) a single N on
> literals or having all your inputs going through a safeint conversion
> would trigger safe computations (unless you try to call a static
> primitive-hinted fn but the compiler would then bark).

I think the example

(defn bad-twice-fact [n] (fact (-> n fact range last inc)))

shows it takes more than safeint conversion of arguments to the
"surface" function to trigger safe computations. All functions called
by the "surface" function needs to have their arguments converted to
bigints too, and their return values probably, just in case. Although
I'm not sure if even that's enough without auto-promotion (if I
understand auto-promotion correctly...). But note the example doesn't
even use any type hints or explicit conversion to primitives (the
point is the conversion may occur as a legitimate side effect of doing
something else).

*Still*, I'm in favour of fast by default, with an *easy* way to make
things safe as an option.

Carson

unread,

Jun 18, 2010, 8:10:01 AM6/18/10

to Clojure

On Jun 18, 4:22 am, Nicolas Oury <nicolas.o...@gmail.com> wrote:

> On Fri, Jun 18, 2010 at 11:47 AM, Carson <c.sci.b...@gmail.com> wrote:
> > Don't buy it. That's the whole point of BigInt contagion. If fact and foo
> > > are correctly written this will work.
>
> > BigInt contagion doesn't help if in some convoluted manner a BigInt's
> > value is used to construct a primitive sufficiently large that later
> > causes an overflow.
>
> > I would like to see that in real examples. And not in something that tend
>
> to be mainly a program that crunch numbers.
> (I think a program "about numbers" could take a bit of complexity to handle
> numbers.
> If you write a program to that generates web page, you agree to have an
> html library and not have "every structure in clojure is html".)

I agree! Sorry I have no real examples, just my contrived example.
But note the example I used isn't *that* contrived, in the sense that
it doesn't use any type hints or explicit conversion to primitives to
cause an overflow on purpose. The point is that the overflow may occur
as a legitimate side effect of doing something else, whereas it
currently will not.

And *still*, I'm in favour of fast by default, with an *easy* way to
make things safe as an option. The key is to make the safe option
*easy*.

Laurent PETIT

unread,

Jun 18, 2010, 8:10:46 AM6/18/10

to clo...@googlegroups.com

I'm no expert in this field.

What I would like to avoid, is having to become expert in the field to
expect the prevention of more :

* nightmare debug sessions because of my own code or third party
libraries code
* crash in production code

which might derive from the choices.

I tend to think that it seems wrong to require more knowledge from
people wanting to write *more* correct code than from people wanting
to write faster/performant code. This is were I would place the
cursor.

2010/6/18 Carson <c.sci...@gmail.com>:

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Laurent PETIT

unread,

Jun 18, 2010, 8:12:49 AM6/18/10

to clo...@googlegroups.com

2010/6/18 Carson <c.sci...@gmail.com>:

> And *still*, I'm in favour of fast by default, with an *easy* way to
> make things safe as an option. The key is to make the safe option
> *easy*.

The solution is easy: write java code by default, and clojure code
when you want to be safe :-p

Lee Spector

unread,

Jun 18, 2010, 8:39:12 AM6/18/10

to clo...@googlegroups.com

I too think that "correct by default" is a much better idea than "fast by default."

One way I think about this is that the meaning of +, for example, is the sum of its numeric arguments. The performance characteristics are not part of this core meaning, although of course everyone will be happier the better the performance is. Nonetheless, I think that the meaning should be correctly maintained as a first priority, and that means giving the right answer regardless of how many bits are needed.

I frequently produce BigIntegers, even in code that's not particularly math-heavy. It has always galled me that in most programming languages any such code breaks because of low-level implementation issues that I don't want to have to think about most of the time. I usually just want the right answer, and I certainly don't want to have to think about the binary representations every time I write code that handles numbers. If I find that my code is too slow then I'll have to think about it, and that's when it makes sense to have to deal with declarations or type hints or special faster unsafe arguments or whatever. Ideally this will take as little work as possible, but I think it'd be a bad approach to make fast free (default) at the cost of saying "but the meaning of + may not actually be numeric sum, and this depends on the number of bits required to represent your numbers, so you better always be thinking about low level representation stuff whenever you use numbers."

-Lee

--
Lee Spector, Professor of Computer Science
School of Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspe...@hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438

Check out Genetic Programming and Evolvable Machines:
http://www.springer.com/10710 - http://gpemjournal.blogspot.com/

David Nolen

unread,

Jun 18, 2010, 8:45:45 AM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 6:47 AM, Carson <c.sci...@gmail.com> wrote:

(defn fact [n] (if (zero? n) 1 (* n (fact (dec n)))))
(defn twice-fact [n] (fact (fact n)))
(defn bad-twice-fact [n] (fact (-> n fact range last inc)))

Not only is it contrived, under the proposal, this implementation of fact is broken. It needs to be fixed to:

(defn fact [n] (if (zero? n) 1N (* n (fact (dec n)))))

With that change bad-twice-fact is not actually bad.

As Christophe Grand has mentioned one of the main things that is be complained about from the BigInt supporters *is already a problem* in Clojure. Nothing new to see here.

Finally having actually tried out the new branches, this is a good change for me. I'd like to hear more from the BigInt crowd about whether getting their code to work with the new branches is actually causing problems.

So far it just seems like talk.

ajuc

unread,

Jun 18, 2010, 8:48:25 AM6/18/10

to Clojure

+1 to safe-by-default.

I use clojure for games, but if it will be possible to easily make
computations fast, safe-by-default is more important.

Rich Hickey

unread,

Jun 18, 2010, 8:49:41 AM6/18/10

to clo...@googlegroups.com

This L suffix idea is a non-starter. Here's why:

We're in a reader based language. The reader reads objects. When the
reader reads 42, it must return an object. It cannot return an int or
a long primitive. Right now, it returns an Integer. I've also
conducted experiments where it returned a Long. Certainly 42000000000
will be a Long. What different thing would the reader return given
42L, or 42000000000L? Given an int 42 what should the printer print?
And long 42, Long 42? Would (= 42L (read-string (pr-str 42L)))?

------
The second thing I don't see being acknowledged is that the real
problem here is the semantic difference. It's not a matter of getting
at a primitive operation when handed a boxed number - that already
happens, and pretty efficiently. It is the fact that the results have
to go somewhere, and the difference in 'where' creates a difference in
semantics. In master right now:

(+ x y)

and

(+ (identity x) y)

can produce different results. We already have bug reports around
this, and user confusion. Fixing it requires adopting a single
semantic for +. Choosing auto-promotion precludes the use of + on
primitives, a huge loss IMO. Choosing throw-on-overflow precludes auto-
promotion, but preserves polymorphism otherwise, and gives the
compiler free reign to optimize since it will never be altering the
semantics by doing so.

I don't see a way around this fundamental dichotomy. The semantics for
+ should be unified, and the operator that makes the other choice
needs to be called something else, or placed somewhere else.

Rich

Konrad Hinsen

unread,

Jun 18, 2010, 8:56:35 AM6/18/10

to clo...@googlegroups.com

On 18.06.2010, at 14:49, Rich Hickey wrote:

> I don't see a way around this fundamental dichotomy. The semantics for + should be unified, and the operator that makes the other choice needs to be called something else, or placed somewhere else.

My preference would be "placed somewhere else". We have namespaces, so why not profit from them?

My vote would be for:

1) clojure.core/+ doing bigint promotion and reduction under all circumstances (no optimizations that would break the semantics)

2) clojure.fastmath/+ doing type-specific and optimized addition.

This would retain the "safe by default" approach that I think fits Clojure best, and yet make it easy to use the "fast" version everywhere with nothing to change but the ns form at the top of a module.

Konrad.

Rich Hickey

unread,

Jun 18, 2010, 9:11:03 AM6/18/10

to clo...@googlegroups.com

------
While range could be written differently, this is tantamount to saying
- if I use a bigint to lookup another number in some mapping, it might
not be a bigint. Yes, ok, but not a real issue.

-------------
Note - I don't mean to single you out, as others are using the same
language, but I find this whole characterization of throw-on-overflow
being somehow 'unsafe' to be a bit specious.

I want it to be perfectly clear to everyone, this is not a choice
between good math and bad math, i.e. C/Java style truncating/wrapping
of results. That will never be the default. Such silent truncation/
wrap is truly unsafe, and often the proposed alternative. It was a
surprising result that optimizations of hot spot, combined with
pipelining and speculative execution on modern processors, makes the
overflow checking a very minor overhead above the raw operations, and,
in my mind, a third, new, viable and safe high-performance option.

Is a program that might throw an exception unsafe? If so, no Java
program is safe.

Is a vector that throws an exception when asked to store something at
an index out of range, rather than grow to accommodate it, unsafe?
This is in many ways similar - you are creating a result that doesn't
fit, and need to be using a bigger container.

So, let's go easy on the hyperbole. The behavior might not be what you
desire, but it is not unsafe.

Rich

Carson

unread,

Jun 18, 2010, 9:19:18 AM6/18/10

to Clojure

On Jun 18, 5:45 am, David Nolen <dnolen.li...@gmail.com> wrote:

> On Fri, Jun 18, 2010 at 6:47 AM, Carson <c.sci.b...@gmail.com> wrote:
> > (defn fact [n] (if (zero? n) 1 (* n (fact (dec n)))))
> > (defn twice-fact [n] (fact (fact n)))
> > (defn bad-twice-fact [n] (fact (-> n fact range last inc)))
>
> Not only is it contrived, under the proposal, this implementation of fact is
> broken. It needs to be fixed to:
>
> (defn fact [n] (if (zero? n) 1N (* n (fact (dec n)))))
>
> With that change bad-twice-fact is not actually bad.

I appreciate what you're saying, and the example above is not broken.
Recall from here:

On Jun 17, 8:00 pm, Rich Hickey <richhic...@gmail.com> wrote:
...
> Certainly one could write a version of fact
> that hardwired bigints as you showed. But you need not - here's the
> original naive definition:

>
> (defn fact [n]
> (if (zero? n) 1 (* n (fact (dec n)))))
>

> Such logic that doesn't specify one of the components (in this case,
> n) is still polymorphic, as are operations with mixed components. So
> the power is in the hands of the supplier of n:
>
> user=> (fact 10)
> 3628800

>
> user=> (fact 40)
> java.lang.ArithmeticException: integer overflow
>

> user=> (fact 40N)
> 815915283247897734345611269596115894272000000000N
>
> I think you can always write code in this way, such that it will work
> with any of the numeric types, without dictating.

I believe the example (which I have tried on the num branch) shows
providing a BigInt argument does not always make things safe. That is
all.

Carson

Rich Hickey

unread,

Jun 18, 2010, 9:20:35 AM6/18/10

to clo...@googlegroups.com

Which then begs the questions:

- how will someone 'protect' themselves from libraries written using
fastmath?

- similar looking code will change semantics when the ns switch is
made - seems dangerous as it might violate the presumptions of the
original authors.

Rich

Nicolas Oury

unread,

Jun 18, 2010, 9:31:06 AM6/18/10

to clo...@googlegroups.com

In this situation, I would rather have some kind of toggle *use-bigints-by-default* (defaulted to false) used by the compiler.

Then, if someone needs big integers, he can switch the toggle and recompile everything.

You wouldn't have to change the libraries ns.

As for being predictable, if you have numbers that can be bigger than 10^19, you would probably know it in advance.

(indices, counters and the like don't grow that big)

Laurent PETIT

unread,

Jun 18, 2010, 9:32:19 AM6/18/10

to clo...@googlegroups.com

2010/6/18 Rich Hickey <richh...@gmail.com>:

> I want it to be perfectly clear to everyone, this is not a choice between
> good math and bad math, i.e. C/Java style truncating/wrapping of results.

> That will never be the default. Such silent truncation/wrap is truly unsafe,

> and often the proposed alternative. It was a surprising result that
> optimizations of hot spot, combined with pipelining and speculative
> execution on modern processors, makes the overflow checking a very minor
> overhead above the raw operations, and, in my mind, a third, new, viable and
> safe high-performance option.
>
> Is a program that might throw an exception unsafe? If so, no Java program is
> safe.
>
> Is a vector that throws an exception when asked to store something at an
> index out of range, rather than grow to accommodate it, unsafe? This is in
> many ways similar - you are creating a result that doesn't fit, and need to
> be using a bigger container.
>
> So, let's go easy on the hyperbole. The behavior might not be what you
> desire, but it is not unsafe.
>

Fair enough.

Laurent PETIT

unread,

Jun 18, 2010, 9:35:53 AM6/18/10

to clo...@googlegroups.com

2010/6/18 Nicolas Oury <nicola...@gmail.com>:

No please, not even in my worst nightmares !
Semantic changing compilation flags ? So I use lib A which needs lib B
compiled with flags C and D, but I also use lib E which needs lib B
compiled with other values for the flags ... I think this just is a
recipe for disaster.

Carson

unread,

Jun 18, 2010, 9:39:49 AM6/18/10

to Clojure

hmm...didn't see this earlier.
You're right, it's not unsafe, it's undesired behaviour. My mistake
describing it that way.

Nicolas Oury

unread,

Jun 18, 2010, 9:41:12 AM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 2:35 PM, Laurent PETIT <lauren...@gmail.com> wrote:

No please, not even in my worst nightmares !
Semantic changing compilation flags ? So I use lib A which needs lib B
compiled with flags C and D, but I also use lib E which needs lib B
compiled with other values for the flags ... I think this just is a
recipe for disaster.

Seen like this, this does not look so great.

However, I would like to note that, in this specific situation, there is a flag that subsumes the other.

I can't see someone relying on overflow (as opposed to promotion) for its behavior...

It would be a bit sick.

Rich Hickey

unread,

Jun 18, 2010, 10:04:31 AM6/18/10

to clo...@googlegroups.com

On Jun 17, 2010, at 11:57 PM, Mark Engelberg wrote:

> Thanks for the responses.
>
> Going back to the naive factorial function:

> (defn fact [n]
> (if (zero? n) 1 (* n (fact (dec n)))))
>

> Right now,
> user=> (fact 40)
> 815915283247897734345611269596115894272000000000
>
> Under the proposed changes,

> user=> (fact 40)
> java.lang.ArithmeticException: integer overflow
>

> So the same code now risks an arithmetic exception. What have I, the
> programmer, gained from this new level of risk? The answer, if I'm
> the kind of programmer who has no interest in putting in type
> annotations into the header of the function, is nothing. There is no
> speed improvement without the additional type annotations. So all
> I've gained is the /potential/ for a speed improvement.

I fully acknowledge this is a breaking change that doesn't suit your
preferences. But again, you are over- and misstating things.

The benefit does not only accrue to those annotating their arguments.

Because many functions and methods return primitives (and more will,
now that fns can return primitives), any operations on their results
and/or literals can be optimized. E.g. things like (if (< (dec (count
x)) 100) ...) can now be optimized into primitives.

>
> I assume that most Clojure users really like its dynamic nature. If
> this is true, then for most of us, the common case is to NOT annotate
> our code with types. Certainly I like the idea of making it as easy
> as possible to write fast code in Clojure. I want my dynamically
> typed code to be as fast as it can be, and it's nice to know that
> there is a way to write statically typed code that is even faster.
> But I really don't want to incur a penalty (i.e., possibility of
> Arithmetic Exception) when I'm not using type annotations.
>
> I agree that the burden on those who want to write type annotated code
> is too high. I agree that this should be made easier. But
> ultimately, I still want the burden to be on them (or me, the 1% time
> I need it), not those of us who prefer unannotated code. I that some
> of the changes you have proposed for moving the annotations into the
> header, along with literal notation for long (i.e., 0L rather than
> (long 0)) would be preferable to the bigint dichotomy.
>

> I agree that an improved bigint type would address the performance
> aspect of my argument. On the other hand, it wouldn't really address
> the problem of using these things in sets and hash tables. I think
> it's very important that Clojure have one unique canonical
> representation for numbers arrived at through Clojure-based
> computations. It is very difficult to use numbers sanely if that's
> not the case.
>

Agreed that this equality bit is a separate issue.

Rich

Stuart Halloway

unread,

Jun 18, 2010, 10:08:25 AM6/18/10

to clo...@googlegroups.com

While I enjoy a theoretical debate as much as the next person, it
would be nice to see

(1) examples of real code that fail on reasonable inputs, if this
change were made. And how difficult it would be to fix them.

(2) examples of real code that get way faster, and the ease of making
those changes (if any).

WRT #1, we have tested a few dozen libs, and found only a single
trivial breakage in their test suites. But I would say tha is still
just plural anecdotes. More data welcome!

Christophe Grand

unread,

Jun 18, 2010, 10:09:02 AM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 2:49 PM, Rich Hickey <richh...@gmail.com> wrote:
> This L suffix idea is a non-starter. Here's why:
>
> We're in a reader based language. The reader reads objects. When the reader
> reads 42, it must return an object. It cannot return an int or a long
> primitive. Right now, it returns an Integer. I've also conducted experiments
> where it returned a Long. Certainly 42000000000 will be a Long. What
> different thing would the reader return given 42L, or 42000000000L? Given an
> int 42 what should the printer print? And long 42, Long 42? Would (= 42L
> (read-string (pr-str 42L)))?

But what if the reader reads 42 as a BigInteger (or an alternative
implementation) and 42L as a Long?
then you should be able to tell them apart.

I agree with the goals of the num branch, I'm just unconvinced by the
reader defaulting to Long for integer literals.

Christophe

Heinz N. Gies

unread,

Jun 18, 2010, 10:11:27 AM6/18/10

to clo...@googlegroups.com

On Jun 18, 2010, at 15:11 , Rich Hickey wrote:
> So, let's go easy on the hyperbole. The behavior might not be what you desire, but it is not unsafe.

I agree with both actually it isn't unsafe but in my eyes undesired, met me explain.

In generally it seems we have two options:

1) We make fast default and loose auto promotion in the number stack.

2) We make it more difficult to be fast (as in extra work) but keep promotion.

Lets hope that I got this right otherwise the following argumentation can be skipped since it's based on this understandings. (of cause there might be a way to accomplish both which would - without question the best way to go ;)

Now let see the auto promotion has one very nice effect, it just works. That simple and this is in my eyes hugely important, because it gives people a happy and fluffy feeling to be able to write normal code without special considerations and deep knowledge of types and hinting. Lets face it for 90% of people it does not matter that it is a long or bigint or even a double they expect math to just work, this is one of the reasons why people enjoy languages like Ruby or buy Apple hardware - because it just works (yes we don't play on the same ground but it was a good principle of it). If we go to the point where people need to know that they have to pass a 42N instead of 42 or get an exception it is just plain ugly, it is javaish and that things are not javaish is one thing poeple love about clojure, and just to not be tossed in the bucket, it is not insecure just horrible inconvenient and annoying.

But lets look at the other side of the coin, performance. We all want it - of cause otherwise we'd put a lot of (Thread/sleep 1000) in our code but, again lets face the reality, most of us don't need it. Most of the time we don't need to do fac or fib or very fast math, in 90% of the cases what clojure offers is just fast enough since the bottle necks isn't arithmetics. For the rest 10% of the cases, well if you are in that case you likely have a complex algorithm. This again means that you belong to the people who really know what they are doing, and know what your algorithm works, given that someone in this position should know where to put performance optimization so the fact that it needs to be explict isn't that horrible in my eyes usually the code that has to be made more complex is alway some kind of focus code where you do way more then just toss a few statics on it and hope that it is fast but try every trick to get it fast - so you'll have the work anyway.

Now what is my conclusion: lets keep the 'it just works' and sacrifice some out of the box performance gains which 90% of the users won't notice anyway and half of the people notice it will only do so because they benchmark for exactly this. Especially since number arithmetics are rarely the bottlenecks, database lookups, string manipulations, IO, network and object allocation those are things that eat much performacne in the most cases.

Just my two (european!) cent :)

That said I find the possible performance gain extraordinary and brilliant, and I'm all for it just not for the sacrifice of 'it just works'. I rather write (long 1) in 2 functions where it is really needed then making sure I write 42N or ^Bigint in every function where I'm not 100% certain (which will likely be all but the two functions where I'd write logn).

Regards,
Heinz

Stuart Halloway

unread,

Jun 18, 2010, 12:06:46 AM6/18/10

to clo...@googlegroups.com

> So the same code now risks an arithmetic exception. What have I, the
> programmer, gained from this new level of risk? The answer, if I'm
> the kind of programmer who has no interest in putting in type
> annotations into the header of the function, is nothing. There is no
> speed improvement without the additional type annotations. So all
> I've gained is the /potential/ for a speed improvement.

It will be much easier for people to write fast library code (including in Clojure and contrib). So if you ever use Clojure libraries, you will get a speed boost.

Nicolas Oury

unread,

Jun 18, 2010, 10:24:02 AM6/18/10

to clo...@googlegroups.com

Why this is marked as abuse? It has been marked as abuse.

Report not abuse

On Fri, Jun 18, 2010 at 3:11 PM, Heinz N. Gies <he...@licenser.net> wrote:

But lets look at the other side of the coin, performance. We all want it - of cause otherwise we'd put a lot of (Thread/sleep 1000) in our code but, again lets face the reality, most of us don't need it. Most of the time we don't need to do fac or fib or very fast math, in 90% of the cases what clojure offers is just fast enough since the bottle necks isn't arithmetics. For the rest 10% of the cases, well if you are in that case you likely have a complex algorithm. This again means that you belong to the people who really know what they are doing, and know what your algorithm works, given that someone in this position should know where to put performance optimization so the fact that it needs to be explict isn't that horrible in my eyes usually the code that has to be made more complex is alway some kind of focus code where you do way more then just toss a few statics on it and hope that it is fast but try every trick to get it fast - so you'll have the work anyway.

I am not sure (= (* 20 not-a-bottleneck) not-a-bottleneck)

or more precisely:

(= (* 20 not-a-bottleneck-1 ... not-a-bottleneck-250) not-a-bottleneck)

Nicolas Oury

unread,

Jun 18, 2010, 10:24:56 AM6/18/10

to clo...@googlegroups.com

I meant:

(= (* 20 (+ not-a-bottleneck-1 ... not-a-bottleneck-250)) not-a-bottleneck)

Rich Hickey

unread,

Jun 18, 2010, 10:26:36 AM6/18/10

to clo...@googlegroups.com

On Jun 18, 2010, at 10:09 AM, Christophe Grand wrote:

> On Fri, Jun 18, 2010 at 2:49 PM, Rich Hickey <richh...@gmail.com>
> wrote:
>> This L suffix idea is a non-starter. Here's why:
>>
>> We're in a reader based language. The reader reads objects. When
>> the reader
>> reads 42, it must return an object. It cannot return an int or a long
>> primitive. Right now, it returns an Integer. I've also conducted
>> experiments
>> where it returned a Long. Certainly 42000000000 will be a Long. What
>> different thing would the reader return given 42L, or 42000000000L?
>> Given an
>> int 42 what should the printer print? And long 42, Long 42? Would
>> (= 42L
>> (read-string (pr-str 42L)))?
>
> But what if the reader reads 42 as a BigInteger (or an alternative
> implementation) and 42L as a Long?
> then you should be able to tell them apart.
>

And keeps them that way? Hrm. Then most programs, that currently have
no bigints, will now be full of them. And the boxed math will be
slower still, as it will be the (possibly improved, but still more
complex) logic of bigints. And the interop will be trickier. The
current boxing matches the boxing you will get from things from the
outside world. This wouldn't. And if this is just a 'way to
communicate with the compiler' thing, then read/print round-tripping
will break.

> I agree with the goals of the num branch, I'm just unconvinced by the
> reader defaulting to Long for integer literals.
>

It's important not to get focused on the literals. There are
primitives coming from other places as well. The semantic dichotomy is
the driver. The choice of the semantics of + can't be based upon the
boxed-ness of the argument, as it is now.

Rich

Heinz N. Gies

unread,

Jun 18, 2010, 10:47:17 AM6/18/10

to clo...@googlegroups.com

On Jun 18, 2010, at 16:24 , Nicolas Oury wrote:
> I am not sure (= (* 20 not-a-bottleneck) not-a-bottleneck)
>
> or more precisely:
> (= (* 20 not-a-bottleneck-1 ... not-a-bottleneck-250) not-a-bottleneck)

Point is, no one will write 250 variables even less so values by hand, it will be a map, reduce or whatever and then it's pretty easy to type hint it if the performance is an issue at this point and if you write this code you likely will know that this is a bottleneck. (I ignore the * there since you corrected it, otherwise it'd be a nice Palce to point out that it is likely becoming a overflow and an exception w/o automatic promotion :P)

Regards,
Heinz

Nicolas Oury

unread,

Jun 18, 2010, 10:50:00 AM6/18/10

to clo...@googlegroups.com

Point is, no one will write 250 variables even less so values by hand, it will be a map, reduce or whatever and then it's pretty easy to type hint it if the performance is an issue at this point and if you write this code you likely will know that this is a bottleneck. (I ignore the * there since you corrected it, otherwise it'd be a nice Palce to point out that it is likely becoming a overflow and an exception w/o automatic promotion :P)

That would have been really better to have an exception as it was a bug :-b.

Nicolas Oury

unread,

Jun 18, 2010, 11:23:47 AM6/18/10

to clo...@googlegroups.com

Experiment results :

I switched a program to num.

It was quite straightforward.

First I had to replace a few unchecked-add by +.

Has unchecked-add changed name? (I use it for indices when itering on a java array, if there is an overflow, you will know it beforehand)

The only funny event is that it was much much much slower at first.

Mainly doing reflection stuff.

I used *warn-on-reflection* and found:

( < val 0) where val is a double. Ugly, bad and wrong. I am not proud.

(* some-int some-double), that I changed to (* (double some-int) some-double)

Then, I had some results on the benchmark I use to test my change. (I remove 14 seconds to each, because it is the compilation time. As this program is made to run for a long time and I want to test the stationary speed,I put a very low threshold for the jit. I don't know if it is a good idea or not.)

Initial program : 92 seconds - 14 = 78s

Corrected program with master : 86 seconds - 14 = 72s

Corrected program with num : 80 seconds - 14 = 66s

Note that it is quite heavy on the GC, so there is 5s gc in that, which increase still more the difference.

Altogether, it is around 10% faster, without touching any annotation and static function.

Moreover, it might be difficult to believe if you read the beginning of my mail, but I spend a lot of time adding annotations where I thought it was useful.

That's difficult, and still there is a nice speedup.

Most of all, I could start added annotations to function. The problem is I use a lot of protocols.

Are there plans for support of prim in protocols?

Last thing, there is no single bottleneck that I could rewrite easily. This 10% is spread over the program.

Result of profiling:

Interpreted + native Method

0.2% 2 + 10 clojure.core__init.load

0.1% 8 + 0 distribution.FastProductWithinComp.non_local_draw

0.1% 7 + 0 clojure.core$vector.invoke

0.1% 7 + 0 java.util.WeakHashMap.get

...

2.8% 177 + 32 Total interpreted (including elided)

Compiled + native Method

2.4% 183 + 0 clojure.lang.RT.seqFrom

2.2% 164 + 1 distribution.FastProductWithinComp.non_local_draw

1.9% 61 + 81 java.util.WeakHashMap.getTable

1.4% 103 + 0 clojure.lang.RestFn.invoke

0.8% 62 + 1 clojure.core.protocols$fn__5284.invoke

0.7% 51 + 0 clojure.lang.RT.next

0.4% 31 + 1 clojure.lang.Var.pushThreadBindings

....

14.8% 1019 + 100 Total compiled (including elided)

Stub + native Method

81.3% 0 + 6154 java.lang.Object.hashCode

And yes, this 10% are obtained in a code that mainly use HashMaps to retrieve already computed values (81.3% seems spent there).

So I would expect far better speed-up in a more typical program.

(Game : guess what the program does from its profile...)

Hugo Duncan

unread,

Jun 18, 2010, 12:48:57 PM6/18/10

to clo...@googlegroups.com

On Fri, 18 Jun 2010 09:20:35 -0400, Rich Hickey <richh...@gmail.com>
wrote:

>
> On Jun 18, 2010, at 8:56 AM, Konrad Hinsen wrote:
>
>> On 18.06.2010, at 14:49, Rich Hickey wrote:
>>
>>> I don't see a way around this fundamental dichotomy. The semantics for
>>> + should be unified, and the operator that makes the other choice
>>> needs to be called something else, or placed somewhere else.

If you are doing a lot of work with whichever semantic requires decorated
use, then it is going to seem awkward. A namespace based solution would
allow "natural" use of whichever semantic was appropriate for the domain
you are coding. It would still require an explicit choice - pulling in a
different namespace.

> - how will someone 'protect' themselves from libraries written using
> fastmath?

If I understand correctly, that is still an issue, whatever syntax is
chosen.

> - similar looking code will change semantics when the ns switch is made

The switch is explicit.

> - seems dangerous as it might violate the presumptions of the original
> authors.

Not sure I understand this. Presumably if you change your selected
semantics within a namespace, you had better be sure of how that affects
all the functions in that namespace.

--
Hugo Duncan

Mark Engelberg

unread,

Jun 18, 2010, 1:51:51 PM6/18/10

to clo...@googlegroups.com

On Thu, Jun 17, 2010 at 9:06 PM, Stuart Halloway
<stuart....@gmail.com> wrote:
> It will be much easier for people to write fast library code (including in Clojure and contrib). So if you ever use Clojure libraries, you will get a speed boost.

I am skeptical of this claim. Since static functions lose some of
their dynamism, I don't think library writers are going to rush to
convert all their code over to the "fast" format. I think library
writers will care the most about ensuring their functions play well
with Clojure's dynamic ecosystem, and so, as before, type-optimized
code will be written only sparingly. (I'll admit though, I don't
fully understand yet which dynamic features are lost by static
functions, and how much of an impact it will have, so I could be wrong
about this. Just wanted to throw this out as something to consider.)

Nicolas Oury

unread,

Jun 18, 2010, 2:03:51 PM6/18/10

to clo...@googlegroups.com

- There is a speedup without static or annotation.

- Static keep the dynamic semantic, if I understood well the proposal (only direct first-order call are static)

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

miner

unread,

Jun 18, 2010, 2:24:12 PM6/18/10

to Clojure

On Jun 18, 8:49 am, Rich Hickey <richhic...@gmail.com> wrote:
> Fixing it requires adopting a single
> semantic for +. Choosing auto-promotion precludes the use of + on
> primitives, a huge loss IMO. Choosing throw-on-overflow precludes auto-
> promotion, but preserves polymorphism otherwise, and gives the
> compiler free reign to optimize since it will never be altering the
> semantics by doing so.
>
> I don't see a way around this fundamental dichotomy. The semantics for
> + should be unified, and the operator that makes the other choice
> needs to be called something else, or placed somewhere else.

I'm just a newbie here, but I would prefer the simpler and faster +
(with throw-on-overflow). I don't want to annotate my arguments
(carefully!) to get primitive performance. I want the best
performance in the simple cases. The primitive domain is big enough
for most programs.

I would be willing to use separate function names ("auto-promo-+",
etc.) to get the "safe" auto-promotion semantics. Perhaps they could
be in a separate namespace so that you can make your + symbol refer to
the auto-promoting one if you want to pay the cost. By the way, I
also want to know if a library is using auto-promotion so that I can
avoid it unless I really need it. And, yes, I think I'll know enough
about my domain to make that decision.

Steve Miner

Rich Hickey

unread,

Jun 18, 2010, 4:52:45 PM6/18/10

to clo...@googlegroups.com

I've revised and enhanced the strategy, based upon the feedback here.
I think it is a nice compromise.

Docs (see update section at the top)

https://www.assembla.com/wiki/show/b4-TTcvBSr3RAZeJe5aVNr/Enhanced_Primitive_Support

Code:

http://github.com/richhickey/clojure/commit/c79d28775e06b196ae1426f6c1446d00b621d2e1

Thanks to all for the feedback, keep it coming!

Rich

On Jun 17, 2010, at 4:13 PM, Rich Hickey wrote:

> I've been doing some work to enhance the performance, and unify the
> semantics, of primitives, in three branches. I've started to document
> this work here:
>
> https://www.assembla.com/wiki/show/clojure/Enhanced_Primitive_Support
>
> Feedback welcome,
>
> Rich
>
>

Heinz N. Gies

unread,

Jun 18, 2010, 4:57:30 PM6/18/10

to clo...@googlegroups.com

On Jun 18, 2010, at 22:52 , Rich Hickey wrote:

> I've revised and enhanced the strategy, based upon the feedback here. I think it is a nice compromise.

I agree this is a very nice compromise :) great work again and it is really cool to see how a community effort shapes clojure itself :D

Regards,
Heinz

David Nolen

unread,

Jun 18, 2010, 5:01:01 PM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 4:52 PM, Rich Hickey <richh...@gmail.com> wrote:

I've revised and enhanced the strategy, based upon the feedback here. I think it is a nice compromise.

Docs (see update section at the top)

https://www.assembla.com/wiki/show/b4-TTcvBSr3RAZeJe5aVNr/Enhanced_Primitive_Support

Code:

http://github.com/richhickey/clojure/commit/c79d28775e06b196ae1426f6c1446d00b621d2e1

Thanks to all for the feedback, keep it coming!

Rich

Tried it out. Even faster than before. Love it.

David

Paul Moore

unread,

Jun 18, 2010, 12:28:42 PM6/18/10

to clo...@googlegroups.com

I'm new to clojure, and while I have an interest in fast number
crunching, and in easily using big numbers, I don't have a valid
opinion on the whole matter.

However, I know that some time back, Python changed in precisely the
opposite direction - from separate small and big numbers to a unified
numeric type. Their situation is completely different (not least
because all Python objects are effectively boxed, so the speed benefit
of primitive arithmetic isn't relevant) but there may be real-world
examples of code that demonstrates (1) in the Python discussions. If
the information might be useful, I'd be happy to do some searching of
the archives.

Paul.

Daniel

unread,

Jun 18, 2010, 3:47:41 PM6/18/10

to Clojure

Save for the new equality semantics (bravo!), most of this can be
viewed as a loss for newcomers to the language, especially those using
project euler to learn the language. Shame you've been unable to
think of some way to garuntee top performance while keeping overflow
prevention the norm.

Daniel

unread,

Jun 18, 2010, 4:02:54 PM6/18/10

to Clojure

This also seems to break the principle of make it work, make it right,
make it fast. Math in Clojure isn't to the point that it works well
yet.

Michał Marczyk

unread,

Jun 18, 2010, 5:41:06 PM6/18/10

to clo...@googlegroups.com

On 18 June 2010 21:47, Daniel <double...@gmail.com> wrote:
> Save for the new equality semantics (bravo!), most of this can be
> viewed as a loss for newcomers to the language, especially those using
> project euler to learn the language. Shame you've been unable to
> think of some way to garuntee top performance while keeping overflow
> prevention the norm.

Actually the new idea with the + / +' split addresses this, so the
Eulerians (?) among us may be happy again. I take it you've posted
without reading the bottom of the thread or your message was stuck in
the moderation queue...

Incidentally, at first glance, I love the new tip of equal. Really
exciting stuff! :-)

Sincerely,
Michał

Heinz N. Gies

unread,

Jun 18, 2010, 5:53:15 PM6/18/10

to clo...@googlegroups.com

Daniel the decision was to keep things working and add speed as an option by using +' (and friends) specifically. So actually what happened is exactly what you wished for :).

Just tested with the newest code:

user=> (defn fact [n] (if (zero? n) 1 (*' n (fact (dec' n)))))
#'user/fact
user=> (fact 42)

java.lang.ArithmeticException: integer overflow (NO_SOURCE_FILE:3)

user=> (defn fact [n] (if (zero? n) 1 (* n (fact (dec n)))))
#'user/fact
user=> (fact 42)
1405006117752879898543142606244511569936384000000000N

Regards,
Heinz

Mark Engelberg

unread,

Jun 18, 2010, 6:27:42 PM6/18/10

to clo...@googlegroups.com

On Fri, Jun 18, 2010 at 1:52 PM, Rich Hickey <richh...@gmail.com> wrote:
> I've revised and enhanced the strategy, based upon the feedback here. I
> think it is a nice compromise.

Looks good to me.

Michał Marczyk

unread,

Jun 18, 2010, 6:31:09 PM6/18/10

to clo...@googlegroups.com

In connection to a conversation on #clojure (in progress as I write)...

Apparently loop/recur locals are now primitive by default:

(defn fact [n]
(loop [n n r 1]
(if (zero? n)
r
;; note the regular * on the next line
(recur (dec n) (* r n)))))

will throw ArithmeticException because of integer overflow, because r
is a long. The solution provided by Rich is to "hint" r to be a boxed
number: (num r) => gives old behaviour. I'm vaguely in favour of
switching around to boxed-by-default locals, but that's just a very
early initial impression... Just wanted to point this out for the
consideration of others here.

Sincerely,
Michał

Heinz N. Gies

unread,

Jun 18, 2010, 6:32:18 PM6/18/10

to clo...@googlegroups.com

On Jun 18, 2010, at 22:52 , Rich Hickey wrote:

Thanks to all for the feedback, keep it coming!

Okay you asked for it :P

to quote from the IRC:

mmarczyk:
also, (defn fact [n] (loop [n n r 1] (if (zero? n) 1 (recur (dec n) (* r n))))) throws on (fact 40) -- that's with *, not *'

and it trhows: java.lang.IllegalArgumentException: Value out of range for long: 158905670470170624000

and as you explained it is because 1 is a primitive instead of a Number. This is quite frankly not the best behavior in my eyes. Not only is the error message overly cryptic, and if you hadn't explained us what went wrong we likely taken hours to figure it out, but also the behavior is very very strange. This again is not 'it just works' but 'it works most of the time and if not has horrible painful and unpredictable consequences'.

I start this again, I know, but imagine someone starting to code in clojure and reaching this point, they will be super frustrated, toss the language in the next corner and start crying for the next full moon. Don't get me wrong it is not a reason to drop the language, god (or whatever you believe in) forbid, but it is very well a reason not to learn the language. I say it frankly if that would have been one of my first experiences, before I learned what joy and beauty clojure can be, I'd had turned without a second thought - and taking this is a very simple example many newcomers might actually write, this is a not unlikely scenario.

Regards,

Heinz

Message has been deleted

Daniel

unread,

Jun 18, 2010, 6:23:30 PM6/18/10

to Clojure

Sorry guys - missed that latest post. This new approach is something
I can definitely get behind. :)

Rich Hickey

unread,

Jun 18, 2010, 9:33:36 PM6/18/10

to Clojure

Turnabout is fair play, so I've produced a version that swaps the
defaults, still in the 'equal' branch:

Docs:

https://www.assembla.com/wiki/show/b4-TTcvBSr3RAZeJe5aVNr/Enhanced_Primitive_Support

Code:

http://github.com/richhickey/clojure/commit/310534b8e7e7f28c75bb122b4bf1bee320cdae67

You can get the older arbitrary-precision default with this commit:

http://github.com/richhickey/clojure/commit/7652f7e935684d3c7851fbcad8ddce97e510a5a6

I've also temporarily enabled a diagnostic (in both) that tells you
when you have a mismatch between a loop initializer and its recur
form. It goes off over a hundred times in Clojure itself, when using
the arbitrary precision default. In each case, the recur value is
needlessly being boxed, every iteration of a loop, for loops that will
never be bigints; indexes, counters etc. I expect this will be very
typical of most code. But removing that useless overhead would be a
lot of tedious work.

With the defaults swapped, only 2 warnings.

Pay for what you use ...

Rich

On Jun 18, 4:52 pm, Rich Hickey <richhic...@gmail.com> wrote:
> I've revised and enhanced the strategy, based upon the feedback here.
> I think it is a nice compromise.
>
> Docs (see update section at the top)
>

> https://www.assembla.com/wiki/show/b4-TTcvBSr3RAZeJe5aVNr/Enhanced_Pr...
>
> Code:
>
> http://github.com/richhickey/clojure/commit/c79d28775e06b196ae1426f6c...

Aaron Cohen

unread,

Jun 18, 2010, 9:52:17 PM6/18/10

to clo...@googlegroups.com

> I've also temporarily enabled a diagnostic (in both) that tells you
> when you have a mismatch between a loop initializer and its recur
> form. It goes off over a hundred times in Clojure itself, when using
> the arbitrary precision default. In each case, the recur value is
> needlessly being boxed, every iteration of a loop, for loops that will
> never be bigints; indexes, counters etc. I expect this will be very
> typical of most code. But removing that useless overhead would be a
> lot of tedious work.
>
> With the defaults swapped, only 2 warnings.
>

How often does the warning go off if you go back to
arbitrary-precision by default, but make literals return boxed numbers
again? I suspect your comparison is unfair because lots of the loop
initializers are probably literals.

---

A concern I have is that interop forms that return primitives can
cause a loop var to have a primitive type.

(loop [procs (.availableProcessors Runtime/getRuntime)] ; Current
clojure makes procs a primitive, do people really expect that?

Would it be possible for loop to use boxed numbers always by default,
and only use primitives if some specific metadata is on the
initializer?

So in that world:

(loop [i 5] ; Boxing is always done by the loop statement, even if
initialized with a primitive

(loop [procs (.availableProcessors Runtime/getRuntime)] ; Boxing is
done by the loop statement

(loop [^:primitive procs (.availableProcessors Runtime/getRuntime)] ;
Primitive is used, good because we are expecting it

I'm not sure which order the metadata should go in such a scheme.

(loop [^:primitive i 5]
(loop [i ^:primitive 5]
(loop [i (prim 5)]

--Aaron

Mark Engelberg

unread,

Jun 18, 2010, 10:09:24 PM6/18/10

to clo...@googlegroups.com

An idea to consider:

How about keeping the arbitrary-precision default, but add a loop'
construct to the family of +',-',*',inc', and dec' for primitive
optimization? The loop' construct would bind primitive literals,
whereas the loop construct would keep the literals boxed, so people
who don't want to analyze types rigorously in their loop/recur
constructs won't get mismatch errors.

I see that using primitive literals by default in loop/recur offers
speed benefits for the common case (recur type matches the
loop-declared type), but I worry about how hard it might be to analyze
and make sure that I'm always recurring with a type that exactly
matches what is in the loop.

Any other ideas on how to solve the problem of mismatch errors for
"casual coders"?

Rich Hickey

unread,

Jun 18, 2010, 10:12:53 PM6/18/10

to clo...@googlegroups.com

On Jun 18, 2010, at 9:52 PM, Aaron Cohen wrote:

>> I've also temporarily enabled a diagnostic (in both) that tells you
>> when you have a mismatch between a loop initializer and its recur
>> form. It goes off over a hundred times in Clojure itself, when using
>> the arbitrary precision default. In each case, the recur value is
>> needlessly being boxed, every iteration of a loop, for loops that
>> will
>> never be bigints; indexes, counters etc. I expect this will be very
>> typical of most code. But removing that useless overhead would be a
>> lot of tedious work.
>>
>> With the defaults swapped, only 2 warnings.
>>
>
> How often does the warning go off if you go back to
> arbitrary-precision by default, but make literals return boxed numbers
> again? I suspect your comparison is unfair because lots of the loop
> initializers are probably literals.
>

It would be easy to make it go away. The point is not the mismatches
themselves, but they serve as a counter - how many loops had primitive
init and needed only primitive arithmetic? Each warning is a case
where it could be automatically fast but isn't. Making the warning go
away will leave the inefficiency but make it silent. It is just a
measurable loss to counter all the theoretical arguments being made.

> ---
>
> A concern I have is that interop forms that return primitives can
> cause a loop var to have a primitive type.
>
> (loop [procs (.availableProcessors Runtime/getRuntime)] ; Current
> clojure makes procs a primitive, do people really expect that?
>
> Would it be possible for loop to use boxed numbers always by default,
> and only use primitives if some specific metadata is on the
> initializer?
>
> So in that world:
>
> (loop [i 5] ; Boxing is always done by the loop statement, even if
> initialized with a primitive
>
> (loop [procs (.availableProcessors Runtime/getRuntime)] ; Boxing is
> done by the loop statement
>
> (loop [^:primitive procs (.availableProcessors Runtime/getRuntime)] ;
> Primitive is used, good because we are expecting it
>
>
> I'm not sure which order the metadata should go in such a scheme.
>
> (loop [^:primitive i 5]
> (loop [i ^:primitive 5]
> (loop [i (prim 5)]

Yes, it's easy to imagine a world where people who want efficient code
have to jump through hoops to get it. OTOH, you can just say (num some-
expr) to force it to be boxed, if you want assurance of an Object
initializer. Which will be the more common need?

I have to say I'm in the 'pay for what you use' camp - you need a box,
you ask for one. If I don't (and neither do any of those loops), why
should I have to do extra work to avoid it?

Rich

Mark Fredrickson

unread,

Jun 18, 2010, 10:18:22 PM6/18/10

to Clojure

So far most of the action has concerned arithmetic ops (+, -, *, /).
Will these new semantics include the bit-shift operators? I vote yes.
My use cases for bit ops would benefit from primitive ops.

On a related note, my use cases call for silent overflow of bit shifts
(pseudo random number generators). Will there be a way to disable
overflow exceptions (either via a binding or through unchecked-*
functions)?

Thanks,
-Mark

On Jun 18, 8:33 pm, Rich Hickey <richhic...@gmail.com> wrote:
> Turnabout is fair play, so I've produced a version that swaps the
> defaults, still in the 'equal' branch:
>
> Docs:
>

> https://www.assembla.com/wiki/show/b4-TTcvBSr3RAZeJe5aVNr/Enhanced_Pr...
>
> Code:
>
> http://github.com/richhickey/clojure/commit/310534b8e7e7f28c75bb122b4...

>
> You can get the older arbitrary-precision default with this commit:
>

> http://github.com/richhickey/clojure/commit/7652f7e935684d3c7851fbcad...

Rich Hickey

unread,

Jun 18, 2010, 10:20:24 PM6/18/10

to clo...@googlegroups.com

On Jun 18, 2010, at 10:09 PM, Mark Engelberg wrote:

> An idea to consider:
>
> How about keeping the arbitrary-precision default, but add a loop'
> construct to the family of +',-',*',inc', and dec' for primitive
> optimization? The loop' construct would bind primitive literals,
> whereas the loop construct would keep the literals boxed, so people
> who don't want to analyze types rigorously in their loop/recur
> constructs won't get mismatch errors.
>
> I see that using primitive literals by default in loop/recur offers
> speed benefits for the common case (recur type matches the
> loop-declared type), but I worry about how hard it might be to analyze
> and make sure that I'm always recurring with a type that exactly
> matches what is in the loop.
>

Did you try it? The compiler does that analysis and reports any
mismatch:

user=> (defn foo [] (loop [x 42] (recur 4.2)))
NO_SOURCE_FILE:2 recur arg for primitive local: x is not matching
primitive, had: double, needed: long

user=> (defn foo [] (loop [x 42] (recur false)))
NO_SOURCE_FILE:5 recur arg for primitive local: x is not matching
primitive, had: java.lang.Boolean, needed: long

user=> (defn foo [] (loop [x 4.2] (recur (meta #'first))))
NO_SOURCE_FILE:15 recur arg for primitive local: x is not matching
primitive, had: java.lang.Object, needed: double

> Any other ideas on how to solve the problem of mismatch errors for
> "casual coders"?
>

Those could become hard errors again. But there are legitimate correct
cases that would then need annotation:

(defn foo [] (loop [x 42] (recur (:y z))))

Rich

Rich Hickey

unread,

Jun 18, 2010, 10:22:25 PM6/18/10

to clo...@googlegroups.com

On Jun 18, 2010, at 10:18 PM, Mark Fredrickson wrote:

> So far most of the action has concerned arithmetic ops (+, -, *, /).
> Will these new semantics include the bit-shift operators? I vote yes.
> My use cases for bit ops would benefit from primitive ops.
>
> On a related note, my use cases call for silent overflow of bit shifts
> (pseudo random number generators). Will there be a way to disable
> overflow exceptions (either via a binding or through unchecked-*
> functions)?
>

Yes, bit-ops will be made to align with whatever is done here.

Rich

dmiller

unread,

Jun 18, 2010, 11:47:34 PM6/18/10

to Clojure

> Yes, it's easy to imagine a world where people who want efficient code
> have to jump through hoops to get it. OTOH, you can just say (num some-
> expr) to force it to be boxed, if you want assurance of an Object
> initializer. Which will be the more common need?
>

From the wiki page "Enhanced Primitive Support": * Note: this means
that locals initialized with literals will have primitive type, e.g.
(let [x 42] …), and especially: (loop [x 42] …). If you intend to
recur with a non-primitive, init like this, (loop [x (num 42)] …)

I'd like to ask for some consideration on any use of (num x). On the
CLR side, it is unimplementable as documented, there being no
equivalent in the CLR to Number. If a proposed use of num can be
satisfied by this definition of num:

(defn num {tag :object} [x] x)

I can manage. I imagine this to be the case, but haven't had time to
read all the new code.

If you really mean to use a type that is a base type for all the
primitive numeric types -- not going to happen on the CLR.

-David

MarkSwanson

unread,

Jun 19, 2010, 12:27:24 AM6/19/10

to Clojure

> I have to say I'm in the 'pay for what you use' camp - you need a box,
> you ask for one. If I don't (and neither do any of those loops), why
> should I have to do extra work to avoid it?

+1

Barely worth mentioning:
:s /non-prime/non-primitive/g

MarkSwanson

unread,

Jun 19, 2010, 12:32:56 AM6/19/10

to Clojure

> :s /non-prime/non-primitive/g

Oh, nvm. You were referring to '.

Cheers.

Mark Engelberg

unread,

Jun 19, 2010, 2:03:12 AM6/19/10

to clo...@googlegroups.com

I'm confused. In the latest scheme, what is the eventual plan for
handling a recur to a loop element that was initialized with a
primitive? Are boxed numbers automatically coerced to the primitive?
Would a long be coerced to a double, or double to long? Under what
scenarios will you get a warning, and when will you get an error?

I think what troubles me the most about this loop/recur issue is that
the semantics of recur depend completely on whether a variable is
assigned a primitive or boxed numeric type. To know what the recur
will do, you must know the type of the assigned value. This has
always been the case, but in the current version of Clojure, provided
you aren't using Java interop, it's relatively straightforward to
know. Unless the variable or literal is explicitly cast to a
primitive, it's not a primitive. But now, with the advent of static
functions which can return primitives, it becomes more likely to not
easily be able to determine this fact. If I say (loop [x (foo 2)]...)
I have no way of knowing what's going to happen when I recur. Yes, I
realize that you can run the function and use warnings/errors to find
out. But I find it unsettling to have a commonly used code form with
semantics I can't predict just by looking at it.

It's attractive that the current proposal gives speed benefits to
common cases with no additional annotation. But I personally place a
higher value on being able to understand the semantics of my code when
reading through it. I would prefer a version where loop requires some
sort of explicit annotation on a variable if you want it to require a
primitive upon recur. (Ideally, I'd probably want the annotation on
the variable using a syntax that mirrors static functions, rather than
the old technique of typecasting the initializer, since the whole
typecasting system makes less sense now that literals and certain
return values are already primitives. Unifying the syntax and
semantics of using primitives in loops with the syntax and semantics
of using primitives as inputs to static functions makes a lot of sense
to me.)

Tim Daly

unread,

Jun 19, 2010, 2:50:44 AM6/19/10

to clo...@googlegroups.com

Is it possible to follow the Common Lisp convention
of proclaim/declaim/declare in order to specify types?

Mark Engelberg wrote:
> Thanks for the responses.
>
> Going back to the naive factorial function:

> (defn fact [n]
> (if (zero? n) 1 (* n (fact (dec n)))))
>

> Right now,
> user=> (fact 40)
> 815915283247897734345611269596115894272000000000
>
> Under the proposed changes,
> user=> (fact 40)
> java.lang.ArithmeticException: integer overflow
>
> So the same code now risks an arithmetic exception. What have I, the
> programmer, gained from this new level of risk? The answer, if I'm
> the kind of programmer who has no interest in putting in type
> annotations into the header of the function, is nothing. There is no
> speed improvement without the additional type annotations. So all
> I've gained is the /potential/ for a speed improvement.
>
> I assume that most Clojure users really like its dynamic nature. If
> this is true, then for most of us, the common case is to NOT annotate
> our code with types. Certainly I like the idea of making it as easy
> as possible to write fast code in Clojure. I want my dynamically
> typed code to be as fast as it can be, and it's nice to know that
> there is a way to write statically typed code that is even faster.
> But I really don't want to incur a penalty (i.e., possibility of
> Arithmetic Exception) when I'm not using type annotations.
>
> I agree that the burden on those who want to write type annotated code
> is too high. I agree that this should be made easier. But
> ultimately, I still want the burden to be on them (or me, the 1% time
> I need it), not those of us who prefer unannotated code. I that some
> of the changes you have proposed for moving the annotations into the
> header, along with literal notation for long (i.e., 0L rather than
> (long 0)) would be preferable to the bigint dichotomy.
>
> I agree that an improved bigint type would address the performance
> aspect of my argument. On the other hand, it wouldn't really address
> the problem of using these things in sets and hash tables. I think
> it's very important that Clojure have one unique canonical
> representation for numbers arrived at through Clojure-based
> computations. It is very difficult to use numbers sanely if that's
> not the case.
>
> I see your point that there are a couple of sides to this argument. I
> hope I can persuade some more people to speak up for my side :)
>
>

Konrad Hinsen

unread,

Jun 19, 2010, 3:24:23 AM6/19/10

to clo...@googlegroups.com

On 18.06.2010, at 15:20, Rich Hickey wrote:

> Which then begs the questions:
>
> - how will someone 'protect' themselves from libraries written using fastmath?

Using fastmath or not would be part of the library specification. Or, in general, of the documentation for each individual function. It may not matter at all for the user of some functions, but be very important for others.

> - similar looking code will change semantics when the ns switch is made - seems dangerous as it might violate the presumptions of the original authors.

That is true, but it is true for any strategy to introduce new maths semantics into Clojure. As soon as there are two, there is a risk for confusion. And even if there is only one but different from the one in the past, there is a risk for confusion as well.

Konrad.

Konrad Hinsen

unread,

Jun 19, 2010, 3:26:51 AM6/19/10

to clo...@googlegroups.com

On 18.06.2010, at 15:31, Nicolas Oury wrote:

> In this situation, I would rather have some kind of toggle *use-bigints-by-default* (defaulted to false) used by the compiler.
> Then, if someone needs big integers, he can switch the toggle and recompile everything.
> You wouldn't have to change the libraries ns.

I don't think it's a good idea to have a compiler switch change the semantics of number handling.

> As for being predictable, if you have numbers that can be bigger than 10^19, you would probably know it in advance.
> (indices, counters and the like don't grow that big)

True, but that's on a variable-by-variable basis.

Konrad.

Nicolas Oury

unread,

Jun 19, 2010, 4:01:22 AM6/19/10

to clo...@googlegroups.com

On Sat, Jun 19, 2010 at 3:12 AM, Rich Hickey <richh...@gmail.com> wrote:

I have to say I'm in the 'pay for what you use' camp - you need a box, you ask for one. If I don't (and neither do any of those loops), why should I have to do extra work to avoid it?

Rich

+1.

It is often very easy to spot integers that can grow above 10^18. Again, > 95% of number variables that someone will write in his life never will attain that because they are somehow referring to a ressource, either in the computer (counter, indices) or in the real world.

So, except for scientific programming, fact, fib and ackerman, that's not a problem. (For scientific programming, I argue people should know what to do with numbers). I still would like to have a real world example were it is hard to predict and to solve.

On the other hand, having boxed by default is a very significant slowdown (10% on the strange program I tried, that had already a lot of annotations, probably 20% or more on most programs), that can never be addressed : you can't write annotations on every single line of your program.

So, we take a 10/30% speed hit, that won't be solved, mainly because we want people writing their first Fib or fact function not to have an exception thrown. I think people can handle this level of complexity.

If boxed is the default, I advocate at least a flag allowing to change that (and possibly allowing int as a default for embedded stuff).

Most problem with semantic changing flag disappear if the flags have an order.

It is close to choosing the size of your heap on the command -line, which is a semantic-changing run-time flag...

Mike Meyer

unread,

Jun 19, 2010, 5:13:56 AM6/19/10

to clo...@googlegroups.com

"Nicolas Oury" <nicola...@gmail.com> wrote:

>On the other hand, having boxed by default is a very significant slowdown
>(10% on the strange program I tried, that had already a lot of annotations,
>probably 20% or more on most programs), that can never be addressed : you
>can't write annotations on every single line of your program.

Were those real world programs, or arithmetic benchmarks? Most real world programs I see spend so little of their time doing arithmetic that making the math an order of magnitude slower wouldn't make a noticeable difference in overall runtime.

Which puts making numbers primitives by default squarely in the realm of premature optimization.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Nicolas Oury

unread,

Jun 19, 2010, 5:25:18 AM6/19/10

to clo...@googlegroups.com

I tried it on my program. Very few arithmetic, most of it with annotation. 10%.

On an arithmetic benchmark, it would be in te *10/*20 range.

This 10% is orthogonal to what profiling shows. It just makes a lot of little improvements spread everywhere.

That's why it couldn't be solved with annotation.

Mike Meyer

unread,

Jun 19, 2010, 5:28:32 AM6/19/10

to clo...@googlegroups.com

"Nicolas Oury" <nicola...@gmail.com> wrote:

>I tried it on my program. Very few arithmetic, most of it with annotation.
>10%

Can we see the source?

Nicolas Oury

unread,

Jun 19, 2010, 6:00:39 AM6/19/10

to clo...@googlegroups.com

yes, when it will be released :-b.

I can describe it though:

it applied a stochastically a set of rewriting rules on trees (represented by records of records).

To speed up, the activity of the rules in each subtree is cached in a Java weak hash table from one step to another.

There is a bit of arithmetic involved everywhere, and around 2-3 double operations per function in the part choosing the instance of rule to apply.

Jules

unread,

Jun 19, 2010, 6:39:51 AM6/19/10

to Clojure

I know nothing about the JVM, but I do know that on x86 you can handle
fixnum -> bignum promotion fairly cheaply. You compile two versions of
the code: one for bignums and one for fixnums. After an arithmetic
instruction in the fixnum code you check if it overflowed, and if it
did you switch to the bignum code.

while(n < 10) {
n = n + 1;
}

=>

while(n #< 10) {
a = n #+ 1;
if(overflow){ n = ToBignum(n); goto foo; }
n = a;
}

while(n.lessthan(10)) {
foo:
n = n.add(1);
}

where #< and #+ are the fixnum versions of < and +, and lessthan and
add are the bignum versions. Yeah it's slower than ignoring the
overflow, but faster than tagged 31-bit fixnums and a whole lot faster
than bignums. Can you convince the JVM to produce similar code?

Jules

Heinz N. Gies

unread,

Jun 19, 2010, 6:41:58 AM6/19/10

to clo...@googlegroups.com

Why this is marked as abuse? It has been marked as abuse.

Report not abuse

On Jun 19, 2010, at 4:12 , Rich Hickey wrote:
> I have to say I'm in the 'pay for what you use' camp - you need a box, you ask for one. If I don't (and neither do any of those loops), why should I have to do extra work to avoid it?

- 42

I totally and wholeheartedly disagree, as long as there is a chance something goes wrong, and here is a lot of it, the default can not be we force people to know more - for example that they explicitly need to box things. And it is not even about longs and BigInts, doubles and Floats or what the java version means also come into play, how often do you mixed calculations because initializing 1 is shorter and more used for you then 1.0? This will now break your loop, make it crumble into dust and give you odd, unexpected and wrong results.

Lets get away from the argument already cast and bring a new one. It is an impossible situation that the literal 1 in one place has a different meaning and behavior then in another place. This is in fact a show stopper, I'm serious, this would be worst then java.

(* 1 0.2) -> works fine

but

(loop [n 1 r 1](if (zero? n) r (recur (dec n) (* r 0.2))))

does not because the 1 there is not the 1 in the upper example, that is so utterly wrong and horrible please think about it and think about how you'd sell someone he has to know where a literal means what, I can guarantee you this will keep many people from this wonderful language :(.

The default case for a language has to be the most intuitive version otherwise you end up with a mess of complex rules and special cases. It was said before, this is premature optimization in it's worst since it is forced on the user if they want or not. The priority has to be 'it works' later if it matters we can worry about 'it is fast' but this isn't working any more.

Regards,
Heinz

Michał Marczyk

unread,

Jun 19, 2010, 7:53:11 AM6/19/10

to clo...@googlegroups.com

On 19 June 2010 03:33, Rich Hickey <richh...@gmail.com> wrote:
> Turnabout is fair play, so I've produced a version that swaps the
> defaults, still in the 'equal' branch:

The issue of which behaviour should be the default aside, I think that
I'd expect the *, + etc. ops to go with unannotated literals and the
primed variants to go with explicit hinting, whereas now this seems
not to be the case:

(defn fact [n]
(loop [n (num n) r (num 1)]
(if (zero? n)
r
(recur (dec n) (* r n)))))

This works for (fact 40), whereas the version with dec' and *' (and
the (num ...) still in place) doesn't; I'd expect this to be reversed.

In general, I'd expect the primed ops to be there for those who know
what they're doing; same goes for explicit hinting.

On 19 June 2010 04:12, Rich Hickey <richh...@gmail.com> wrote:
> I have to say I'm in the 'pay for what you use' camp - you need a box, you
> ask for one. If I don't (and neither do any of those loops), why should I
> have to do extra work to avoid it?

The question is which desirable commodity Clojure programmers can
expect to get for free and which is to be obtainable through voluntary
banter: the sturdy boxes or the Potion of Speed +5. I'm used to paying
for the PoS, but a sudden rise in the price of boxes seems unwarranted
to me, especially if it only serves to lower the price of the PoS from
a (prim ...) hint where the local is introduced to zero. I mean, this
is no longer about "the free lunch", it's more about "the free
dessert" which the power number crunchers surely do not need.

I have a hunch that if (num ...) stays in and primitive is the
default, people (I mean just about *everybody*) will only box things
in the face of the first crash or *maybe* after developing a great
intuitive feel for when that might be required. Of course there's
going to be a period of frantic learning about numeric representations
on the JVM etc. in between the crash and the boxing. And, um, whatever
for...?

Interestingly, the introductory materials on Clojure I've read so far
tout the "don't worry about it" quality of Clojure's numerics as a
benefit of the language. (The "don't worry about it" phrasing comes
from "Programming Clojure".) I just mention this because I tend to
think it reasonable to present that as a benefit. The idea of making a
U-turn on this is surprising to me... Which is partly to say that I'm
still evaluating it, though so far my intuition stays pretty firmly on
the "don't worry about it"-is-good side.

TLDR summary: I'm very excited about the latest equal; I'd definitely
want the primed arithmetic ops to go with the hinted locals, whereas
non-primed (basic) variants would do the right thing for the simplest,
unhinted code (so which ops should get the primes in my eyes depends
on whether boxing of locals is the default); my heart is still on the
side of free sturdy boxes while I'm expecting to pay for the Potion of
Speed.

Sincerely,
Michał

David Powell

unread,

Jun 19, 2010, 8:18:22 AM6/19/10

to clo...@googlegroups.com

Personally, I have no real interest in bigints, but I'm glad that they
are there, and that arithmetic code supports them polymorphically.

I'm not sure what I think regarding non-promoting numeric operators.
They are ok, but they seem to add complexity to the language, and they
don't look very newbie friendly. I think if we had them, promoting ops
should be the primed ones, and they would mostly be used in library
functions where performance is important.

> but
> (loop [n 1 r 1](if (zero? n) r (recur (dec n) (* r 0.2))))

> does not [work]

This seems to be particularly nasty. Clojure is statically typing r on
the assumption that just because the init value of r fits a primitive
int, that it can never be anything else. Promoting operators do no
good, because the variable slot for the loop var has already been
assumed to be a primitive, so it isn't possible to promote it.

Is there any possibility that loop variables could be assumed to be
boxed numerics, unless type hinted to something else?

So:

(loop [^int n 1 r 1](if (zero? n) r (recur (dec n) (* r 0.2))))

Numeric literals would still produce primitives where possible, but
would get auto-boxed in loop initialisers.

I think this option wouldn't have any nasty surprises.

--
Dave

Heinz N. Gies

unread,

Jun 19, 2010, 8:23:08 AM6/19/10

to clo...@googlegroups.com

On Jun 19, 2010, at 14:18 , David Powell wrote:

(loop [^int n 1 r 1](if (zero? n) r (recur (dec n) (* r 0.2))))

Numeric literals would still produce primitives where possible, but
would get auto-boxed in loop initialisers.

I think this option wouldn't have any nasty surprises.

+41 (would be 42 if it were ^long since it is the clojure type I think)

This gives the option of make it fast if you know-what-you-are-doing(TM) which is great, and I certainly would use that myself :P but it still keeps the things simple and working for the default case.

Regards,

Heinz