Of course, this better performance is not needed "in the common case",
IMO, but only in hotspots that do number crunching, where people
already optimize using primitive locals, coercion, and unchecked-foo.
> The implementation makes the assumptions that Java long is big enough for
> nearly all cases
It also makes the assumption that the Java long is as fast as native
arithmetic. Which, on a lot of 32-bit hardware, it won't be.
> and that auto-promotion to BigInteger (and the resulting performance hit) is
> rarely desirable.
Debatable. I, for one, prefer to have unadorned arithmetic be correct
at the expense of a little speed, while still having a way to get the
speed in performance-critical parts of my code.
> On Jan 14, 2:40 pm, Stuart Sierra <the.stuart.sie...@gmail.com> wrote:
>> Debatable it is, endlessly. ;) So Clojure committers made a choice.
>> Hopefully, they have made a choice that has:
I agree that they've made a choice, and I really don't want to be too critical here. However, since Clojure 1.3 is still in an alpha stage, maybe this discussion can still contribute something. Maybe what I'm saying is that I want to stay constructive and that there's maybe still time to be constructive.
>> * a small positive effect (better performance with less effort) on a
>> majority of users
>> * a small negative effect (worse performance, extra effort) on a minority
These goals are good, but I don't know that the approach taken achieves them.
In my experience, errors are the problem and we should be avoiding them, almost at all costs. Numbers are confusing to people. Numbers approximated on a computer are far more confusing. How many times do you see threads discussing how a compiler is buggy because it can't divide two numbers and get the right answer? I've been doing this stuff for years and I can come up with an awful lot of amusing and/or horribly nasty examples. But I don't think this needs to be re-established.
Given my experience I *strongly* lean towards not making a 'mistake' due to compiler optimisations. In other words, I'd be very annoyed, and I'd expect others to be annoyed too, if a numerical error was introduced to one of my programs because of an unexpected, silent, compiler optimisation.
Secondly, Clojure has already established that we will use type annotations to signal to the compiler what's what. When we annotate, we are relaxing our requirements on the compiler to not make a mistake by assuming that responsibility ourselves.
I would suggest the following:
1) if there's type annotation on both values of, say, an addition, then the optimised version can be used. If there isn't, or the compiler isn't sure, then use safe operations.
2) if the compiler isn't cooperating (because it isn't sure what's going on) we should be helping it by again assuming the responsibility of being right and marking the operator, say with a tick.
And yes, this likely has problems too. I'm not saying that this is an issue with easy solutions.
We're heading for a hodgepodge of annotation purposes, some for optimisation, some for correctness (and one of these days I'll mention what I think of the @/deref thing :-) And now we're pretty much guaranteed ugly code no matter what. Though I'd prefer no ugliness, I'd trade ugly code for speed, but I'd rather not for correctness.
And there's a practical problem with mixed annotation purposes. If you want to track down a bug you can't just remove all annotations temporarily. You'd have to remove some and add others. Not looking forward to that. Maybe a macro: make-this-safe could be written. Hmm. Maybe a 'defn-safe' would be something to think about???
This is also the kind of thing that you just can't fix later. Imagine how we'll feel in ten or twenty years about this decision.
On 2011-01-14, at 8:40 PM, Armando Blancas wrote:
> They used to give you compile switches for that kind of stuff, not
> hope and wholesome wishes. Seems like every performance improvements
> makes the language more complex, uglier or both.
Compiler switches were/are problematic too, but at least they are explicit and have to be *added*.
> In other words, I'd be very
> annoyed, and I'd expect others to be annoyed too, if a numerical
> error was introduced to one of my programs because of an unexpected, silent, compiler optimisation.
Just to be clear, Clojure 1.3-alpha does not introduce numerical
errors, unless you explicitly ask for them; it throws a
RuntimeException - which I guess is analogous to it being a
dynamically-typed language and throwing RuntimeExceptions to signal
user=> (* 100000000000 100000000000)
ArithmeticException integer overflow clojure.lang.Numbers.throwIntOverflow (Numbers.java:1583)
user=> (*' 100000000000 100000000000)
user=> (* 100000000000 100000000000N)
user=> (unchecked-multiply 100000000000 100000000000)
> It would help people like me understand the debate if some mainstream
> examples of applications requiring (seamless) BigInteger support could
> be identified.
I doubt that many will consider this "mainstream," but I evolve programs using genetic programming techniques and I've found that in this context BigIntegers can arise in all sorts of unexpected and weird and wonderful and sometimes adaptive ways.
I still haven't figured out exactly what the 1.3 changes will mean for this work -- maybe it'll be fine or even better -- but I've liked not having to think about integer sizes much at all previously (as in Common Lisp, where the handling of complex numbers is also nice).
This debate always starts by conflating three things into two, and then goes downhill from there. :-( It isn't
(a) safe/slow vs.
(a) unsafe/incorrect value on overflow/fastest/unifiable* vs.
(b) safe/error on overflow/fast/unifiable vs.
(c) safe/promoting on overflow/slow/not-unifiable
*unifiable: able to deliver same semantics for primitives and objects
We have thought about this quite a bit, and an argument from one axis only (e.g safe/unsafe) that doesn't even mention some of the other axes is not likely to be persuasive. Would be more interesting to see a new axis we haven't thought of...
The spreadsheet example may be a useful one. Clojure is a language you could write a spreadsheet in. Would you want to use a spreadsheet written entirely in a JVM language that did not provide access to the primitives?
> Now we face the choice of putting limits on what our users can do, use
> a different set of operators, or decide that we "want" BigInts. Again,
> this isn't about wanting bigints, that's a red herring. Part of the
> problem is the complex, confusing, and sometimes quite mysterious Type
> System that's been creeping into Clojure for the sake of performance.
Clojure is not getting a type system. nor is the behavior in 1.3 complex. It can be confusing, because it is addressing a multifaceted problem, and it certainly is mysterious, because (1) we haven't spent enough effort documenting it, and (2) lots of people have misdocumented it.
I'll make a documentation update higher priority; hopefully that will help.
> Another is the conflicted attitude of being an untyped language: a
> kind of guilty pleasure with the remorse it brings of all those
> reflective calls and boxing/unboxing, whose negative effects on
> performance supposedly makes the language lose credibility.
The Clojure design process is not about achieving credibility, it is about solving problems. Credibility has followed, and will continue to follow, to the extent that Clojure solves problems well.
Here's an axis that hasn't gotten much discussion: How evident is the
behavior of Clojure code, and what features help and hinder this
Clojure is a dynamically typed language, which means that generally
speaking, it is not obvious what the type of a given variable is,
since there aren't annotations immediately prior to the variable
telling you what it must be. Similarly, a Clojure IDE does not offer
any way to to "hover" over a variable and see what the type is. There
is a lot of freedom that comes with this, but the cost is that a
dynamic programmer must be careful to document in some way what kinds
of things are acceptable inputs, and what kinds of promises are made
of the outputs. The compiler can't check this, so it's up to the
programmer. As Clojure programmers, we take on the responsibility of
tracking a certain amount of "unseen information" that isn't readily
evident from the code itself, but there's a limit to how much
responsibility programmers can take on before programs become brittle,
so new features should take this "axis" into account.
Primitives are especially problematic because there is no good way to
determine whether something is a primitive or not. Consider the
following interactions in the 1.1 REPL:
user> (type 1)
user> (type (int 1))
Any features involving primitives should be assessed from the
standpoint that it is extremely difficult to know from looking at code
whether something is a primitive or not. Many of the new features
(e.g., static functions can now return primitives, literals are
primitives, but numbers that get stored in collections or cross
certain kinds of function boundaries are not), means that you'll
frequently end up with a mixture of primitives and non-primitives, and
it won't always be obvious which is which.
When designing math operators that behave one way for longs and
another for bigints, one question that needs to be asked is: "How
apparent will it be whether a variable represents a long or a bigint?
If it's not apparent, how will the programmer know which behavior to
expect? Is there any tooling that can help make this more
discoverable?" One possibility is that Clojure programmers will need
to evolve ways to track this information, perhaps by explicitly
commenting in code whether a function can gracefully handle both longs
and bigints. On the other hand, there's already a history in Clojure
and similar languages of just documenting certain vars as "numbers"
without needing to get more precise than that, so this could be a
painful transition for many programmers who are not used to thinking
about specifying their numeric types in greater detail than that.
Because it's difficult to do "typeflow analysis" within a
dynamically-typed language as Clojure, this clarity axis also comes
into play when thinking about what sorts of burdens are going to be
placed on library developers. As a case in point, I developed the
expt function in clojure.contrib.math because I was surprised when I
first came to Clojure that no generic exponentiation operator existed
in the language. The expt in contrib handles all of Clojure's numeric
types seamlessly. But what am I supposed to do with expt in Clojure
1.3? New expectations are being created with the new model -- some
people will expect expt with primitives to return primitives; some
will expect computation with longs to return bigints when necessary,
since exponentiation frequently overflows. Do I need to provide an
expt and expt' function to make both camps happy? (For that matter,
is there even a way to overload expt for both primitive longs and
primitive doubles, or do I need to make separate expt-long and
expt-double functions?) Are we going to see a proliferation of
variations for all mathematical functions once we start going down
This is an axis I think about a lot, and I hope this is something that
the Clojure dev team is carefully considering as well.
> This is an axis I think about a lot, and I hope this is something that
> the Clojure dev team is carefully considering as well.
+1 to all of that.
> I'll make a documentation update higher priority; hopefully that will help.
This should help. I feel like the discussion is going in circles because there's no single, official source that summarizes exactly what is happening with numerics in 1.3. (I know about http://www.assembla.com/wiki/show/clojure/Enhanced_Primitive_Support, but it's terse and a bit confusing.)
When Clojure compiles your function, it emits JVM bytecode for a new
class which is then loaded by the classloader. That JVM bytecode
defines a function (well, a method as far as the JVM is concerned)
which returns either a primitive type or Object. Your suggestion
would involve redifining the class while it is executing. That's not
possible on the JVM. Even if it were possible -- your function now
returns Object instead of long. But the variable that the result of
your function is about to be assigned to is a long, because that's
what your function used to be defined to return, and the next bytecode
operation in the calling code is the one that subtracts a primitive
long from a primitive long. Now what?
Fundamentally, Clojure has to contend with the fact that the JVM as a
platform distinguishes between primitives and Objects. The bytecode
operations which the Clojure compiler emits have to differ based on
that distinction. Essentially, the distinction cannot be magicked
away, because JVM bytecode is going to statically enforce that we're
either working with a (long or double) or a (Long or BigInt or
BigDecimal or whatever), and never the twain shall meet. So if we
ever want to be able to access the speed of the primitive bytecode
operations, then the primitive/Object distinction has to leak into
Clojure. (Or we have to redefine or move away from the JVM.
Obviously not really an option, but I mention it to point out that the
reason Clojure has to make this decision is that it's a hosted
language. That has a lot of benefits; this is one of the trade-offs
we have to contend with in return.)
I think everyone agrees that it's important to make the speed of the
primitive bytecode operations available in Clojure (whether or not
it's the default), so that rules out the option of always doing
auto-promotion. I think it's probably also agreed that allowing
unchecked overflow is not good (at least, no one seems to be arguing
So we're left with option (b) and a choice about the default behaviour
of the core library functions. If we default to boxing and treating
everything as an Object then we get the nice comfy numeric tower that
we never have to worry about, but the default case suffers in
performance. Otherwise, we default to primitive, and accept that if
we're dealing with numbers which might get bigger than Long.MAX_VALUE,
then we might need to explicitly use a BigInt to get contagion, or use
an operator like +' which will always deal with Objects.
By choosing to make speed the default preference in the core library
functions, I suppose there's more for Clojure programmers to think
about, because whenever you're dealing with numbers, you need to have
in the back of your mind the question of whether this might ever need
to be bigger than Long.MAX_VALUE, and so whether you might need +'
instead of +. Then again, how often do you write code that might be
doing maths with numbers that big and not realise it? For that
matter, how often do you write code that might be doing maths with
numbers that big and not spend time thinking carefully about its
The problem is that if you have an arbitrary form that can operate
entirely in primitives (some loop/recur perhaps) and you allow
primitives to magically convert to Objects in that code, then the
entire piece of code has to handle both primitives AND Objects and
every single sub-form must be capable of handling primitives as input
AND Objects as input and returning primitives if possible...
You can't have automatic promotion to Object from primitive and expect
any reasonable code to be generated that can maintain primitive
performance across arbitrary expressions. Either everything can work
with Objects - and you lose performance - or everything must be able
to work within primitives (and at most throw exceptions) and remain
Sean A Corfield -- (904) 302-SEAN
Railo Technologies, Inc. -- http://getrailo.com/
An Architect's View -- http://corfield.org/
"If you're not annoying somebody, you're not really alive."
-- Margaret Atwood
A lot of reasons for which it is not possible:
- it would mean coordinating two implementations/implementers.
- it would prevent to go to platform for which there is no support in
the other language.
- A type checker would not be really happy to deal with a lot of
Object -> Object functions...
- it would be ugly
Having a bit of (optional) type inference for performance and
compile-time safety in Clojure could be interesting though.
On 2011-01-15, at 4:06 PM, Stuart Halloway wrote:
>> In my experience, errors are the problem and we should be avoiding them, almost at all costs.
> This debate always starts by conflating three things into two, and then goes downhill from there. :-( It isn't
> (a) safe/slow vs.
> (b) unsafe/fast.
That's how us outsiders are left to look at it.
> It is
> (a) unsafe/incorrect value on overflow/fastest/unifiable* vs.
> (b) safe/error on overflow/fast/unifiable vs.
> (c) safe/promoting on overflow/slow/not-unifiable
> *unifiable: able to deliver same semantics for primitives and objects
This doesn't really help me understand your argument.
It looks to me as though Clojure is trying to steer itself through the middle of something. The trouble is that I don't know where the edges of the middle are.
Maybe it is just a documentation problem. But I'd also suggest that there's a bit of a sales job necessary here.
> We have thought about this quite a bit,
Nobody doubts that, certainly I don't. And I'm not trying to minimise or dismiss what you've done. And I'm not claiming that I've thought about it better or more or deeper. But I do have concerns and I don't see them being addressed, and I'd like it if they weren't minimised either. Maybe my concerns are completely addressed. Maybe not. I don't know, and I'd like to be convinced.
> and an argument from one axis only (e.g safe/unsafe) that doesn't even mention some of the other axes is not likely to be persuasive. Would be more interesting to see a new axis we haven't thought of...
Numerical correctness, for some of us, is an overwhelming issue. This is purely from experience... bad experience... 30+ years of bad experience in my case :-) From my point of view, the approach Clojure is taking isn't persuasive, not to say it couldn't be made persuasive.
I think I did add what might be considered an additional axis. Syntax. Specifically what annotations are needed and for what purpose. I don't think this should be dismissed out of hand.
> Stuart Halloway
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> For more options, visit this group at
Under the current proposal, you should never get an incorrect answer.
You might get an error, though. It's a subtle difference, but this is
the main reason why the developers don't see it as a "correctness
issue". If your program runs, and gives you back an answer, it will
be correct. If it crashes, you convert to the overflow version of
arithmetic, or typecast some of your numbers to bigints, and you'll
get the right answer. I think a lot of the argument from both sides
boils down to how much you fear runtime crashes.