I like most of the proposals. But this part troubles me:
# For bigints, specify bigints
* new literal bigint format 42N
# bigints contagious, no more auto-reductions
* (\+ 0N 21 21) => 42N
One of the big reasons I prefer Clojure over languages like Scala/F#
is that in Clojure, numbers "just work". I don't have to worry about
overflow errors, and the use of numbers in sets/hash tables is
generally pretty intuitive (it gets a little wonky if you're using
numbers produced by Java methods, but at least you're good if you stay
within Clojure).
As a case in point, consider writing a naive recursive factorial function.
(defn fact [n]
(if (zero? n) 1 (* n (fact (dec n)))))
In Clojure, (fact 3) produces the Integer 6. (fact 40) produces the
BigInteger 815915283247897734345611269596115894272000000000. It just
works. The answers produced by fact can be used in collections,
compared for equality with other numbers.
In F#, Scala, if you want factorial to not break on something like an
input of 40, you have to essentially use the bigint literal syntax
like what you're proposing:
(defn fact [n]
(if (zero? n) 1N (* n (fact (dec n)))))
Now, the results of this function are "contaminating bigints" which
can't be used as effectively with other Clojure data structures. As
you pointed out, it causes problems when the same number can have two
different type representations depending on the sequence of
computations used to get to that number. This will always be somewhat
of a problem due to Java interop, but it's important to me that all
numbers arrived at through Clojure computations just work with respect
to equality and use in sets/hash tables. The lack of auto-reduction,
in conjunction with the proposal to make equality take numeric type
into account, will really cripple the utility of bigints.
Even more frustrating, in my experience with languages like F# and
Scala, functions written like this are way, way slower. The reason is
that Java's bigint arithmetic is sloooooow. To protect yourself from
overflow, you've forced the function to use bigint arithmetic on
1N*2N*3N*..., well before you even reach a bigint. For this
particular function, it reaches bigint status quite quickly, so the
performance degradation is fairly minimal. But I have found that in
general, requiring programmers to explicitly use bigint arithmetic for
small numbers in order to protect from overflow is an absolute
performance killer. Clojure's technique of auto-promotion to bigint
upon overflow, and then auto-reduction when appropriate, is way, way
faster than using bigints all the time.
In summary:
No auto-reduction would make bigints second-class citizens and make
them difficult to use in other functions and collections.
No auto-promotion to bigints means that programmers who want to
protect from overflow have to use bigints everywhere, which destroys
performance.
So what would be the downside of implementing most of these proposals
(e.g., 1 is a literal for long 1), but retain auto-promotion and
auto-reduction rather than throwing an error for primitive long math?
Seems to me like this would be the best of both worlds. What am I
missing?
You raise several points, I'll take them in turn. First, things are
not as dire as you state. Certainly one could write a version of fact
that hardwired bigints as you showed. But you need not - here's the
original naive definition:
(defn fact [n]
(if (zero? n) 1 (* n (fact (dec n)))))
Such logic that doesn't specify one of the components (in this case,
n) is still polymorphic, as are operations with mixed components. So
the power is in the hands of the supplier of n:
user=> (fact 10)
3628800
user=> (fact 40)
java.lang.ArithmeticException: integer overflow
user=> (fact 40N)
815915283247897734345611269596115894272000000000N
I think you can always write code in this way, such that it will work
with any of the numeric types, without dictating.
----
It is true that Java's BigIntegers are slow, especially for numbers
small enough to fit in Long or Integer, so using them for smaller
numbers has overhead. But there are other options, not yet pursued
here. For instance, Kawa has a higher performing biginteger that uses
an internal int component until it overflows. I'm sure no one has a
particular attachment to BigInteger should we find a better
alternative. It would be easy to create a big integer class based on
the Kawa concept, perhaps with a long or int basis overflowing to
BigInteger itself.
http://groups.google.com/group/jvm-languages/msg/0413ed05d1371907
----------
As far as performance killer, that depends on your perspective. It is
my belief that overwhelming majority of code *never* uses numbers that
exceed long, And those people are suffering a 'performance killer' on
the order of 10-20x by having numbers be boxed all the time.
------
The interaction between equality and the BigInteger schism is
something I am still thinking about. That is why the branches are
staged. Only the 'equal' branch has the type-strict =
--------
The thing that you are missing is that there is no single
representation on the JVM* that can hold either a primitive or a
reference to an Object like a BigInteger. That is where the current
semantic difference creeps in - a primitive op sourced by a primitive
must have as its target a primitive - should it produce an object
there would be nowhere to put it. And any interface points (like the
math operations themselves) cannot dynamically take either a primitive
or an object. So the auto-promotion has an unavoidable cost of the use
of objects to hold all values, and the allocation thereof. Opting out
of the objects means a change in semantics. Auto-promotion cannot be
made a uniform semantic for both primitives and objects.
---------
Another alternative is to provide a second set of operations with auto-
promoting, object based semantics. These would never operate on
primitives. Since a key argument for auto-promotion is the poor
behavior of small bigints, making a better bigint seems preferable.
--------
In the end, this does put the interests of two sides of the community
at odds. Only one set of interests can be the default. Part of the
point of this discussion is to measure the sides (so speak up people!).
I think the current default doesn't serve the majority well, and has a
semantic rift. Making ordinary programs that use longs and doubles run
fast requires extensive and tedious annotation that is easy to get
wrong. This work proves it could be completely otherwise.
Rich
* There are proposals:
http://blogs.sun.com/jrose/entry/fixnums_in_the_vm
Note that even on mature platforms like CL with very good tagging and
fixnum systems, actually getting at full-width 32- and 64-bit hardware
operations requires extensive, tedious and error-prone annotation.
I.e. a tagged number is not a hardware number.
In the end, this does put the interests of two sides of the community at odds. Only one set of interests can be the default. Part of the point of this discussion is to measure the sides (so speak up people!).
Going back to the naive factorial function:
(defn fact [n]
(if (zero? n) 1 (* n (fact (dec n)))))
Right now,
user=> (fact 40)
815915283247897734345611269596115894272000000000
Under the proposed changes,
user=> (fact 40)
java.lang.ArithmeticException: integer overflow
So the same code now risks an arithmetic exception. What have I, the
programmer, gained from this new level of risk? The answer, if I'm
the kind of programmer who has no interest in putting in type
annotations into the header of the function, is nothing. There is no
speed improvement without the additional type annotations. So all
I've gained is the /potential/ for a speed improvement.
I assume that most Clojure users really like its dynamic nature. If
this is true, then for most of us, the common case is to NOT annotate
our code with types. Certainly I like the idea of making it as easy
as possible to write fast code in Clojure. I want my dynamically
typed code to be as fast as it can be, and it's nice to know that
there is a way to write statically typed code that is even faster.
But I really don't want to incur a penalty (i.e., possibility of
Arithmetic Exception) when I'm not using type annotations.
I agree that the burden on those who want to write type annotated code
is too high. I agree that this should be made easier. But
ultimately, I still want the burden to be on them (or me, the 1% time
I need it), not those of us who prefer unannotated code. I that some
of the changes you have proposed for moving the annotations into the
header, along with literal notation for long (i.e., 0L rather than
(long 0)) would be preferable to the bigint dichotomy.
I agree that an improved bigint type would address the performance
aspect of my argument. On the other hand, it wouldn't really address
the problem of using these things in sets and hash tables. I think
it's very important that Clojure have one unique canonical
representation for numbers arrived at through Clojure-based
computations. It is very difficult to use numbers sanely if that's
not the case.
I see your point that there are a couple of sides to this argument. I
hope I can persuade some more people to speak up for my side :)
I assume that most Clojure users really like its dynamic nature. If
this is true, then for most of us, the common case is to NOT annotate
our code with types. Certainly I like the idea of making it as easy
as possible to write fast code in Clojure.
The problem is that it distinctly *not* easy to write fast numeric code in Clojure. It requires expert Clojure knowledge. On the other hand, the mental burden for someone who wants BigInts in the new system is very low - you'll get a trivial to track exception.
> You raise several points, I'll take them in turn. First, things are not as dire as you state. Certainly one could write a version of fact that hardwired bigints as you showed. But you need not - here's the original naive definition:
>
> (defn fact [n]
> (if (zero? n) 1 (* n (fact (dec n)))))
>
> Such logic that doesn't specify one of the components (in this case, n) is still polymorphic, as are operations with mixed components. So the power is in the hands of the supplier of n:
>
> user=> (fact 10)
> 3628800
>
> user=> (fact 40)
> java.lang.ArithmeticException: integer overflow
>
> user=> (fact 40N)
> 815915283247897734345611269596115894272000000000N
That's fine for fact, but as a consumer of library functions, how do I know when I should pass in bigints? How do I know when an intermediate value, for my particular combination of parameter values, is going to overflow?
Should I just use 'N' on every number to every function? This proposal destroys the black-box nature of functions, and seems to me to be the easy solution rather than the best solution, which, I acknowledge, might be very hard work, possibly involving every such function having 2 polymorphic variants, one generic boxed form and one unboxed form, automatically selected and call-site-cached like OO dispatch.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
If you pick up a starving dog and make him prosperous, he will not bite you. This is the principal difference between a man and a dog.
-- Mark Twain
That's fine for fact, but as a consumer of library functions, how do I know when I should pass in bigints? How do I know when an intermediate value, for my particular combination of parameter values, is going to overflow?
On the other hand, the mental burden for someone who wants BigInts in the new system is very low - you'll get a trivial to track exception.
Right, but with just the prim branch and a shorthand for long literals, you get:
(defn ^:static fib ^long [^long n]
(if (<= n 1L)
1L
(+ (fib (dec n)) (fib (- n 2L)))))
which really isn't that bad. So my claim is that the prim branch gets
you far enough along the road to easy fast numeric code, without
screwing with the numeric tower.
How about inside of a static function with type annotations, all
literals are automatically converted into long and doubles, otherwise
behavior stays the same.
So:
(defn ^:static fib ^long [^long n]
(if (<= n 1)
1
(+ (fib (dec n)) (fib (- n 2)))))
is automatically converted to
(defn ^:static fib ^long [^long n]
(if (<= n (long 1))
(long 1)
(+ (fib (dec n)) (fib (- n (long 2))))))
whereas
(defn fib [n]
(if (<= n 1)
1
(+ (fib (dec n)) (fib (- n 2)))))
works the way it always did.
This behavior makes a lot of sense because (I think) the only time you
really gain from literals being primitives is when the variables
around it are also hinted. Otherwise, the polymorphic nature of the
variable will cause the literal to get boxed anyway.
That's pretty much what Common Lisp compilers do. The compiler (or
your own compiler macro, if you so choose) can examine the input forms
and the compiler environment to decide what code to produce.
The following is from Allegro Common Lisp, but all of the input is
ANSI CL -- part of the spec.
Observe how the compiler switches between different implementations of
trivial addition as it gets a progressively looser or tighter bound on
x. (I've elided some of the irrelevant preamble.)
What would be nice is if Clojure and/or the JVM could produce safe
code/bytecode that wouldn't overflow, but could also be JIT-optimized
into unboxed primitive arithmetic for known input ranges.
;; Range known, optimizing for speed: assembly add instruction.
cl-user(7): (disassemble
(compile nil
(lambda (x)
(declare (type (integer 0 255) x)
(optimize (speed 3)))
(+ x 2))))
...
15: 48 83 c7 10 add rdi,$16
19: f8 clc
20: 4c 8b 74 24 10 movq r14,[rsp+16]
25: c3 ret
;; We know that x is a fixnum: the arithmetic doesn't need a
function call, but
;; the result will be a boxed bignum, because fixnum + 2 won't fit
in a fixnum.
;; If this function were inlined elsewhere, the compiler might
avoid that decision.
cl-user(8): (disassemble
(compile nil
(lambda (x)
(declare (type fixnum x)
(optimize (speed 3)))
(+ x 2))))
...
21: 74 01 jz 24
23: 17 (pop ss) ; sys::trap-signal-hit
24: 48 c1 ff 03 sar rdi,$3
28: 49 c7 c5 02 00 movq r13,$2
00 00
35: 49 03 fd addq rdi,r13
38: 4c 8b ef movq r13,rdi
41: 49 d1 e5 sall r13,$1
44: 70 19 jo 71
46: 49 d1 e5 sall r13,$1
49: 70 14 jo 71
51: 49 d1 e5 sall r13,$1
54: 70 0f jo 71
56: 49 8b fd movq rdi,r13
59: f8 clc
60: 48 8d 64 24 78 leaq rsp,[rsp+120]
65: 4c 8b 74 24 10 movq r14,[rsp+16]
70: c3 ret
71: 41 ff 57 47 call *[r15+71] ; sys::box-to-new-bignum
75: eb ef jmp 60
77: 90 nop
;; Nothing known: call to excl::+_2op, the two-arg addition function.
cl-user(9): (disassemble
(compile nil
(lambda (x)
(declare (optimize (speed 3)))
(+ x 2))))
...
21: 74 01 jz 24
23: 17 (pop ss) ; sys::trap-signal-hit
24: 40 f6 c7 07 testb dil,$7
28: 75 12 jnz 48
30: 48 83 c7 10 add rdi,$16
34: f8 clc
35: 70 1e jo 67
37: 48 8d 64 24 78 leaq rsp,[rsp+120]
42: 4c 8b 74 24 10 movq r14,[rsp+16]
47: c3 ret
48: 49 8b af 2f ff movq rbp,[r15-209] ; excl::+_2op
ff ff
55: 48 c7 c6 10 00 movq rsi,$16 ; 2
00 00
62: ff 53 d0 call *[rbx-48] ; sys::tramp-two
65: eb e2 jmp 37
67: 49 c7 c5 10 00 movq r13,$16 ; 2
00 00
74: 49 2b fd subq rdi,r13
77: eb e1 jmp 48
79: 90 nop
> On Fri, Jun 18, 2010 at 12:24 AM, Antony Blakey <antony...@gmail.com> wrote:
>
> That's fine for fact, but as a consumer of library functions, how do I know when I should pass in bigints? How do I know when an intermediate value, for my particular combination of parameter values, is going to overflow?
>
> If the library isn't communicating that to you isn't that just a broken API that you don't want to participate in? Tag information on arglists is available:
No. How do I know whether I need to pass a bigint literal or not? For what possible input values will overflow be an issue? It's easy to see with fact, but that's because the function implementation is clearly visible, and has well known growth properties of a single argument. And it's not just literals - when functions are composed from other functions that have this same problem and return primitive types - hence using the static versions of functions - the question in general becomes impossible to answer, not only because there might be a number of parameters with interdependencies, but because the structure of intermediate results may not be visible to the consumer. And what if you don't have the source to the function? This proposal not only puts the burden of promotion and representation choice onto the user of a function, but it requires that promotion be done before the function is executed, in a precautionary way, rather than iff needed.
This proposal is IMO a very bad idea.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Lack of will power has caused more failure than lack of intelligence or ability.
-- Flower A. Newhouse
> On Fri, Jun 18, 2010 at 12:24 AM, Antony Blakey <antony...@gmail.com> wrote:This proposal is IMO a very bad idea.
It's a composability issue.
Elaborating on Anthony's explanation, let's say you call (fact (foo n)).
Do you know what values of n, when passed to foo, produce a value
large enough that fact will produce an exception? If you do know
this, do you know enough about the internals of foo to know the best
way to fix the problem? If you pass a bigint to foo, are you
guaranteed to get a bigint as an output, or do you need to explicitly
wrap a protective call to bigint between the two calls? In other
words, do you call (fact (foo 20N)) or (fact (bigint (foo 20))?
This imposes too high a burden on any programmer who cares about safety.
This imposes too high a burden on any programmer who cares about safety.
Elaborating on Anthony's explanation, let's say you call (fact (foo n)).
This imposes too high a burden on any programmer who cares about safety.
This imposes too high a burden on any programmer who cares about safety.Don't buy it. That's the whole point of BigInt contagion. If fact and foo are correctly written this will work.
+1, what I've been trying to say.
> I wonder: is there's a middle ground, where primitives are automatic almost everywhere, but bignum promotion on overflow is still possible?
That's what I was suggesting by having two versions of functions, one with primitives and one with boxed values. I did some investigation on this for a Smalltalk JIT using LLVM, in the form of partial specialisation e.g. dynamically generating type-specialised versions of functions (methods in ST) and using the types available at the call site to dispatch - which is more obvious in Clojure because we already have multi-method dispatch. For Clojure you wouldn't test all types, it could just be a binary choice of all primitives where expected or not Of course where a type-specialised function calls another type-specialised function you can avoid the dispatch entirely because you have the type information at the call site. This way, you can pay for the type-driven dispatch once at the start of a computation and end up doing minimal testing (basically dynamically checking that type invariants are maintained e.g. args haven't been promoted in the case under discussion). IMO this is speed + safety.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Reflecting on W.H. Auden's contemplation of 'necessary murders' in the Spanish Civil War, George Orwell wrote that such amorality was only really possible, 'if you are the kind of person who is always somewhere else when the trigger is pulled'.
-- John Birmingham, "Appeasing Jakarta"
This nicely summarizes my thoughts. The other parts of Rich's notes
seem very elegant, but I can see lack of promotion leading to problem in
my own code. Some external input causing an intermediate result to
overflow would certainly happen. Probably at the least convenient time
possible.
(Have we discarded the concept of a set of non-promoting math functions
that are only used during optimization? prim-+ prim-/ etc.)
Cheers,
Chris Dean
It can already happen: a library can screw you by design.
(defn fact [n] (loop [r (int 1) n (int n)] (recur (* r n) (dec n))))
user=> (fact 40)
java.lang.ArithmeticException: integer overflow (NO_SOURCE_FILE:0)
Actually people who want numerical perfs have to do very invasive type
hinting and other interfaces workaround because primitives are...
primitives (not objects).
With contagious bigints (let's nick name them "safeints" and assume
they are not BigInteger but something à la kawa) a single N on
literals or having all your inputs going through a safeint conversion
would trigger safe computations (unless you try to call a static
primitive-hinted fn but the compiler would then bark).
Of course a rogue fn can still internally convert to primitives and
overflow but you can't do anything to protect you from that.
As for mixing several numebr representations in a single collection, I
think it's a problem of normalizing data but most of the time your are
either in safeland or in fastland so your representations will be
homogeneous (safehints or (boxed) primitives).
A middleground may be to keep the semantics proposed by Rich but:
* integer literals default to safeint (unless suffixed by L for long)
* scientific notation and decimal literals default to double (unless
suffixed by M or N)
* implement a better bigint.
Christophe
--
European Clojure Training Session: Brussels, 23-25/6 http://conj-labs.eu/
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.cgrand.net/ (en)
QFT -- I'm in absolute agreement.
It seems to me that in its current shape, the new design -- actually
the "no automatic promotion/demotion" part -- would amount to
sacrificing correctness for performance *by default*; that doesn't
strike me as a good way to go. Not a single language I like using does
that... this of course includes my favourite language, Clojure, as it
currently stands on the master branch. :-)
Part of the reason why I think this is a bad idea is that while it is
all very well to say that people should know what they're doing,
libraries should be well-designed and have excellent test coverage
etc., this is not what happens in practice. (It's also perfectly
possible to write correct concurrent programmes with locks...). And
not necessarily through lack of dilligence; I know I have learned many
a gotcha by getting bitten by it (and rather expect to accumulate
plenty more "bites" in the years to come).
I suppose that's fine as long as one is learning something useful in
the process, but I should think that top numeric performance is
something required only by a certain part of the community -- a
sizeable one, no doubt, but hardly a majority share -- in fact
comprising the people most likely to know what they're doing when
dealing with numbers (why would someone *not* doing involved numeric
computations know what s/he's doing with numbers on a comparable level
to someone who is?). Making things significantly easier for them would
be welcome, of course, but perhaps not at the cost of making
*everyone* deal with entirely new kinds of complexity to support that.
I wouldn't know if the "good" solution Antony Blakey mentions is
feasible currently, but if it isn't, I'd prefer the language to err on
the side of caution, while perhaps offering a better system for
hand-annotating things for top performance (some people have been
using pretty ingenious macros to ease things a little -- perhaps these
could be improved upon to produce an acceptable close-to-universal
solution)?
Right, so these are my initial impressions. I'll continue thinking
about this, the new developments do seem attractive in many ways
(^:static is very cool indeed!) and I'm curious to see if I might be
able to bring myself to see more positive sides to the long/bigint
split (as well as come to an opinion on the equals issue).
All the best,
Michał
> Don't buy it. That's the whole point of BigInt contagion. If fact and fooBigInt contagion doesn't help if in some convoluted manner a BigInt's
> are correctly written this will work.
value is used to construct a primitive sufficiently large that later
causes an overflow.
What I would like to avoid, is having to become expert in the field to
expect the prevention of more :
* nightmare debug sessions because of my own code or third party
libraries code
* crash in production code
which might derive from the choices.
I tend to think that it seems wrong to require more knowledge from
people wanting to write *more* correct code than from people wanting
to write faster/performant code. This is were I would place the
cursor.
2010/6/18 Carson <c.sci...@gmail.com>:
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
The solution is easy: write java code by default, and clojure code
when you want to be safe :-p
One way I think about this is that the meaning of +, for example, is the sum of its numeric arguments. The performance characteristics are not part of this core meaning, although of course everyone will be happier the better the performance is. Nonetheless, I think that the meaning should be correctly maintained as a first priority, and that means giving the right answer regardless of how many bits are needed.
I frequently produce BigIntegers, even in code that's not particularly math-heavy. It has always galled me that in most programming languages any such code breaks because of low-level implementation issues that I don't want to have to think about most of the time. I usually just want the right answer, and I certainly don't want to have to think about the binary representations every time I write code that handles numbers. If I find that my code is too slow then I'll have to think about it, and that's when it makes sense to have to deal with declarations or type hints or special faster unsafe arguments or whatever. Ideally this will take as little work as possible, but I think it'd be a bad approach to make fast free (default) at the cost of saying "but the meaning of + may not actually be numeric sum, and this depends on the number of bits required to represent your numbers, so you better always be thinking about low level representation stuff whenever you use numbers."
-Lee
--
Lee Spector, Professor of Computer Science
School of Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspe...@hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438
Check out Genetic Programming and Evolvable Machines:
http://www.springer.com/10710 - http://gpemjournal.blogspot.com/
(defn fact [n] (if (zero? n) 1 (* n (fact (dec n)))))
(defn twice-fact [n] (fact (fact n)))
(defn bad-twice-fact [n] (fact (-> n fact range last inc)))
This L suffix idea is a non-starter. Here's why:
We're in a reader based language. The reader reads objects. When the
reader reads 42, it must return an object. It cannot return an int or
a long primitive. Right now, it returns an Integer. I've also
conducted experiments where it returned a Long. Certainly 42000000000
will be a Long. What different thing would the reader return given
42L, or 42000000000L? Given an int 42 what should the printer print?
And long 42, Long 42? Would (= 42L (read-string (pr-str 42L)))?
------
The second thing I don't see being acknowledged is that the real
problem here is the semantic difference. It's not a matter of getting
at a primitive operation when handed a boxed number - that already
happens, and pretty efficiently. It is the fact that the results have
to go somewhere, and the difference in 'where' creates a difference in
semantics. In master right now:
(+ x y)
and
(+ (identity x) y)
can produce different results. We already have bug reports around
this, and user confusion. Fixing it requires adopting a single
semantic for +. Choosing auto-promotion precludes the use of + on
primitives, a huge loss IMO. Choosing throw-on-overflow precludes auto-
promotion, but preserves polymorphism otherwise, and gives the
compiler free reign to optimize since it will never be altering the
semantics by doing so.
I don't see a way around this fundamental dichotomy. The semantics for
+ should be unified, and the operator that makes the other choice
needs to be called something else, or placed somewhere else.
Rich
> I don't see a way around this fundamental dichotomy. The semantics for + should be unified, and the operator that makes the other choice needs to be called something else, or placed somewhere else.
My preference would be "placed somewhere else". We have namespaces, so why not profit from them?
My vote would be for:
1) clojure.core/+ doing bigint promotion and reduction under all circumstances (no optimizations that would break the semantics)
2) clojure.fastmath/+ doing type-specific and optimized addition.
This would retain the "safe by default" approach that I think fits Clojure best, and yet make it easy to use the "fast" version everywhere with nothing to change but the ns form at the top of a module.
Konrad.
------
While range could be written differently, this is tantamount to saying
- if I use a bigint to lookup another number in some mapping, it might
not be a bigint. Yes, ok, but not a real issue.
-------------
Note - I don't mean to single you out, as others are using the same
language, but I find this whole characterization of throw-on-overflow
being somehow 'unsafe' to be a bit specious.
I want it to be perfectly clear to everyone, this is not a choice
between good math and bad math, i.e. C/Java style truncating/wrapping
of results. That will never be the default. Such silent truncation/
wrap is truly unsafe, and often the proposed alternative. It was a
surprising result that optimizations of hot spot, combined with
pipelining and speculative execution on modern processors, makes the
overflow checking a very minor overhead above the raw operations, and,
in my mind, a third, new, viable and safe high-performance option.
Is a program that might throw an exception unsafe? If so, no Java
program is safe.
Is a vector that throws an exception when asked to store something at
an index out of range, rather than grow to accommodate it, unsafe?
This is in many ways similar - you are creating a result that doesn't
fit, and need to be using a bigger container.
So, let's go easy on the hyperbole. The behavior might not be what you
desire, but it is not unsafe.
Rich
Which then begs the questions:
- how will someone 'protect' themselves from libraries written using
fastmath?
- similar looking code will change semantics when the ns switch is
made - seems dangerous as it might violate the presumptions of the
original authors.
Rich
Fair enough.
No please, not even in my worst nightmares !
Semantic changing compilation flags ? So I use lib A which needs lib B
compiled with flags C and D, but I also use lib E which needs lib B
compiled with other values for the flags ... I think this just is a
recipe for disaster.
No please, not even in my worst nightmares !
Semantic changing compilation flags ? So I use lib A which needs lib B
compiled with flags C and D, but I also use lib E which needs lib B
compiled with other values for the flags ... I think this just is a
recipe for disaster.
> Thanks for the responses.
>
> Going back to the naive factorial function:
> (defn fact [n]
> (if (zero? n) 1 (* n (fact (dec n)))))
>
> Right now,
> user=> (fact 40)
> 815915283247897734345611269596115894272000000000
>
> Under the proposed changes,
> user=> (fact 40)
> java.lang.ArithmeticException: integer overflow
>
> So the same code now risks an arithmetic exception. What have I, the
> programmer, gained from this new level of risk? The answer, if I'm
> the kind of programmer who has no interest in putting in type
> annotations into the header of the function, is nothing. There is no
> speed improvement without the additional type annotations. So all
> I've gained is the /potential/ for a speed improvement.
I fully acknowledge this is a breaking change that doesn't suit your
preferences. But again, you are over- and misstating things.
The benefit does not only accrue to those annotating their arguments.
Because many functions and methods return primitives (and more will,
now that fns can return primitives), any operations on their results
and/or literals can be optimized. E.g. things like (if (< (dec (count
x)) 100) ...) can now be optimized into primitives.
>
> I assume that most Clojure users really like its dynamic nature. If
> this is true, then for most of us, the common case is to NOT annotate
> our code with types. Certainly I like the idea of making it as easy
> as possible to write fast code in Clojure. I want my dynamically
> typed code to be as fast as it can be, and it's nice to know that
> there is a way to write statically typed code that is even faster.
> But I really don't want to incur a penalty (i.e., possibility of
> Arithmetic Exception) when I'm not using type annotations.
>
> I agree that the burden on those who want to write type annotated code
> is too high. I agree that this should be made easier. But
> ultimately, I still want the burden to be on them (or me, the 1% time
> I need it), not those of us who prefer unannotated code. I that some
> of the changes you have proposed for moving the annotations into the
> header, along with literal notation for long (i.e., 0L rather than
> (long 0)) would be preferable to the bigint dichotomy.
>
> I agree that an improved bigint type would address the performance
> aspect of my argument. On the other hand, it wouldn't really address
> the problem of using these things in sets and hash tables. I think
> it's very important that Clojure have one unique canonical
> representation for numbers arrived at through Clojure-based
> computations. It is very difficult to use numbers sanely if that's
> not the case.
>
Agreed that this equality bit is a separate issue.
Rich
(1) examples of real code that fail on reasonable inputs, if this
change were made. And how difficult it would be to fix them.
(2) examples of real code that get way faster, and the ease of making
those changes (if any).
WRT #1, we have tested a few dozen libs, and found only a single
trivial breakage in their test suites. But I would say tha is still
just plural anecdotes. More data welcome!
But what if the reader reads 42 as a BigInteger (or an alternative
implementation) and 42L as a Long?
then you should be able to tell them apart.
I agree with the goals of the num branch, I'm just unconvinced by the
reader defaulting to Long for integer literals.
Christophe
In generally it seems we have two options:
1) We make fast default and loose auto promotion in the number stack.
2) We make it more difficult to be fast (as in extra work) but keep promotion.
Lets hope that I got this right otherwise the following argumentation can be skipped since it's based on this understandings. (of cause there might be a way to accomplish both which would - without question the best way to go ;)
Now let see the auto promotion has one very nice effect, it just works. That simple and this is in my eyes hugely important, because it gives people a happy and fluffy feeling to be able to write normal code without special considerations and deep knowledge of types and hinting. Lets face it for 90% of people it does not matter that it is a long or bigint or even a double they expect math to just work, this is one of the reasons why people enjoy languages like Ruby or buy Apple hardware - because it just works (yes we don't play on the same ground but it was a good principle of it). If we go to the point where people need to know that they have to pass a 42N instead of 42 or get an exception it is just plain ugly, it is javaish and that things are not javaish is one thing poeple love about clojure, and just to not be tossed in the bucket, it is not insecure just horrible inconvenient and annoying.
But lets look at the other side of the coin, performance. We all want it - of cause otherwise we'd put a lot of (Thread/sleep 1000) in our code but, again lets face the reality, most of us don't need it. Most of the time we don't need to do fac or fib or very fast math, in 90% of the cases what clojure offers is just fast enough since the bottle necks isn't arithmetics. For the rest 10% of the cases, well if you are in that case you likely have a complex algorithm. This again means that you belong to the people who really know what they are doing, and know what your algorithm works, given that someone in this position should know where to put performance optimization so the fact that it needs to be explict isn't that horrible in my eyes usually the code that has to be made more complex is alway some kind of focus code where you do way more then just toss a few statics on it and hope that it is fast but try every trick to get it fast - so you'll have the work anyway.
Now what is my conclusion: lets keep the 'it just works' and sacrifice some out of the box performance gains which 90% of the users won't notice anyway and half of the people notice it will only do so because they benchmark for exactly this. Especially since number arithmetics are rarely the bottlenecks, database lookups, string manipulations, IO, network and object allocation those are things that eat much performacne in the most cases.
Just my two (european!) cent :)
That said I find the possible performance gain extraordinary and brilliant, and I'm all for it just not for the sacrifice of 'it just works'. I rather write (long 1) in 2 functions where it is really needed then making sure I write 42N or ^Bigint in every function where I'm not 100% certain (which will likely be all but the two functions where I'd write logn).
Regards,
Heinz
It will be much easier for people to write fast library code (including in Clojure and contrib). So if you ever use Clojure libraries, you will get a speed boost.
But lets look at the other side of the coin, performance. We all want it - of cause otherwise we'd put a lot of (Thread/sleep 1000) in our code but, again lets face the reality, most of us don't need it. Most of the time we don't need to do fac or fib or very fast math, in 90% of the cases what clojure offers is just fast enough since the bottle necks isn't arithmetics. For the rest 10% of the cases, well if you are in that case you likely have a complex algorithm. This again means that you belong to the people who really know what they are doing, and know what your algorithm works, given that someone in this position should know where to put performance optimization so the fact that it needs to be explict isn't that horrible in my eyes usually the code that has to be made more complex is alway some kind of focus code where you do way more then just toss a few statics on it and hope that it is fast but try every trick to get it fast - so you'll have the work anyway.
> On Fri, Jun 18, 2010 at 2:49 PM, Rich Hickey <richh...@gmail.com>
> wrote:
>> This L suffix idea is a non-starter. Here's why:
>>
>> We're in a reader based language. The reader reads objects. When
>> the reader
>> reads 42, it must return an object. It cannot return an int or a long
>> primitive. Right now, it returns an Integer. I've also conducted
>> experiments
>> where it returned a Long. Certainly 42000000000 will be a Long. What
>> different thing would the reader return given 42L, or 42000000000L?
>> Given an
>> int 42 what should the printer print? And long 42, Long 42? Would
>> (= 42L
>> (read-string (pr-str 42L)))?
>
> But what if the reader reads 42 as a BigInteger (or an alternative
> implementation) and 42L as a Long?
> then you should be able to tell them apart.
>
And keeps them that way? Hrm. Then most programs, that currently have
no bigints, will now be full of them. And the boxed math will be
slower still, as it will be the (possibly improved, but still more
complex) logic of bigints. And the interop will be trickier. The
current boxing matches the boxing you will get from things from the
outside world. This wouldn't. And if this is just a 'way to
communicate with the compiler' thing, then read/print round-tripping
will break.
> I agree with the goals of the num branch, I'm just unconvinced by the
> reader defaulting to Long for integer literals.
>
It's important not to get focused on the literals. There are
primitives coming from other places as well. The semantic dichotomy is
the driver. The choice of the semantics of + can't be based upon the
boxed-ness of the argument, as it is now.
Rich
Point is, no one will write 250 variables even less so values by hand, it will be a map, reduce or whatever and then it's pretty easy to type hint it if the performance is an issue at this point and if you write this code you likely will know that this is a bottleneck. (I ignore the * there since you corrected it, otherwise it'd be a nice Palce to point out that it is likely becoming a overflow and an exception w/o automatic promotion :P)
Regards,
Heinz
Point is, no one will write 250 variables even less so values by hand, it will be a map, reduce or whatever and then it's pretty easy to type hint it if the performance is an issue at this point and if you write this code you likely will know that this is a bottleneck. (I ignore the * there since you corrected it, otherwise it'd be a nice Palce to point out that it is likely becoming a overflow and an exception w/o automatic promotion :P)
If you are doing a lot of work with whichever semantic requires decorated
use, then it is going to seem awkward. A namespace based solution would
allow "natural" use of whichever semantic was appropriate for the domain
you are coding. It would still require an explicit choice - pulling in a
different namespace.
> - how will someone 'protect' themselves from libraries written using
> fastmath?
If I understand correctly, that is still an issue, whatever syntax is
chosen.
> - similar looking code will change semantics when the ns switch is made
The switch is explicit.
> - seems dangerous as it might violate the presumptions of the original
> authors.
Not sure I understand this. Presumably if you change your selected
semantics within a namespace, you had better be sure of how that affects
all the functions in that namespace.
--
Hugo Duncan
I am skeptical of this claim. Since static functions lose some of
their dynamism, I don't think library writers are going to rush to
convert all their code over to the "fast" format. I think library
writers will care the most about ensuring their functions play well
with Clojure's dynamic ecosystem, and so, as before, type-optimized
code will be written only sparingly. (I'll admit though, I don't
fully understand yet which dynamic features are lost by static
functions, and how much of an impact it will have, so I could be wrong
about this. Just wanted to throw this out as something to consider.)
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
Docs (see update section at the top)
https://www.assembla.com/wiki/show/b4-TTcvBSr3RAZeJe5aVNr/Enhanced_Primitive_Support
Code:
http://github.com/richhickey/clojure/commit/c79d28775e06b196ae1426f6c1446d00b621d2e1
Thanks to all for the feedback, keep it coming!
Rich
On Jun 17, 2010, at 4:13 PM, Rich Hickey wrote:
> I've been doing some work to enhance the performance, and unify the
> semantics, of primitives, in three branches. I've started to document
> this work here:
>
> https://www.assembla.com/wiki/show/clojure/Enhanced_Primitive_Support
>
> Feedback welcome,
>
> Rich
>
>
> I've revised and enhanced the strategy, based upon the feedback here. I think it is a nice compromise.
I agree this is a very nice compromise :) great work again and it is really cool to see how a community effort shapes clojure itself :D
Regards,
Heinz
I've revised and enhanced the strategy, based upon the feedback here. I think it is a nice compromise.
Docs (see update section at the top)
https://www.assembla.com/wiki/show/b4-TTcvBSr3RAZeJe5aVNr/Enhanced_Primitive_Support
Code:
http://github.com/richhickey/clojure/commit/c79d28775e06b196ae1426f6c1446d00b621d2e1
Thanks to all for the feedback, keep it coming!
Rich
I'm new to clojure, and while I have an interest in fast number
crunching, and in easily using big numbers, I don't have a valid
opinion on the whole matter.
However, I know that some time back, Python changed in precisely the
opposite direction - from separate small and big numbers to a unified
numeric type. Their situation is completely different (not least
because all Python objects are effectively boxed, so the speed benefit
of primitive arithmetic isn't relevant) but there may be real-world
examples of code that demonstrates (1) in the Python discussions. If
the information might be useful, I'd be happy to do some searching of
the archives.
Paul.
Actually the new idea with the + / +' split addresses this, so the
Eulerians (?) among us may be happy again. I take it you've posted
without reading the bottom of the thread or your message was stuck in
the moderation queue...
Incidentally, at first glance, I love the new tip of equal. Really
exciting stuff! :-)
Sincerely,
Michał
Daniel the decision was to keep things working and add speed as an option by using +' (and friends) specifically. So actually what happened is exactly what you wished for :).
Just tested with the newest code:
user=> (defn fact [n] (if (zero? n) 1 (*' n (fact (dec' n)))))
#'user/fact
user=> (fact 42)
java.lang.ArithmeticException: integer overflow (NO_SOURCE_FILE:3)
user=> (defn fact [n] (if (zero? n) 1 (* n (fact (dec n)))))
#'user/fact
user=> (fact 42)
1405006117752879898543142606244511569936384000000000N
Regards,
Heinz
Looks good to me.
Apparently loop/recur locals are now primitive by default:
(defn fact [n]
(loop [n n r 1]
(if (zero? n)
r
;; note the regular * on the next line
(recur (dec n) (* r n)))))
will throw ArithmeticException because of integer overflow, because r
is a long. The solution provided by Rich is to "hint" r to be a boxed
number: (num r) => gives old behaviour. I'm vaguely in favour of
switching around to boxed-by-default locals, but that's just a very
early initial impression... Just wanted to point this out for the
consideration of others here.
Sincerely,
Michał
Thanks to all for the feedback, keep it coming!
How often does the warning go off if you go back to
arbitrary-precision by default, but make literals return boxed numbers
again? I suspect your comparison is unfair because lots of the loop
initializers are probably literals.
---
A concern I have is that interop forms that return primitives can
cause a loop var to have a primitive type.
(loop [procs (.availableProcessors Runtime/getRuntime)] ; Current
clojure makes procs a primitive, do people really expect that?
Would it be possible for loop to use boxed numbers always by default,
and only use primitives if some specific metadata is on the
initializer?
So in that world:
(loop [i 5] ; Boxing is always done by the loop statement, even if
initialized with a primitive
(loop [procs (.availableProcessors Runtime/getRuntime)] ; Boxing is
done by the loop statement
(loop [^:primitive procs (.availableProcessors Runtime/getRuntime)] ;
Primitive is used, good because we are expecting it
I'm not sure which order the metadata should go in such a scheme.
(loop [^:primitive i 5]
(loop [i ^:primitive 5]
(loop [i (prim 5)]
--Aaron
How about keeping the arbitrary-precision default, but add a loop'
construct to the family of +',-',*',inc', and dec' for primitive
optimization? The loop' construct would bind primitive literals,
whereas the loop construct would keep the literals boxed, so people
who don't want to analyze types rigorously in their loop/recur
constructs won't get mismatch errors.
I see that using primitive literals by default in loop/recur offers
speed benefits for the common case (recur type matches the
loop-declared type), but I worry about how hard it might be to analyze
and make sure that I'm always recurring with a type that exactly
matches what is in the loop.
Any other ideas on how to solve the problem of mismatch errors for
"casual coders"?
>> I've also temporarily enabled a diagnostic (in both) that tells you
>> when you have a mismatch between a loop initializer and its recur
>> form. It goes off over a hundred times in Clojure itself, when using
>> the arbitrary precision default. In each case, the recur value is
>> needlessly being boxed, every iteration of a loop, for loops that
>> will
>> never be bigints; indexes, counters etc. I expect this will be very
>> typical of most code. But removing that useless overhead would be a
>> lot of tedious work.
>>
>> With the defaults swapped, only 2 warnings.
>>
>
> How often does the warning go off if you go back to
> arbitrary-precision by default, but make literals return boxed numbers
> again? I suspect your comparison is unfair because lots of the loop
> initializers are probably literals.
>
It would be easy to make it go away. The point is not the mismatches
themselves, but they serve as a counter - how many loops had primitive
init and needed only primitive arithmetic? Each warning is a case
where it could be automatically fast but isn't. Making the warning go
away will leave the inefficiency but make it silent. It is just a
measurable loss to counter all the theoretical arguments being made.
> ---
>
> A concern I have is that interop forms that return primitives can
> cause a loop var to have a primitive type.
>
> (loop [procs (.availableProcessors Runtime/getRuntime)] ; Current
> clojure makes procs a primitive, do people really expect that?
>
> Would it be possible for loop to use boxed numbers always by default,
> and only use primitives if some specific metadata is on the
> initializer?
>
> So in that world:
>
> (loop [i 5] ; Boxing is always done by the loop statement, even if
> initialized with a primitive
>
> (loop [procs (.availableProcessors Runtime/getRuntime)] ; Boxing is
> done by the loop statement
>
> (loop [^:primitive procs (.availableProcessors Runtime/getRuntime)] ;
> Primitive is used, good because we are expecting it
>
>
> I'm not sure which order the metadata should go in such a scheme.
>
> (loop [^:primitive i 5]
> (loop [i ^:primitive 5]
> (loop [i (prim 5)]
Yes, it's easy to imagine a world where people who want efficient code
have to jump through hoops to get it. OTOH, you can just say (num some-
expr) to force it to be boxed, if you want assurance of an Object
initializer. Which will be the more common need?
I have to say I'm in the 'pay for what you use' camp - you need a box,
you ask for one. If I don't (and neither do any of those loops), why
should I have to do extra work to avoid it?
Rich
> An idea to consider:
>
> How about keeping the arbitrary-precision default, but add a loop'
> construct to the family of +',-',*',inc', and dec' for primitive
> optimization? The loop' construct would bind primitive literals,
> whereas the loop construct would keep the literals boxed, so people
> who don't want to analyze types rigorously in their loop/recur
> constructs won't get mismatch errors.
>
> I see that using primitive literals by default in loop/recur offers
> speed benefits for the common case (recur type matches the
> loop-declared type), but I worry about how hard it might be to analyze
> and make sure that I'm always recurring with a type that exactly
> matches what is in the loop.
>
Did you try it? The compiler does that analysis and reports any
mismatch:
user=> (defn foo [] (loop [x 42] (recur 4.2)))
NO_SOURCE_FILE:2 recur arg for primitive local: x is not matching
primitive, had: double, needed: long
user=> (defn foo [] (loop [x 42] (recur false)))
NO_SOURCE_FILE:5 recur arg for primitive local: x is not matching
primitive, had: java.lang.Boolean, needed: long
user=> (defn foo [] (loop [x 4.2] (recur (meta #'first))))
NO_SOURCE_FILE:15 recur arg for primitive local: x is not matching
primitive, had: java.lang.Object, needed: double
> Any other ideas on how to solve the problem of mismatch errors for
> "casual coders"?
>
Those could become hard errors again. But there are legitimate correct
cases that would then need annotation:
(defn foo [] (loop [x 42] (recur (:y z))))
Rich
> So far most of the action has concerned arithmetic ops (+, -, *, /).
> Will these new semantics include the bit-shift operators? I vote yes.
> My use cases for bit ops would benefit from primitive ops.
>
> On a related note, my use cases call for silent overflow of bit shifts
> (pseudo random number generators). Will there be a way to disable
> overflow exceptions (either via a binding or through unchecked-*
> functions)?
>
Yes, bit-ops will be made to align with whatever is done here.
Rich
I think what troubles me the most about this loop/recur issue is that
the semantics of recur depend completely on whether a variable is
assigned a primitive or boxed numeric type. To know what the recur
will do, you must know the type of the assigned value. This has
always been the case, but in the current version of Clojure, provided
you aren't using Java interop, it's relatively straightforward to
know. Unless the variable or literal is explicitly cast to a
primitive, it's not a primitive. But now, with the advent of static
functions which can return primitives, it becomes more likely to not
easily be able to determine this fact. If I say (loop [x (foo 2)]...)
I have no way of knowing what's going to happen when I recur. Yes, I
realize that you can run the function and use warnings/errors to find
out. But I find it unsettling to have a commonly used code form with
semantics I can't predict just by looking at it.
It's attractive that the current proposal gives speed benefits to
common cases with no additional annotation. But I personally place a
higher value on being able to understand the semantics of my code when
reading through it. I would prefer a version where loop requires some
sort of explicit annotation on a variable if you want it to require a
primitive upon recur. (Ideally, I'd probably want the annotation on
the variable using a syntax that mirrors static functions, rather than
the old technique of typecasting the initializer, since the whole
typecasting system makes less sense now that literals and certain
return values are already primitives. Unifying the syntax and
semantics of using primitives in loops with the syntax and semantics
of using primitives as inputs to static functions makes a lot of sense
to me.)
Mark Engelberg wrote:
> Thanks for the responses.
>
> Going back to the naive factorial function:
> (defn fact [n]
> (if (zero? n) 1 (* n (fact (dec n)))))
>
> Right now,
> user=> (fact 40)
> 815915283247897734345611269596115894272000000000
>
> Under the proposed changes,
> user=> (fact 40)
> java.lang.ArithmeticException: integer overflow
>
> So the same code now risks an arithmetic exception. What have I, the
> programmer, gained from this new level of risk? The answer, if I'm
> the kind of programmer who has no interest in putting in type
> annotations into the header of the function, is nothing. There is no
> speed improvement without the additional type annotations. So all
> I've gained is the /potential/ for a speed improvement.
>
> I assume that most Clojure users really like its dynamic nature. If
> this is true, then for most of us, the common case is to NOT annotate
> our code with types. Certainly I like the idea of making it as easy
> as possible to write fast code in Clojure. I want my dynamically
> typed code to be as fast as it can be, and it's nice to know that
> there is a way to write statically typed code that is even faster.
> But I really don't want to incur a penalty (i.e., possibility of
> Arithmetic Exception) when I'm not using type annotations.
>
> I agree that the burden on those who want to write type annotated code
> is too high. I agree that this should be made easier. But
> ultimately, I still want the burden to be on them (or me, the 1% time
> I need it), not those of us who prefer unannotated code. I that some
> of the changes you have proposed for moving the annotations into the
> header, along with literal notation for long (i.e., 0L rather than
> (long 0)) would be preferable to the bigint dichotomy.
>
> I agree that an improved bigint type would address the performance
> aspect of my argument. On the other hand, it wouldn't really address
> the problem of using these things in sets and hash tables. I think
> it's very important that Clojure have one unique canonical
> representation for numbers arrived at through Clojure-based
> computations. It is very difficult to use numbers sanely if that's
> not the case.
>
> I see your point that there are a couple of sides to this argument. I
> hope I can persuade some more people to speak up for my side :)
>
>
> Which then begs the questions:
>
> - how will someone 'protect' themselves from libraries written using fastmath?
Using fastmath or not would be part of the library specification. Or, in general, of the documentation for each individual function. It may not matter at all for the user of some functions, but be very important for others.
> - similar looking code will change semantics when the ns switch is made - seems dangerous as it might violate the presumptions of the original authors.
That is true, but it is true for any strategy to introduce new maths semantics into Clojure. As soon as there are two, there is a risk for confusion. And even if there is only one but different from the one in the past, there is a risk for confusion as well.
Konrad.
> In this situation, I would rather have some kind of toggle *use-bigints-by-default* (defaulted to false) used by the compiler.
> Then, if someone needs big integers, he can switch the toggle and recompile everything.
> You wouldn't have to change the libraries ns.
I don't think it's a good idea to have a compiler switch change the semantics of number handling.
> As for being predictable, if you have numbers that can be bigger than 10^19, you would probably know it in advance.
> (indices, counters and the like don't grow that big)
True, but that's on a variable-by-variable basis.
Konrad.
I have to say I'm in the 'pay for what you use' camp - you need a box, you ask for one. If I don't (and neither do any of those loops), why should I have to do extra work to avoid it?
Rich
>On the other hand, having boxed by default is a very significant slowdown
>(10% on the strange program I tried, that had already a lot of annotations,
>probably 20% or more on most programs), that can never be addressed : you
>can't write annotations on every single line of your program.
Were those real world programs, or arithmetic benchmarks? Most real world programs I see spend so little of their time doing arithmetic that making the math an order of magnitude slower wouldn't make a noticeable difference in overall runtime.
Which puts making numbers primitives by default squarely in the realm of premature optimization.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>I tried it on my program. Very few arithmetic, most of it with annotation.
>10%
Can we see the source?
- 42
I totally and wholeheartedly disagree, as long as there is a chance something goes wrong, and here is a lot of it, the default can not be we force people to know more - for example that they explicitly need to box things. And it is not even about longs and BigInts, doubles and Floats or what the java version means also come into play, how often do you mixed calculations because initializing 1 is shorter and more used for you then 1.0? This will now break your loop, make it crumble into dust and give you odd, unexpected and wrong results.
Lets get away from the argument already cast and bring a new one. It is an impossible situation that the literal 1 in one place has a different meaning and behavior then in another place. This is in fact a show stopper, I'm serious, this would be worst then java.
(* 1 0.2) -> works fine
but
(loop [n 1 r 1](if (zero? n) r (recur (dec n) (* r 0.2))))
does not because the 1 there is not the 1 in the upper example, that is so utterly wrong and horrible please think about it and think about how you'd sell someone he has to know where a literal means what, I can guarantee you this will keep many people from this wonderful language :(.
The default case for a language has to be the most intuitive version otherwise you end up with a mess of complex rules and special cases. It was said before, this is premature optimization in it's worst since it is forced on the user if they want or not. The priority has to be 'it works' later if it matters we can worry about 'it is fast' but this isn't working any more.
Regards,
Heinz
The issue of which behaviour should be the default aside, I think that
I'd expect the *, + etc. ops to go with unannotated literals and the
primed variants to go with explicit hinting, whereas now this seems
not to be the case:
(defn fact [n]
(loop [n (num n) r (num 1)]
(if (zero? n)
r
(recur (dec n) (* r n)))))
This works for (fact 40), whereas the version with dec' and *' (and
the (num ...) still in place) doesn't; I'd expect this to be reversed.
In general, I'd expect the primed ops to be there for those who know
what they're doing; same goes for explicit hinting.
On 19 June 2010 04:12, Rich Hickey <richh...@gmail.com> wrote:
> I have to say I'm in the 'pay for what you use' camp - you need a box, you
> ask for one. If I don't (and neither do any of those loops), why should I
> have to do extra work to avoid it?
The question is which desirable commodity Clojure programmers can
expect to get for free and which is to be obtainable through voluntary
banter: the sturdy boxes or the Potion of Speed +5. I'm used to paying
for the PoS, but a sudden rise in the price of boxes seems unwarranted
to me, especially if it only serves to lower the price of the PoS from
a (prim ...) hint where the local is introduced to zero. I mean, this
is no longer about "the free lunch", it's more about "the free
dessert" which the power number crunchers surely do not need.
I have a hunch that if (num ...) stays in and primitive is the
default, people (I mean just about *everybody*) will only box things
in the face of the first crash or *maybe* after developing a great
intuitive feel for when that might be required. Of course there's
going to be a period of frantic learning about numeric representations
on the JVM etc. in between the crash and the boxing. And, um, whatever
for...?
Interestingly, the introductory materials on Clojure I've read so far
tout the "don't worry about it" quality of Clojure's numerics as a
benefit of the language. (The "don't worry about it" phrasing comes
from "Programming Clojure".) I just mention this because I tend to
think it reasonable to present that as a benefit. The idea of making a
U-turn on this is surprising to me... Which is partly to say that I'm
still evaluating it, though so far my intuition stays pretty firmly on
the "don't worry about it"-is-good side.
TLDR summary: I'm very excited about the latest equal; I'd definitely
want the primed arithmetic ops to go with the hinted locals, whereas
non-primed (basic) variants would do the right thing for the simplest,
unhinted code (so which ops should get the primes in my eyes depends
on whether boxing of locals is the default); my heart is still on the
side of free sturdy boxes while I'm expecting to pay for the Potion of
Speed.
Sincerely,
Michał
Personally, I have no real interest in bigints, but I'm glad that they
are there, and that arithmetic code supports them polymorphically.
I'm not sure what I think regarding non-promoting numeric operators.
They are ok, but they seem to add complexity to the language, and they
don't look very newbie friendly. I think if we had them, promoting ops
should be the primed ones, and they would mostly be used in library
functions where performance is important.
> but
> (loop [n 1 r 1](if (zero? n) r (recur (dec n) (* r 0.2))))
> does not [work]
This seems to be particularly nasty. Clojure is statically typing r on
the assumption that just because the init value of r fits a primitive
int, that it can never be anything else. Promoting operators do no
good, because the variable slot for the loop var has already been
assumed to be a primitive, so it isn't possible to promote it.
Is there any possibility that loop variables could be assumed to be
boxed numerics, unless type hinted to something else?
So:
(loop [^int n 1 r 1](if (zero? n) r (recur (dec n) (* r 0.2))))
Numeric literals would still produce primitives where possible, but
would get auto-boxed in loop initialisers.
I think this option wouldn't have any nasty surprises.
--
Dave
(loop [^int n 1 r 1](if (zero? n) r (recur (dec n) (* r 0.2))))
Numeric literals would still produce primitives where possible, but
would get auto-boxed in loop initialisers.
I think this option wouldn't have any nasty surprises.