[ANN] Clojure 1.8.0-RC4 is now available

957 views
Skip to first unread message

Alex Miller

unread,
Dec 16, 2015, 4:45:21 PM12/16/15
to Clojure
Clojure 1.8.0-RC4 is now available. This build is a "release candidate"! We would appreciate any and all testing you can do on your own libraries or internal projects to find problems. 

Of particular note, CLJ-1861 removes the interning of unused vars. This change reduces compiled class size (clojure jar is 8% smaller), which reduces classloading time, and thus improves startup time (in some tests ~10-15% faster). Code that uses direct linking (like Clojure core) sees the greatest benefits as most direct-linked vars are unused, but you may also see some benefits with code that is not direct linked as well.  Feedback on startup time or other impacts in actual projects (with or without direct linking) is appreciated.

Try it via
Below are the changes since 1.8.0-RC3. See the full 1.8 change log here: https://github.com/clojure/clojure/blob/master/changes.md.
  • CLJ-1861 - Remove unused var interning
  • CLJ-1161 - Clojure -sources.jar includes a bad properties file in release builds
  • Commit ae7ac - Unrolls the remainder of the Tuple changes from earlier in the release cycle, most significantly rolling back the addition of IMapEntry to APersistentVector

Mikera

unread,
Dec 16, 2015, 8:34:50 PM12/16/15
to Clojure
Thanks Alex, working well for me and startup times certainly seem a bit snappier.

I also agree that APersistentVector should not implement IMapEntry..... always seemed like a bad idea to so glad to see it rolled back.

What's the plan with Tuples more broadly? I worked on this many months ago along with Zach T and a couple of others, and we demonstrated some promising performance improvements (see e.g. https://gist.github.com/ztellman/3701d965228fb9eda084). I remain convinced we are still leaving some fairly big wins on the table here.

Alex Miller

unread,
Dec 16, 2015, 11:34:58 PM12/16/15
to clo...@googlegroups.com
On Wed, Dec 16, 2015 at 7:34 PM, Mikera <mike.r.an...@gmail.com> wrote:

What's the plan with Tuples more broadly?

Don't know.

Herwig Hochleitner

unread,
Dec 17, 2015, 9:59:37 AM12/17/15
to clo...@googlegroups.com

Am 17.12.2015 02:35 schrieb "Mikera" <mike.r.an...@gmail.com>:

> What's the plan with Tuples more broadly?

Speaking as a kibitzer to the process: Suppose somebody was to carry this along, I'd like to see these points addressed:

IIRC, the breaking factor to the proposal were slow-downs in real-world programs, likely due to pollution of jvm's polymorphic inline caches. It seems necessary to have a benchmark, exercising the data-structure part of clojure.core with real-world degrees of polymorphism, replicating the slow-downs, Rich saw for the proposal. When we have such a realistic basis, to which we can amend expected best- and worst-cases, it's much easier to have a conversion about expected benefits and draw-backs, performance wise.

The second thing, bothering me about the proposal: To me (as a non-authority on the matter), checking in generated files is borderline unacceptable. I'd much rather see such classes generated as part of the build process, e.g. by:
- using ant or maven plugins to generate java source, or,
- using macros to generate byte code as part of AOT compilation

====

So, while the second point certainly would make a proposal more appealing, the first one is mandatory due diligence. I'm really glad, that cognitect acted as a gate-keeper there and saved us from microbenchmark-hell.

I'd really love to write some more, about my ideas and alternatives to generating tuple arities 1-8, but I also think we ought to have that benchmark before discussing this point any further.

kind regards

Mikera

unread,
Dec 18, 2015, 7:02:13 AM12/18/15
to Clojure
On Thursday, 17 December 2015 14:59:37 UTC, Herwig Hochleitner wrote:

Am 17.12.2015 02:35 schrieb "Mikera" <mike.r.an...@gmail.com>:

> What's the plan with Tuples more broadly?

Speaking as a kibitzer to the process: Suppose somebody was to carry this along, I'd like to see these points addressed:

IIRC, the breaking factor to the proposal were slow-downs in real-world programs, likely due to pollution of jvm's polymorphic inline caches. It seems necessary to have a benchmark, exercising the data-structure part of clojure.core with real-world degrees of polymorphism, replicating the slow-downs, Rich saw for the proposal. When we have such a realistic basis, to which we can amend expected best- and worst-cases, it's much easier to have a conversion about expected benefits and draw-backs, performance wise.


I don't actually recall seeing any benchmarks showing slow-downs in real-world programs. Rich made an apparently unsubstantiated assertion that these exist but didn't provide his analysis (see CLJ-1517). 

On the other hand Zach ran some benchmarks on JSON decoding and found a roughly 2x speedup. That's a pretty big deal for code implementing JSON APIs (which is probably a reasonable example of real world, nested-data-structure heavy code).

Does anyone have any actual evidence of this supposed slowdown? i.e. is there a standard benchmark that is considered acceptable for general purpose / real world performance in Clojure applications? If so I'm happy to run it and figure out why any slowdown with Tuples is happening. My strong suspicion is that the following is true:
1) The Tuples generally provide a noticeable speedup (as demonstrated by the various micro-benchmarks)
2) There are a few hotspots where Tuples *don't* make sense because of PIC pressure / megamorphic call sites (repeated conj on vectors might be an example....). These cases can revealed by more macro-level benchmarking.
3) We should be able to identify these cases of 2) and revert to generating regular PersistentVectors (or switching to Transients....). In that case the Tuple patches may develop from being a debatable patch with some problematic trade-offs to a pretty clear all-round improvement (in both micro and macro benchmarks).

The key point regarding 3): code that is performance sensitive (certainly in core, maybe in some libs) should consider whether a Tuple is a good idea or not (for any given call-site). These may need addressing individually, but this is incremental to the inclusion of Tuples themselves. The performance comparison isn't as simple as "current vs. tuples patch", it should be "current vs. tuples patch + related downstream optimisation" because that is what you are going to see in the released version.

Also it should be remembered that JVMs are getting smarter (escape analysis allowing allocation of small objects on the stack etc.) and the Clojure compiler is also getting smarter (direct linking etc.). Tuples could potentially give further upside in these cases, so there is a broader context to be considered. My view is that the balance will shift more in favour of Tuples over time as the respective runtime components get smarter at taking advantage of type specialisation (happy to hear other views, of course).

 

The second thing, bothering me about the proposal: To me (as a non-authority on the matter), checking in generated files is borderline unacceptable. I'd much rather see such classes generated as part of the build process, e.g. by:
- using ant or maven plugins to generate java source, or,
- using macros to generate byte code as part of AOT compilation


I agree checking in generated files is a bad idea, that was why I actually created hand-coded variants of Zach's original Tuple code as part of CLJ-1517. My reasoning for this was as follows:
1) You do in fact want some hand-coded differences, e.g. making the 2-Tuple work as a MapEntry, having a single immutable instance of Tuple0 etc.). It is annoying to handle these special cases in a code generator
2) Class generation at compile time is fiddly and would complicate the build / development process (definitely not a good thing!)
3) It is simpler to maintain a small, fixed number of concrete Java source files than it is to maintain a code-generator for the same (which may be less lines of code, but has much higher conceptual overhead)
 

====

So, while the second point certainly would make a proposal more appealing, the first one is mandatory due diligence. I'm really glad, that cognitect acted as a gate-keeper there and saved us from microbenchmark-hell. 


Really? I think this CLJ-1517 issue is an example of how *not* to do OSS development.
a) Substantial potential improvements (demonstrated with numerous benchmarks) sitting unresolved for well over a year with limited / very slow feedback
b) Motivated, skilled contributors initially being encouraged to work on this but find themselves getting ignored / annoyed with the process / confused by lack of communication (certainly myself and I suspect I also speak for Zach here) 
c) Rich commits his own patch, to the surprise of contributors. I provided some (admittedly imperfect, but hopefully directionally correct) evidence that Zach's approach is better. Rich's patch subsequently gets reverted, but we are just back to square one.
d) Lack of clarity on process / requirements for ultimately getting a patch accepted. What benchmark of "real world usage" is actually wanted? I've seen little / no communication on this despite multiple requests.

This is all meant as honest constructive criticism, I hope Cognitect can learn from it. If anyone from Cognitect wants more detailed feedback on how I think the process could be improved, happy to provide. To be clear I'm not angry about this, nor am I the kind of person to demand that my patches get accepted, I am just a little sad that my favourite language appears to be held back by the lack of a fully collaborative, open development process.

I also have a related philosophical point about the "burden of proof" for accepting patches that may cause regressions. For functional / API changes the right standard is "beyond reasonable doubt" because any regression is a breaking change to user code and therefore usually unacceptable. For performance-related patches the standard should be "on the balance of probabilities" because regressions in less common cases are acceptable providing the overall performance impact (for the average real world user) is expected to be positive.
 

I'd really love to write some more, about my ideas and alternatives to generating tuple arities 1-8, but I also think we ought to have that benchmark before discussing this point any further.

kind regards


Interested to hear your views Herwig  - it's always worth discussing ideas and alternatives, this can help inform the ultimate solution.  FWIW I think most of the wins for Tuples are for the very small arities (0-4), larger sizes than that are probably much more marginal in value.

I agree macro-level benchmarks would be great to inform the debate, but just to repeat my point d) above - different contributors asked multiple times what sort of real world benchmark would be considered informative but these requests seem to have been ignored so far. Would be great if the core team could provide some guidance here (Alex? Rich?)

Mikera

unread,
Dec 18, 2015, 7:08:39 AM12/18/15
to Clojure
I'm willing to take another shot at at a patch for this, as I believe there is a decent performance win still on the table. 

But I need some guidance first from the core team on:
a) What "real world" benchmark(s) would be required to demonstrate an overall improvement?
b) If I can demonstrate an overall improvement on said benchmark(s), will the patch be accepted (otherwise I'm wasting my time)?

Alex Miller

unread,
Dec 18, 2015, 9:39:36 AM12/18/15
to Clojure
I haven't talked to Rich about it recently, but I expect tuples will be reassessed at some point. I don't think more patches would be helpful at this time.

Herwig Hochleitner

unread,
Dec 18, 2015, 6:01:08 PM12/18/15
to clo...@googlegroups.com
Apologies for the incoming wall of text, as well as for co-opting the -RC4 thread

TLDR:
-RC4 LGTM, I enjoy the startup-speed boost
- we need a benchmark before further evaluating the work on tuples

2015-12-18 13:02 GMT+01:00 Mikera <mike.r.an...@gmail.com>:

I don't actually recall seeing any benchmarks showing slow-downs in real-world programs. Rich made an apparently unsubstantiated assertion that these exist but didn't provide his analysis (see CLJ-1517). 

I don't remember any benchmarks showing the slowdown either, but I'm taking Rich's word for it.

On the other hand Zach ran some benchmarks on JSON decoding and found a roughly 2x speedup. That's a pretty big deal for code implementing JSON APIs (which is probably a reasonable example of real world, nested-data-structure heavy code).

Well I'm also taking your word for that and for the speedups that you saw in other benchmarks. Whether it represents real-world usage, depends on the shape of your test data. If the test data is just (repeat [:test :vector]), then no it doen't represent real-world usage, because it would exercise just one arity.
Is this benchmark, you're speaking of, posted somewhere the wider community can review it? I'd like to play with it, see the speedup and try to break it by adding polymorphism. I'd also be happy to help developing the test cases.

I think it's good to have your tuple proposal around, so that we have something to benchmark stock clojure against, but, before making a serious push into core, we should have a test suite, that allows to run many different permutations of enabled test cases (exercising various arities), with clojure + various proposal patches, ideally on various jvms. Only then we can get serious about discussing performance trade-offs.

Does anyone have any actual evidence of this supposed slowdown? i.e. is there a standard benchmark that is considered acceptable for general purpose / real world performance in Clojure applications?

If there were, somebody would probably have pointed it out. Right now, I feel that any work is best spent on developing such a benchmark, to help the community evaluate the situation.
 
If so I'm happy to run it and figure out why any slowdown with Tuples is happening. My strong suspicion is that the following is true:
1) The Tuples generally provide a noticeable speedup (as demonstrated by the various micro-benchmarks)

(IMHO) Clojure has always been a big-picture language and reliable end-to-end performance in a multi-tenant setup is more important than looking good on alioth
 
2) There are a few hotspots where Tuples *don't* make sense because of PIC pressure / megamorphic call sites (repeated conj on vectors might be an example....). These cases can revealed by more macro-level benchmarking.

There are many possible caveats:
- is the morphism degree of a protocol-call local to the callsite or global to the protocol dispatch fn?
- does the gc take advantage of objects being uniformly sized and how much of that will we lose?
- how much do the hot-spots shift amongst various programs?

3) We should be able to identify these cases of 2) and revert to generating regular PersistentVectors (or switching to Transients....). In that case the Tuple patches may develop from being a debatable patch with some problematic trade-offs to a pretty clear all-round improvement (in both micro and macro benchmarks).

Well, before we have a comprehensive set of benchmarks, all we can really do, is throw code at a wall and see if it sticks.

The key point regarding 3): code that is performance sensitive (certainly in core, maybe in some libs) should consider whether a Tuple is a good idea or not (for any given call-site). These may need addressing individually, but this is incremental to the inclusion of Tuples themselves. The performance comparison isn't as simple as "current vs. tuples patch", it should be "current vs. tuples patch + related downstream optimisation" because that is what you are going to see in the released version.

To be really honest, this sounds a bit like: If only cognitect shoved tuples down the community's throat, people would start optimizing for them. Which is true. It's also probable, that, after the dust settling, we'd end up with somewhat better performance than we have now. We still shouldn't do it that way.

Why not start with a tuple library, that we can use if we want increased tuple performance? That certainly worked for cljx, even if people used to complain about it.
(ns my.lib/awesome-ns
  (:refer-clojure :exclude [into conj vector vec])
  (:require [mikera.awesome/vectors :refer [into conj vector vec]]))

You could even include a flag in your tuple library to revert to core functions, in order to benchmark against core without rewriting anything.

Also it should be remembered that JVMs are getting smarter (escape analysis allowing allocation of small objects on the stack etc.) and the Clojure compiler is also getting smarter (direct linking etc.). Tuples could potentially give further upside in these cases, so there is a broader context to be considered. My view is that the balance will shift more in favour of Tuples over time as the respective runtime components get smarter at taking advantage of type specialisation (happy to hear other views, of course).

Look at it this way: You're telling me, that the JVM's JIT will get smarter in the future. Ok. Right there are two things that I'd have to take somebody's word for. That criticism doesn't even touch on the hell of a lot of maybes, that you ask us to base decisions on.
It's just about the simple fact that the only way to falsify your hypothesis is to wait it out.

I agree checking in generated files is a bad idea, that was why I actually created hand-coded variants of Zach's original Tuple code as part of CLJ-1517. My reasoning for this was as follows:
1) You do in fact want some hand-coded differences, e.g. making the 2-Tuple work as a MapEntry, having a single immutable instance of Tuple0 etc.). It is annoying to handle these special cases in a code generator

OTOH, it's worth it, because the generator would make it easy to benchmark many different permutations. Maybe even generate per-application variants based on profiling.
 
2) Class generation at compile time is fiddly and would complicate the build / development process (definitely not a good thing!)

Commiting to a generated blob with the generator being lost to tribal knowledge is also not a good trade-off, complexity-wise.
 
3) It is simpler to maintain a small, fixed number of concrete Java source files than it is to maintain a code-generator for the same (which may be less lines of code, but has much higher conceptual overhead)

That would be true, if perfomance was a fixed point to optimize against. ALAS, as you argued yourself, performance is a moving target, with the constraints changing even from machine to machine.
That means that any, however small, set of classes dedicated to optimizing performance will also be a moving target. It certainly will be until we agree on a sweet spot in our test suite.

So, while the second point certainly would make a proposal more appealing, the first one is mandatory due diligence. I'm really glad, that cognitect acted as a gate-keeper there and saved us from microbenchmark-hell. 

Really?

Yes 
 
I think this CLJ-1517 issue is an example of how *not* to do OSS development.

Let me tell you: I'm also active in the NixOS community and while I love the community as well as the system (almost as much as clojure's :-), the thing that annoys me the most are collaborators just hitting that "Merge" button without proper evaluation, blindly relying on the CI Server. It works out, because it's still a functional system with deep immutability and I don't think there is much of an alternative with 1000s of upstream packages, but it still made me appreciate the zen-like pace of clojure, especially since the language is so extensible. Stuart Halloway's remarks on that topic, in the recent cognicast episode struck a chord with me there.

a) Substantial potential improvements (demonstrated with numerous benchmarks) sitting unresolved for well over a year with limited / very slow feedback

Well, the same is also true for some bugs and I agree that Cognitect still has bottle-neck problems, even though they got a lot better, since Alex Miller became the community spokesperson.
But Cognitect is only one half of the problem. The other is a community where fantastic large-scale efforts to advance the language, like dunaj, go largely undiscussed, while every second thread about a new uri-parsing library goes into a double digit post-count.
 
b) Motivated, skilled contributors initially being encouraged to work on this but find themselves getting ignored / annoyed with the process / confused by lack of communication (certainly myself and I suspect I also speak for Zach here)

I agree and I hope that Cognitect will find more ways to transfer Rich's sense of direction into the community.
 
c) Rich commits his own patch, to the surprise of contributors. I provided some (admittedly imperfect, but hopefully directionally correct) evidence that Zach's approach is better. Rich's patch subsequently gets reverted, but we are just back to square one.

Full ack, while Rich certanly has the privilege to commit however he likes, it would be good to see him use the bug tracker for his own patches, if only to appreciate how crappy the workflow, for creating a ticket and attaching a patch really is in jira ;-)
 
d) Lack of clarity on process / requirements for ultimately getting a patch accepted. What benchmark of "real world usage" is actually wanted? I've seen little / no communication on this despite multiple requests.

I suspect that my requirements for a benchmark (mainly extensibility and ease of testing many permutations), would happily coincide with advancing the state of data-structure based optimizations, thus coincide with said requirements. (Yes, Rich, I'm putting words in your mouth, if you don't like it, come over to discuss it ;-)

This is all meant as honest constructive criticism, I hope Cognitect can learn from it. If anyone from Cognitect wants more detailed feedback on how I think the process could be improved, happy to provide. To be clear I'm not angry about this, nor am I the kind of person to demand that my patches get accepted, I am just a little sad that my favourite language appears to be held back by the lack of a fully collaborative, open development process.

To be fair, such a process need not be provided by Rich, or even Cognitect. Why not create a community-supported upstream fork of clojure, where promising patches from jira (maybe even PRs) are collected and distributed as clojure.next, without any promise of eventual inclusion into clojure proper. A place where speculative work can prove out and mature a bit, before needing to deal with cognitect's processes. A bit like wine-staging does.

I also have a related philosophical point about the "burden of proof" for accepting patches that may cause regressions. For functional / API changes the right standard is "beyond reasonable doubt" because any regression is a breaking change to user code and therefore usually unacceptable. For performance-related patches the standard should be "on the balance of probabilities" because regressions in less common cases are acceptable providing the overall performance impact (for the average real world user) is expected to be positive.

I hear you, yet, I'd also weigh in with implementation complexity, and hacks that tend to stick once they're in, especially in the face of those "downstream optimizations", that you mentioned.
Let's not forget, that it's such a joy to work with the code base, exactly because Rich tends to hold back when faced with the option of commiting non-essential stuff.

Interested to hear your views Herwig  - it's always worth discussing ideas and alternatives, this can help inform the ultimate solution.  FWIW I think most of the wins for Tuples are for the very small arities (0-4), larger sizes than that are probably much more marginal in value.

Well, if you must know, among the permutations I'd try are:
- generating tuple sizes of powers - of - two 1, 2, 4, 8, 16
  - with a separate length field
  - or with the unused slots holding a sentinel object
- replacing array-map with tuple-map
- specializing IFn for being applied to tuples

Anyway, before having some kind of extensible, permutable benchmark, I'd rather not sink time into the implementation.

I agree macro-level benchmarks would be great to inform the debate, but just to repeat my point d) above - different contributors asked multiple times what sort of real world benchmark would be considered informative but these requests seem to have been ignored so far. Would be great if the core team could provide some guidance here (Alex? Rich?)

I'm not core, but if you accept (or criticize) my guidance for a good real-world benchmark, we could already be two vocal contributors agreeing on a benchmark, they can't ignore us forever ;-) It should

- be reviewable
- be extensible
- solicit representative cases from all corners of the community
- support basic combinatorics on
  - the tested clojure versions (i.e. various patchsets)
  - the set of routines within a single run (to monitor effect of patches with various PIC loadouts)
  - used jvms
- generate machine-readable reports

James Elliott

unread,
Dec 20, 2015, 11:07:33 PM12/20/15
to Clojure
This release candidate appears to be working fine for me, as have the previous ones.

Robin Heggelund Hansen

unread,
Dec 21, 2015, 12:47:39 AM12/21/15
to Clojure
Been running with this in production for two days now. Working fine.
Reply all
Reply to author
Forward
0 new messages