What's the plan with Tuples more broadly?
Am 17.12.2015 02:35 schrieb "Mikera" <mike.r.an...@gmail.com>:
> What's the plan with Tuples more broadly?
Speaking as a kibitzer to the process: Suppose somebody was to carry this along, I'd like to see these points addressed:
IIRC, the breaking factor to the proposal were slow-downs in real-world programs, likely due to pollution of jvm's polymorphic inline caches. It seems necessary to have a benchmark, exercising the data-structure part of clojure.core with real-world degrees of polymorphism, replicating the slow-downs, Rich saw for the proposal. When we have such a realistic basis, to which we can amend expected best- and worst-cases, it's much easier to have a conversion about expected benefits and draw-backs, performance wise.
The second thing, bothering me about the proposal: To me (as a non-authority on the matter), checking in generated files is borderline unacceptable. I'd much rather see such classes generated as part of the build process, e.g. by:
- using ant or maven plugins to generate java source, or,
- using macros to generate byte code as part of AOT compilation
====
So, while the second point certainly would make a proposal more appealing, the first one is mandatory due diligence. I'm really glad, that cognitect acted as a gate-keeper there and saved us from microbenchmark-hell.
I'd really love to write some more, about my ideas and alternatives to generating tuple arities 1-8, but I also think we ought to have that benchmark before discussing this point any further.
kind regards
Am 17.12.2015 02:35 schrieb "Mikera" <mike.r.an...@gmail.com>:
> What's the plan with Tuples more broadly?
Speaking as a kibitzer to the process: Suppose somebody was to carry this along, I'd like to see these points addressed:
IIRC, the breaking factor to the proposal were slow-downs in real-world programs, likely due to pollution of jvm's polymorphic inline caches. It seems necessary to have a benchmark, exercising the data-structure part of clojure.core with real-world degrees of polymorphism, replicating the slow-downs, Rich saw for the proposal. When we have such a realistic basis, to which we can amend expected best- and worst-cases, it's much easier to have a conversion about expected benefits and draw-backs, performance wise.
The second thing, bothering me about the proposal: To me (as a non-authority on the matter), checking in generated files is borderline unacceptable. I'd much rather see such classes generated as part of the build process, e.g. by:
- using ant or maven plugins to generate java source, or,
- using macros to generate byte code as part of AOT compilation
====
So, while the second point certainly would make a proposal more appealing, the first one is mandatory due diligence. I'm really glad, that cognitect acted as a gate-keeper there and saved us from microbenchmark-hell.
I'd really love to write some more, about my ideas and alternatives to generating tuple arities 1-8, but I also think we ought to have that benchmark before discussing this point any further.
kind regards
I don't actually recall seeing any benchmarks showing slow-downs in real-world programs. Rich made an apparently unsubstantiated assertion that these exist but didn't provide his analysis (see CLJ-1517).
On the other hand Zach ran some benchmarks on JSON decoding and found a roughly 2x speedup. That's a pretty big deal for code implementing JSON APIs (which is probably a reasonable example of real world, nested-data-structure heavy code).
Does anyone have any actual evidence of this supposed slowdown? i.e. is there a standard benchmark that is considered acceptable for general purpose / real world performance in Clojure applications?
If so I'm happy to run it and figure out why any slowdown with Tuples is happening. My strong suspicion is that the following is true:1) The Tuples generally provide a noticeable speedup (as demonstrated by the various micro-benchmarks)
2) There are a few hotspots where Tuples *don't* make sense because of PIC pressure / megamorphic call sites (repeated conj on vectors might be an example....). These cases can revealed by more macro-level benchmarking.
3) We should be able to identify these cases of 2) and revert to generating regular PersistentVectors (or switching to Transients....). In that case the Tuple patches may develop from being a debatable patch with some problematic trade-offs to a pretty clear all-round improvement (in both micro and macro benchmarks).
The key point regarding 3): code that is performance sensitive (certainly in core, maybe in some libs) should consider whether a Tuple is a good idea or not (for any given call-site). These may need addressing individually, but this is incremental to the inclusion of Tuples themselves. The performance comparison isn't as simple as "current vs. tuples patch", it should be "current vs. tuples patch + related downstream optimisation" because that is what you are going to see in the released version.
Also it should be remembered that JVMs are getting smarter (escape analysis allowing allocation of small objects on the stack etc.) and the Clojure compiler is also getting smarter (direct linking etc.). Tuples could potentially give further upside in these cases, so there is a broader context to be considered. My view is that the balance will shift more in favour of Tuples over time as the respective runtime components get smarter at taking advantage of type specialisation (happy to hear other views, of course).
I agree checking in generated files is a bad idea, that was why I actually created hand-coded variants of Zach's original Tuple code as part of CLJ-1517. My reasoning for this was as follows:1) You do in fact want some hand-coded differences, e.g. making the 2-Tuple work as a MapEntry, having a single immutable instance of Tuple0 etc.). It is annoying to handle these special cases in a code generator
2) Class generation at compile time is fiddly and would complicate the build / development process (definitely not a good thing!)
3) It is simpler to maintain a small, fixed number of concrete Java source files than it is to maintain a code-generator for the same (which may be less lines of code, but has much higher conceptual overhead)
So, while the second point certainly would make a proposal more appealing, the first one is mandatory due diligence. I'm really glad, that cognitect acted as a gate-keeper there and saved us from microbenchmark-hell.
Really?
I think this CLJ-1517 issue is an example of how *not* to do OSS development.
a) Substantial potential improvements (demonstrated with numerous benchmarks) sitting unresolved for well over a year with limited / very slow feedback
b) Motivated, skilled contributors initially being encouraged to work on this but find themselves getting ignored / annoyed with the process / confused by lack of communication (certainly myself and I suspect I also speak for Zach here)
c) Rich commits his own patch, to the surprise of contributors. I provided some (admittedly imperfect, but hopefully directionally correct) evidence that Zach's approach is better. Rich's patch subsequently gets reverted, but we are just back to square one.
d) Lack of clarity on process / requirements for ultimately getting a patch accepted. What benchmark of "real world usage" is actually wanted? I've seen little / no communication on this despite multiple requests.
This is all meant as honest constructive criticism, I hope Cognitect can learn from it. If anyone from Cognitect wants more detailed feedback on how I think the process could be improved, happy to provide. To be clear I'm not angry about this, nor am I the kind of person to demand that my patches get accepted, I am just a little sad that my favourite language appears to be held back by the lack of a fully collaborative, open development process.
I also have a related philosophical point about the "burden of proof" for accepting patches that may cause regressions. For functional / API changes the right standard is "beyond reasonable doubt" because any regression is a breaking change to user code and therefore usually unacceptable. For performance-related patches the standard should be "on the balance of probabilities" because regressions in less common cases are acceptable providing the overall performance impact (for the average real world user) is expected to be positive.
Interested to hear your views Herwig - it's always worth discussing ideas and alternatives, this can help inform the ultimate solution. FWIW I think most of the wins for Tuples are for the very small arities (0-4), larger sizes than that are probably much more marginal in value.
I agree macro-level benchmarks would be great to inform the debate, but just to repeat my point d) above - different contributors asked multiple times what sort of real world benchmark would be considered informative but these requests seem to have been ignored so far. Would be great if the core team could provide some guidance here (Alex? Rich?)