Can't speak for Cliff Click, but he mentioned in one of his presentations that the JIT (hotspot at least) is equivalent to GCC -O2 in terms of optimizations; perhaps that's the motivation there.
I've also encountered C++ devs who were under the impression that O3 is for experimental/unstable optimizations, and wouldn't enable it. That's not the case though so IMHO O3 is the right comparison for peak performance (e.g O3 is where a lot of the aggressive vectorization takes place).
sent from my phone
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
In my experience, for a lot of applications, -O2 beats out O3 significantly in terms of code size. In the non-hot parts of the code, this is more important than fast straight line performance, since it avoids instruction cache misses, etc.
At least in the project I'm currently working on, building all my third party dependencies with O2 resulted in a couple percent speedup in the resulting binary. Selectively enabling O3 on the hot code paths on the other hand does make sense to get vectorization, etc.
Todd
It's true that O3 will almost always increase code size primarily due to more aggressive unrolling and other loop transforms (and subsequent vectorization). For microbenchmarks this is almost always a win since the code is hot by definition and there's not a lot of it; in a "real" app it could cause issues, very true. My default is O3 and selectively decreasing when warranted.
The ideal way to avoid instruction bloat due to expansion of non-hot paths is to either use PGO or at least manually mark unlikely paths (when known a priori). The benchmarks benefiting from vectorization will be interesting to re-check with java 9 given a few Intel superword enhancements.
sent from my phone
It's true that O3 will almost always increase code size primarily due to more aggressive unrolling and other loop transforms (and subsequent vectorization). For microbenchmarks this is almost always a win since the code is hot by definition and there's not a lot of it; in a "real" app it could cause issues, very true. My default is O3 and selectively decreasing when warranted.
The ideal way to avoid instruction bloat due to expansion of non-hot paths is to either use PGO or at least manually mark unlikely paths (when known a priori). The benchmarks benefiting from vectorization will be interesting to re-check with java 9 given a few Intel superword enhancements.
sent from my phone
On Jul 5, 2015 1:11 PM, "Todd Lipcon" <to...@lipcon.org> wrote:
In my experience, for a lot of applications, -O2 beats out O3 significantly in terms of code size. In the non-hot parts of the code, this is more important than fast straight line performance, since it avoids instruction cache misses, etc.
At least in the project I'm currently working on, building all my third party dependencies with O2 resulted in a couple percent speedup in the resulting binary. Selectively enabling O3 on the hot code paths on the other hand does make sense to get vectorization, etc.
Todd
On Jul 5, 2015 9:41 AM, "Vitaly Davidovich" <vit...@gmail.com> wrote:
Can't speak for Cliff Click, but he mentioned in one of his presentations that the JIT (hotspot at least) is equivalent to GCC -O2 in terms of optimizations; perhaps that's the motivation there.
I've also encountered C++ devs who were under the impression that O3 is for experimental/unstable optimizations, and wouldn't enable it. That's not the case though so IMHO O3 is the right comparison for peak performance (e.g O3 is where a lot of the aggressive vectorization takes place).
sent from my phone
On Jul 5, 2015 2:14 AM, <rick.ow...@gmail.com> wrote:
--So I am writing Java and C++ equivalent programs to compare both languages for speed.In his excellent article, Cliff Click did the same, but he used -O2 to compile instead of -O3.On my various benchmarks, Java loses for C++ code compiled with -O3 but wins for C++ code compiled with -O2.Which one should I use to reach a reasonable conclusion? It looks like Cliff Click chose -O2. Does anyone know why?
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Hand selecting optimizations per method (I'm assuming you're referring to XX:CompileCommand) is very cumbersome in hotspot and I've yet to see this widely used; I've seen it used only to turn off compilation of a method entirely when a JIT bug is suspected.
sent from my phone
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Yeah, this also makes intuitive sense; most large workloads are dominated by branchy logic and/or cache misses. There may not even be much code to vectorize over. This is of course not true of certain types of libs, but generally holds IME. Inevitably, java vs c++ microbenchmarks tend to contain lots of loops and use arrays of primitives - lots of opportunity for vectorizing. IMHO, the difference maker (wrt java vs c++) for typical real world apps is better locality and lower abstraction costs in c++.
sent from my phone
Ditto here. -O3 is almost always a better outcome for micro benchmarks since the code size, number of branches etc doesn't saturate the limits of the machine one is running on. But many "real world" applications that I have worked on run better with a mix of -O3 and -Os/-O2.
On Sunday, July 5, 2015 at 10:36:01 AM UTC-7, Vitaly Davidovich wrote:
It's true that O3 will almost always increase code size primarily due to more aggressive unrolling and other loop transforms (and subsequent vectorization). For microbenchmarks this is almost always a win since the code is hot by definition and there's not a lot of it; in a "real" app it could cause issues, very true. My default is O3 and selectively decreasing when warranted.
The ideal way to avoid instruction bloat due to expansion of non-hot paths is to either use PGO or at least manually mark unlikely paths (when known a priori). The benchmarks benefiting from vectorization will be interesting to re-check with java 9 given a few Intel superword enhancements.
sent from my phone
On Jul 5, 2015 1:11 PM, "Todd Lipcon" <to...@lipcon.org> wrote:
In my experience, for a lot of applications, -O2 beats out O3 significantly in terms of code size. In the non-hot parts of the code, this is more important than fast straight line performance, since it avoids instruction cache misses, etc.
At least in the project I'm currently working on, building all my third party dependencies with O2 resulted in a couple percent speedup in the resulting binary. Selectively enabling O3 on the hot code paths on the other hand does make sense to get vectorization, etc.
Todd
On Jul 5, 2015 9:41 AM, "Vitaly Davidovich" <vit...@gmail.com> wrote:
Can't speak for Cliff Click, but he mentioned in one of his presentations that the JIT (hotspot at least) is equivalent to GCC -O2 in terms of optimizations; perhaps that's the motivation there.
I've also encountered C++ devs who were under the impression that O3 is for experimental/unstable optimizations, and wouldn't enable it. That's not the case though so IMHO O3 is the right comparison for peak performance (e.g O3 is where a lot of the aggressive vectorization takes place).
sent from my phone
On Jul 5, 2015 2:14 AM, <rick.ow...@gmail.com> wrote:
--So I am writing Java and C++ equivalent programs to compare both languages for speed.In his excellent article, Cliff Click did the same, but he used -O2 to compile instead of -O3.On my various benchmarks, Java loses for C++ code compiled with -O3 but wins for C++ code compiled with -O2.Which one should I use to reach a reasonable conclusion? It looks like Cliff Click chose -O2. Does anyone know why?
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Yeah, that's fairly new and maybe works for Oracle SQE :). JEP 165 may make this a bit more accessible for end users, but I don't think that's its stated goal.
sent from my phone
You can see what GCC selects for O3: https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/opts.c?view=markup#l522. Most of it is for vectorization (loops) and shouldn't generally have negative effect on other code shapes (may not improve it either, of course). Hotspot is actually pretty aggressive in inlining frequent code (bytecode size limit and number of nodes are large) and more often than not, inlining helps if it's for hot code path. The big advantage hotspot has over typical static compilation is profile info (the way it's collected can create perf problems sometimes), but if you at least highlight to static compilers which paths are uncommon (a good chunk of these are known a priori by developer), it won't bloat code unnecessarily. Best case is you use PGO and have a representative and consistent profile, but that's problematic in big apps.
As for FIX handling, how do you know your java impl is faster due to better icache utilization? I suspect it's really other reasons.
sent from my phone
You can see what GCC selects for O3: https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/opts.c?view=markup#l522. Most of it is for vectorization (loops) and shouldn't generally have negative effect on other code shapes (may not improve it either, of course). Hotspot is actually pretty aggressive in inlining frequent code (bytecode size limit and number of nodes are large) and more often than not, inlining helps if it's for hot code path. The big advantage hotspot has over typical static compilation is profile info (the way it's collected can create perf problems sometimes), but if you at least highlight to static compilers which paths are uncommon (a good chunk of these are known a priori by developer), it won't bloat code unnecessarily. Best case is you use PGO and have a representative and consistent profile, but that's problematic in big apps.
As for FIX handling, how do you know your java impl is faster due to better icache utilization? I suspect it's really other reasons.
sent from my phone