Came across this today: http://sebastiansylvan.com/2015/04/13/why-most-high-level-languages-are-slow/Nothing that hasn't been stated before, but what's entertaining is actually the ensuing reddit discussion: http://www.reddit.com/r/programming/comments/32f4as/why_most_high_level_languages_are_slow/. I must say it's somewhat annoying to hear people "talk up" the JVM as if it's some magic pixie dust, and never mentioning practical/existing limitations.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I think high level languages do not need to be slow. I don't think well written code in Java or C# is what you would call slow at all. I see so much C and C++ that is very slow because of bad design and poor data structures. With good GC, object graph layout semantics, and value objects then Java could be really fast and usable. The big issue I see way more often is library and API design. We need to be thinking data friendly as first class in design.
Oh, the other current limitation that irks me is that JIT doesn't trust final fields and so doesn't treat them as constants (prime example where this stinks is Enum.ordinal usage). I know the rationale behind it, but could've been done differently.
sent from my phone
Even if you manage to avoid GC you're still paying some price to support it (e.g. write barrier/card marking, some bits in the mark word of an object header).
sent from my phone
Oh, the other current limitation that irks me is that JIT doesn't trust final fields and so doesn't treat them as constants (prime example where this stinks is Enum.ordinal usage). I know the rationale behind it, but could've been done differently.
--
--
Yeah the whole "A sufficiently advanced VM could do this" is such a cop out. You cannot even keep a few gigs of objects cached in memory without massive pauses . Most high performance Java code I see rarely allocates objects. Otherwise they resort to flyweights over ByteBuffers or Unsafe (not without its own performance caveats). Even Gil who keeps talking about how cheap allocations are, has zero allocations in HdrHistogram. I totally understand the good enough performance with short lead times promise of Java, but the heroic efforts made by people to make it do ungodly things just seems like a wrong choice of technology.I think a lot of people have been burnt by the poor debugging and compiler infrastructure for C/C++. I am also really excited about Rust because it has started from scratch. It is not handicapped by the header file expansion issues of C, nor the 100 pages template errors from C++. Given the compiler prevents aliasing it can actually generate better code than C++ compilers that always assume aliasing.
On Apr 15, 2015, at 1:01 AM, Rajiv Kurian <geet...@gmail.com> wrote:
> Yeah the whole "A sufficiently advanced VM could do this" is such a cop out. You cannot even keep a few gigs of objects cached in memory without massive pauses . Most high performance Java code I see rarely allocates objects. Otherwise they resort to flyweights over ByteBuffers or Unsafe (not without its own performance caveats). Even Gil who keeps talking about how cheap allocations are, has zero allocations in HdrHistogram.
This is because quite frankly, Gil is right about many many things.. many things but in this case he’s wrong. They are not expensive but they are also not cheap. If you take in the life-cycle costs then that can get downright expensive.
+1
There's probably no debate that GC allocation throughput (especially bump the pointer in a thread local buffer) will beat native allocators. But, the people that make this statement as an argument for GC fail to realize that even semi performance conscious native code doesn't allocate at the same rate as typical java code. The real issue is that java is a very heap heavy language - there's very little useful work that can be done without causing allocations (assuming one doesn't jump out of their way and contort the code). The one glaring difference is native code uses the stack quite a bit for temp allocations.
The other thing is that typical C++ code isn't written in the same manner as java. In particular, templates allow for compile time polymorphism whereas java relies on the JIT and profile/classload info to devirtualize, which is much more brittle and subject to subtle performance regressing changes (e.g. going from one subclass of an abstract class loaded to two).
The "frustrating" thing is that java IS pretty fast and it does have some advantages over AOT native compilers (even ones being compiled with PGO), and yet it's not quite where it could be. For example, having a tiered compilation system is great in that it doesn't put unnecessary pressure on the highest tier compiler to finish compilation quickly. But then there are implementation artifacts that reduce some of that potential. One example in mind is the whole "profile pollution" problem.
The other "dropped the ball" aspect is generics - the incessant type checks in the generated code is just silly. I'll leave it at that as I'm sure we're all well aware of this issue, and hopefully Valhalla fixes it (and value types).
As for Rust, I also quite like that performance is a paramount requirement for its engineers/community. They also have a performance sensitive complex "dogfooding" project in Servo, which should help to keep the performance train on track.
sent from my phone
+1
There's probably no debate that GC allocation throughput (especially bump the pointer in a thread local buffer) will beat native allocators. But, the people that make this statement as an argument for GC fail to realize that even semi performance conscious native code doesn't allocate at the same rate as typical java code. The real issue is that java is a very heap heavy language - there's very little useful work that can be done without causing allocations (assuming one doesn't jump out of their way and contort the code). The one glaring difference is native code uses the stack quite a bit for temp allocations.
The other thing is that typical C++ code isn't written in the same manner as java. In particular, templates allow for compile time polymorphism whereas java relies on the JIT and profile/classload info to devirtualize, which is much more brittle and subject to subtle performance regressing changes (e.g. going from one subclass of an abstract class loaded to two).
The "frustrating" thing is that java IS pretty fast and it does have some advantages over AOT native compilers (even ones being compiled with PGO), and yet it's not quite where it could be. For example, having a tiered compilation system is great in that it doesn't put unnecessary pressure on the highest tier compiler to finish compilation quickly. But then there are implementation artifacts that reduce some of that potential. One example in mind is the whole "profile pollution" problem.
The other "dropped the ball" aspect is generics - the incessant type checks in the generated code is just silly. I'll leave it at that as I'm sure we're all well aware of this issue, and hopefully Valhalla fixes it (and value types).
Modern intel chips will fuse cmp+jmp into 1 uop, if I recall correctly. But yes, irrespective of fusion, it's a waste in the instruction stream and uses up a BTB entry. This is particularly annoying when the actual code using the object is cheaper than the typecheck.
The other issue is that if you end up reading data from the loaded object that's a cacheline away from the header, the type check will either cause an unnecessary cache miss or keep a cacheline resident that may not otherwise be needed.
sent from my phone
--
On 15 April 2015 at 14:00, Vitaly Davidovich <vit...@gmail.com> wrote:
+1
There's probably no debate that GC allocation throughput (especially bump the pointer in a thread local buffer) will beat native allocators. But, the people that make this statement as an argument for GC fail to realize that even semi performance conscious native code doesn't allocate at the same rate as typical java code. The real issue is that java is a very heap heavy language - there's very little useful work that can be done without causing allocations (assuming one doesn't jump out of their way and contort the code). The one glaring difference is native code uses the stack quite a bit for temp allocations.
If found that I often end up allocating or contorting my own code in Java due to the lack of value types so that I can return a multi value structure on the stack. We can hopefully get this addressed for a big bump for a future release. I've spent some time over the last few years tracking when I came perf issues due to restrictions in Java vs C. Value types is right up there next to the ability to co-locate a small object in a larger one as an aggregate and having arrays of objects that are not via reference array. Most other issues I tracked in real world apps are simple API issues like with ByteBuffer and String missing key methods.
The other thing is that typical C++ code isn't written in the same manner as java. In particular, templates allow for compile time polymorphism whereas java relies on the JIT and profile/classload info to devirtualize, which is much more brittle and subject to subtle performance regressing changes (e.g. going from one subclass of an abstract class loaded to two).
The "frustrating" thing is that java IS pretty fast and it does have some advantages over AOT native compilers (even ones being compiled with PGO), and yet it's not quite where it could be. For example, having a tiered compilation system is great in that it doesn't put unnecessary pressure on the highest tier compiler to finish compilation quickly. But then there are implementation artifacts that reduce some of that potential. One example in mind is the whole "profile pollution" problem.
The other "dropped the ball" aspect is generics - the incessant type checks in the generated code is just silly. I'll leave it at that as I'm sure we're all well aware of this issue, and hopefully Valhalla fixes it (and value types).
The type checks are an interesting problem. They are the wrong way round in some ways. For example, we don't check when putting something into a HashMap due to erasure but we do when getting it out via a check cast. Maps are often read much more than written so the cost feels wrongly apportioned.
Even without generics, there is a little trivia around java.util.HashMap, the boot process of the OpenJDK VM loads at least one LinkedHashMap (that inherits from HashMap) which defeat CHA so each time you do a get/put on a HashMap, the VM as first to check at runtime if the HashMap is a HashMap or a LinkedHashMap.
Oracle will never make such a change as it breaks backcompat. And really, they'd have to make HashMap final or else someone else (JDK or another lib) can load their own subclass of HashMap and that will deopt your uses and recompile with the type check.
I suspect though that if final fields were trusted, enough occurrences of this would be squashed to make the remainder irrelevant.
sent from my phone
--
The reason most high level languages are slow is usually because of two reasons:
- They don’t play well with the cache.
- They have to do expensive garbage collections.
But really, both of these boil down to a single reason: the language heavily encourages too many allocations.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/n3sApDaBuyY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
On top of that WPO will unroll loops, monomorphize functions, inline dependencies among many other things, all very good for cache coherence
Actually it's quite easy to write programs that operate in constant space, so I don't see where the problem is.
Unless you are doing hard-realtime, GC isn't your biggest trouble. Actually it's quite easy to write programs that operate in constant space, so I don't see where the problem is.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
I agree with a lot of what you say, but they're somewhat general and subjective statements such as "find balance between performance and maintenance". Certainly if 100ms+ GC pauses don't breach SLAs or lose you money (likewise, if 100+ns cache miss penalties don't add up to a number that hurts), the conversation shifts purely to "what language/platform with good enough baseline perf suits my non-perf requirements"; for some, it's java/Scala/clojure/another JVM language, others it's Go, someone else it's Haskell/Ocaml, etc.
The original blog though is talking about a performance pain in the domain where the above things hurt; you'll find similar thinking in any domain where request/frame response time budgets are in the millis/micros. Simply different things.
As for Rust, I have high hopes for it. It seems to have a well thought out design, performance is a paramount requirement (not a nice to have), and is being shaped by a perf sensitive project (servo). There's going to be a learning curve for people as it's got some novel techniques (borrow checker, lifetimes, etc) and depending on which background people are coming from, ramp up time will vary. There's nothing wrong with that. The zero cost I was referring to was runtime cost, not anything else.
sent from my phone
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
There's also the common saying/belief that simple languages move complexity into the application code. Java being simple has its upsides, but it's definitely true that this simplicity makes some code/design more complex and bug prone (e.g. overuse of primitives to avoid heap objects, flyweight patterns over plain bytes, GC tuning wizardry - people make a business out of consulting in this space!, custom collections implemented million times, off heap storage, various hacks for erasure, Unsafe usage, cacheline padding via inheritance hacks, custom layout proposals such as ObjectLayout, nevermind the complexity of the JVM/JIT).
sent from my phone
Well, shared data-structures don't necessarily create a lot of coherence traffic, as they can exploit hierarchies that perfectly cancel out cache/memory/NUMA/distributed hierarchies; I did the math. It gets better if the DB runs in-process and actually assists in scheduling the logic (move code to data etc).
On the CPU side we have of course multi-core and larger L3 caches.
On the CPU side we have of course multi-core and larger L3 caches.Does anyone has a nice presentation on how the size of L3 caches have evolved and where they are headed? My impression is that we are nowhere near having an exponential growth, especially on a per core basis.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.