--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/frVwfX8g6Gw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
--You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/frVwfX8g6Gw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Other than that, you’re in the same pointer chasing world that *all* GC’ed managed memory systems suffer from. So unless you accidentally get predicable strides you’re not going to see the benefits of pre-fetching. I don’t think there is much you can do about this other than pack the data yourself into buffers.
What about moving the object into permgen by making it a field of a ClassLoader? (I assume there's only one instance, and it has a known size, so you could increase permgen space by that known size.)
-Heath Borders
Sent from my Nexus 5
Other than that, you’re in the same pointer chasing world that *all* GC’ed managed memory systems suffer from. So unless you accidentally get predicable strides you’re not going to see the benefits of pre-fetching. I don’t think there is much you can do about this other than pack the data yourself into buffers.
Other than that, you’re in the same pointer chasing world that *all* GC’ed managed memory systems suffer from. So unless you accidentally get predicable strides you’re not going to see the benefits of pre-fetching. I don’t think there is much you can do about this other than pack the data yourself into buffers.
Besides what Richard mentioned, not *all* GC'd languages suffer the same way; C# for example has had value types for 10+ years, allowing some level of abstraction without incurring indirection costs. Hopefully that will be alleviated in java with value types.
Not the intent of java value types AFAIU.. And C# value types aren’t really want I’d like to see show up in Java.. performance or not. We have a struct, it’s called a class. We just need better mechanisms for packing them (hence Object Layout from Gil et el..)
The goal of this effort is to explore how small immutable user-defined aggregate types without identity can be surfaced in the language and the JVM instruction set to support memory- and locality-efficient programming idioms without sacrificing encapsulation.
As for Object Layout vs .NET struct vs Value Types, we've had this discussion on this list with Gil already :). I don't want to get into that again, but my point is that irrespective of how java's VTs will differ from .NET, it *is* a goal to be more memory efficient.
They want VTs to be immutable to avoid a class of errors whereby you modify a copy of a VT (think method argument). C# allows passing structs by reference so one can avoid this if careful, but java has no plans to allow passing args by reference.
Sent from my phone
On 24 Dec 2014, at 01:40, Gil Tene <g...@azulsystems.com> wrote:Given your question, the actual title of the thread should be "How does object graph layout impact GC pause time”.
GC performance outside of a pause mostly deals with the rate at which the collector is able to do concurrent [not during pause] work, and the amount of CPU the GC will spend per allocation unit. But those are usually of much lesser concern than pause durations, because most GC setups have tons of performance headroom. As in: they are only active for tiny percent of the overall wall clock time, and are therefore far from caring about efficiency outside of a pause.The answer to your actual question ends up depending (dramatically) on the GC mechanism used.E.g. focusing on Oldgen pause times (and ignoring newgen for now):[Assumption: you are comparing an array of refs to a linked list going *through* the objects. Not a linked list "on top of" the objects (i.e. not. LinkedList<Object>) ]
Marking/Tracing:1. Concurrent and Mostly-concurrent Olden markers/tracers will incur the cost of traversing the references during their concurrent phases, but that will generally not increase the pause times (at the edges of the concurrent phases). This applies to CMS and G1 (and Shenandoah when it comes) as long as their markers remain mostly-concurrent (The collector did not fail over to a full GC). It also applies to C4 obviously. It does not apply to ParallelGC, SerialGC or to a Full-GC fallback of any mostly-concurrent collector.2. Stop-the-world marking (E.g. ParallelGC) will incur the cost of traversing the various reference *during* a pause. This first order thing the cost is dominated by is the number of reference fields that need to be visited, regardless of "shape" (flat array of refs vs. linked list). A second order effect will be the number of those references that are non-null, and the number of those references that are found to refer to already-marked-live objects (e.g. working on a reference to a popular object that has already been marked live is cheaper than working on a reference to a not-yet-marked-live object).3. Parallelism during STW marking: Another stop-the-world marking consideration is parallelize-ability of the GC work during the pause. The refs in array of refs *could* be chucked up and given to different GC threads to mark down in parallel [I don't know if ParallelGC or FullGC fallbacks actually do this when ParallelOldGen is on or not. C4 does, but it is not STW so who cares...]. But a linked list will not be parralelizeable from a tracing perspective. So *IF* your olden marker/tracer is STW *and* if it can use parallel threads for marking *and* it knows how to chunk up arrays of refs to distribute their marking work across parallel threads *then* a linked list layout may present a much longer STW pause than a flat array of refs would.Sweeping:1. For oldgen collectors that sweep and manage free lists (e.g. CMS as long as it stays in it's mostly-concurrent mode), the sweep cost will not be dramatically different between a layout that has a huge array of refs and one that has a linked list going *through* the objects [the number of objects to skip over is +/-1]. But the sweeper in such collectors is usually concurrent, so no STW-affecting behavior either way.Compaction:1. For STW compact-in-place collectors (e.g. ParallelGC and CMS when it eventually has to compact, G1 when it falls back to a Full GC), the compaction work is not going to be significantly different between the array-of-refs and the linked-list-thrugh-the-objects case. The number of objects to forward refs for and move remains the same (+/- 1). The amount of memory to move remains the same (+/- array header). The number of refs that need fixing up remains the same (+/- 1). It's a big bad STW pause, but it doesn't change much with the shape variants you describe.
2. For incremental-STW compactors (E.g. G1) the variation between an array of refs and linked-list-through-objects can be huge if the data structure in 100s of MB in size (as you indicate) and it is not lucky enough to have been allocated and compared in order and to have retained these grouping even through repeated evacuations (e.g. it was populated over time, interpreted with other work, and/or had elements replaced or shuffled early on). The reason for this big impact is that the work done to evacuate one region with pointed-to objects in it depends primarily on the number of other regions that contain references to that region. In the huge-array-of-refs case, the references pointing to the various objects would be concentrated in the array (only a few regions), while in the linked-list-through-the-objects case the references to the objects will be spread event all over the 100s of MB (and 100s of regions) that the structure occupies. Each region that contains many (100s) of individual member objects of the overall data structure will therefore have many (100s) regions referring to it, leading to much higher incremental compaction cost compared to the array-of-refs case. Incremental compaction cost can end up growing by N^2 where N is the number of objects in the overall data structure.
3. For Concurrent Compactors (C4, future Shenandoah) the compaction/evacuation cost will not be significantly impacted by the choice of layout (same amount of objects to move, same number of refs to eventually fix up). And since the compaction is not STW, we don't really care.Moving to Newgen collection behavior:For the other side (newgen pauses), several people have already commented on the card-table scanning cost incurred during a newgen pause in a pausing newgen collector. But here are some thoughts on the variants (assuming the data structure in question is "old and stable" as described:1. Pausing, copying newgen (PatallelGC, CMS, G1, etc.):1.1 The cleanliness (or lack thereof) of the card table will make a difference to newgen pause times. And different collectors end up cleaning up the card table in different ways. E.g. a STW compactor can (and generally will) produce a clean, no-false-positives card table on every compaction. Other collectors may perform some amount of concurrent card table cleaning [I'm not sure which do and which don't]. Without cleaning the card table periodically, the length of newgen pauses with gradually increase over time.1.2 The means and logic by which a dirty card are processed can end up with cost that dramatically depends on the choice of layout. E.g. when a dirty card is detected in a non-precise-card-marking-table (in HotSpot a card correlates to a 512-bytes of the heap), the first step in finding the references to scan for is to look for the start of the first object that falls within the card (and then scan forward from there). In the linked-list-through-the-objects case, this is never a big deal. But in the array-of-refs case the start of the huge array will be "far far away" from a dirty card when it is encountered. Depending on whether the GC has special-case-optimization for card scanning a large array-of-refs case, this can end up with a much higher newgen pause effect in the array-of-refs case vs the linked-list case.2. Concurrent Newgen collectors (e.g. C4): card scanning is not done in a STW pause, so who cares...-- Gil.
On Monday, December 22, 2014 10:35:55 AM UTC-8, Chris Vest wrote:
Thanks, Gil. This is the sort of guidance for the intuition I was after.
...
Our newgen pauses, whenever I’ve probed them in our testing, have tended to hover between 1 and 2 ms. We haven’t been modifying the references in the array, only fields of the referenced objects, so I don’t think this will make any difference to us.
Cheers,Chris
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.