Explanation of some LVB subtleties in the garbage collector

498 views
Skip to first unread message

Marko Topolnik

unread,
Apr 21, 2015, 4:42:07 AM4/21/15
to mechanica...@googlegroups.com
I'm reading the paper on the C4 collector. I got the message that the cornerstone is the Loaded Value Barrier (LVB). However, I'm still struggling to understand how the second invariant in the paper ("a ref is always valid") is guaranteed.

Let's take this specific example: 

- There is an object to be moved.
- There are two threads, both with large call stacks (say, a million pointers live on each).
- On each stack, somewhere there is pointer to this particular object.

I cannot escape the conclusion that both of these threads will have to arrive at a safepoint together and the object would have to be moved while they are parked inside it. Is my thinking correct? I'm familiar with the concept of ragged safepoints and their performance advantage, but I can't visualize how they could be employed for this case.

I'm also confused as to how this updating is performed efficiently in such cases where there are many loaded refs per thread.

--
Marko Topolnik
Hazelcast

Jean-Philippe BEMPEL

unread,
Apr 21, 2015, 9:09:31 AM4/21/15
to mechanica...@googlegroups.com
Hello

I am sure Gil will answer it with all the gory details, but what I think you are missing, is the self-healing process as the consequence of the LVB.
The LVB is for each ref access ensure the ref is valid, and not in the process to be moved. if the LVB test is not passed the reference is no longer valid and you need to follow the forwarding pointers to reach the correct location of the object during the process you will "heal" your  reference to move to new location so all subsequent accesses will reach the correct location of the moved object.

Cheers

Marko Topolnik

unread,
Apr 21, 2015, 9:13:33 AM4/21/15
to mechanica...@googlegroups.com
What worries me is that the reference is validated on load and after that it's free to be used any number of times. So how does the system keep the ref valid throughout its lifespan?

Vitaly Davidovich

unread,
Apr 21, 2015, 10:30:04 AM4/21/15
to mechanica...@googlegroups.com

You're asking how it handles concurrent evacuation/movement of a reference that's already passed the initial LVB, right? I may be wrong, but I thought C4 uses memory protection to cause traps if a reference is used from a page that's being evacuated; the LVB logic kicks in and corrects the reference (and/or performs the evacuation as well, IIRC).

FWIW, Redhat is working on a similar GC called Shenandoah, which uses Brooks forwarding pointers.

sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gil Tene

unread,
Apr 21, 2015, 10:52:28 AM4/21/15
to mechanica...@googlegroups.com
The invariant is maintained very simply: The LVB validates each reference when it is loaded from the heap assures that all references that enter a thread's stack (registers and stack locations) are always "valid" for the current GC phase. When combined with independent scanning of each thread's stack once per phase flip, a thread that has had it's stack scanned is unable to observe or experience any non-valid references, and therefore cannot propagate such references to the heap.

Each time the definition of "valid" changes (e.g. at the beginning of a mark phase, or at the beginning or a relocation phase), all thread stacks do need to be scanned. But algorithmically, this does not require a global safepoint. Instead, it can be done using checkpoints (what others refer to as ragged safepoints or barriers), where each thread independently cross a thread-local safepoint to perform some required work, but no global all-threads-are-stopped-at-the-same-time safepoint pause is required. A checkpoint based phase flip does involves thread-local notion of what "valid" is: each thread that crosses the checkpoint uses the new (post-checkpoint) meaning of "valid" in it's LVB and stack scanning, while threads that have not yet crossed the checkpoint still use the previous (pre-checkpoint) meaning. GC operations that depend on global agreement on what "valid" means obviously don't start until the phase flip is complete and all threads have independently crossed the checkpoint, at which point all thread agree on the same stable meaning. This does mean that until all threads cross the checkpoint, threads on opposite sides of the checkpoint can end up "fighting" over the valid meaning if they communicate shared references to each other via the heap, but that's a small and transient concern, as each thread will simply continue correct it's incoming references to it's own notion of what "valid" means until the checkpoint is crossed by all. 

It is worth mentioning that while the C4 algorithm does not require any global safepoints, the current C4 implementation in Zing does involve global safepoints at certain places (and uses checkpoints only in some of the phase shifts). This is primarily driven by engineering concerns and runtime implementation detail that go beyond the basic GC algorithm. E.g. system dictionary scanning, class unloading, code cache cleaning, lock deflation, and various other operations are "more complex" to perform concurrently, and some are still done in a global safepoint phase change as a result. While we continue to work to reduce the number of GC-related safepoints and the amount of non-concurrent work performed in them (potentially down to 0), these current GC phase change pauses are already small enough that had they not been clearly logged in our GC lots, they would be hard to detect or observe when compared to OS-related stalls (which tend to be larger by an order of magnitude or more on most systems). We currently spend a lot more time battling Linux thread stalling effects than we do on reducing work-in-safepoint items like the above. Since none of the work performed in the current global GC safepoints grows with the heap size, live set, allocation rate, mutation rate, or use of soft/weak/phantom references or finalizers, those timings remain insensitive to program scale and behavior. 

To your question/example about threads with deep call stacks (a million refs on a stack require at least an 8MB deep stack): While not necessary for the C4's algorithm implementation, there are well established ways (orthogonal to the C4 algorithm) for GCs to both incrementally and concurrently clean most of the stack, instead of doing so in a single thread-stalling operation. In GC parlance, such techniques often use the term "stacklets" in their description, so that's a good keyword to google for if you want to see examples of other algorithms doing that. The basis for using stacklets in an LVB based algorithm would be that only the stack contents that thread can actually interact with needs to be "cleaned" such that it is all known to be "valid" after each phase change. Stacklets that guarantee non-interaction with shallower parts of the stack are fairly simple to apply in Java Machine semantics, since in-stack referencing across frames is precluded semantically: a reference held in a shallower frame is not visible to a thread until it actually "pops" its way into that frame. As a result the top few frames of a thread's stack can be scanned to guarantee "valid" references, while a "poisoned" return address can be installed at the boundary between the scanned and not-yet-scanned parts of the stack. Incremental self-cleaning can the be done when the thread pops a frame that crosses that boundary: a few more frames will get cleaned, the boundary will move, and the thread will continue it's operation. Concurrent cleaning of the stack can also be done since the thread will not directly interact with the "not yet cleaned" part of it's stack. This allows concurrent collector threads to safely scan and correct that "not yet clean part" of a thread's stack while the thread is running, covering most of the stack frames without incremental mutator scanning work, such that the thread will not actually have to clean them itself when it crosses the boundary frames.

Vitaly Davidovich

unread,
Apr 21, 2015, 11:02:48 AM4/21/15
to mechanica...@googlegroups.com
Gil,

How does LVB interact with JIT'd code? Specifically, what precludes the JIT from putting a calculated address into a register (after LVB is passed) and then using that register as the base for field accesses (say, in a loop)? What exactly corrects/patches that address?

Jean-Philippe BEMPEL

unread,
Apr 21, 2015, 11:03:48 AM4/21/15
to mechanica...@googlegroups.com
And here are the gory details! Thanks Gil.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Jean-Philippe BEMPEL

unread,
Apr 21, 2015, 11:06:47 AM4/21/15
to mechanica...@googlegroups.com
I think that even for field access you will need the LVB.
Loading the address into the register does not need to be protected by LVB, but dereferencing it (with the offset) requires it.


--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Vitaly Davidovich

unread,
Apr 21, 2015, 11:13:48 AM4/21/15
to mechanica...@googlegroups.com
So every time you read a field of a reference, even a field of primitive type, there's an LVB inserted? What does this look like in pseudo-x86 assembly?

Gil Tene

unread,
Apr 21, 2015, 11:15:43 AM4/21/15
to mechanica...@googlegroups.com


On Tuesday, April 21, 2015 at 10:30:04 AM UTC-4, Vitaly Davidovich wrote:

You're asking how it handles concurrent evacuation/movement of a reference that's already passed the initial LVB, right? I may be wrong, but I thought C4 uses memory protection to cause traps if a reference is used from a page that's being evacuated; the LVB logic kicks in and corrects the reference (and/or performs the evacuation as well, IIRC).

The algorithm requires the LVB to detect and logically "trap" on any non-valid ref state. The same LVB test is used to support concurrent marking and concurrent compaction. The actual implementation of the LVB can vary widely, can be (and does) evolve over time, and tends to be highly tuned to the specific processor and OS capabilities in use. E.g. the specific LVB implementation, state encoding, and actual checks made can vary greatly between in Vega, X86, ARM, or Power, but they would all achieve the same algorithmic meaning and enforcement.

Specifically on x86, C4 implementations only use memory protection traps for correctness assertions, and for supporting partial moves of very large objects (e.g. 4KB at a time), neither of which are part of the LVB's semantics. In implementation, C4's LVB "traps" on x86 are currently all done via conditional branching: an LVB's fast path test devolves to one or two x86 u-ops that tend to completely hide in the superscalar execution pipelines of modern x86 processors. This, combined with self-healing nature that dramatically improves the dynamic occurrence of the fast-path even during GC activity make LVB an extremely efficient read barrier: it has orders of magnitude lower dynamic execution costs than all prior read barriers which would continue to "trigger" their slower paths during entire phases of GC execution.
 

FWIW, Redhat is working on a similar GC called Shenandoah, which uses Brooks forwarding pointers.

I wouldn't call it "similar". No more than ParallelGC and G1 are "similar". Some of the goals for C4 and are Shenandoah similar (low pause, and concurrent compaction), but the mechanisms are wildly different. E.g. a Brooks-style barrier is very different from an LVB, and the twi have dramatically different algorithmic effects and dynamic behavior characteristics. As another example, the Brooks-style barrier in Shenandoah deals only with supporting concurrent relocation (Shenandoah currently uses a SATB mostly-concurrent maker that uses a separate write barrier to enforce it's invariants). In contrast, C4 uses the same LVB barrier tho support both concurrent marking and concurrent relocation. The marking phase in C4 leverages the LVB's strong invariants to avoids the need to track any mutation, which makes for a steady, single pass marker that is oblivious to mutation rate of application throughput.
 

sent from my phone

On Apr 21, 2015 9:13 AM, "Marko Topolnik" <marko.t...@gmail.com> wrote:
What worries me is that the reference is validated on load and after that it's free to be used any number of times. So how does the system keep the ref valid throughout its lifespan?

On Tuesday, April 21, 2015 at 3:09:31 PM UTC+2, Jean-Philippe BEMPEL wrote:
Hello

I am sure Gil will answer it with all the gory details, but what I think you are missing, is the self-healing process as the consequence of the LVB.
The LVB is for each ref access ensure the ref is valid, and not in the process to be moved. if the LVB test is not passed the reference is no longer valid and you need to follow the forwarding pointers to reach the correct location of the object during the process you will "heal" your  reference to move to new location so all subsequent accesses will reach the correct location of the moved object.

Cheers

On Tuesday, April 21, 2015 at 10:42:07 AM UTC+2, Marko Topolnik wrote:
I'm reading the paper on the C4 collector. I got the message that the cornerstone is the Loaded Value Barrier (LVB). However, I'm still struggling to understand how the second invariant in the paper ("a ref is always valid") is guaranteed.

Let's take this specific example: 

- There is an object to be moved.
- There are two threads, both with large call stacks (say, a million pointers live on each).
- On each stack, somewhere there is pointer to this particular object.

I cannot escape the conclusion that both of these threads will have to arrive at a safepoint together and the object would have to be moved while they are parked inside it. Is my thinking correct? I'm familiar with the concept of ragged safepoints and their performance advantage, but I can't visualize how they could be employed for this case.

I'm also confused as to how this updating is performed efficiently in such cases where there are many loaded refs per thread.

--
Marko Topolnik
Hazelcast

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Gil Tene

unread,
Apr 21, 2015, 11:18:22 AM4/21/15
to mechanica...@googlegroups.com, jean-p...@bempel.fr
No... That's backwards. Only loading a reference into a register is protected by LVB. De-referencing that reference does not require an LVB, since all de-referenceable references are known to be "valid".
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Jean-Philippe BEMPEL

unread,
Apr 21, 2015, 11:21:13 AM4/21/15
to mechanica...@googlegroups.com
Ok but the point of Vitaly isn't that the reference becomes invalid between the 2 when you dereference the address loaded into the register ?

On Tue, Apr 21, 2015 at 5:18 PM, Gil Tene <g...@azulsystems.com> wrote:
No... That's backwards. Only loading a reference into a register is protected by LVB. De-referencing that reference doing not require an LVB, since all de-referenceable references are known to be "valid".
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,
Apr 21, 2015, 11:23:16 AM4/21/15
to mechanica...@googlegroups.com
Such derived pointers are either tracked in oopmaps [in which case they would be corrected by stack scanning in a phase flip just like any other reference], or they do not survive across thread-local safepoint opportunities, in which case they are known to not exist on any stacks after a checkpoint is complete. Either way, all derived pointers are known to be "valid" once a checkpoint has been crossed.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Gil Tene

unread,
Apr 21, 2015, 11:27:56 AM4/21/15
to mechanica...@googlegroups.com, jean-p...@bempel.fr
The only thing that will make a previously "valid" reference "not valid" is a phase flip that changes the meaning of "valid". And before completing, each phase flip performs a checkpoint (or equivalent) that scans all thread stacks to make all de-referenceable references in them "valid". After a phase flip, the only "not valid" references that exist are in the heap, and non of those can make it into registers or stack locations because the LVB will intercept them (and correct them to be "valid") first.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Jean-Philippe BEMPEL

unread,
Apr 21, 2015, 11:30:21 AM4/21/15
to mechanica...@googlegroups.com
Ok makes sense now, thanks for the explanation.

On Tue, Apr 21, 2015 at 5:27 PM, Gil Tene <g...@azulsystems.com> wrote:
The only things that will make a previously "valid" reference "not valid" is a phase flip that changes the meaning of "valid". And before completing, each phase flip performs a checkpoint (or equivalent) that scans all thread stacks to make all de-referenceable references in them "valid". After a phase flip, the only "not valid" references that exist are in the heap, and non of those can make it into registers or stack locations because the LVB will intercept them (and correct them to be "valid") first.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Vitaly Davidovich

unread,
Apr 21, 2015, 11:31:16 AM4/21/15
to mechanica...@googlegroups.com, jean-p...@bempel.fr

Ok, thanks.  So this effectively requires a global safepoint like thing so you can reliably scan all thread stacks? That is, relocating objects pauses mutators to at least check their stacks for the moved reference.

sent from my phone

On Apr 21, 2015 11:27 AM, "Gil Tene" <g...@azulsystems.com> wrote:
The only things that will make a previously "valid" reference "not valid" is a phase flip that changes the meaning of "valid". And before completing, each phase flip performs a checkpoint (or equivalent) that scans all thread stacks to make all de-referenceable references in them "valid". After a phase flip, the only "not valid" references that exist are in the heap, and non of those can make it into registers or stack locations because the LVB will intercept them (and correct them to be "valid") first.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Vitaly Davidovich

unread,
Apr 21, 2015, 11:34:25 AM4/21/15
to mechanica...@googlegroups.com

I meant Shenandoah is similar in principle/goals, not implementation, hence the mention of Brooks fwd pointers.

sent from my phone

sent from my phone

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,
Apr 21, 2015, 11:49:04 AM4/21/15
to mechanica...@googlegroups.com, jean-p...@bempel.fr
No global safepoint is *required* by the algorithm, even if some implementations choose to use them... E.g. depending on implementation, the collector could incremehtally designate references to a set of regions that are being evacuated as "invalid", perform a checkpoint (no global safepoint) and then [concurrently] relocate the objects from those regions before moving on to the next set of regions to be evacuated. Mutator threads that encounter references to the being-evacuated regions will either fix them to reference to their new locations (if the object has already been moved) or participate in moving the object (if the object has not yet been moved by the collector or some other mutator thread). During each checkpoint, access by threads that have not yet crossed the checkpoint remains safe because moving of the evactuated objects (by either the collector or mutators) will only happen after the checkpoint is complete...

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,
Apr 21, 2015, 12:02:31 PM4/21/15
to mechanica...@googlegroups.com
How would threads know that their oopmaps need patching if they're not brought to a safepoint? So GC designates a region as invalid, how does a mutator thread see this if it has an address from that region in its oopmap and is actively using that reference?

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,
Apr 21, 2015, 12:18:13 PM4/21/15
to mechanica...@googlegroups.com
There is overlap in some principles and goals, but not in most. Specifically not in the driving ones.

E.g. C4's "prime directives" are "don't be in a hurry", and "always do the same thing". Everything else in to derives from those two.

"Don't be in a hurry" means that nothing the collector does ever gets "worse" or "harder" if the collector doesn't finish it quickly. Nothing stays "slow" or "bad" until something important happens, and there are no phases of operations during which things are persistently slow (e.g. during which barriers continually trigger their slow paths). The only thing the collector has to pace to "keep up with" is allocation rate, and that pace can be ultimately relaxed with empty heap space. 

"Always do the same thing" means that no attempt is made to delay or avoid "rare" operations. Each mechanism always works in exactly the same way in each GC cycle, and the collector is designed to have mutators "stay happy" during this normal and predictable operation rather than keeping them "happy most of the time" by pushing hard/annoying work to the future.

These two key qualities leads to C4's trivial tuning: Changing the heap size (for a given workload) changes the pace at which the collector has to act. Larger heaps mean a slower, more relaxed pace, while smaller heaps (for the same live set) means a faster, more active pace. Nothing else needs to be tuned because nothing else really matters... [well, to be accurate: on low latency systems, people sometimes also tune down the number of GC threads in order to avoid CPU contention, but that's about it...]

These two key qualities (not being in a hurry and always working the same way) also mean that virtually all tunables in other collectors become irrelevant and unneeded with C4: No newgen size, no survivor ratios, no promotion threasholds, no occupancy thresholds, and non of the other 10s of flags that you all know and "love". They pretty much all deal with fine grain control of various "hurry up or things will get worse" or "lets try and avoid doing this terrible thing too often" motivations. With C4, both those motivations are gone by design... 

C4's "prime directives" are what drives it's other choices. E.g. single pass, mutation-oblivious concurrent marking is important because having to do work to detect and track mutation during marking would mean things get worse if you don't mark quickly. Similarly, avoiding ANY fallbacks to alternate ways of doing some of the GC work in a STW pause means that we don't have to worry about the trying to delay or avoid things like compaction, o matter what the heap topology, allocation behavior, or mutation behavior looks like. Since we don't need to worry about "what would happen if we can't delay it any longer" things get much simpler to implement, and much nicer for the application and its users.

sent from my phone

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Marko Topolnik

unread,
Apr 21, 2015, 12:22:04 PM4/21/15
to mechanica...@googlegroups.com
I think here we're coming the closest to the essence of my question. Let me single out key points from your post as I read them:

1. A thread which has not yet crossed the checkpoint keeps mutating the old object location.
2. A thread which has crossed the checkpoint mutates the new location.

How can these two be reconciled? I just can't find the kink in my understanding!

--
Marko


To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,
Apr 21, 2015, 12:24:43 PM4/21/15
to mechanica...@googlegroups.com
The threads know this because a checkpoint makes them know it, but it does so without resorting to a STW pause. A checkpoint concurrently moves all threads though thread-local safepoints without going through a global safepoint (where all threads would have been stopped at the same time, and none of them could move forward until all have reached the safepoint and whatever operations needed tone done there is complete). 

After a checkpoint is complete, the collector knows that all threads have gone through at least one safepoint opportunity (and as a result, for examples no pre-checkpoint references that are not described in oopmaps exist in any thread), and that [if the checkpoint operation needed it] all stacks only contain "valid" references.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Gil Tene

unread,
Apr 21, 2015, 12:38:23 PM4/21/15
to mechanica...@googlegroups.com
Only one object location is ever accessible (for each object instance), globally. There is no "two copies" concept in C4. A thread that has crossed the checkpoint will only be able to observe a reference to the object in it's new location after it has been moved there (by the collector, by some other thread, or otherwise by the observing thread itself). And an object will only be moved after the checkpoint associated with declaring the region set it is in "not valid" has been crossed by all threads.

This does mean that a thread that has crossed the checkpoint and attempts to observe a reference to an object that is to be relocated as a result of that checkpoint will wait until all other threads cross the checkpoint. This cross-checkpoint interaction is one of the reasons to perform relocation in multiple small increments (each processing through it's own non-STW checkpoint and concurrent object relocation). Alternatively an "easy shortcut" is to use a global safepoint if one wishes to take a shortcut and simply relocate all objects in a single flip instead of moving through many small region increments.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,
Apr 21, 2015, 12:45:10 PM4/21/15
to mechanica...@googlegroups.com
So each thread goes through its own local safepoint/checkpoint, and while each thread is doing that, GC is waiting for all threads to complete their thread local checkpoints? Once they complete, the collector can proceed.  If so, effectively all threads will pause anyway to perform checkpoint, it's just that each thread walks through its checkpoint independent of others, and thus no threads wait for others to stop before performing safepoint operations.  Is that about right?

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Marko Topolnik

unread,
Apr 21, 2015, 1:16:00 PM4/21/15
to mechanica...@googlegroups.com

Great, I think everything is clear now. In the laziest setup, a thread entering a checkpoint with the intent to mutate an invalid object will block until all the threads are on or past the checkpoint. Only at that point will it be allowed to relocate the object,  then proceed with mutation.

--
Marko

To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,
Apr 21, 2015, 5:28:07 PM4/21/15
to mechanica...@googlegroups.com
Yes, that's about right. Except that I don't think of non-blocking operations that a thread does on its own stack as a "pause" (no more than I think of a handling a timer interrupt as a pause, anyway). A thread-local safepoint is extremely short, and generally does not need to block for anything (when it is part of a checkpoint). In the most common case, it boils down to taking a branch mis-prediction on the safepoint poll instruction and performing whatever work is required by the specific chekpoint type. This is usually shorter than the time it would take the OS to handle a timer interrupt (which involves taking an exception, context switching to the kernel, etc.), and timer interrupts happen 1000s of times per second on every CPU in your system... E.g. we regularly profile time-to-safepoint by making each thread get to a safepoint 1000 times per [cpu] second, where each safepoint just records the latency from when the safepoint was requested until it the thread actually arrived at the safepoint handling code. Typical TTSP times are in the low tens on usec, and that includes taking the safepoint.

When actual stack frames need scanning as part of a checkpoint, scanning work is generally very quick as well (10s of usec). But as mentioned earlier in this thread, scanning time can be capped using stacklet-style techniques if needed.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,
Apr 21, 2015, 9:33:42 PM4/21/15
to mechanica...@googlegroups.com

Got it, thanks.

sent from my phone

On Apr 21, 2015 5:28 PM, "Gil Tene" <g...@azulsystems.com> wrote:
Yes, that's about right. Except that I don't think of non-blocking operations that a thread does on its own stack as a "pause" (no more than I think of a handling a timer interrupt as a pause, anyway). A thread-local safepoint is extremely short, and never needs to block for anything (when it is part of a checkpoint). In the most common case, it boils down to taking a branch mis-prediction on the safepoint poll instruction and performing whatever work is required by the specific chekpoint type. This is usually shorter than the time it would take the OS to handle a timer interrupt (which involves taking an exception, context switching to the kernel, etc.), and timer interrupts happen 1000s of times per second on every CPU in your system... E.g. we regularly profile time-to-safepoint by making each thread get to a safepoint 1000 times per [cpu] second, where each safepoint just records the latency from when the safepoint was requested until it the thread actually arrived at the safepoint handling code. Typical TTSP times are in the low tens on usec, and that includes taking the safepoint.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/H5rCkmd2UwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Marko Topolnik

unread,
Apr 22, 2015, 4:22:12 AM4/22/15
to mechanica...@googlegroups.com
On 21 Apr 2015, at 18:45, Vitaly Davidovich <vit...@gmail.com> wrote:

each thread walks through its checkpoint independent of others, and thus no threads wait for others to stop before performing safepoint operations.

Any thread entering the LVB to load an object flagged for relocation will stay within the LVB and pause until all the threads are past the checkpoint. In the worst case, all threads will attempt to load an invalid object and there will be a global STW pause. That's how I got the story.

--
Marko

Gil Tene

unread,
Apr 22, 2015, 9:35:04 AM4/22/15
to mechanica...@googlegroups.com
The worst case is certainly a global synchronization between all threads. Relocation granularity (per increment) can help reduce the occurrence or likelihood of such a global synchronization, but even if the relocation granularity was one-object-at-a-time, the worst case would be having all threads loading a reference to that one object at the same time, and before it was relocated.

You could call this global synchronization a STW pause, I guess, but then you'd want to call all global synchronizations that (including, for example, the ones found in concurrent hash maps, or encountered when page faulting on code). The distinction I draw is that this pause/synchronization involves no work, just waiting for an observation line to be crossed by all.

The worst case global synchronization for C4 will always involve a situation where all threads are trying to do something that in itself requires all other threads to have crossed a checkpoint. Waiting for all threads to reach and cross the checkpoint doesn't mean that they need to do any work: it means that we know that they have observed whatever it is we need them to observe, and will not proceed without taking that observation into account. The time does not include doing any operations as a result of the checkpoint, which means that it solely depends on time-to-checkpoint-observation. How long that can take depends on the VM's implementation of the mechanisms that ensure the observation of the checkpoint condition.

When using a mechanism that relies purely on per-thread safepointing (which is what we currently do, for example, in Zing), time-to-checkpoint-observation is dominated by time-to-safepoint. Any blocked thread is already at a thread-local safepoint and will have it's checkpoint crossed immediately by the VM (which amounts to setting of a flag with a CAS). It's the runnable threads that are likely to not be at safepoints, and all of them need to get to run across a safepoint opportunity for the checkpoint to be crossed by all.

That's why time-to-safepoint management in all code paths is important. Even if no global safepoint is attempted. Controlling time-to-safepoint in interpreted or JIT'ed code is relatively easy: if polling opportunities are placed on all loops and calls, a safepoint opportunity is never more than a few instructions away. JNI calls (and the arbitrary user code in them) are also easy because all JNI code is already at a safepoint. It's the numerous "other" runtime paths that end up needing careful attention, such that long runtime code paths (like array copying, memory zeroing, or other potentially-long-running logic) include regular safepoint opportunities within them. With the right work and coverage, those paths can be kept to a capped amount of time (e.g. a few usec). A time-to-safepoint profiler is critical for coverage here. For example, in Zing we use one to exercise all runnable threads 1000 times per second in doing the "how long would it take if I asked you to go to a safepoint *now*" question, looking for any paths that take longer than a threshold of our choosing. And we use that to find targets that need to have safepoint opportunities added to their loops. Currently, our time-to-safepoint is dominated by OS artifacts, with the worst ones being things like page faulting on an mapped file access.  

Note that algorithmically (as opposed to in a specific implementation), all that is required by the C4 algorithm is a checkpoint crossing (which means guaranteed observation of something by the threads that crossed it). And there are ways to achieve a checkpoint crossing without requiring any code execution by the affected threads, or any safepoint polling by threads. E.g. virtual memory manipulation can be used to guarantee such observation, requiring a synchronization that is no longer than the time it takes to do a global TLB-invalidate in the process (this tends to be on the order of 1-20 usec on modern systems). The synchronization is still global (any thread that has had it's CPU's TLB invalidated and attempts to access whatever it is that is affected and requires the checkpoint to be fully crossed by others will still need to wait for the TLB invalidate to be completed), but it is VERY short, and will not block for anything other than disabled interrupts on the various CPUs. Most OS kernels are full of global synchronizations that are longer than this...
Reply all
Reply to author
Forward
0 new messages