Attention is currently required from: Andre Kempe, Benoit Lize.
1 comment:
Patchset:
This is the next step in my suggested refactoring plan from http://doc/1sA2pnvpRJGejfSoCxkQOo2lv2Fr6vH5Yn8JeUW1IFcw. Andre can build on this for the remaining steps. What do you think?
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Benoit Lize, Joe Mason.
3 comments:
Patchset:
I do not have powers to vote here, but looks mostly good to me, except for two nits and one question: In https://source.chromium.org/chromium/chromium/src/+/main:base/sampling_heap_profiler/poisson_allocation_sampler_unittest.cc;l=13 we have a test which verifies that "ScopedMuteHookedSamplesForTesting updates the allocator hooks". The code which does so has just been removed. I wonder if we should adjust the test (I'd remove it).
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #5, Line 193: // Fast, thread-safe access to the current profiling state. This is static
Nit: Is the problem really to access before initialization of instance_? Or to access without dereferencing instance_?
Patch Set #5, Line 200: friend class ScopedMuteHookedSamplesForTesting;
Nit: Do we need this? ScopedMuteHookedSamplesForTesting doesn't access any of the private members.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Benoit Lize, Joe Mason.
1 comment:
Patchset:
One more thing to be aware of: Installing the hooks that early in https://chromium-review.googlesource.com/c/chromium/src/+/4140359/5/base/sampling_heap_profiler/poisson_allocation_sampler.cc#203 will lead to a scenario where we have no instance yet, but the hooks are already being called. Currently the newly introduced profiler_state_ protects us from dereferencing the nullptr-instance, but this must be addressed in the next step.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Andre Kempe, Benoit Lize.
4 comments:
Patchset:
I do not have powers to vote here, but looks mostly good to me, except for two nits and one question […]
Thanks for pointing that out! I'll leave it in for now since it can't hurt to continue testing that scenario, but I added it to the list of code to clean up later in http://doc/1sA2pnvpRJGejfSoCxkQOo2lv2Fr6vH5Yn8JeUW1IFcw.
I'll update the comment to be more accurate before submitting. (I'm editing through the web interface right now so changing files that aren't already in the patch is awkward.)
One more thing to be aware of: Installing the hooks that early in https://chromium-review. […]
I think that will automatically work itself out in the next step. Since the new hook installer in https://crrev.com/c/4031766 needs to pass a pointer to the PoissonAllocationSampler, the caller will need to call `PoissonAllocationSampler::Get()` which will create the singleton as early as it's needed.
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #5, Line 193: // Fast, thread-safe access to the current profiling state. This is static
Nit: Is the problem really to access before initialization of instance_? Or to access without derefe […]
Currently the allocator hooks are static functions, so in theory they could be called when `instance_` is unset, if some other code calls `InstallStandardAllocatorHooks`. (No other code actually does that AFAIK so maybe this comment is excessively cautious.)
So I think "before initialization of instance" is more accurate. (It's not JUST for performance of not having to dereference.)
But when we switch to the new hooks, `instance_` will always exist so the comment will become obsolete. I'll just remove it so we don't need to update it later.
Patch Set #5, Line 200: friend class ScopedMuteHookedSamplesForTesting;
Nit: Do we need this? ScopedMuteHookedSamplesForTesting doesn't access any of the private members.
Good catch! This was part of a change that didn't work out, I forgot to remove it. Thanks.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Andre Kempe, Benoit Lize.
1 comment:
Patchset:
Thanks for pointing that out! I'll leave it in for now since it can't hurt to continue testing that […]
Updated the comment.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Andre Kempe, Joe Mason.
Patch set 9:Code-Review +1
2 comments:
Patchset:
lgtm
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #9, Line 193: // Fast, thread-safe access to the current profiling state.
What is thread-safe about this? Since you're only reading it with relaxed atomics, the only guarantee you get is that you will read either true or false, not a garbage intermediate value.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Gabriel Charette.
1 comment:
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #9, Line 193: // Fast, thread-safe access to the current profiling state.
What is thread-safe about this? Since you're only reading it with relaxed atomics, the only guarante […]
Good point. The new use of `profiling_state_` introduces an ordering dependency between threads - if thread A calls RecordAlloc, thread B calls RecordFree, and thread C updates `profiling_state_` from kNeverStarted to kRunning, it would be possible for thread A to read kRunning and thread B to read kNeverStarted, meaning the free would be missed. I think we now need to use Release-Consume ordering. (Before this patch the state was only checked in RecordAlloc, not RecordFree, so at worst one extra allocation could be recorded after turning off profiling. There was no chance of an Alloc / Free mismatch.)
`g_mute_hooked_samples` also needs to be Release-Consume. It used to only be read in test assertions, but now it's written on the main thread of unit tests (in the ScopedSuppressRandomnessForTesting constructor/destructor) and can be read in any
thread. If thread A creates a ScopedSuppressRandomnessForTesting and then adds an observer, and thread B reads `g_mute_hooked_samples` as false, that could lead to test flakiness as the observer incorrectly sees some unexpected real memory allocations. (On thread B, the write `g_mute_hooked_samples = true` and `profiling_state_ = kRunning` need to be seen in the same order as on thread A.)
+gab, can you verify the use of atomics here?
`profiling_state_` is written in AddSamplesObserver and RemoveSamplesObserver, inside a mutex, and the possible state transitions are `kNeverStarted -> kRunning <-> kStopped`. It's read in RecordFree(), which early returns on `kNeverStarted`, and RecordAlloc(), which also early returns on `kNeverStarted` or does a bit more work and then branches on the other two states.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Gabriel Charette, Joe Mason.
2 comments:
Patchset:
Just my 5 Cent / 2 Pence on reading/storing of profiling_state_
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #9, Line 193: // Fast, thread-safe access to the current profiling state.
Good point. […]
So, std::atomics use std::memory_order_seq_cst by default for store and read operations, see https://en.cppreference.com/w/cpp/atomic/atomic/store and https://en.cppreference.com/w/cpp/atomic/memory_order#Sequentially-consistent_ordering
So
> [...] if thread A calls RecordAlloc, thread B calls RecordFree, and thread C updates `profiling_state_` from kNeverStarted to kRunning, it would be possible for thread A to read kRunning and thread B to read kNeverStarted, meaning the free would be missed [...]
is NOT possible if the value is set before the reads. Both threads would read the same value.
So, IMHO from a functional POV we're completely fine without specifying the memory order in details. May not be top-notch performance wise, but it sure is for correctness.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #9, Line 193: // Fast, thread-safe access to the current profiling state.
So, std::atomics use std::memory_order_seq_cst by default for store and read operations, see https:/ […]
Yes, using the full `memory_order_seq_cst` would definitely be correct, but I want to use the more performant semantics if possible. gab@ is our atomics expert. I'll defer to him.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Joe Mason.
9 comments:
Patchset:
I strongly recommend this talk https://youtu.be/ZQFzMfHIxng on atomics
The allocator is a very subtle place and, on first look, it's not doing atomics correctly. It should at very least be explicit on all memory barriers (never use operators - even the C++ committee agrees it was a mistake to add them).
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #10, Line 208: memory_order_consume
1) First time I see std::memory_order_consume (I'm used to std::memory_order_acquire), looks like it's deprecated in C++17:
```
The specification of release-consume ordering is being revised, and the use of memory_order_consume is temporarily discouraged.
```
https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Consume_ordering
Looks like the difference is that consume only synchronizes memory related to the state being consumed instead of everything the producing thread wrote.
2) In either case, I don't think you need acquire nor consume here. Atomics (even when read with `std::memory_order_relaxed`) always guarantee that you see the whole variable. Your comment seems to hint at a desire for `std::memory_order_seq_cst` (i.e. total ordering of writes across threads) but that's rarely needed.
3) Your comment also implies you want some kind of a flush/guarantee that other threads see writes right away. No memory order guarantees that. Cache coherency is what makes memory eventually consistent. What memory orders do is that *once* you see the value written by thread A, you also see other (atomic and non-atomic) memory written by thread A (for release-acquire; or not necessarily for relaxed).
For state-like variables, relaxed is often correct, unless it's a one-way flag where upon seeing it signaled you assume the producer has set some key properties in memory that you may now also consume (e.g. base::AtomicFlag).
File base/sampling_heap_profiler/poisson_allocation_sampler.cc:
Patch Set #10, Line 161: g_mute_hooked_samples = true;
ditto
Patch Set #10, Line 171: DCHECK(g_mute_hooked_samples);
.load(std::memory_order_relaxed)
Patch Set #10, Line 177: g_mute_hooked_samples = false;
Let's use an explicit memory order here. Operators on atomics are bad form because they always use the strictest memory order and fail at expressing the intent.
Here's a good talk on atomics and direct link to a specific section on why explicit memory barriers: https://youtu.be/ZQFzMfHIxng?t=2988
Patch Set #10, Line 182: std::atomic<PoissonAllocationSampler::ProfilingState>
`ABSL_CONST_INIT` is a nice addition here to be explicit (though this is currently initialized as a POD or static-initializers checks on bots would complain)
`ABSL_CONST_INIT` is like `constexpr` but without `const`
Patch Set #10, Line 215: return g_mute_hooked_samples;
.load(std::memory_order_relaxed)
(unless this method guarantees some properties to callers based on returning true, and that this property is set by thread A which set it before setting `g_mute_hooked_samples.store(true, std::memory_order_release)`
Patch Set #10, Line 438: memory_order_acquire
`memory_order_acquire` inside a `DCHECK` doesn't make sense as it's altering the state of *other* memory for this thread based on whether this is running in a DCHECK build. `std::memory_order_relaxed` is always sufficient when reading the state of a variable without acting upon its content.
Patch Set #10, Line 439: profiling_state_ = ProfilingState::kRunning;
Explicit .store() and memory barrier. The question to ask yourself is whether a reader seeing kRunning expects any of the memory previously set by this thread to also be visible? If not, memory_order_relaxed is sufficient; if so, memory_order_release is required (https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering).
Since the state modified above is only `observers_` which is `GUARDED_BY(mutex_)` (and future reader/modifier of `observers_` will therefore have a happens-after relationship with it), I suspect there isn't memory state that's expected by a reader based on reading kRunning?
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Gabriel Charette.
3 comments:
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #10, Line 208: memory_order_consume
1) First time I see std::memory_order_consume (I'm used to std::memory_order_acquire), looks like it […]
Ah, I didn't notice the deprecation warning. I'll either go back to relaxed to switch to Release/Aquire.
Other threads don't need to see writes right away. Which comment implies that? I'll reword it.
Here's an example situation:
Thread A calls malloc(), which calls RecordAlloc(p) and then returns `p`. Thread B calls free(p), which calls RecordFree(p).
It's important that when `profiling_state_` changes from kNeverStarted to anything else, then IF Thread A sees the new value in RecordAlloc, Thread B must NOT see kNeverStarted in RecordFree. Does relaxed guarantee that?
I think in this case the RecordAlloc inter-thread happens-before RecordFree, because RecordAlloc(p) is sequenced-before the return from malloc on Thread A, and free(p) / RecordFree(p) consume the return value of malloc. Is that enough to guarantee that IF thread A sees the write to `profiling_state_`, so will thead B?
File base/sampling_heap_profiler/poisson_allocation_sampler.cc:
Patch Set #10, Line 280: memory_order_consume
I need to avoid this situation:
Thread A sets `g_mute_hooked_samples` to true, then changes `profiling_state_` from `kNeverStarted` to `kRunning`.
Thread B in this code reads `profiling_state_` as `kRunning` and `g_mute_hooked_samples` as false.
If I understand right, `g_mute_hooked_samples` needs a release/acquire memory barrier to guarantee this, so that compiler doesn't reorder the `profiling_state_` write before the `g_mute_hooked_samples` write, or reorder the `profiling_state_` read after the `g_mute_hooked_samples` read. `profiling_state_` can itself can use relaxed. Is that correct?
Patch Set #10, Line 439: profiling_state_ = ProfilingState::kRunning;
Explicit .store() and memory barrier. […]
The only reason this would need `memory_order_acquire` is if that's required to ensure that `RecordAlloc(p)` and `RecordFree(p)` from different threads see the same value of `profiling_state_`, as described above.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Joe Mason.
3 comments:
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #10, Line 208: memory_order_consume
Ah, I didn't notice the deprecation warning. I'll either go back to relaxed to switch to Release/Aquire.
Other threads don't need to see writes right away. Which comment implies that? I'll reword it.
```
// Load with release-consume ordering to make sure RecordAlloc() and
// RecordFree() in different threads see the same writes to
// `profiling_state_`.
```
Here's an example situation:Thread A calls malloc(), which calls RecordAlloc(p) and then returns `p`. Thread B calls free(p), which calls RecordFree(p).
It's important that when `profiling_state_` changes from kNeverStarted to anything else, then IF Thread A sees the new value in RecordAlloc, Thread B must NOT see kNeverStarted in RecordFree. Does relaxed guarantee that?
The choice of a memory order model doesn't change which value this load sees. What it does is change what *other* memory is visible as a result of the value read.
So no memory order on *this* atomic variable guarantees that, but 👇...
> I think in this case the RecordAlloc inter-thread happens-before RecordFree, because RecordAlloc(p) is sequenced-before the return from malloc on Thread A, and free(p) / RecordFree(p) consume the return value of malloc. Is that enough to guarantee that IF thread A sees the write to `profiling_state_`, so will thead B?
Correct. What guarantees that RecordFree sees the value set in RecordAlloc is that (short of a rogue thread freeing random memory) there's an happens-after relationship between the return of malloc() and the call to free().
In other words, as far as this relationship is concerned, `profiling_state_` doesn't even need to be atomic. It only needs to be atomic if other threads (which don't care about such guarantees) are racing to read it.
File base/sampling_heap_profiler/poisson_allocation_sampler.cc:
Patch Set #10, Line 280: memory_order_consume
I need to avoid this situation: […]
A memory barrier on X doesn't affect which value X.load() returns, it affects the visibility of *other* memory.
In this case, if g_mute_hooked_samples is set before and read after setting/reading profiling_state_, the release-acquire relationship needs to be on profiling_state_.
`profiling_state_.store(kRunning, std::memory_order_release)` means "make all the memory side-effects I have caused visible to any thread that reads-with-acquire-semantics the value I've set"
A future thread B which calls `profiling_state_.load(std::memory_order_release)` and sees `kRunning` will also see anything (atomic or non-atomic) written before the matching release.
Now. This only works if thread B knows **which** instance of kRunning this is (e.g. one time transition or Thread B somehow knew it was previously kNeverStarted). This is why we don't allow resetting `base::AtomicFlag`, the relationship only works the first time.
Subsequent times have an ABA problem.
For instance, if thread A can do the following:
T0 {g_mute_hooked_samples = false; profiling_state_ = kNeverStarted}
T1 {g_mute_hooked_samples = false; profiling_state_ = kRunning}
T2 {g_mute_hooked_samples = false; profiling_state_ = kNeverStarted}
T3 {g_mute_hooked_samples = true; profiling_state_ = kRunning}
Now, when thread B sees kRunning at T4, there's no guarantee that it's seeing the kRunning set at T3, it could be reading T1's value. At which point the acquire-release relationship is meaningless.
`std::memory_order_seq_cst` is the solution to this as it enforces a total ordering across all threads between all atomics.
https://en.cppreference.com/w/cpp/atomic/memory_order#Sequentially-consistent_ordering
I've never seen a case of this before in Chromium but this might be one? Hopefully kNeverStarted=>kRunning is a one-way transition and acquire-release is sufficient?
Note that this also brings the question of whether g_mute_hooked_samples needs to be atomic? It doesn't w.r.t. this relationship but it might if threads that don't have ordering requirements are also poking it? Ideally only threads that have an happens-after relationship with it being set are reading it, in which case it doesn't need to be atomic.
Patch Set #10, Line 439: profiling_state_ = ProfilingState::kRunning;
The only reason this would need `memory_order_acquire` is if that's required to ensure that `RecordA […]
Not required but may need seq_cst for ordering w.r.t. g_mute_hooked_samples as discussed
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Gabriel Charette.
Patch set 12:Commit-Queue +1
10 comments:
Patchset:
This is the next step in my suggested refactoring plan from http://doc/1sA2pnvpRJGejfSoCxkQOo2lv2Fr6 […]
Done
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #10, Line 208: memory_order_consume
> Ah, I didn't notice the deprecation warning. […]
Great! Updated to relaxed then.
File base/sampling_heap_profiler/poisson_allocation_sampler.cc:
Patch Set #10, Line 161: g_mute_hooked_samples = true;
ditto
Done
Patch Set #10, Line 171: DCHECK(g_mute_hooked_samples);
.load(std::memory_order_relaxed)
Changed it to compare_exchange since it's more obviously correct than a load and then a store.
Patch Set #10, Line 177: g_mute_hooked_samples = false;
Let's use an explicit memory order here. […]
Done
Patch Set #10, Line 182: std::atomic<PoissonAllocationSampler::ProfilingState>
`ABSL_CONST_INIT` is a nice addition here to be explicit (though this is currently initialized as a […]
Done
Patch Set #10, Line 215: return g_mute_hooked_samples;
.load(std::memory_order_relaxed) […]
Done
Patch Set #10, Line 280: memory_order_consume
A memory barrier on X doesn't affect which value X. […]
Argh, I got it backwards. I really don't want to impose a total ordering just for a test-only feature. The only point of g_mute_hooked_samples is to prevent tests from flaking.
Instead I moved all the state into one atomic, so there's no dependency between two variables. That makes it more complicated to update the state, but that's ok because updating the state is much, much more rare than reading it so it's ok for it to be heavier.
Patch Set #10, Line 438: memory_order_acquire
`memory_order_acquire` inside a `DCHECK` doesn't make sense as it's altering the state of *other* me […]
Done
Patch Set #10, Line 439: profiling_state_ = ProfilingState::kRunning;
Not required but may need seq_cst for ordering w.r.t. […]
Ack
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Gabriel Charette.
📍 Job complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/1563903d3a0000
File base/sampling_heap_profiler/poisson_allocation_sampler.h:
Patch Set #14, Line 122: enum class ProfilingState {
In order to avoid cases where a caller checks `== kRunning` and forgets to also checks `== kRunningMutedForTesting`, it's preferable to hide this packing inside a class whose only member is an atomic bitpack.
e.g. base::internal::TaskTracker::State
This also removes the need for helpers like AtomicallyUpdateProfilingState
File base/sampling_heap_profiler/poisson_allocation_sampler.cc:
Patch Set #14, Line 192: ABSL_CONST_INIT std::atomic<PoissonAllocationSampler::ProfilingState>
// static
Patch Set #14, Line 210: g_sampled_addresses_set.store(sampled_addresses, std::memory_order_release);
Let's move changes to atomics not directly related to this CL to another CL? Makes things easier in case it breaks anything?
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Joe Mason.
Joe Mason removed Benoit Lize from this change.
To view, visit change 4140359. To unsubscribe, or for help writing mail filters, visit settings.