PTAL.
Implemented Multiple ThreadCache support; remaining work includes additional testing and addressing 3 TODO comments about optimization, which will be addressed in subsequent CLs.
// ThreadCaches at a differenct, preferably dedicated index and swap only whenI'll fix the typo later
options.thread_cache_index = 1;This is for test purposes.
I'll revert this change after reviews.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Should there be a check to see if we've registered to MANY thread caches?
ThreadCache::SwapForTesting(nullptr, another_thread_cache_index);
root1->settings.with_thread_cache = false;Is this to properly clean up the test for destruction?
I think we should consider trying to move this into something automated.
For example root1 has `CreateCustomTestRoot` that returns a unique_ptr, we could either
1) make the PartitionRoot destructor handle it
2) Add a custom deletor to the unique_ptr that nulls out the thread cache when it goes out of scope.
// TODO(ayumiohno): Once `ThreadCache::largest_active_bucket_index_` becomes aCan you file a bug (crbug/####) and put all the details there?
You can use component chromium > Blink > MemoryAllocator > Partition
static ThreadCache* Get(size_t index) {Should we provide a templated version of this?
`ThreadCache::Get<0>()` or is the schedulerloop the only user of explicit index anymore and long term doesn't make sense?
I guess looking through that only SchedulerLoopQuarantine and the extended_api for tests knows that it was <0>. Therefore probably not worth adding.
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];can this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
tcache->AccumulateStats(stats);Should this be updated to have some sort of index to track stats per thread cache?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/speedometer3 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/13006667310000
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/speedometer3 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/17582db7310000
const size_t another_thread_cache_index = 1;nit: `constexpr`?
size_t thread_cache_index = 0;Put this near `thread_cache`?
ThreadCache* tcache = ThreadCache::Get(0);Can you explicitly create a constant for this, or create a function like `GetForQurantine`?
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];can this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
But do we want bounds checking every time? This is fast path.
tcaches = reinterpret_cast<ThreadCache*>(operator new(array_size));Is this calling the `ThreadCache::operator new`? Or the global one?
internal::g_thread_caches[internal::kThreadCacheTombstoneIndex]) =Can you explain why setting this on `internal::kThreadCacheTombstoneIndex` is sufficient to mitigate the issue commented above?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// ThreadCaches at a differenct, preferably dedicated index and swap only whenI'll fix the typo later
Done
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Should there be a check to see if we've registered to MANY thread caches?
I've added this check. This should be removed once multiple thread cache is in use.
```
PA_CHECK(opts.thread_cache_index == 0)
<< "Cannot use a non-default thread cache index.";
```
const size_t another_thread_cache_index = 1;Ayumi Ononit: `constexpr`?
Done
ThreadCache::SwapForTesting(nullptr, another_thread_cache_index);
root1->settings.with_thread_cache = false;Is this to properly clean up the test for destruction?
I think we should consider trying to move this into something automated.
For example root1 has `CreateCustomTestRoot` that returns a unique_ptr, we could either
1) make the PartitionRoot destructor handle it
2) Add a custom deletor to the unique_ptr that nulls out the thread cache when it goes out of scope.
Created `CreateCustomTestRootWithThreadCache`, which returns the unique_ptr with destructor that nulls out the thread cache.
size_t thread_cache_index = 0;Ayumi Onoditto
Done
size_t thread_cache_index = 0;Put this near `thread_cache`?
Done
ThreadCache* tcache = ThreadCache::Get(0);Can you explicitly create a constant for this, or create a function like `GetForQurantine`?
Created `internal::kThreadCacheQuarantineIndex` in thread_cache.h
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
But do we want bounds checking every time? This is fast path.
By accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
tcache->AccumulateStats(stats);Should this be updated to have some sort of index to track stats per thread cache?
Thank you, I added index to the `DumpStats` arguments, and made it accumulate stats of the thread caches of the same root.
tcaches = reinterpret_cast<ThreadCache*>(operator new(array_size));Is this calling the `ThreadCache::operator new`? Or the global one?
It's calling the class-specific one. Should I create Alloc/Free functions and use them instead?
internal::g_thread_caches[internal::kThreadCacheTombstoneIndex]) =Can you explain why setting this on `internal::kThreadCacheTombstoneIndex` is sufficient to mitigate the issue commented above?
`PartitionRoot::MaybeInitThreadCache`, `ForceInitThreadCache`, and `EnableThreadCacheIfSupported` are the only place where `ThreadCache::Create` is called.
In `MaybeInitThreadCache` and `ForceInitThreadCache`, (which are called when newly allocating thread cache), `ThreadCache::IsTombstone()` check is always performed, and `IsTombstone` only sees the `internal::kThreadCacheTombstoneIndex`, so it's sufficient.
Regarding `EnableThreadCacheIfSupported`, I guess it's not called during thread/process teardown (and that's why `IsTombstone` does not exist), so I think it's okay.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/124999c0b10000
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Ayumi OnoShould there be a check to see if we've registered to MANY thread caches?
I've added this check. This should be removed once multiple thread cache is in use.
```
PA_CHECK(opts.thread_cache_index == 0)
<< "Cannot use a non-default thread cache index.";
```
Sorry what I meant is something like
`PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex);`
Obviously == 0 is fine for now as well
Though I also see you check it in partition_root so perhaps fine.
ThreadCache::SwapForTesting(nullptr, root->settings.thread_cache_index);Can you add a comment about why this exists so it doesn't confuse future readers?
ThreadCache::SwapForTesting(nullptr, another_thread_cache_index);
root1->settings.with_thread_cache = false;Ayumi OnoIs this to properly clean up the test for destruction?
I think we should consider trying to move this into something automated.
For example root1 has `CreateCustomTestRoot` that returns a unique_ptr, we could either
1) make the PartitionRoot destructor handle it
2) Add a custom deletor to the unique_ptr that nulls out the thread cache when it goes out of scope.
Created `CreateCustomTestRootWithThreadCache`, which returns the unique_ptr with destructor that nulls out the thread cache.
Thanks looks a lot cleaner!
root0->Free(ptr0);
root1->Free(ptr1);Should we check that
1) the thread cache are different pointers
2) that they have the correct pointer in their free lists? I.E. root0's thread cache's freelist should have ptr0 and root1 should have ptr1?
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
Ayumi OnoBut do we want bounds checking every time? This is fast path.
By accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
In Chrome [] triggers bounds checking due to Chromium compiling with [libc++ hardening](https://libcxx.llvm.org/Hardening.html).
I suspect given this is a constexpr sized array, that most if not all of the bounds checks would be optimized away.
If we see bounds checks show up in our performance then of course we can opt-out (that is what PA_UNSAFE_TODO/PA_UNSAFE_BUFFERS macros are for), however in this case I suspect when accessing at `kThreadCacheQuarantineIndex` we'll have no bounds check, and for most of the roots especially if we assign it in a way that is clearly valid the checks will be optimized out, because the value never changes.
tcache->AccumulateStats(stats);Ayumi OnoShould this be updated to have some sort of index to track stats per thread cache?
Thank you, I added index to the `DumpStats` arguments, and made it accumulate stats of the thread caches of the same root.
Acknowledged
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
Ayumi OnoBut do we want bounds checking every time? This is fast path.
Stephen NuskoBy accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
In Chrome [] triggers bounds checking due to Chromium compiling with [libc++ hardening](https://libcxx.llvm.org/Hardening.html).
I suspect given this is a constexpr sized array, that most if not all of the bounds checks would be optimized away.
If we see bounds checks show up in our performance then of course we can opt-out (that is what PA_UNSAFE_TODO/PA_UNSAFE_BUFFERS macros are for), however in this case I suspect when accessing at `kThreadCacheQuarantineIndex` we'll have no bounds check, and for most of the roots especially if we assign it in a way that is clearly valid the checks will be optimized out, because the value never changes.
Oh and of course not mandating it, I defer to @mikt if there is a legit reason to suspect this is going to cause a large issue then of course keep it as a ThreadCache* but if benchmarks don't move we could just move it to std::array, we use std::array in plenty of locations already.
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/1737a4bb310000
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Ayumi OnoShould there be a check to see if we've registered to MANY thread caches?
Stephen NuskoI've added this check. This should be removed once multiple thread cache is in use.
```
PA_CHECK(opts.thread_cache_index == 0)
<< "Cannot use a non-default thread cache index.";
```
Sorry what I meant is something like
`PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex);`
Obviously == 0 is fine for now as well
Though I also see you check it in partition_root so perhaps fine.
I see, I'll allow multiple thread cache from now, and I've fixed the checks to the following:
If the default partition uses thread cache and thus index 0 will be initialized in `EnableThreadCacheIfSupported`, I prohibit `opts.thread_cache_index == 0 && opts.thread_cache == PartitionOptions::kEnabled`.
Anyway, `thread_cache_index` should be less than `kMaxThreadCacheIndex`.
```
#if PA_BUILDFLAG(USE_PARTITION_ALLOC_AS_MALLOC)
PA_CHECK(opts.thread_cache_index != 0 ||
opts.thread_cache == PartitionOptions::kDisabled)
<< "Cannot use a thread cache at default index when PartitionAlloc is "
"malloc().";
#endif
PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex)
<< "Thread cache index must be less than kMaxThreadCacheIndex";
```
ThreadCache::SwapForTesting(nullptr, root->settings.thread_cache_index);Can you add a comment about why this exists so it doesn't confuse future readers?
Done
root0->Free(ptr0);
root1->Free(ptr1);Should we check that
1) the thread cache are different pointers
2) that they have the correct pointer in their free lists? I.E. root0's thread cache's freelist should have ptr0 and root1 should have ptr1?
Thank you, they looks good to ensure the behavior of PartitionRoot and ThreadCache.
However, in `PartitionAllocTest`, `root0 = allocator.root()` does not have thread cache, and instead default partition (of the process running the test) has the thread cache, so I found it's dangerous to use `ThreadCache::Get(0)` for this test.
So I created two roots (with index 1 and 2), and use them to do the test you suggest.
// TODO(ayumiohno): Once `ThreadCache::largest_active_bucket_index_` becomes aCan you file a bug (crbug/####) and put all the details there?
You can use component chromium > Blink > MemoryAllocator > Partition
I filed a feature request bug to track the multiple thread cache project, and put the number within TODO, (because this todo will be handled soon)
https://g-issues.chromium.org/issues/467243745
static bool IsValid(ThreadCache* tcache) {I found that we don't need `tcache->root_` check if `THREAD_CACHE_FAST_TLS` and tcache pointer is taken from `g_thread_caches[index]`, so I fixed these parts.
(It's possible to change `ThreadCache* g_thread_caches[kMaxThreadCacheIndex]` to `ThreadCache*` that only has the same pointer as `PartitionTls`, but I think the current strategy is better in that we can eliminate such checks.)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Ayumi OnoShould there be a check to see if we've registered to MANY thread caches?
Stephen NuskoI've added this check. This should be removed once multiple thread cache is in use.
```
PA_CHECK(opts.thread_cache_index == 0)
<< "Cannot use a non-default thread cache index.";
```
Ayumi OnoSorry what I meant is something like
`PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex);`
Obviously == 0 is fine for now as well
Though I also see you check it in partition_root so perhaps fine.
I see, I'll allow multiple thread cache from now, and I've fixed the checks to the following:
If the default partition uses thread cache and thus index 0 will be initialized in `EnableThreadCacheIfSupported`, I prohibit `opts.thread_cache_index == 0 && opts.thread_cache == PartitionOptions::kEnabled`.Anyway, `thread_cache_index` should be less than `kMaxThreadCacheIndex`.
```
#if PA_BUILDFLAG(USE_PARTITION_ALLOC_AS_MALLOC)
PA_CHECK(opts.thread_cache_index != 0 ||
opts.thread_cache == PartitionOptions::kDisabled)
<< "Cannot use a thread cache at default index when PartitionAlloc is "
"malloc().";
#endif
PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex)
<< "Thread cache index must be less than kMaxThreadCacheIndex";
```
Acknowledged
internal::SlotStart::Unchecked(ptr1).Untag(), bucket_index, pos1));nit:
```
EXPECT_EQ(pos1, 0u);
EXPECT_TRUE(tcache2->IsInFreelist(
internal::SlotStart::Unchecked(ptr2).Untag(), bucket_index, pos2));
EXPECT_EQ(pos2, 0u);
```
// TODO(ayumiohno): Once `ThreadCache::largest_active_bucket_index_` becomes aAyumi OnoCan you file a bug (crbug/####) and put all the details there?
You can use component chromium > Blink > MemoryAllocator > Partition
I filed a feature request bug to track the multiple thread cache project, and put the number within TODO, (because this todo will be handled soon)
https://g-issues.chromium.org/issues/467243745
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
Ayumi OnoBut do we want bounds checking every time? This is fast path.
Stephen NuskoBy accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
Stephen NuskoIn Chrome [] triggers bounds checking due to Chromium compiling with [libc++ hardening](https://libcxx.llvm.org/Hardening.html).
I suspect given this is a constexpr sized array, that most if not all of the bounds checks would be optimized away.
If we see bounds checks show up in our performance then of course we can opt-out (that is what PA_UNSAFE_TODO/PA_UNSAFE_BUFFERS macros are for), however in this case I suspect when accessing at `kThreadCacheQuarantineIndex` we'll have no bounds check, and for most of the roots especially if we assign it in a way that is clearly valid the checks will be optimized out, because the value never changes.
Oh and of course not mandating it, I defer to @mikt if there is a legit reason to suspect this is going to cause a large issue then of course keep it as a ThreadCache* but if benchmarks don't move we could just move it to std::array, we use std::array in plenty of locations already.
I don't have strong preference. Maybe the ongoing movement to rewrite raw pointers to `std::array` will rewrite this anyway.
My suggestion is leave this as-is, and let them rewrite to delegate perf impact measurement.
tcaches = reinterpret_cast<ThreadCache*>(operator new(array_size));Ayumi OnoIs this calling the `ThreadCache::operator new`? Or the global one?
It's calling the class-specific one. Should I create Alloc/Free functions and use them instead?
No, it should be fine. Thanks ;)
operator delete(tcaches);ditto
Done
internal::g_thread_caches[internal::kThreadCacheTombstoneIndex]) =Ayumi OnoCan you explain why setting this on `internal::kThreadCacheTombstoneIndex` is sufficient to mitigate the issue commented above?
`PartitionRoot::MaybeInitThreadCache`, `ForceInitThreadCache`, and `EnableThreadCacheIfSupported` are the only place where `ThreadCache::Create` is called.
In `MaybeInitThreadCache` and `ForceInitThreadCache`, (which are called when newly allocating thread cache), `ThreadCache::IsTombstone()` check is always performed, and `IsTombstone` only sees the `internal::kThreadCacheTombstoneIndex`, so it's sufficient.
Regarding `EnableThreadCacheIfSupported`, I guess it's not called during thread/process teardown (and that's why `IsTombstone` does not exist), so I think it's okay.
(Sorry I meant to explain in the comments)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
root0->Free(ptr0);
root1->Free(ptr1);Should we check that
1) the thread cache are different pointers
2) that they have the correct pointer in their free lists? I.E. root0's thread cache's freelist should have ptr0 and root1 should have ptr1?
Thank you, they looks good to ensure the behavior of PartitionRoot and ThreadCache.
However, in `PartitionAllocTest`, `root0 = allocator.root()` does not have thread cache, and instead default partition (of the process running the test) has the thread cache, so I found it's dangerous to use `ThreadCache::Get(0)` for this test.
So I created two roots (with index 1 and 2), and use them to do the test you suggest.
Done
internal::SlotStart::Unchecked(ptr1).Untag(), bucket_index, pos1));nit:
```
EXPECT_EQ(pos1, 0u);
EXPECT_TRUE(tcache2->IsInFreelist(
internal::SlotStart::Unchecked(ptr2).Untag(), bucket_index, pos2));
EXPECT_EQ(pos2, 0u);
```
Done, thank you
I found that we don't need `tcache->root_` check if `THREAD_CACHE_FAST_TLS` and tcache pointer is taken from `g_thread_caches[index]`, so I fixed these parts.
(It's possible to change `ThreadCache* g_thread_caches[kMaxThreadCacheIndex]` to `ThreadCache*` that only has the same pointer as `PartitionTls`, but I think the current strategy is better in that we can eliminate such checks.)
Done
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
Ayumi OnoBut do we want bounds checking every time? This is fast path.
Stephen NuskoBy accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
Stephen NuskoIn Chrome [] triggers bounds checking due to Chromium compiling with [libc++ hardening](https://libcxx.llvm.org/Hardening.html).
I suspect given this is a constexpr sized array, that most if not all of the bounds checks would be optimized away.
If we see bounds checks show up in our performance then of course we can opt-out (that is what PA_UNSAFE_TODO/PA_UNSAFE_BUFFERS macros are for), however in this case I suspect when accessing at `kThreadCacheQuarantineIndex` we'll have no bounds check, and for most of the roots especially if we assign it in a way that is clearly valid the checks will be optimized out, because the value never changes.
Mikihito MatsuuraOh and of course not mandating it, I defer to @mikt if there is a legit reason to suspect this is going to cause a large issue then of course keep it as a ThreadCache* but if benchmarks don't move we could just move it to std::array, we use std::array in plenty of locations already.
I don't have strong preference. Maybe the ongoing movement to rewrite raw pointers to `std::array` will rewrite this anyway.
My suggestion is leave this as-is, and let them rewrite to delegate perf impact measurement.
Looks good, I'll leave PA_UNSAFE_TODO for now.
internal::g_thread_caches[internal::kThreadCacheTombstoneIndex]) =Ayumi OnoCan you explain why setting this on `internal::kThreadCacheTombstoneIndex` is sufficient to mitigate the issue commented above?
Mikihito Matsuura`PartitionRoot::MaybeInitThreadCache`, `ForceInitThreadCache`, and `EnableThreadCacheIfSupported` are the only place where `ThreadCache::Create` is called.
In `MaybeInitThreadCache` and `ForceInitThreadCache`, (which are called when newly allocating thread cache), `ThreadCache::IsTombstone()` check is always performed, and `IsTombstone` only sees the `internal::kThreadCacheTombstoneIndex`, so it's sufficient.
Regarding `EnableThreadCacheIfSupported`, I guess it's not called during thread/process teardown (and that's why `IsTombstone` does not exist), so I think it's okay.
(Sorry I meant to explain in the comments)
Oh, I've added brief comments in code.
options.thread_cache_index = 1;This is for test purposes.
I'll revert this change after reviews.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
static bool IsValidPtr(ThreadCache* tcache) {
// Do not MTE-untag, as it'd mess up the sentinel value.
return reinterpret_cast<uintptr_t>(tcache) & kTombstoneMask;
}
static bool IsValid(ThreadCache* tcache) {
#if PA_CONFIG(THREAD_CACHE_FAST_TLS)
// `g_thread_caches[index]` has valid pointers only if the ThreadCache
// object is initialized.
return IsValidPtr(tcache);
#else
// Even if the array of ThreadCache is allocated, the ThreadCache object
// may not be initialized, and thus check `root_` to know if initialized.
return tcache && tcache->root_;
#endif
}nit: comment difference and use cases between `IsValidPtr` and `IsValid`.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/170d9bbcb10000
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/10a2e6cab10000
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/10b494eab10000
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/1696b2d6b10000
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/108077c4b10000