PTAL.
Implemented Multiple ThreadCache support; remaining work includes additional testing and addressing 3 TODO comments about optimization, which will be addressed in subsequent CLs.
// ThreadCaches at a differenct, preferably dedicated index and swap only whenI'll fix the typo later
options.thread_cache_index = 1;This is for test purposes.
I'll revert this change after reviews.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Should there be a check to see if we've registered to MANY thread caches?
ThreadCache::SwapForTesting(nullptr, another_thread_cache_index);
root1->settings.with_thread_cache = false;Is this to properly clean up the test for destruction?
I think we should consider trying to move this into something automated.
For example root1 has `CreateCustomTestRoot` that returns a unique_ptr, we could either
1) make the PartitionRoot destructor handle it
2) Add a custom deletor to the unique_ptr that nulls out the thread cache when it goes out of scope.
// TODO(ayumiohno): Once `ThreadCache::largest_active_bucket_index_` becomes aCan you file a bug (crbug/####) and put all the details there?
You can use component chromium > Blink > MemoryAllocator > Partition
static ThreadCache* Get(size_t index) {Should we provide a templated version of this?
`ThreadCache::Get<0>()` or is the schedulerloop the only user of explicit index anymore and long term doesn't make sense?
I guess looking through that only SchedulerLoopQuarantine and the extended_api for tests knows that it was <0>. Therefore probably not worth adding.
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];can this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
tcache->AccumulateStats(stats);Should this be updated to have some sort of index to track stats per thread cache?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/speedometer3 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/13006667310000
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/speedometer3 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/17582db7310000
const size_t another_thread_cache_index = 1;nit: `constexpr`?
size_t thread_cache_index = 0;Put this near `thread_cache`?
ThreadCache* tcache = ThreadCache::Get(0);Can you explicitly create a constant for this, or create a function like `GetForQurantine`?
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];can this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
But do we want bounds checking every time? This is fast path.
tcaches = reinterpret_cast<ThreadCache*>(operator new(array_size));Is this calling the `ThreadCache::operator new`? Or the global one?
internal::g_thread_caches[internal::kThreadCacheTombstoneIndex]) =Can you explain why setting this on `internal::kThreadCacheTombstoneIndex` is sufficient to mitigate the issue commented above?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// ThreadCaches at a differenct, preferably dedicated index and swap only whenI'll fix the typo later
Done
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Should there be a check to see if we've registered to MANY thread caches?
I've added this check. This should be removed once multiple thread cache is in use.
```
PA_CHECK(opts.thread_cache_index == 0)
<< "Cannot use a non-default thread cache index.";
```
const size_t another_thread_cache_index = 1;Ayumi Ononit: `constexpr`?
Done
ThreadCache::SwapForTesting(nullptr, another_thread_cache_index);
root1->settings.with_thread_cache = false;Is this to properly clean up the test for destruction?
I think we should consider trying to move this into something automated.
For example root1 has `CreateCustomTestRoot` that returns a unique_ptr, we could either
1) make the PartitionRoot destructor handle it
2) Add a custom deletor to the unique_ptr that nulls out the thread cache when it goes out of scope.
Created `CreateCustomTestRootWithThreadCache`, which returns the unique_ptr with destructor that nulls out the thread cache.
size_t thread_cache_index = 0;Ayumi Onoditto
Done
size_t thread_cache_index = 0;Put this near `thread_cache`?
Done
ThreadCache* tcache = ThreadCache::Get(0);Can you explicitly create a constant for this, or create a function like `GetForQurantine`?
Created `internal::kThreadCacheQuarantineIndex` in thread_cache.h
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
But do we want bounds checking every time? This is fast path.
By accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
tcache->AccumulateStats(stats);Should this be updated to have some sort of index to track stats per thread cache?
Thank you, I added index to the `DumpStats` arguments, and made it accumulate stats of the thread caches of the same root.
tcaches = reinterpret_cast<ThreadCache*>(operator new(array_size));Is this calling the `ThreadCache::operator new`? Or the global one?
It's calling the class-specific one. Should I create Alloc/Free functions and use them instead?
internal::g_thread_caches[internal::kThreadCacheTombstoneIndex]) =Can you explain why setting this on `internal::kThreadCacheTombstoneIndex` is sufficient to mitigate the issue commented above?
`PartitionRoot::MaybeInitThreadCache`, `ForceInitThreadCache`, and `EnableThreadCacheIfSupported` are the only place where `ThreadCache::Create` is called.
In `MaybeInitThreadCache` and `ForceInitThreadCache`, (which are called when newly allocating thread cache), `ThreadCache::IsTombstone()` check is always performed, and `IsTombstone` only sees the `internal::kThreadCacheTombstoneIndex`, so it's sufficient.
Regarding `EnableThreadCacheIfSupported`, I guess it's not called during thread/process teardown (and that's why `IsTombstone` does not exist), so I think it's okay.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/124999c0b10000
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Ayumi OnoShould there be a check to see if we've registered to MANY thread caches?
I've added this check. This should be removed once multiple thread cache is in use.
```
PA_CHECK(opts.thread_cache_index == 0)
<< "Cannot use a non-default thread cache index.";
```
Sorry what I meant is something like
`PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex);`
Obviously == 0 is fine for now as well
Though I also see you check it in partition_root so perhaps fine.
ThreadCache::SwapForTesting(nullptr, root->settings.thread_cache_index);Can you add a comment about why this exists so it doesn't confuse future readers?
ThreadCache::SwapForTesting(nullptr, another_thread_cache_index);
root1->settings.with_thread_cache = false;Ayumi OnoIs this to properly clean up the test for destruction?
I think we should consider trying to move this into something automated.
For example root1 has `CreateCustomTestRoot` that returns a unique_ptr, we could either
1) make the PartitionRoot destructor handle it
2) Add a custom deletor to the unique_ptr that nulls out the thread cache when it goes out of scope.
Created `CreateCustomTestRootWithThreadCache`, which returns the unique_ptr with destructor that nulls out the thread cache.
Thanks looks a lot cleaner!
root0->Free(ptr0);
root1->Free(ptr1);Should we check that
1) the thread cache are different pointers
2) that they have the correct pointer in their free lists? I.E. root0's thread cache's freelist should have ptr0 and root1 should have ptr1?
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
Ayumi OnoBut do we want bounds checking every time? This is fast path.
By accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
In Chrome [] triggers bounds checking due to Chromium compiling with [libc++ hardening](https://libcxx.llvm.org/Hardening.html).
I suspect given this is a constexpr sized array, that most if not all of the bounds checks would be optimized away.
If we see bounds checks show up in our performance then of course we can opt-out (that is what PA_UNSAFE_TODO/PA_UNSAFE_BUFFERS macros are for), however in this case I suspect when accessing at `kThreadCacheQuarantineIndex` we'll have no bounds check, and for most of the roots especially if we assign it in a way that is clearly valid the checks will be optimized out, because the value never changes.
tcache->AccumulateStats(stats);Ayumi OnoShould this be updated to have some sort of index to track stats per thread cache?
Thank you, I added index to the `DumpStats` arguments, and made it accumulate stats of the thread caches of the same root.
Acknowledged
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
Ayumi OnoBut do we want bounds checking every time? This is fast path.
Stephen NuskoBy accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
In Chrome [] triggers bounds checking due to Chromium compiling with [libc++ hardening](https://libcxx.llvm.org/Hardening.html).
I suspect given this is a constexpr sized array, that most if not all of the bounds checks would be optimized away.
If we see bounds checks show up in our performance then of course we can opt-out (that is what PA_UNSAFE_TODO/PA_UNSAFE_BUFFERS macros are for), however in this case I suspect when accessing at `kThreadCacheQuarantineIndex` we'll have no bounds check, and for most of the roots especially if we assign it in a way that is clearly valid the checks will be optimized out, because the value never changes.
Oh and of course not mandating it, I defer to @mikt if there is a legit reason to suspect this is going to cause a large issue then of course keep it as a ThreadCache* but if benchmarks don't move we could just move it to std::array, we use std::array in plenty of locations already.
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/1737a4bb310000
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Ayumi OnoShould there be a check to see if we've registered to MANY thread caches?
Stephen NuskoI've added this check. This should be removed once multiple thread cache is in use.
```
PA_CHECK(opts.thread_cache_index == 0)
<< "Cannot use a non-default thread cache index.";
```
Sorry what I meant is something like
`PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex);`
Obviously == 0 is fine for now as well
Though I also see you check it in partition_root so perhaps fine.
I see, I'll allow multiple thread cache from now, and I've fixed the checks to the following:
If the default partition uses thread cache and thus index 0 will be initialized in `EnableThreadCacheIfSupported`, I prohibit `opts.thread_cache_index == 0 && opts.thread_cache == PartitionOptions::kEnabled`.
Anyway, `thread_cache_index` should be less than `kMaxThreadCacheIndex`.
```
#if PA_BUILDFLAG(USE_PARTITION_ALLOC_AS_MALLOC)
PA_CHECK(opts.thread_cache_index != 0 ||
opts.thread_cache == PartitionOptions::kDisabled)
<< "Cannot use a thread cache at default index when PartitionAlloc is "
"malloc().";
#endif
PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex)
<< "Thread cache index must be less than kMaxThreadCacheIndex";
```
ThreadCache::SwapForTesting(nullptr, root->settings.thread_cache_index);Can you add a comment about why this exists so it doesn't confuse future readers?
Done
root0->Free(ptr0);
root1->Free(ptr1);Should we check that
1) the thread cache are different pointers
2) that they have the correct pointer in their free lists? I.E. root0's thread cache's freelist should have ptr0 and root1 should have ptr1?
Thank you, they looks good to ensure the behavior of PartitionRoot and ThreadCache.
However, in `PartitionAllocTest`, `root0 = allocator.root()` does not have thread cache, and instead default partition (of the process running the test) has the thread cache, so I found it's dangerous to use `ThreadCache::Get(0)` for this test.
So I created two roots (with index 1 and 2), and use them to do the test you suggest.
// TODO(ayumiohno): Once `ThreadCache::largest_active_bucket_index_` becomes aCan you file a bug (crbug/####) and put all the details there?
You can use component chromium > Blink > MemoryAllocator > Partition
I filed a feature request bug to track the multiple thread cache project, and put the number within TODO, (because this todo will be handled soon)
https://g-issues.chromium.org/issues/467243745
static bool IsValid(ThreadCache* tcache) {I found that we don't need `tcache->root_` check if `THREAD_CACHE_FAST_TLS` and tcache pointer is taken from `g_thread_caches[index]`, so I fixed these parts.
(It's possible to change `ThreadCache* g_thread_caches[kMaxThreadCacheIndex]` to `ThreadCache*` that only has the same pointer as `PartitionTls`, but I think the current strategy is better in that we can eliminate such checks.)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
PA_CHECK(opts.thread_cache == PartitionOptions::kDisabled)Ayumi OnoShould there be a check to see if we've registered to MANY thread caches?
Stephen NuskoI've added this check. This should be removed once multiple thread cache is in use.
```
PA_CHECK(opts.thread_cache_index == 0)
<< "Cannot use a non-default thread cache index.";
```
Ayumi OnoSorry what I meant is something like
`PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex);`
Obviously == 0 is fine for now as well
Though I also see you check it in partition_root so perhaps fine.
I see, I'll allow multiple thread cache from now, and I've fixed the checks to the following:
If the default partition uses thread cache and thus index 0 will be initialized in `EnableThreadCacheIfSupported`, I prohibit `opts.thread_cache_index == 0 && opts.thread_cache == PartitionOptions::kEnabled`.Anyway, `thread_cache_index` should be less than `kMaxThreadCacheIndex`.
```
#if PA_BUILDFLAG(USE_PARTITION_ALLOC_AS_MALLOC)
PA_CHECK(opts.thread_cache_index != 0 ||
opts.thread_cache == PartitionOptions::kDisabled)
<< "Cannot use a thread cache at default index when PartitionAlloc is "
"malloc().";
#endif
PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex)
<< "Thread cache index must be less than kMaxThreadCacheIndex";
```
Acknowledged
internal::SlotStart::Unchecked(ptr1).Untag(), bucket_index, pos1));nit:
```
EXPECT_EQ(pos1, 0u);
EXPECT_TRUE(tcache2->IsInFreelist(
internal::SlotStart::Unchecked(ptr2).Untag(), bucket_index, pos2));
EXPECT_EQ(pos2, 0u);
```
// TODO(ayumiohno): Once `ThreadCache::largest_active_bucket_index_` becomes aAyumi OnoCan you file a bug (crbug/####) and put all the details there?
You can use component chromium > Blink > MemoryAllocator > Partition
I filed a feature request bug to track the multiple thread cache project, and put the number within TODO, (because this todo will be handled soon)
https://g-issues.chromium.org/issues/467243745
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
Ayumi OnoBut do we want bounds checking every time? This is fast path.
Stephen NuskoBy accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
Stephen NuskoIn Chrome [] triggers bounds checking due to Chromium compiling with [libc++ hardening](https://libcxx.llvm.org/Hardening.html).
I suspect given this is a constexpr sized array, that most if not all of the bounds checks would be optimized away.
If we see bounds checks show up in our performance then of course we can opt-out (that is what PA_UNSAFE_TODO/PA_UNSAFE_BUFFERS macros are for), however in this case I suspect when accessing at `kThreadCacheQuarantineIndex` we'll have no bounds check, and for most of the roots especially if we assign it in a way that is clearly valid the checks will be optimized out, because the value never changes.
Oh and of course not mandating it, I defer to @mikt if there is a legit reason to suspect this is going to cause a large issue then of course keep it as a ThreadCache* but if benchmarks don't move we could just move it to std::array, we use std::array in plenty of locations already.
I don't have strong preference. Maybe the ongoing movement to rewrite raw pointers to `std::array` will rewrite this anyway.
My suggestion is leave this as-is, and let them rewrite to delegate perf impact measurement.
tcaches = reinterpret_cast<ThreadCache*>(operator new(array_size));Ayumi OnoIs this calling the `ThreadCache::operator new`? Or the global one?
It's calling the class-specific one. Should I create Alloc/Free functions and use them instead?
No, it should be fine. Thanks ;)
operator delete(tcaches);ditto
Done
internal::g_thread_caches[internal::kThreadCacheTombstoneIndex]) =Ayumi OnoCan you explain why setting this on `internal::kThreadCacheTombstoneIndex` is sufficient to mitigate the issue commented above?
`PartitionRoot::MaybeInitThreadCache`, `ForceInitThreadCache`, and `EnableThreadCacheIfSupported` are the only place where `ThreadCache::Create` is called.
In `MaybeInitThreadCache` and `ForceInitThreadCache`, (which are called when newly allocating thread cache), `ThreadCache::IsTombstone()` check is always performed, and `IsTombstone` only sees the `internal::kThreadCacheTombstoneIndex`, so it's sufficient.
Regarding `EnableThreadCacheIfSupported`, I guess it's not called during thread/process teardown (and that's why `IsTombstone` does not exist), so I think it's okay.
(Sorry I meant to explain in the comments)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
root0->Free(ptr0);
root1->Free(ptr1);Should we check that
1) the thread cache are different pointers
2) that they have the correct pointer in their free lists? I.E. root0's thread cache's freelist should have ptr0 and root1 should have ptr1?
Thank you, they looks good to ensure the behavior of PartitionRoot and ThreadCache.
However, in `PartitionAllocTest`, `root0 = allocator.root()` does not have thread cache, and instead default partition (of the process running the test) has the thread cache, so I found it's dangerous to use `ThreadCache::Get(0)` for this test.
So I created two roots (with index 1 and 2), and use them to do the test you suggest.
Done
internal::SlotStart::Unchecked(ptr1).Untag(), bucket_index, pos1));nit:
```
EXPECT_EQ(pos1, 0u);
EXPECT_TRUE(tcache2->IsInFreelist(
internal::SlotStart::Unchecked(ptr2).Untag(), bucket_index, pos2));
EXPECT_EQ(pos2, 0u);
```
Done, thank you
I found that we don't need `tcache->root_` check if `THREAD_CACHE_FAST_TLS` and tcache pointer is taken from `g_thread_caches[index]`, so I fixed these parts.
(It's possible to change `ThreadCache* g_thread_caches[kMaxThreadCacheIndex]` to `ThreadCache*` that only has the same pointer as `PartitionTls`, but I think the current strategy is better in that we can eliminate such checks.)
Done
PARTITION_ALLOC) thread_local ThreadCache* g_thread_caches
[kMaxThreadCacheIndex];Mikihito Matsuuracan this be an `thread_local std::array<ThreadCache>`?
Then you don't need as many PA_UNSAFE_TODO
Ayumi OnoBut do we want bounds checking every time? This is fast path.
Stephen NuskoBy accessing the array with [] (not .at()), can we remove PA_UNSAFE_TODO while avoiding bounds checks?
Stephen NuskoIn Chrome [] triggers bounds checking due to Chromium compiling with [libc++ hardening](https://libcxx.llvm.org/Hardening.html).
I suspect given this is a constexpr sized array, that most if not all of the bounds checks would be optimized away.
If we see bounds checks show up in our performance then of course we can opt-out (that is what PA_UNSAFE_TODO/PA_UNSAFE_BUFFERS macros are for), however in this case I suspect when accessing at `kThreadCacheQuarantineIndex` we'll have no bounds check, and for most of the roots especially if we assign it in a way that is clearly valid the checks will be optimized out, because the value never changes.
Mikihito MatsuuraOh and of course not mandating it, I defer to @mikt if there is a legit reason to suspect this is going to cause a large issue then of course keep it as a ThreadCache* but if benchmarks don't move we could just move it to std::array, we use std::array in plenty of locations already.
I don't have strong preference. Maybe the ongoing movement to rewrite raw pointers to `std::array` will rewrite this anyway.
My suggestion is leave this as-is, and let them rewrite to delegate perf impact measurement.
Looks good, I'll leave PA_UNSAFE_TODO for now.
internal::g_thread_caches[internal::kThreadCacheTombstoneIndex]) =Ayumi OnoCan you explain why setting this on `internal::kThreadCacheTombstoneIndex` is sufficient to mitigate the issue commented above?
Mikihito Matsuura`PartitionRoot::MaybeInitThreadCache`, `ForceInitThreadCache`, and `EnableThreadCacheIfSupported` are the only place where `ThreadCache::Create` is called.
In `MaybeInitThreadCache` and `ForceInitThreadCache`, (which are called when newly allocating thread cache), `ThreadCache::IsTombstone()` check is always performed, and `IsTombstone` only sees the `internal::kThreadCacheTombstoneIndex`, so it's sufficient.
Regarding `EnableThreadCacheIfSupported`, I guess it's not called during thread/process teardown (and that's why `IsTombstone` does not exist), so I think it's okay.
(Sorry I meant to explain in the comments)
Oh, I've added brief comments in code.
options.thread_cache_index = 1;This is for test purposes.
I'll revert this change after reviews.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
static bool IsValidPtr(ThreadCache* tcache) {
// Do not MTE-untag, as it'd mess up the sentinel value.
return reinterpret_cast<uintptr_t>(tcache) & kTombstoneMask;
}
static bool IsValid(ThreadCache* tcache) {
#if PA_CONFIG(THREAD_CACHE_FAST_TLS)
// `g_thread_caches[index]` has valid pointers only if the ThreadCache
// object is initialized.
return IsValidPtr(tcache);
#else
// Even if the array of ThreadCache is allocated, the ThreadCache object
// may not be initialized, and thus check `root_` to know if initialized.
return tcache && tcache->root_;
#endif
}nit: comment difference and use cases between `IsValidPtr` and `IsValid`.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/170d9bbcb10000
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/10a2e6cab10000
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/10b494eab10000
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/1696b2d6b10000
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/108077c4b10000
static bool IsValidPtr(ThreadCache* tcache) {
// Do not MTE-untag, as it'd mess up the sentinel value.
return reinterpret_cast<uintptr_t>(tcache) & kTombstoneMask;
}
static bool IsValid(ThreadCache* tcache) {
#if PA_CONFIG(THREAD_CACHE_FAST_TLS)
// `g_thread_caches[index]` has valid pointers only if the ThreadCache
// object is initialized.
return IsValidPtr(tcache);
#else
// Even if the array of ThreadCache is allocated, the ThreadCache object
// may not be initialized, and thus check `root_` to know if initialized.
return tcache && tcache->root_;
#endif
}nit: comment difference and use cases between `IsValidPtr` and `IsValid`.
Done
static bool IsValidPtr(ThreadCache* tcache) {
// Do not MTE-untag, as it'd mess up the sentinel value.
return reinterpret_cast<uintptr_t>(tcache) & kTombstoneMask;
}
static bool IsValid(ThreadCache* tcache) {
#if PA_CONFIG(THREAD_CACHE_FAST_TLS)
// `g_thread_caches[index]` has valid pointers only if the ThreadCache
// object is initialized.
return IsValidPtr(tcache);
#else
// Even if the array of ThreadCache is allocated, the ThreadCache object
// may not be initialized, and thus check `root_` to know if initialized.
return tcache && tcache->root_;
#endif
}nit: comment difference and use cases between `IsValidPtr` and `IsValid`.
Done
options.thread_cache_index = 1;Ayumi OnoThis is for test purposes.
I'll revert this change after reviews.
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Overall looks good, only very minor comments about comments.
I like the pragmatic approach. You may want to add an explanation of the per-thread overhead of this (a few kiB, I suspect, assuming the thread caches are empty, and then more as they "trap" memory, but that should still be fine).
Also, can you estimate locally how much that translates into? For the malloc() partition, since all thread are going to use malloc(), then there is not much we can do. But for other partitions, they are going to be mostly unused on most threads I suspect, so maybe there is a future optimization where we can do something like a very simple ThreadCache instance allocator?
Anyway, that's a problem for later.
size_t thread_cache_index = 0;nit: Can you change this into a signed value (int would be enough) and have a kInvalid set to -1?
This feels less error-prone, since 0 is a valid value here.
EnableToggle thread_cache = kDisabled;nit: Might be a good time to put a comment here explaining that partitions with a thread cache should cannot be destroyed. This is checked at runtime, but better be explicit here as well.
// Global registry of all ThreadCache instances.May want to update this comment. This is no longer very precise, since this is a registry of all the arrays, not all the ThreadCache instances.
I think your design is fine, but let's make sure the comments are explicit to avoid confusion.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Overall looks good, only very minor comments about comments.
I like the pragmatic approach. You may want to add an explanation of the per-thread overhead of this (a few kiB, I suspect, assuming the thread caches are empty, and then more as they "trap" memory, but that should still be fine).
Also, can you estimate locally how much that translates into? For the malloc() partition, since all thread are going to use malloc(), then there is not much we can do. But for other partitions, they are going to be mostly unused on most threads I suspect, so maybe there is a future optimization where we can do something like a very simple ThreadCache instance allocator?
Anyway, that's a problem for later.
Thank you for the review.
The "per-thread overhead" is a bit discussed in design doc in Performance & Memory
section, but need numeric investigations.
For thread cache utilization, custom partition is only used in renderer process, and 22 threads use both default and buffer partition's cache, and 25 threads only use default partition's cache. If we enable thread cache for more custom partitions, we might need thread cache instance allocator as you mention.
size_t thread_cache_index = 0;nit: Can you change this into a signed value (int would be enough) and have a kInvalid set to -1?
This feels less error-prone, since 0 is a valid value here.
I created `kInvalidThreadCacheIndex` with size_t.
```
constexpr inline size_t kInvalidThreadCacheIndex = static_cast<size_t>(-1);
```
We have `PA_CHECK(opts.thread_cache_index < internal::kMaxThreadCacheIndex);` checks and we can detect `kInvalidThreadCacheIndex` with it.
Using signed value complicates the casts between index and array sizes, so is using big number in size_t okay?
nit: Might be a good time to put a comment here explaining that partitions with a thread cache should cannot be destroyed. This is checked at runtime, but better be explicit here as well.
Done
// Global registry of all ThreadCache instances.May want to update this comment. This is no longer very precise, since this is a registry of all the arrays, not all the ThreadCache instances.
I think your design is fine, but let's make sure the comments are explicit to avoid confusion.
I didn't change the ThreadCacheRegistry implementation except that it iterates for all the indexes at purge/delete method. ThreadCacheRegistry now manages the ThreadChace instances of all threads "and all indexes" as linked-list.
So I think "Global registry of all ThreadCache instances" is still precise. What do you think?
options.thread_cache_index = 1;Ayumi OnoThis is for test purposes.
I'll revert this change after reviews.
Ayumi OnoDone
Marked as unresolved.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |