📍 Job mac-m1_mini_2020-perf/speedometer3 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/15ee5c5a510000
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job mac-m1_mini_2020-perf/jetstream2 complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/11975d08510000
I ran some pinpoints for you, cf links in the change log of the CL.
FYI, the quicksort-wasm regression in the JetStream2 pinpoint is very likely just noise; I wouldn't worry too much about it.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I ran some pinpoints for you, cf links in the change log of the CL.
FYI, the quicksort-wasm regression in the JetStream2 pinpoint is very likely just noise; I wouldn't worry too much about it.
Hi Darius, thank you very much for that. When I open either https://pinpoint-dot-chromeperf.appspot.com/job/11975d08510000 and https://pinpoint-dot-chromeperf.appspot.com/job/15ee5c5a510000 I keep getting 403 and can't see the reports. I'm wondering if I'm missing any permission to see them, given this status code.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Caio LimaI ran some pinpoints for you, cf links in the change log of the CL.
FYI, the quicksort-wasm regression in the JetStream2 pinpoint is very likely just noise; I wouldn't worry too much about it.
Hi Darius, thank you very much for that. When I open either https://pinpoint-dot-chromeperf.appspot.com/job/11975d08510000 and https://pinpoint-dot-chromeperf.appspot.com/job/15ee5c5a510000 I keep getting 403 and can't see the reports. I'm wondering if I'm missing any permission to see them, given this status code.
Mmh, that's annoying.
I'll give you the TL;DR: seems perfectly neutral on both Speedometer3 and Jestream2. Maybe a slight regression on bomb-workers.First, but I wouldn't be surprised if this is just noise.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Hey Dominik, since a lot of code that I was reading to understand how to implement this counter you were as the last author, I'm picking you as possible reviewer here. Sorry if you are not the best reviewer here.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if (origin != AllocationOrigin::kGC) {
Should this be folded into the previous conditional? If we're not on the main thread, surely this is not an allocation that could go towards the counters. Also, what kinds of allocations go to SHARED_LO_SPACE?
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if (origin != AllocationOrigin::kGC) {
Should this be folded into the previous conditional? If we're not on the main thread, surely this is not an allocation that could go towards the counters. Also, what kinds of allocations go to SHARED_LO_SPACE?
what kinds of allocations go to SHARED_LO_SPACE?
Objects that are both shared (per https://github.com/tc39/proposal-structs or https://github.com/WebAssembly/shared-everything-threads) and too large for non-LO spaces.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if (origin != AllocationOrigin::kGC) {
Jakob KummerowShould this be folded into the previous conditional? If we're not on the main thread, surely this is not an allocation that could go towards the counters. Also, what kinds of allocations go to SHARED_LO_SPACE?
what kinds of allocations go to SHARED_LO_SPACE?
Objects that are both shared (per https://github.com/tc39/proposal-structs or https://github.com/WebAssembly/shared-everything-threads) and too large for non-LO spaces.
If we fold this into the condition above, we will not count the allocations of those shared objects that Jakob mentioned. I can see that a JSSharedArray always allocate with `AllocationType::kSharedOld` to allocate its storage in `Factory::NewJSSharedArray`.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
}
If allocation was successful, we could just run `total_allocated_bytes_ += size_in_bytes` here.
if (origin != AllocationOrigin::kGC) {
We shouldn't really have large object allocations in the GC? That's probably the reason we don't have the AllocationOrigin here.
heap()->isolate()->AddTotalAllocatedBytes(object_size);
Here we probably should report back to HeapAllocator instead with `local_heap->allocator()->AddTotalAllocatedBytes(..)`. That would make this operation thread-local and the HeapAllocator would know about all allocations.
Although thinking about this more: What about adding this in HeapAllocator::AllocateRawLargeInternal? That's the bottleneck for all large allocations, so you would only need to do it once and not once per space like currently.
if (!allocator_->in_gc()) {
Can we do that in MainAllocator directly? E.g. in ResetLab(). That way we wouldn't need to do this 3x but only once. The "AllocatorPolicy" stuff should really be only for logic in the MainAllocator that's space-dependent.
uint64_t Space::GetTotalAllocatedBytesInLAA() const {
I think this method should go to HeapAllocator instead. The idea here is that the Space doesn't know about LABs anymore.
Ideally we also wouldn't report a value per Space because afaics we only care about the total number in the end, right? With `HeapAllocator::GetTotalAllocatedBytes()`, supporting background threads would be quite simple as well: `for (LocalHeap* local_heap: xx) total += local_heap->allocator()->GetTotalAllocatedBytes()` (although we would need a safepoint).
total_bytes += space_allocator->top() - space_allocator->start();
I believe as part of GetHeapStatistics/GetHeapSpaceStatistics we already free the main-thread LABs, see [0]. So in theory at least this should be always 0 here and we shouldn't need to handle this here.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Sorry, apparently I reviewed your changes but didn't submit my comments until just now. Please look over them but some of them might be either outdated or not show up at the right code location anymore.
Caio LimaI ran some pinpoints for you, cf links in the change log of the CL.
FYI, the quicksort-wasm regression in the JetStream2 pinpoint is very likely just noise; I wouldn't worry too much about it.
Darius MercadierHi Darius, thank you very much for that. When I open either https://pinpoint-dot-chromeperf.appspot.com/job/11975d08510000 and https://pinpoint-dot-chromeperf.appspot.com/job/15ee5c5a510000 I keep getting 403 and can't see the reports. I'm wondering if I'm missing any permission to see them, given this status code.
Mmh, that's annoying.
I'll give you the TL;DR: seems perfectly neutral on both Speedometer3 and Jestream2. Maybe a slight regression on bomb-workers.First, but I wouldn't be surprised if this is just noise.
If allocation was successful, we could just run `total_allocated_bytes_ += size_in_bytes` here.
I've changed the patch to do count allocation here.
if (origin != AllocationOrigin::kGC) {
We shouldn't really have large object allocations in the GC? That's probably the reason we don't have the AllocationOrigin here.
Sorry I missed this. I reverted those changes and now I'm counting on `HeapAllocator::AllocateRawLargeInternal`. Thanks a lot for pointing it out.
heap()->isolate()->AddTotalAllocatedBytes(object_size);
Here we probably should report back to HeapAllocator instead with `local_heap->allocator()->AddTotalAllocatedBytes(..)`. That would make this operation thread-local and the HeapAllocator would know about all allocations.
Although thinking about this more: What about adding this in HeapAllocator::AllocateRawLargeInternal? That's the bottleneck for all large allocations, so you would only need to do it once and not once per space like currently.
Acknowledged
if (!allocator_->in_gc()) {
Can we do that in MainAllocator directly? E.g. in ResetLab(). That way we wouldn't need to do this 3x but only once. The "AllocatorPolicy" stuff should really be only for logic in the MainAllocator that's space-dependent.
I moved this to `MainAllocator::AdvanceAllocationObservers`. I placed there because it calls `MainAllocator::MarkLabStartInitialized` and it resets AllocationInfo's start to top. So when I do `allocator_->top() - allocator_->start()` inside `ResetLab` it quite often results in 0.
uint64_t Space::GetTotalAllocatedBytesInLAA() const {
I think this method should go to HeapAllocator instead. The idea here is that the Space doesn't know about LABs anymore.
Ideally we also wouldn't report a value per Space because afaics we only care about the total number in the end, right? With `HeapAllocator::GetTotalAllocatedBytes()`, supporting background threads would be quite simple as well: `for (LocalHeap* local_heap: xx) total += local_heap->allocator()->GetTotalAllocatedBytes()` (although we would need a safepoint).
I moved most of the logic to `HeapAllocator` instead, together with the counter. Looks like it makes more sense things be there instead of in the Isolate.
Additionally I've changed how `GetTotalAllocatedBytes` gets calculated and I'm now iterating over LocalHeaps with safepoint logic inside a `Heap::GettotalAllocatedBytes()`. Let me know if this is what you had in mind.
total_bytes += space_allocator->top() - space_allocator->start();
I believe as part of GetHeapStatistics/GetHeapSpaceStatistics we already free the main-thread LABs, see [0]. So in theory at least this should be always 0 here and we shouldn't need to handle this here.
That's true. In my current implementation I also want to take into account allocations that happens on non-GC workers (e.g compiler threads) and I'm wondering about their LABs. They seem to get their own LocalHeap and IIUC their LABs aren't freed when GetHeapStatistics gets called. Does it still make sense to keep such logic of inspecting LABs there?
Additionally, after looking deeper into the code, I think LocalHeaps aren't created for GC workers, so when I iterate over `Heap::safepoint()->IterateLocalHeaps`, I won't get any allocation for GC. Is this correct?
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
void AddTotalAllocatedBytes(size_t size) { total_allocated_bytes_ += size; }
I suppose we don't need to expose this anymore.
Caio LimaIf allocation was successful, we could just run `total_allocated_bytes_ += size_in_bytes` here.
I've changed the patch to do count allocation here.
Acknowledged
if (new_space_allocator_) {
The LABs should be all empty here, you can add DHECKs like `DCHECK(!old_space_allocator->IsLabValid())` or `DHECK_IMPLIES(new_space_allocator_, !new_space_allocator->IsLabValid()` that this indeed holds. Then this method is just `return total_allocated_bytes_` with some additional DCHECKs.
SafepointScope safe_scope(isolate_, SafepointKind::kIsolate);
We currently don't have a safepoint in GetHeapStatistics(). I need to check whether this is okay. I would move the safepoint to GetHeapStatistics() though. Here it is a bit hidden.
if (!allocator_->in_gc()) {
Caio LimaCan we do that in MainAllocator directly? E.g. in ResetLab(). That way we wouldn't need to do this 3x but only once. The "AllocatorPolicy" stuff should really be only for logic in the MainAllocator that's space-dependent.
I moved this to `MainAllocator::AdvanceAllocationObservers`. I placed there because it calls `MainAllocator::MarkLabStartInitialized` and it resets AllocationInfo's start to top. So when I do `allocator_->top() - allocator_->start()` inside `ResetLab` it quite often results in 0.
AdvanceAllocationObservers() doesn't feel like the right location for this. Why not add this to ResetLab() instead? In there you would simply increment your counter by `end-start`.
total_bytes += space_allocator->top() - space_allocator->start();
Caio LimaI believe as part of GetHeapStatistics/GetHeapSpaceStatistics we already free the main-thread LABs, see [0]. So in theory at least this should be always 0 here and we shouldn't need to handle this here.
That's true. In my current implementation I also want to take into account allocations that happens on non-GC workers (e.g compiler threads) and I'm wondering about their LABs. They seem to get their own LocalHeap and IIUC their LABs aren't freed when GetHeapStatistics gets called. Does it still make sense to keep such logic of inspecting LABs there?
Additionally, after looking deeper into the code, I think LocalHeaps aren't created for GC workers, so when I iterate over `Heap::safepoint()->IterateLocalHeaps`, I won't get any allocation for GC. Is this correct?
With a safepoint you can just invoke `FreeLinearAllocationAreas()` to free the LABs for background threads as well. Because we don't have a safepoint yet, we only free LABs them on the main thread atm. I don't think it's really necessary to account for `top-start` anywhere, we should just free the LABs.
In theory I guess we could avoid the safepoint but I would like to avoid that.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thank you very much for the review so far, Dominik.
SafepointScope safe_scope(isolate_, SafepointKind::kIsolate);
We currently don't have a safepoint in GetHeapStatistics(). I need to check whether this is okay. I would move the safepoint to GetHeapStatistics() though. Here it is a bit hidden.
This is nice, mainly when combined to your comment bellow asking to always free LienarAllocationAreas.
I'm curious to understand how you'd check if it's ok to place a safepoint in `GetHeapStatistics`. I'd like to understand it and it's likely that I misunderstand how safepoints work. My understanding so far is that when I call `SafepointScope safe_scope(isolate_, SafepointKind::kIsolate);`, it will wait for all workers to report `Safepoint()` and Park their thread until `~SafepointScope` is called. Once all workers for this Isolate are parked, the rest of the code continues and I can be sure that when calling `local_heap->allocator()->GetTotalAllocatedBytes()` I won't get a race. Is it correct?
Based on that, I thought that moving it to `GetHeapStatistics` should just work, but from your comment, I'm overlooking something else.
if (!allocator_->in_gc()) {
Caio LimaCan we do that in MainAllocator directly? E.g. in ResetLab(). That way we wouldn't need to do this 3x but only once. The "AllocatorPolicy" stuff should really be only for logic in the MainAllocator that's space-dependent.
Dominik InführI moved this to `MainAllocator::AdvanceAllocationObservers`. I placed there because it calls `MainAllocator::MarkLabStartInitialized` and it resets AllocationInfo's start to top. So when I do `allocator_->top() - allocator_->start()` inside `ResetLab` it quite often results in 0.
AdvanceAllocationObservers() doesn't feel like the right location for this. Why not add this to ResetLab() instead? In there you would simply increment your counter by `end-start`.
The issue that I'm getting here is because by the end of the `AdvanceAllocationObservers` we call `MarkLabInitialized`[1]. The `AllocationInfo` get's its `start` changed to `top`, which makes all calculation after `AdvanceAllocationObservers` miss some bytes. I've got this problem while implementing my patch, since I had some accounting happening after `AdvanceAllocationObservers`. Given that, if I place the logic to count on ResetLab, I'll get 0 in when this path is executed here[2], for example. That's the reason why I initially implemented the `AddTotalAllocatedBytes` in three places before, and always calling before `AdvanceAllocationObservers`. I can do it in `RestLab` if this isn't a blocker, but I'm not sure why we call a `MarkLabStartInitialized` by the end of `AdvanceAllocationObservers`. Do you know why it's called there? TBH, I'm quite in favor to move that from there, because it took me quite a while to debug why my counter was quite low and I was very surprised to see that `AdvanceAllocationObservers` changes `AllocationInfo`'s start.
[1] -https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/main-allocator.cc;l=130
[2]- https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/main-allocator.cc;l=526
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
SafepointScope safe_scope(isolate_, SafepointKind::kIsolate);
Caio LimaWe currently don't have a safepoint in GetHeapStatistics(). I need to check whether this is okay. I would move the safepoint to GetHeapStatistics() though. Here it is a bit hidden.
This is nice, mainly when combined to your comment bellow asking to always free LienarAllocationAreas.
I'm curious to understand how you'd check if it's ok to place a safepoint in `GetHeapStatistics`. I'd like to understand it and it's likely that I misunderstand how safepoints work. My understanding so far is that when I call `SafepointScope safe_scope(isolate_, SafepointKind::kIsolate);`, it will wait for all workers to report `Safepoint()` and Park their thread until `~SafepointScope` is called. Once all workers for this Isolate are parked, the rest of the code continues and I can be sure that when calling `local_heap->allocator()->GetTotalAllocatedBytes()` I won't get a race. Is it correct?
Based on that, I thought that moving it to `GetHeapStatistics` should just work, but from your comment, I'm overlooking something else.
I think I've found an issue here. From this comment[1], `GetHeapStatistics` not necessarily comes from main thread, and this would make the `SafepointScope` fail `DCHECK(LocalHeap::Current()->is_main_thread());` when it's called out of main thread. It seems a breaking change if it's required that `GetHeapStatistics` needs to be called from Main Thread. I see a couple of directions here:
1. For our use cases, it's probably fine to only count allocations that happens in main thread. This way, we would keep this new field to `HeapStatistics`. However I feel that this isn't very nice approach, because I think the API extension should not be specialized for those use cases.
2. We could actually change the API a bit and create a public `Isolate::GetTotalAllocatedBytes` where it's required to be called from main thread. This probably would allow the use Safepoint and the counting should work fine.
I prefer the second option, because it seems a better way to extend the API supporting allocations from workers and not introducing any break changes in current API.
SafepointScope safe_scope(isolate_, SafepointKind::kIsolate);
Caio LimaWe currently don't have a safepoint in GetHeapStatistics(). I need to check whether this is okay. I would move the safepoint to GetHeapStatistics() though. Here it is a bit hidden.
Caio LimaThis is nice, mainly when combined to your comment bellow asking to always free LienarAllocationAreas.
I'm curious to understand how you'd check if it's ok to place a safepoint in `GetHeapStatistics`. I'd like to understand it and it's likely that I misunderstand how safepoints work. My understanding so far is that when I call `SafepointScope safe_scope(isolate_, SafepointKind::kIsolate);`, it will wait for all workers to report `Safepoint()` and Park their thread until `~SafepointScope` is called. Once all workers for this Isolate are parked, the rest of the code continues and I can be sure that when calling `local_heap->allocator()->GetTotalAllocatedBytes()` I won't get a race. Is it correct?
Based on that, I thought that moving it to `GetHeapStatistics` should just work, but from your comment, I'm overlooking something else.
I think I've found an issue here. From this comment[1], `GetHeapStatistics` not necessarily comes from main thread, and this would make the `SafepointScope` fail `DCHECK(LocalHeap::Current()->is_main_thread());` when it's called out of main thread. It seems a breaking change if it's required that `GetHeapStatistics` needs to be called from Main Thread. I see a couple of directions here:
1. For our use cases, it's probably fine to only count allocations that happens in main thread. This way, we would keep this new field to `HeapStatistics`. However I feel that this isn't very nice approach, because I think the API extension should not be specialized for those use cases.
2. We could actually change the API a bit and create a public `Isolate::GetTotalAllocatedBytes` where it's required to be called from main thread. This probably would allow the use Safepoint and the counting should work fine.I prefer the second option, because it seems a better way to extend the API supporting allocations from workers and not introducing any break changes in current API.
No, sorry. You are not overlooking anything. I was just wondering whether it is okay to do this because of performance - not because of correctness. At some point iirc we unintentionally had such calls during benchmarks. Safepointing shouldn't be too costly but I still wanted to check.
I think we already rely on GetHeapStatistics only being used from the main thread (see [0]). So adding a safepoint there shouldn't be an issue from that point-of-view. Having the safepoint in GetHeapStatistics would make this a lot simpler, we could then simply free all LABs (not just main thread) and simply sum up all per-thread counters (no atomics or synchronization required).
if (origin != AllocationOrigin::kGC) {
Caio LimaWe shouldn't really have large object allocations in the GC? That's probably the reason we don't have the AllocationOrigin here.
Sorry I missed this. I reverted those changes and now I'm counting on `HeapAllocator::AllocateRawLargeInternal`. Thanks a lot for pointing it out.
Acknowledged
if (!allocator_->in_gc()) {
Caio LimaCan we do that in MainAllocator directly? E.g. in ResetLab(). That way we wouldn't need to do this 3x but only once. The "AllocatorPolicy" stuff should really be only for logic in the MainAllocator that's space-dependent.
Dominik InführI moved this to `MainAllocator::AdvanceAllocationObservers`. I placed there because it calls `MainAllocator::MarkLabStartInitialized` and it resets AllocationInfo's start to top. So when I do `allocator_->top() - allocator_->start()` inside `ResetLab` it quite often results in 0.
Caio LimaAdvanceAllocationObservers() doesn't feel like the right location for this. Why not add this to ResetLab() instead? In there you would simply increment your counter by `end-start`.
The issue that I'm getting here is because by the end of the `AdvanceAllocationObservers` we call `MarkLabInitialized`[1]. The `AllocationInfo` get's its `start` changed to `top`, which makes all calculation after `AdvanceAllocationObservers` miss some bytes. I've got this problem while implementing my patch, since I had some accounting happening after `AdvanceAllocationObservers`. Given that, if I place the logic to count on ResetLab, I'll get 0 in when this path is executed here[2], for example. That's the reason why I initially implemented the `AddTotalAllocatedBytes` in three places before, and always calling before `AdvanceAllocationObservers`. I can do it in `RestLab` if this isn't a blocker, but I'm not sure why we call a `MarkLabStartInitialized` by the end of `AdvanceAllocationObservers`. Do you know why it's called there? TBH, I'm quite in favor to move that from there, because it took me quite a while to debug why my counter was quite low and I was very surprised to see that `AdvanceAllocationObservers` changes `AllocationInfo`'s start.
[1] -https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/main-allocator.cc;l=130
[2]- https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/main-allocator.cc;l=526
AllocationObservers and their accounting is indeed quite tricky. For better or worse it uses the size of [start,top] for accounting. If I could redo this from scratch, it would probably look quite differently today. That's why I was thinking about using ResetLab() for this feature instead. That method should be called every time we set up a LAB. At that point we still have start==top in the MainAllocator's `allocation_info_`. The boundaries are passed to `ResetLab(start, end)`, so we can use them to directly bump the counter there before even filling the LAB.
E.g. I believe this should already work:
```
void MainAllocator::ResetLab(..) {
...
total_allocated_bytes_ += (end-start);
allocation_info().Reset(start, end);
....
}
```
I have to admit that this has the disadvantage that we might overapproximate allocations a bit if we don't end up using the whole LAB but in relation to all the other flakiness/background threads/etc. it should still be negligible relative to how much is allocated in general. I would prefer going with this first to land the CL sooner. And if necessary we could refine this as a follow-up.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
void AddTotalAllocatedBytes(size_t size) { total_allocated_bytes_ += size; }
I suppose we don't need to expose this anymore.
I'm not sure if I missed anything here, but since I'm counting from `MainAllocator::ResetLab` and the counter is in `HeapAllocator`, I think I'd still need it here, no?
The LABs should be all empty here, you can add DHECKs like `DCHECK(!old_space_allocator->IsLabValid())` or `DHECK_IMPLIES(new_space_allocator_, !new_space_allocator->IsLabValid()` that this indeed holds. Then this method is just `return total_allocated_bytes_` with some additional DCHECKs.
Acknowledged
SafepointScope safe_scope(isolate_, SafepointKind::kIsolate);
Caio LimaWe currently don't have a safepoint in GetHeapStatistics(). I need to check whether this is okay. I would move the safepoint to GetHeapStatistics() though. Here it is a bit hidden.
Caio LimaThis is nice, mainly when combined to your comment bellow asking to always free LienarAllocationAreas.
I'm curious to understand how you'd check if it's ok to place a safepoint in `GetHeapStatistics`. I'd like to understand it and it's likely that I misunderstand how safepoints work. My understanding so far is that when I call `SafepointScope safe_scope(isolate_, SafepointKind::kIsolate);`, it will wait for all workers to report `Safepoint()` and Park their thread until `~SafepointScope` is called. Once all workers for this Isolate are parked, the rest of the code continues and I can be sure that when calling `local_heap->allocator()->GetTotalAllocatedBytes()` I won't get a race. Is it correct?
Based on that, I thought that moving it to `GetHeapStatistics` should just work, but from your comment, I'm overlooking something else.
Dominik InführI think I've found an issue here. From this comment[1], `GetHeapStatistics` not necessarily comes from main thread, and this would make the `SafepointScope` fail `DCHECK(LocalHeap::Current()->is_main_thread());` when it's called out of main thread. It seems a breaking change if it's required that `GetHeapStatistics` needs to be called from Main Thread. I see a couple of directions here:
1. For our use cases, it's probably fine to only count allocations that happens in main thread. This way, we would keep this new field to `HeapStatistics`. However I feel that this isn't very nice approach, because I think the API extension should not be specialized for those use cases.
2. We could actually change the API a bit and create a public `Isolate::GetTotalAllocatedBytes` where it's required to be called from main thread. This probably would allow the use Safepoint and the counting should work fine.I prefer the second option, because it seems a better way to extend the API supporting allocations from workers and not introducing any break changes in current API.
No, sorry. You are not overlooking anything. I was just wondering whether it is okay to do this because of performance - not because of correctness. At some point iirc we unintentionally had such calls during benchmarks. Safepointing shouldn't be too costly but I still wanted to check.
I think we already rely on GetHeapStatistics only being used from the main thread (see [0]). So adding a safepoint there shouldn't be an issue from that point-of-view. Having the safepoint in GetHeapStatistics would make this a lot simpler, we could then simply free all LABs (not just main thread) and simply sum up all per-thread counters (no atomics or synchronization required).
Acknowledged
Jakob KummerowShould this be folded into the previous conditional? If we're not on the main thread, surely this is not an allocation that could go towards the counters. Also, what kinds of allocations go to SHARED_LO_SPACE?
Caio Limawhat kinds of allocations go to SHARED_LO_SPACE?
Objects that are both shared (per https://github.com/tc39/proposal-structs or https://github.com/WebAssembly/shared-everything-threads) and too large for non-LO spaces.
If we fold this into the condition above, we will not count the allocations of those shared objects that Jakob mentioned. I can see that a JSSharedArray always allocate with `AllocationType::kSharedOld` to allocate its storage in `Factory::NewJSSharedArray`.
Done
if (!allocator_->in_gc()) {
Caio LimaCan we do that in MainAllocator directly? E.g. in ResetLab(). That way we wouldn't need to do this 3x but only once. The "AllocatorPolicy" stuff should really be only for logic in the MainAllocator that's space-dependent.
Dominik InführI moved this to `MainAllocator::AdvanceAllocationObservers`. I placed there because it calls `MainAllocator::MarkLabStartInitialized` and it resets AllocationInfo's start to top. So when I do `allocator_->top() - allocator_->start()` inside `ResetLab` it quite often results in 0.
Caio LimaAdvanceAllocationObservers() doesn't feel like the right location for this. Why not add this to ResetLab() instead? In there you would simply increment your counter by `end-start`.
Dominik InführThe issue that I'm getting here is because by the end of the `AdvanceAllocationObservers` we call `MarkLabInitialized`[1]. The `AllocationInfo` get's its `start` changed to `top`, which makes all calculation after `AdvanceAllocationObservers` miss some bytes. I've got this problem while implementing my patch, since I had some accounting happening after `AdvanceAllocationObservers`. Given that, if I place the logic to count on ResetLab, I'll get 0 in when this path is executed here[2], for example. That's the reason why I initially implemented the `AddTotalAllocatedBytes` in three places before, and always calling before `AdvanceAllocationObservers`. I can do it in `RestLab` if this isn't a blocker, but I'm not sure why we call a `MarkLabStartInitialized` by the end of `AdvanceAllocationObservers`. Do you know why it's called there? TBH, I'm quite in favor to move that from there, because it took me quite a while to debug why my counter was quite low and I was very surprised to see that `AdvanceAllocationObservers` changes `AllocationInfo`'s start.
[1] -https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/main-allocator.cc;l=130
[2]- https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/main-allocator.cc;l=526
AllocationObservers and their accounting is indeed quite tricky. For better or worse it uses the size of [start,top] for accounting. If I could redo this from scratch, it would probably look quite differently today. That's why I was thinking about using ResetLab() for this feature instead. That method should be called every time we set up a LAB. At that point we still have start==top in the MainAllocator's `allocation_info_`. The boundaries are passed to `ResetLab(start, end)`, so we can use them to directly bump the counter there before even filling the LAB.
E.g. I believe this should already work:
```
void MainAllocator::ResetLab(..) {
...
total_allocated_bytes_ += (end-start);
allocation_info().Reset(start, end);
....
}
```I have to admit that this has the disadvantage that we might overapproximate allocations a bit if we don't end up using the whole LAB but in relation to all the other flakiness/background threads/etc. it should still be negligible relative to how much is allocated in general. I would prefer going with this first to land the CL sooner. And if necessary we could refine this as a follow-up.
I've implemented and tested this version. From the toy example I'm using here, the overestimation is about ~1%, which seems ok.
total_bytes += space_allocator->top() - space_allocator->start();
Caio LimaI believe as part of GetHeapStatistics/GetHeapSpaceStatistics we already free the main-thread LABs, see [0]. So in theory at least this should be always 0 here and we shouldn't need to handle this here.
Dominik InführThat's true. In my current implementation I also want to take into account allocations that happens on non-GC workers (e.g compiler threads) and I'm wondering about their LABs. They seem to get their own LocalHeap and IIUC their LABs aren't freed when GetHeapStatistics gets called. Does it still make sense to keep such logic of inspecting LABs there?
Additionally, after looking deeper into the code, I think LocalHeaps aren't created for GC workers, so when I iterate over `Heap::safepoint()->IterateLocalHeaps`, I won't get any allocation for GC. Is this correct?
With a safepoint you can just invoke `FreeLinearAllocationAreas()` to free the LABs for background threads as well. Because we don't have a safepoint yet, we only free LABs them on the main thread atm. I don't think it's really necessary to account for `top-start` anywhere, we should just free the LABs.
In theory I guess we could avoid the safepoint but I would like to avoid that.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
void AddTotalAllocatedBytes(size_t size) { total_allocated_bytes_ += size; }
Caio LimaI suppose we don't need to expose this anymore.
I'm not sure if I missed anything here, but since I'm counting from `MainAllocator::ResetLab` and the counter is in `HeapAllocator`, I think I'd still need it here, no?
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
In general this CL already LGTM but my colleagues prefer the non-safepoint variant. Added a few comments.
i::SafepointKind safepoint_kind = i_isolate->is_shared_space_isolate()
Sorry to have misled but my colleagues would prefer to keep the safepoint out of this method. It should not be too hard to change this though.
void AddTotalAllocatedBytes(size_t size) { total_allocated_bytes_ += size; }
This one now needs to increment the counter in heap. This should be something like `heap()->total_allocated_bytes.fetch_add(size, std::memory_order_relaxed)`.
safepoint()->IterateLocalHeaps([&](LocalHeap* local_heap) {
Let's add an atomic Heap::total_allocated_bytes_ field and return that field relaxed here.
if (!allocator_->in_gc()) {
Caio LimaCan we do that in MainAllocator directly? E.g. in ResetLab(). That way we wouldn't need to do this 3x but only once. The "AllocatorPolicy" stuff should really be only for logic in the MainAllocator that's space-dependent.
Dominik InführI moved this to `MainAllocator::AdvanceAllocationObservers`. I placed there because it calls `MainAllocator::MarkLabStartInitialized` and it resets AllocationInfo's start to top. So when I do `allocator_->top() - allocator_->start()` inside `ResetLab` it quite often results in 0.
Caio LimaAdvanceAllocationObservers() doesn't feel like the right location for this. Why not add this to ResetLab() instead? In there you would simply increment your counter by `end-start`.
Dominik InführThe issue that I'm getting here is because by the end of the `AdvanceAllocationObservers` we call `MarkLabInitialized`[1]. The `AllocationInfo` get's its `start` changed to `top`, which makes all calculation after `AdvanceAllocationObservers` miss some bytes. I've got this problem while implementing my patch, since I had some accounting happening after `AdvanceAllocationObservers`. Given that, if I place the logic to count on ResetLab, I'll get 0 in when this path is executed here[2], for example. That's the reason why I initially implemented the `AddTotalAllocatedBytes` in three places before, and always calling before `AdvanceAllocationObservers`. I can do it in `RestLab` if this isn't a blocker, but I'm not sure why we call a `MarkLabStartInitialized` by the end of `AdvanceAllocationObservers`. Do you know why it's called there? TBH, I'm quite in favor to move that from there, because it took me quite a while to debug why my counter was quite low and I was very surprised to see that `AdvanceAllocationObservers` changes `AllocationInfo`'s start.
[1] -https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/main-allocator.cc;l=130
[2]- https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/main-allocator.cc;l=526
Caio LimaAllocationObservers and their accounting is indeed quite tricky. For better or worse it uses the size of [start,top] for accounting. If I could redo this from scratch, it would probably look quite differently today. That's why I was thinking about using ResetLab() for this feature instead. That method should be called every time we set up a LAB. At that point we still have start==top in the MainAllocator's `allocation_info_`. The boundaries are passed to `ResetLab(start, end)`, so we can use them to directly bump the counter there before even filling the LAB.
E.g. I believe this should already work:
```
void MainAllocator::ResetLab(..) {
...
total_allocated_bytes_ += (end-start);
allocation_info().Reset(start, end);
....
}
```I have to admit that this has the disadvantage that we might overapproximate allocations a bit if we don't end up using the whole LAB but in relation to all the other flakiness/background threads/etc. it should still be negligible relative to how much is allocated in general. I would prefer going with this first to land the CL sooner. And if necessary we could refine this as a follow-up.
I've implemented and tested this version. From the toy example I'm using here, the overestimation is about ~1%, which seems ok.
Great!
i::SafepointKind safepoint_kind = i_isolate->is_shared_space_isolate()
Sorry to have misled but my colleagues would prefer to keep the safepoint out of this method. It should not be too hard to change this though.
Acknowledged
safepoint()->IterateLocalHeaps([&](LocalHeap* local_heap) {
Let's add an atomic Heap::total_allocated_bytes_ field and return that field relaxed here.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
void AddTotalAllocatedBytes(size_t size) { total_allocated_bytes_ += size; }
This one now needs to increment the counter in heap. This should be something like `heap()->total_allocated_bytes.fetch_add(size, std::memory_order_relaxed)`.
Acknowledged
uint64_t Space::GetTotalAllocatedBytesInLAA() const {
I think this method should go to HeapAllocator instead. The idea here is that the Space doesn't know about LABs anymore.
Ideally we also wouldn't report a value per Space because afaics we only care about the total number in the end, right? With `HeapAllocator::GetTotalAllocatedBytes()`, supporting background threads would be quite simple as well: `for (LocalHeap* local_heap: xx) total += local_heap->allocator()->GetTotalAllocatedBytes()` (although we would need a safepoint).
I moved most of the logic to `HeapAllocator` instead, together with the counter. Looks like it makes more sense things be there instead of in the Isolate.
Additionally I've changed how `GetTotalAllocatedBytes` gets calculated and I'm now iterating over LocalHeaps with safepoint logic inside a `Heap::GettotalAllocatedBytes()`. Let me know if this is what you had in mind.
Acknowledged
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Code-Review | +1 |
// concurrent optimizations, this seems tolerable.
Nit: "concurrent optimization" (compilation to generated code) doesn't allocate much, so maybe not the best example. Let's just say that the leftover unused part of the LAB is quite small compared to the LAB itself.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks a lot for the review.
Could someone trigger pinpoints to get numbers from JetStream and Speedometer? From my local tests, there's no regression on benchmarks, but it would be good to get those numbers from bots as well.
Also, looks like a need another +1 from reviewers, but I'm not sure who can I ask for review here. Do you have any pointer?
Nit: "concurrent optimization" (compilation to generated code) doesn't allocate much, so maybe not the best example. Let's just say that the leftover unused part of the LAB is quite small compared to the LAB itself.
What I had in mind here is an optimization like allocation sink. This can also make the total allocated bytes fluctuate a bit. But I've changed the comment since it seems better suited.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
@mlippautz: Can you ptal as well? We need an additional (api) reviewer.
- Total allocation counter is stored in each HeapAllocator, to make the counter thread local
Nit: Please also add some line breaks to stay within bounds.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I've added another reviewer. Bots results look good as well - within usual fluctuation (either neutral or slightly positive).
(1) https://pinpoint-dot-chromeperf.appspot.com/job/10226588d10000
(2) https://pinpoint-dot-chromeperf.appspot.com/job/11b13c6b510000.
Code-Review | +1 |
lgtm
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
- Total allocation counter is stored in each HeapAllocator, to make the counter thread local
Nit: Please also add some line breaks to stay within bounds.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
It looks like I need the r+ from Dominik again, because I changed the comment in one in last Patchset. Also since I'm not a committer, I can't ask for CQ to land it yet. Could you help me on that, please?
Code-Review | +1 |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Commit-Queue | +2 |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
[api] Adding total allocated bytes in HeapStatistics
This change exposes total allocated bytes in v8::HeapStatistics API by
introducing a new total_allocated_bytes() method that tracks all heap
allocations since an Isolate creation.
The implementation adds:
- uint64_t total_allocated_bytes_ field to HeapStatistics.
- An atomic total allocation counter is stored in the Heap class.
- The counter is incremented whenever a RestLab is called. This approach can overestimate the total allocation for cases where the LAB is not fully used, but the leftover compared to the LAB itself is quite small, so it seems tolerable.
Design doc reference:
https://docs.google.com/document/d/1O4JPsoaxTQsX_7T5Fz4rsGeHMiM16jUrvDuq9FrtbNM
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |