On Tue, Aug 27, 2024 at 4:01 PM TAMURA, Kent <tk...@chromium.org> wrote:On Thu, Aug 15, 2024 at 5:44 PM Michael Lippautz <mlip...@chromium.org> wrote:Thanks for keeping performance a top priority here.On Thu, Aug 15, 2024 at 3:04 AM TAMURA, Kent <tk...@chromium.org> wrote:Thank you for the feedback!I reconsider alternative options.Alternative 1: Adding minimal CHECKs to the current iteratorI tried to add CHECKs to * -> and [] operators.
44.679 ms +0.1% 47.839 ms +0.4% 24.150 ms +0.3% 78.013 ms +0.3% 87.927 ms +0.4% 74.510 ms +0.3% 34.905 ms -0.1% 82.670 ms +0.9% 25.561 -0.3% 33.982 ms +0.7% 28.726 ms +0.1% 49.120 ms +0.1% 57.223 ms +0.7% 120.165 ms +0.4% 22.915 ms +0.7% 15.526 ms -0.1% 34.814 ms -0.4% 35.314 ms +0.1% 13.690 ms +0.4% 24.184 ms +0.1% 20.882 ms +0.3%IMO, this performance result is not acceptable.This is not great as is. However, have you tried a local build w/ PGO/LTO as well?For better or worse, this is an area where PGO/LTO and good inlining changes profiles. In the past I noticed that a bunch of base::span CHECKs that would show up in profiles for plain release builds that were all gone on PGO/LTO.I know it's a bit cumbersome to experiment with locally but it may allow you to ship the checked iterators everywhere. E.g., Oilpan pointer compression had >1% regression (on Speedometer2) without PGO that was almost (~0.2%) gone when compiling with PGO.We don't have high repetition counts on our PGO perf bots though (resource constraints), so we would need to verify locally that PGO indeed helps here.Thank you for the suggestion! I tested PGO builds on MacBook Pro (M1 Pro). I ran Speedometer3 128 times for each configuration."base": ToT, PGO build with locally-generated profile
"checked": The CL is applied, PGO build with locally-generated profileI think the result below is still unacceptable unfortunately.
42.833 ms +1.0% 41.956 ms +1.7% 22.060 ms +1.3% 71.760 ms +0.2% 71.426 ms -0.4% 56.899 ms +0.2% 31.001 ms +1.7% 72.251 ms +1.8% 31.505 -1.0% 29.201 ms +1.1% 19.864 ms +2.0% 35.568 ms +0.7% 45.357 ms +0.8% 96.584 ms +0.6% 17.555 ms +1.5% 11.695 ms +2.3% 31.544 ms +0.1% 29.516 ms +0.8% 10.727 ms +0.7% 18.643 ms +1.8% 13.499 ms +0.2%Alternative 2: Keep the current unsafe iterator as is, but Blink developers can avoid using the unsafe iteratorC++ compilers produce code calling `Vector::begin()` `Vector::end()` `iterator::operator++` and `iterator::operator*` for range-based `for` loops, and we don't want to add CHECKs to them to keep the current performance. However, Blink developers should use a safer iterator in hand-written code. So,
- Don't change Vector::begin(), Vector::end(), and the current iterator implementation
- Add Vector::CheckedBegin() and Vector::CheckedEnd(), which return base::CheckedContiguousIterator instances. Blink developers should use them in hand-written code.
- Other methods taking an iterator argument or returning an iterator are not changed. We can have implicit converters between the unsafe iterator and base::CheckedContiguousIterator.
Blink developers replace `begin()` with `CheckedBegin() to resolve unsafe-buffer-manipulation warnings for existing code.How big is the fallout of changes required here? I personally find DX okay here as it's all guided by warnings.I guess we can always exempt certain hot code in case it comes up on profiles.Currently 764 files in third_party/blink/renderer have "#pragma allow_unsafe_buffers". Not all of them are due to WTF::Vector iterator, so I think the number of files involved is much less than 764. We can proceed this incrementally, as a part of the spanification effort.If no one objects, I'll proceed with this approach.// Code producing a unsafe-buffer-manipulation warningfor (auto it = v.begin(); it != v.end(); ++it) {if (it->...) {v.erase(it);break;}}// Resolve the warning by CheckedBegin() and CheckedEnd()On Wed, Aug 14, 2024 at 3:04 PM Michael Lippautz <mlip...@chromium.org> wrote:On Wed, Aug 14, 2024 at 6:01 AM Kentaro Hara <har...@chromium.org> wrote:Thanks Kent-san for the idea!I do understand the motivation but I have a few concerns about introducing yet another type of Vector: 1) the extra type regresses developer ergonomics, 2) it's not obvious when developers should use Vector vs. UnsafeVector; e.g., even if it doesn't regress Speedometer3, it may regress the real-world LCP because # of elements stored in the Vector will depend on the workload.+1, Speedometer is a proxy for performance and we should not just work around it.Also, wrt to 1) we are still violating one of our key principles with GC in HeapVector (the type allows on-stack and on-heap usage). We wanted to harden this case many years ago (because it causes real confusion and discussions) but held off on it because of not wanting to introduce any additional types around Vector.My concerns are gone if we can reduce the performance overhead of Vector enough to say "You can use Vector by default. Only when it is extremely performance-critical, please use UnsafeVector -- the usage should be limited".+1 (see below)A few questions:- What is our plan for iterators of std::vector, std::map etc?- Can we consider a (less qualified but) more lightweight check? e.g., only check whether |it| is not pointing to the end element before incrementing. I think @Daniel Cheng experimented with this idea in the past maybe?On Wed, Aug 14, 2024 at 10:48 AM TAMURA, Kent <tk...@chromium.org> wrote:Hi WTF owners,As a part of Spanification project, I'd like to do the following changes:* Rename all existing Vector and HeapVector types to UnsafeVector and HeapUnsafeVector.* Introduce Vector and HeapVector, which use base::CheckedContiguousIterator.* Switch back UnsafeVector/HeapUnsafeVector instances to the new Vector/HeapVector if they are not performance-critical.Isn't one of the promises that spans and friends with their checked ops are free if they are used e.g. range-based for loops when then the compiler can prove that there's no OOB? Shouldn't this be our regular case?I am concerned about switching something as broadly used as Vector to a different version to only incrementally convert it back then. I am a bit surprised that a checked iterator here would actually regress anything because we already have a CHECK() in at() that is also used by operator[].Do you have any feedback on this plan?Background:Iterators of WTF::Vector are almost raw pointers, and even code like `++it` triggers unsafe-buffer-operation errors. Using base::CheckedContiguousIterator resolves the compilation errors, but I confirmed it regressed Speedometer3. So we should have Vector with the current unsafe iterators and Vector with base::CheckedContiguousIterator, and should use the former only in hot code paths.--TAMURA Kent
Software Engineer, Google--Kentaro Hara, Tokyo--TAMURA Kent
Software Engineer, Google--TAMURA Kent
Software Engineer, Google--TAMURA Kent
Software Engineer, Google
Speedometer score should not be changed.
If a CL using CheckedBegin()/CheckedEnd() regressed Speedometer, it should be reverted to begin()/end() with UNSAFE_BUFFERS().
Speedometer score should not be changed.Would you elaborate on why you can say this?If a CL using CheckedBegin()/CheckedEnd() regressed Speedometer, it should be reverted to begin()/end() with UNSAFE_BUFFERS().My concern is that a small regression might accumulate to a significant regression. I want to see some evidence to believe that we can end up with a state where most begin()/end() are replaced with CheckedBegin()/CheckedEnd() without a significant regression.
+1 to Dana's question about PGO (and probably double-checking that). I have seen a lot of iterator CHECKs being optimized away on PGO (yay!) which is why I asked.
On Tue, Sep 3, 2024 at 10:50 AM Kentaro Hara <har...@chromium.org> wrote:Speedometer score should not be changed.Would you elaborate on why you can say this?If a CL using CheckedBegin()/CheckedEnd() regressed Speedometer, it should be reverted to begin()/end() with UNSAFE_BUFFERS().My concern is that a small regression might accumulate to a significant regression. I want to see some evidence to believe that we can end up with a state where most begin()/end() are replaced with CheckedBegin()/CheckedEnd() without a significant regression.+1 about the incremental regressions creeping in that we cannot fix as they don't show up.
On Wed, Sep 4, 2024 at 2:35 AM Michael Lippautz <mlip...@chromium.org> wrote:+1 to Dana's question about PGO (and probably double-checking that). I have seen a lot of iterator CHECKs being optimized away on PGO (yay!) which is why I asked.I generated profile data after building a binary with the change, and rebuilt a binary with the profile data.On Tue, Sep 3, 2024 at 10:50 AM Kentaro Hara <har...@chromium.org> wrote:Speedometer score should not be changed.Would you elaborate on why you can say this?If a CL using CheckedBegin()/CheckedEnd() regressed Speedometer, it should be reverted to begin()/end() with UNSAFE_BUFFERS().My concern is that a small regression might accumulate to a significant regression. I want to see some evidence to believe that we can end up with a state where most begin()/end() are replaced with CheckedBegin()/CheckedEnd() without a significant regression.+1 about the incremental regressions creeping in that we cannot fix as they don't show up.Ok, I understand your concern.If we changed begin()/end() so that they return base::CheckedContiguousIterator, we would have 0.2% regression of Speedometer3 score. The impact of the latest proposal would be smaller than it because we would update a part of begin()/end() usages.