Aldo CulquicondorCan you add a commit message? It might be good to just make this CL focus on the bfcache bits which is the majority of the CL, and then actually enable the feature in tests in another CL.
Yeah... I was still in prototyping mode. Split into a few CLs now.
// this time, as it would violate the invariant that frames entering the
// BFCache can be restored.Aldo CulquicondorI think this can mention the case that Alex mentioned in the other thread, where restoring would set the lifecycle state to active again for all frames, which we don't want here.
Reworded
if (rfh && rfh->lifecycle_state() == LifecycleStateImpl::kActive) {Rakina Zata AmniI'm not sure we can do this, since it'll mean that some subframes of a bfcached RFH might be in pending deletion state, rather than in kInBackForwardCache, which breaks an existing invariant. It also means that if we were to restore a bfcached RFH before those subframes get a chance to complete unloading, these subframes may get restored with an kActive state, potentially going from kPendingDeletion to kActive, which is doubly weird and will probably fail our invariants on allowed state transitions.
@rak...@chromium.org do you know what the expected behavior is if we try to bfcache a RFH that has some subframes in pending deletion state? Do we actually allow that, or is this supposed to not be eligible for bfcache? It would seem that we should disallow it, because we assume the cached RFH and all its subframes will be restored in the same kActive state [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18337;drc=a416f377ead3d8c361b97126daf78381344937dd).
And also, do you know if anything around bfcache eligibility or RFH unloading has changed in the past week or so? This CL used to be green a few days ago, but started failing on GpuNormalTermination_NewWebGLNotBlocked (see failures in PS5). (The feature here postpones subframe RFH destruction by posting a task to do it, which means that even subframe RFHs without unload handlers might now be in pending deletion state until that task runs).
Aldo CulquicondorOh that's an interesting case! I don't think we disallow BFCache right now when that happens, but probably we should make that make the page ineligible, as it might break some more invariants as you point out. I'm not aware of any recent BFCache changes, but hopefully you can just fix this by adding a new check for the lifecycyle state and add a new NotRestoredReason for the pending deletion case somewhere [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/back_forward_cache_impl.cc;l=1117;drc=7189da78436702acd8f20043b0f394b04f588093).
Aldo CulquicondorThanks Rakina, it looks like we reached similar conclusions.
Indeed, I am puzzled as to why this only started failing on Friday, even though I've been running these tests for weeks now. And it's failing very consistently. I don't see any code that has recently changed or any new experiment enabled in field trials. The only exception is this cleanup that I already tried reverting https://chromium-review.googlesource.com/c/chromium/src/+/7064190, but it still fails.
I agree with the proposed solution of making the page ineligible. Is this a temporary state? Would the page be considered for storage again once the pending subframes are deleted?
Alex MoshchukAlternatively, what if I change [this loop](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18336;drc=a416f377ead3d8c361b97126daf78381344937dd) to skip RFHs that are kPendingDeletion or kRunningUnloadHandlers?
Rakina Zata AmniI agree with the proposed solution of making the page ineligible. Is this a temporary state? Would the page be considered for storage again once the pending subframes are deleted?
Yes, I'd expect that once any pending deletion subframes are deleted, the page would become eligible for bfcache again.
Alternatively, what if I change this loop to skip RFHs that are kPendingDeletion or kRunningUnloadHandlers?
While that could work, I think it would mean that we need to start supporting some new frame states (a bfcached main frame having children that are pending deletion, which could then be activated). I'm a bit worried that this will make it harder to reason about navigation invariants. Making the page ineligible for bfcache in this state seems like a simpler fix, if it works.
One more question here for Rakina: presumably we need to run pagehide handlers on a page entering bfcache. Does pagehide get to run for subframes too? Do we wait for it to finish before updating any browser process state on the bfcached page? (I wonder if the roundtrip to run pagehide would imply that any existing pending deletion subframes will finish running their unload handlers and get destroyed.)
Aldo Culquicondorpresumably we need to run pagehide handlers on a page entering bfcache. Does pagehide get to run for subframes too? Do we wait for it to finish before updating any browser process state on the bfcached page? (I wonder if the roundtrip to run pagehide would imply that any existing pending deletion subframes will finish running their unload handlers and get destroyed.)
We do run pagehide on the subframes, but that is triggered on the same time we set the page lifecycle state to frozen on the renderer (so after we call DidCommitand put the page in bfcache on the browser side). Is it guaranteed that the pending deletion subframes would've gotten deleted after a roundtrip IPC, or is it possible for the delayed deletion to be longer?
Alex MoshchukIs it guaranteed that the pending deletion subframes would've gotten deleted after a roundtrip IPC, or is it possible for the delayed deletion to be longer?
There are no guarantees as to when the deletions will run. In the long run, we might want to delay them until a moment of quiescence (nothing is loading and there is no input). They are still immediately deleted if the WebContents is destroyed.
Yes, I'd expect that once any pending deletion subframes are deleted, the page would become eligible for bfcache again.
I prototyped the proposed solution and added a test. The test reveals that the main RFH is deleted when not allowed to enter the cache. Is this because of how I constructed the test or would this always happen? Could this cause a regression on BFCache hits?
Rakina Zata AmniIs it guaranteed that the pending deletion subframes would've gotten deleted after a roundtrip IPC, or is it possible for the delayed deletion to be longer?
There are no guarantees as to when the deletions will run. In the long run, we might want to delay them until a moment of quiescence (nothing is loading and there is no input). They are still immediately deleted if the WebContents is destroyed.
Interesting, so I'm guessing that without your new kDelayRfhDestructionsOnUnloadAndDetach feature, the IPC to trigger unload handlers in the subframe would've necessarily been processed by the renderer prior to processing the IPC to run pagehide handlers, meaning that by the time we finished running pagehide handlers after entering bfcache, the browser probably finished removing any pending deletion subframes. So it was probably difficult/impossible to restore a page from bfcache which still had pending deletion subframes.
But if we want to delay the posted task for completing the RFH destruction, the duration of the pending deletion state can get longer and it becomes more probable that we could restore a page from bfcache which still had pending deletion subframes - so yeah, this just confirms that we probably don't want to allow pending deletion subframes on pages entering the bfcache in the first place.
> Yes, I'd expect that once any pending deletion subframes are deleted, the page would become eligible for bfcache again.I prototyped the proposed solution and added a test. The test reveals that the main RFH is deleted when not allowed to enter the cache. Is this because of how I constructed the test or would this always happen? Could this cause a regression on BFCache hits?
Yes, I think it's expected that the old main RFH would be destroyed if it doesn't enter bfcache.
It's true that this could affect bfcache hit rate. Rakina, do you think that can be a problem? I hope it's a rare enough case, but we should have metrics to observe the impact of it, and if it turns out to be larger than expected, we can revisit one of the alternatives (e.g., add the special casing to allow pending delete frames to enter and leave the cache while keeping their state).
Aldo CulquicondorYeah I think it's fine to add this case, I'm assuming it would be rare, and you can add "BackForwardCache.HistoryNavigationOutcome" to your histograms list when experimenting to watch for the impact on BFCache there. cc-ing @fer...@chromium.org just in case though.
Acknowledged. I will keep the histogram in mind.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
This approach LGTM if rakina@ and fergal@ are ok with it, with some comments below.
A possibly simpler alternative to the 10s delay (which might be flaky on some slower bots) is using [DoNotDeleteForTesting](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.h;l=2716;drc=8900162810d457b64286ce653cb61bc2009c5068) on the subframe's RFH here. Then we also may not need the PostDelayedTask and the delay feature param.
// 1) Delete RFH.minor nit: insert "subframe"
}Is there a way to revert these numerous bracket fixes? It's made it difficult to focus on the actual changes for reviewing, and it'd be nice to avoid polluting the blame history with your username in all these places. :)
// reached
// the pending commit stage.nit: fix wrapping
RenderFrameHostImpl::LifecycleStateImpl::kReadyToBeDeleted) {Should this use IsPendingDeletion() which would include LifecycleStateImpl::kRunningUnloadHandlers too? I feel like kRunningUnloadHandlers has the same problem where the frame is pending async deletion and we don't want to later restore it as an active frame.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I didn't look at the full details of this but I am fine with us blocking BFCache if a frame is in `kRunningUnloadHandlers" or `kPendingDeletion` as long as we are mindful of its impact on the BFCache hit-rate.
That said, what bad thing happens if we just allow the page to be BFCached?
A page cannot enter BFCache if any subframes have `unload` handlers. When considering BFCache, `pagehide` and `visibilitychange` handlers are run *before* making the final decision on BFCache. So I don't think it's possible to be in `kRunningUnloadHandlers` AND be going into BFCache.
So we would
So our invariant would be that a frame in `kInBackForewardCache` can have subframes that are in `kInBackForewardCache` or `kPendingDeletion`.
RenderFrameHostImpl::LifecycleStateImpl::kReadyToBeDeleted) {Should this use IsPendingDeletion() which would include LifecycleStateImpl::kRunningUnloadHandlers too? I feel like kRunningUnloadHandlers has the same problem where the frame is pending async deletion and we don't want to later restore it as an active frame.
I believe `kRunningUnloadHandlers` means we are not going to BFCache. I think that could be better documented [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.h;l=1375;drc=fdfda9825390eca49ac469aecd3d80a689eef23c) because when `pagehide` and `visibilitychange` handlers run in the BFCacheable case, we do not enter this state. There is [no allowed transition](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18258;drc=5f6ce76066823f714b109b61aafc16ab29d598eb) from `kRunningUnloadHandlers` to anything except deletion.
So it kinda makes sense to me to also include that case, however I think if we are in `kRunningUnloadHandlers` then I think we are already blocked from BFCache by having unload handlers.
So perhaps a `CHECK_NE` is actually the appropriate thing.
| Auto-Submit | +1 |
I didn't look at the full details of this but I am fine with us blocking BFCache if a frame is in `kRunningUnloadHandlers" or `kPendingDeletion` as long as we are mindful of its impact on the BFCache hit-rate.
That said, what bad thing happens if we just allow the page to be BFCached?
A page cannot enter BFCache if any subframes have `unload` handlers. When considering BFCache, `pagehide` and `visibilitychange` handlers are run *before* making the final decision on BFCache. So I don't think it's possible to be in `kRunningUnloadHandlers` AND be going into BFCache.
So we would
- Wait for the non-unload handlers to complete (as we already do).
- Leave the lifecycle state of the subframe as `kPendingDeletion` on entering BFCache.
- Leave the lifecycle state of the subframe as `kPendingDeletion` on restoring from BFCache.
- Allow cleanup of pending-deletion frames while in BFCache or after restore.
So our invariant would be that a frame in `kInBackForewardCache` can have subframes that are in `kInBackForewardCache` or `kPendingDeletion`.
Thanks Fergal, that was exactly my other suggestion. I think we can start with what we have and, if we see a drop in the hit rate, we can implement this other approach.
A possibly simpler alternative to the 10s delay (which might be flaky on some slower bots) is using [DoNotDeleteForTesting](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.h;l=2716;drc=8900162810d457b64286ce653cb61bc2009c5068) on the subframe's RFH here. Then we also may not need the PostDelayedTask and the delay feature param.
Ah, easy to miss that method in such a big class 😊
In any case, I want to keep the param. I think it would be valuable to have some value in the study from the beginning.
// 1) Delete RFH.Aldo Culquicondorminor nit: insert "subframe"
Done
Is there a way to revert these numerous bracket fixes? It's made it difficult to focus on the actual changes for reviewing, and it'd be nice to avoid polluting the blame history with your username in all these places. :)
Oops, I didn't realize there were so many. The new tool I'm using, jj, doesn't know yet how to apply format to the changed lines only. I created a new CL just for that https://crrev.com/c/7086847
// reached
// the pending commit stage.Aldo Culquicondornit: fix wrapping
Done
RenderFrameHostImpl::LifecycleStateImpl::kReadyToBeDeleted) {Fergal DalyShould this use IsPendingDeletion() which would include LifecycleStateImpl::kRunningUnloadHandlers too? I feel like kRunningUnloadHandlers has the same problem where the frame is pending async deletion and we don't want to later restore it as an active frame.
I believe `kRunningUnloadHandlers` means we are not going to BFCache. I think that could be better documented [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.h;l=1375;drc=fdfda9825390eca49ac469aecd3d80a689eef23c) because when `pagehide` and `visibilitychange` handlers run in the BFCacheable case, we do not enter this state. There is [no allowed transition](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18258;drc=5f6ce76066823f714b109b61aafc16ab29d598eb) from `kRunningUnloadHandlers` to anything except deletion.
So it kinda makes sense to me to also include that case, however I think if we are in `kRunningUnloadHandlers` then I think we are already blocked from BFCache by having unload handlers.
So perhaps a `CHECK_NE` is actually the appropriate thing.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Aldo CulquicondorA possibly simpler alternative to the 10s delay (which might be flaky on some slower bots) is using [DoNotDeleteForTesting](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.h;l=2716;drc=8900162810d457b64286ce653cb61bc2009c5068) on the subframe's RFH here. Then we also may not need the PostDelayedTask and the delay feature param.
Ah, easy to miss that method in such a big class 😊
In any case, I want to keep the param. I think it would be valuable to have some value in the study from the beginning.
It actually doesn't work, because the check happens before we set the status to ready to be deleted https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=5815;drc=6699d721a3a4738c9c21ee86f9c77bde0ebe88f1
I'll just use a very long time.
RenderFrameHostImpl::LifecycleStateImpl::kReadyToBeDeleted) {Fergal DalyShould this use IsPendingDeletion() which would include LifecycleStateImpl::kRunningUnloadHandlers too? I feel like kRunningUnloadHandlers has the same problem where the frame is pending async deletion and we don't want to later restore it as an active frame.
Aldo CulquicondorI believe `kRunningUnloadHandlers` means we are not going to BFCache. I think that could be better documented [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.h;l=1375;drc=fdfda9825390eca49ac469aecd3d80a689eef23c) because when `pagehide` and `visibilitychange` handlers run in the BFCacheable case, we do not enter this state. There is [no allowed transition](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18258;drc=5f6ce76066823f714b109b61aafc16ab29d598eb) from `kRunningUnloadHandlers` to anything except deletion.
So it kinda makes sense to me to also include that case, however I think if we are in `kRunningUnloadHandlers` then I think we are already blocked from BFCache by having unload handlers.
So perhaps a `CHECK_NE` is actually the appropriate thing.
Added `CHECK_NE`.
The `CHECK_NE` actually fails in existing tests (with my feature disabled) https://chromium-review.googlesource.com/c/chromium/src/+/7028393/14?tab=checks
Even though the test artificially makes the unload handlers slower, I think nothing prevents the main frame from entering the BFCache, other than this setting https://source.chromium.org/chromium/chromium/src/+/main:content/browser/site_per_process_unload_browsertest.cc;l=1753;drc=dc95463e2c8910a667aa570372b9d7ca275ce23d
So, could it happen in production? Should I implement the other approach?
// 1) Detach subframe RFH, but prevent its deletion.nit: maybe point out that the deletion is prevented by kDelayRfhDestructionsOnUnloadAndDetach with an artificically long delay.
RenderFrameHostImpl::LifecycleStateImpl::kReadyToBeDeleted) {Fergal DalyShould this use IsPendingDeletion() which would include LifecycleStateImpl::kRunningUnloadHandlers too? I feel like kRunningUnloadHandlers has the same problem where the frame is pending async deletion and we don't want to later restore it as an active frame.
Aldo CulquicondorI believe `kRunningUnloadHandlers` means we are not going to BFCache. I think that could be better documented [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.h;l=1375;drc=fdfda9825390eca49ac469aecd3d80a689eef23c) because when `pagehide` and `visibilitychange` handlers run in the BFCacheable case, we do not enter this state. There is [no allowed transition](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18258;drc=5f6ce76066823f714b109b61aafc16ab29d598eb) from `kRunningUnloadHandlers` to anything except deletion.
So it kinda makes sense to me to also include that case, however I think if we are in `kRunningUnloadHandlers` then I think we are already blocked from BFCache by having unload handlers.
So perhaps a `CHECK_NE` is actually the appropriate thing.
Aldo CulquicondorAdded `CHECK_NE`.
The `CHECK_NE` actually fails in existing tests (with my feature disabled) https://chromium-review.googlesource.com/c/chromium/src/+/7028393/14?tab=checks
Even though the test artificially makes the unload handlers slower, I think nothing prevents the main frame from entering the BFCache, other than this setting https://source.chromium.org/chromium/chromium/src/+/main:content/browser/site_per_process_unload_browsertest.cc;l=1753;drc=dc95463e2c8910a667aa570372b9d7ca275ce23d
So, could it happen in production? Should I implement the other approach?
That's interesting that the CHECK_NE fails. Fergal, where is the code that ensures that a page can't enter BFCache if one of its subframes is currently executing an unload/pagehide/visibilitychange handler as part of getting destroyed (i.e., it actually is in the kRunningUnloadHandlers state)? Maybe looking at that code might help figure out how the CHECK_NE can fail in practice.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Aldo CulquicondorI didn't look at the full details of this but I am fine with us blocking BFCache if a frame is in `kRunningUnloadHandlers" or `kPendingDeletion` as long as we are mindful of its impact on the BFCache hit-rate.
That said, what bad thing happens if we just allow the page to be BFCached?
A page cannot enter BFCache if any subframes have `unload` handlers. When considering BFCache, `pagehide` and `visibilitychange` handlers are run *before* making the final decision on BFCache. So I don't think it's possible to be in `kRunningUnloadHandlers` AND be going into BFCache.
So we would
- Wait for the non-unload handlers to complete (as we already do).
- Leave the lifecycle state of the subframe as `kPendingDeletion` on entering BFCache.
- Leave the lifecycle state of the subframe as `kPendingDeletion` on restoring from BFCache.
- Allow cleanup of pending-deletion frames while in BFCache or after restore.
So our invariant would be that a frame in `kInBackForewardCache` can have subframes that are in `kInBackForewardCache` or `kPendingDeletion`.
Thanks Fergal, that was exactly my other suggestion. I think we can start with what we have and, if we see a drop in the hit rate, we can implement this other approach.
Unless it's really hard, I would very much prefer to not introduce a new blocking reason. Even if things look good now, we're leaving a door open. All we need is for ads or google tag manager to start doing something like deleting an iframe in their `pagehide` handler to suddenly have this impact BFCache for a non-trivial % of pages.
RenderFrameHostImpl::LifecycleStateImpl::kReadyToBeDeleted) {Fergal DalyShould this use IsPendingDeletion() which would include LifecycleStateImpl::kRunningUnloadHandlers too? I feel like kRunningUnloadHandlers has the same problem where the frame is pending async deletion and we don't want to later restore it as an active frame.
Aldo CulquicondorI believe `kRunningUnloadHandlers` means we are not going to BFCache. I think that could be better documented [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.h;l=1375;drc=fdfda9825390eca49ac469aecd3d80a689eef23c) because when `pagehide` and `visibilitychange` handlers run in the BFCacheable case, we do not enter this state. There is [no allowed transition](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18258;drc=5f6ce76066823f714b109b61aafc16ab29d598eb) from `kRunningUnloadHandlers` to anything except deletion.
So it kinda makes sense to me to also include that case, however I think if we are in `kRunningUnloadHandlers` then I think we are already blocked from BFCache by having unload handlers.
So perhaps a `CHECK_NE` is actually the appropriate thing.
Aldo CulquicondorAdded `CHECK_NE`.
Alex MoshchukThe `CHECK_NE` actually fails in existing tests (with my feature disabled) https://chromium-review.googlesource.com/c/chromium/src/+/7028393/14?tab=checks
Even though the test artificially makes the unload handlers slower, I think nothing prevents the main frame from entering the BFCache, other than this setting https://source.chromium.org/chromium/chromium/src/+/main:content/browser/site_per_process_unload_browsertest.cc;l=1753;drc=dc95463e2c8910a667aa570372b9d7ca275ce23d
So, could it happen in production? Should I implement the other approach?
That's interesting that the CHECK_NE fails. Fergal, where is the code that ensures that a page can't enter BFCache if one of its subframes is currently executing an unload/pagehide/visibilitychange handler as part of getting destroyed (i.e., it actually is in the kRunningUnloadHandlers state)? Maybe looking at that code might help figure out how the CHECK_NE can fail in practice.
I was mistaken. Despite what it says in the start of the comment on [GetCurrentBackForwardCacheEligibility](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/back_forward_cache_impl.h;l=241;drc=c96a878b1cb45f60aac2285ffbdbd6b53dc92415), we call this at the point where `pagehide` handlers are still running. However at that point, the state is `kActive` not `kIsRunningUnloadHandlers`.
So the `CHECK_NE` is violated by the case where a subframe is part-way through deleting when we enter BFCache. This is `DoesNotCacheSubframeReadyToBeDeleted` which you are adding and perhaps a couple of other tests hitting it accidentally.
So I think we already have an existing bad case where if we have a subframe with a slow `pagehide` handler being deleted from page which then goes into BFCache, the submframe would go into `kIsRunningUnloadHandlers` and from there go to `kInBackForwardCache` and then even worse, potentially go to `kActive`.
So
We have visitors this week, with a workshop for the next couple of days. Do you think you could add a test for the above "slow pagehide" case too?
We could fix it by adding a blocking reason or by making it work.