| Auto-Submit | +1 |
Aldo CulquicondorI didn't look at the full details of this but I am fine with us blocking BFCache if a frame is in `kRunningUnloadHandlers" or `kPendingDeletion` as long as we are mindful of its impact on the BFCache hit-rate.
That said, what bad thing happens if we just allow the page to be BFCached?
A page cannot enter BFCache if any subframes have `unload` handlers. When considering BFCache, `pagehide` and `visibilitychange` handlers are run *before* making the final decision on BFCache. So I don't think it's possible to be in `kRunningUnloadHandlers` AND be going into BFCache.
So we would
- Wait for the non-unload handlers to complete (as we already do).
- Leave the lifecycle state of the subframe as `kPendingDeletion` on entering BFCache.
- Leave the lifecycle state of the subframe as `kPendingDeletion` on restoring from BFCache.
- Allow cleanup of pending-deletion frames while in BFCache or after restore.
So our invariant would be that a frame in `kInBackForewardCache` can have subframes that are in `kInBackForewardCache` or `kPendingDeletion`.
Fergal DalyThanks Fergal, that was exactly my other suggestion. I think we can start with what we have and, if we see a drop in the hit rate, we can implement this other approach.
Aldo CulquicondorUnless it's really hard, I would very much prefer to not introduce a new blocking reason. Even if things look good now, we're leaving a door open. All we need is for ads or google tag manager to start doing something like deleting an iframe in their `pagehide` handler to suddenly have this impact BFCache for a non-trivial % of pages.
It wasn't that hard 😊
PTAL
I think I can fix the test by adding an observer for the lifecycle state change in the subframe. But since the timezones are so tricky, PTAL even if there are failures, for the general direction of the change.
// 1) Detach subframe RFH, but prevent its deletion.Aldo Culquicondornit: maybe point out that the deletion is prevented by kDelayRfhDestructionsOnUnloadAndDetach with an artificically long delay.
Done
RenderFrameHostImpl::LifecycleStateImpl::kReadyToBeDeleted) {Fergal DalyShould this use IsPendingDeletion() which would include LifecycleStateImpl::kRunningUnloadHandlers too? I feel like kRunningUnloadHandlers has the same problem where the frame is pending async deletion and we don't want to later restore it as an active frame.
Aldo CulquicondorI believe `kRunningUnloadHandlers` means we are not going to BFCache. I think that could be better documented [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.h;l=1375;drc=fdfda9825390eca49ac469aecd3d80a689eef23c) because when `pagehide` and `visibilitychange` handlers run in the BFCacheable case, we do not enter this state. There is [no allowed transition](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18258;drc=5f6ce76066823f714b109b61aafc16ab29d598eb) from `kRunningUnloadHandlers` to anything except deletion.
So it kinda makes sense to me to also include that case, however I think if we are in `kRunningUnloadHandlers` then I think we are already blocked from BFCache by having unload handlers.
So perhaps a `CHECK_NE` is actually the appropriate thing.
Aldo CulquicondorAdded `CHECK_NE`.
Alex MoshchukThe `CHECK_NE` actually fails in existing tests (with my feature disabled) https://chromium-review.googlesource.com/c/chromium/src/+/7028393/14?tab=checks
Even though the test artificially makes the unload handlers slower, I think nothing prevents the main frame from entering the BFCache, other than this setting https://source.chromium.org/chromium/chromium/src/+/main:content/browser/site_per_process_unload_browsertest.cc;l=1753;drc=dc95463e2c8910a667aa570372b9d7ca275ce23d
So, could it happen in production? Should I implement the other approach?
Fergal DalyThat's interesting that the CHECK_NE fails. Fergal, where is the code that ensures that a page can't enter BFCache if one of its subframes is currently executing an unload/pagehide/visibilitychange handler as part of getting destroyed (i.e., it actually is in the kRunningUnloadHandlers state)? Maybe looking at that code might help figure out how the CHECK_NE can fail in practice.
Aldo CulquicondorI was mistaken. Despite what it says in the start of the comment on [GetCurrentBackForwardCacheEligibility](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/back_forward_cache_impl.h;l=241;drc=c96a878b1cb45f60aac2285ffbdbd6b53dc92415), we call this at the point where `pagehide` handlers are still running. However at that point, the state is `kActive` not `kIsRunningUnloadHandlers`.
So the `CHECK_NE` is violated by the case where a subframe is part-way through deleting when we enter BFCache. This is `DoesNotCacheSubframeReadyToBeDeleted` which you are adding and perhaps a couple of other tests hitting it accidentally.
So I think we already have an existing bad case where if we have a subframe with a slow `pagehide` handler being deleted from page which then goes into BFCache, the submframe would go into `kIsRunningUnloadHandlers` and from there go to `kInBackForwardCache` and then even worse, potentially go to `kActive`.
So
- This CHECK_NE is not correct.
- I think we already have a problem.
- Your work is widening this window for this race.
We have visitors this week, with a workshop for the next couple of days. Do you think you could add a test for the above "slow pagehide" case too?
We could fix it by adding a blocking reason or by making it work.
Implemented skipping recovery of terminating subframes.
if (rfh && rfh->lifecycle_state() == LifecycleStateImpl::kActive) {Rakina Zata AmniI'm not sure we can do this, since it'll mean that some subframes of a bfcached RFH might be in pending deletion state, rather than in kInBackForwardCache, which breaks an existing invariant. It also means that if we were to restore a bfcached RFH before those subframes get a chance to complete unloading, these subframes may get restored with an kActive state, potentially going from kPendingDeletion to kActive, which is doubly weird and will probably fail our invariants on allowed state transitions.
@rak...@chromium.org do you know what the expected behavior is if we try to bfcache a RFH that has some subframes in pending deletion state? Do we actually allow that, or is this supposed to not be eligible for bfcache? It would seem that we should disallow it, because we assume the cached RFH and all its subframes will be restored in the same kActive state [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18337;drc=a416f377ead3d8c361b97126daf78381344937dd).
And also, do you know if anything around bfcache eligibility or RFH unloading has changed in the past week or so? This CL used to be green a few days ago, but started failing on GpuNormalTermination_NewWebGLNotBlocked (see failures in PS5). (The feature here postpones subframe RFH destruction by posting a task to do it, which means that even subframe RFHs without unload handlers might now be in pending deletion state until that task runs).
Aldo CulquicondorOh that's an interesting case! I don't think we disallow BFCache right now when that happens, but probably we should make that make the page ineligible, as it might break some more invariants as you point out. I'm not aware of any recent BFCache changes, but hopefully you can just fix this by adding a new check for the lifecycyle state and add a new NotRestoredReason for the pending deletion case somewhere [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/back_forward_cache_impl.cc;l=1117;drc=7189da78436702acd8f20043b0f394b04f588093).
Aldo CulquicondorThanks Rakina, it looks like we reached similar conclusions.
Indeed, I am puzzled as to why this only started failing on Friday, even though I've been running these tests for weeks now. And it's failing very consistently. I don't see any code that has recently changed or any new experiment enabled in field trials. The only exception is this cleanup that I already tried reverting https://chromium-review.googlesource.com/c/chromium/src/+/7064190, but it still fails.
I agree with the proposed solution of making the page ineligible. Is this a temporary state? Would the page be considered for storage again once the pending subframes are deleted?
Alex MoshchukAlternatively, what if I change [this loop](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=18336;drc=a416f377ead3d8c361b97126daf78381344937dd) to skip RFHs that are kPendingDeletion or kRunningUnloadHandlers?
Rakina Zata AmniI agree with the proposed solution of making the page ineligible. Is this a temporary state? Would the page be considered for storage again once the pending subframes are deleted?
Yes, I'd expect that once any pending deletion subframes are deleted, the page would become eligible for bfcache again.
Alternatively, what if I change this loop to skip RFHs that are kPendingDeletion or kRunningUnloadHandlers?
While that could work, I think it would mean that we need to start supporting some new frame states (a bfcached main frame having children that are pending deletion, which could then be activated). I'm a bit worried that this will make it harder to reason about navigation invariants. Making the page ineligible for bfcache in this state seems like a simpler fix, if it works.
One more question here for Rakina: presumably we need to run pagehide handlers on a page entering bfcache. Does pagehide get to run for subframes too? Do we wait for it to finish before updating any browser process state on the bfcached page? (I wonder if the roundtrip to run pagehide would imply that any existing pending deletion subframes will finish running their unload handlers and get destroyed.)
Aldo Culquicondorpresumably we need to run pagehide handlers on a page entering bfcache. Does pagehide get to run for subframes too? Do we wait for it to finish before updating any browser process state on the bfcached page? (I wonder if the roundtrip to run pagehide would imply that any existing pending deletion subframes will finish running their unload handlers and get destroyed.)
We do run pagehide on the subframes, but that is triggered on the same time we set the page lifecycle state to frozen on the renderer (so after we call DidCommitand put the page in bfcache on the browser side). Is it guaranteed that the pending deletion subframes would've gotten deleted after a roundtrip IPC, or is it possible for the delayed deletion to be longer?
Alex MoshchukIs it guaranteed that the pending deletion subframes would've gotten deleted after a roundtrip IPC, or is it possible for the delayed deletion to be longer?
There are no guarantees as to when the deletions will run. In the long run, we might want to delay them until a moment of quiescence (nothing is loading and there is no input). They are still immediately deleted if the WebContents is destroyed.
Yes, I'd expect that once any pending deletion subframes are deleted, the page would become eligible for bfcache again.
I prototyped the proposed solution and added a test. The test reveals that the main RFH is deleted when not allowed to enter the cache. Is this because of how I constructed the test or would this always happen? Could this cause a regression on BFCache hits?
Rakina Zata AmniIs it guaranteed that the pending deletion subframes would've gotten deleted after a roundtrip IPC, or is it possible for the delayed deletion to be longer?
There are no guarantees as to when the deletions will run. In the long run, we might want to delay them until a moment of quiescence (nothing is loading and there is no input). They are still immediately deleted if the WebContents is destroyed.
Interesting, so I'm guessing that without your new kDelayRfhDestructionsOnUnloadAndDetach feature, the IPC to trigger unload handlers in the subframe would've necessarily been processed by the renderer prior to processing the IPC to run pagehide handlers, meaning that by the time we finished running pagehide handlers after entering bfcache, the browser probably finished removing any pending deletion subframes. So it was probably difficult/impossible to restore a page from bfcache which still had pending deletion subframes.
But if we want to delay the posted task for completing the RFH destruction, the duration of the pending deletion state can get longer and it becomes more probable that we could restore a page from bfcache which still had pending deletion subframes - so yeah, this just confirms that we probably don't want to allow pending deletion subframes on pages entering the bfcache in the first place.
> Yes, I'd expect that once any pending deletion subframes are deleted, the page would become eligible for bfcache again.I prototyped the proposed solution and added a test. The test reveals that the main RFH is deleted when not allowed to enter the cache. Is this because of how I constructed the test or would this always happen? Could this cause a regression on BFCache hits?
Yes, I think it's expected that the old main RFH would be destroyed if it doesn't enter bfcache.
It's true that this could affect bfcache hit rate. Rakina, do you think that can be a problem? I hope it's a rare enough case, but we should have metrics to observe the impact of it, and if it turns out to be larger than expected, we can revisit one of the alternatives (e.g., add the special casing to allow pending delete frames to enter and leave the cache while keeping their state).
Aldo CulquicondorYeah I think it's fine to add this case, I'm assuming it would be rare, and you can add "BackForwardCache.HistoryNavigationOutcome" to your histograms list when experimenting to watch for the impact on BFCache there. cc-ing @fer...@chromium.org just in case though.
Aldo CulquicondorAcknowledged. I will keep the histogram in mind.
Resolving
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Auto-Submit | +1 |
| Commit-Queue | +1 |
I think I can fix the test by adding an observer for the lifecycle state change in the subframe. But since the timezones are so tricky, PTAL even if there are failures, for the general direction of the change.
This didn't work on Android. See comment below.
"f.parentNode.removeChild(f);"));using child_rfh->DoNotDeleteForTesting() and checking for kRunningUnloadHandlers didn't work on Android, even if I added an observer. Any ideas why?
This is still a valuable test, I think, but any problems would only show as flakiness and not reliable failures.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
"f.parentNode.removeChild(f);"));using child_rfh->DoNotDeleteForTesting() and checking for kRunningUnloadHandlers didn't work on Android, even if I added an observer. Any ideas why?
This is still a valuable test, I think, but any problems would only show as flakiness and not reliable failures.
I'm guessing that on Android, a.com and b.com will end up in the same process due to no site isolation, so I think the detach and the running of unload handlers will all happen on the renderer side before the browser is notified - so effectively we'll just end up detaching the subframe that already ran its unload handlers, [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=5834-5835;drc=0a4bbd7daba492283f6b453c0b0a7d934f2b2ecd). (If the b.com subframe had any grandchild frames in OOPIFs, those would enter the kRunningUnloadHandlers check at this point, but there aren't any.)
You could do something like this to ensure that a.com and b.com end up in different processes, which would be needed for the browser process to drive the unload handler execution here: https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl_browsertest.cc;l=7982-7985;drc=1f707187191ead988b5bc4eb3442569051344acc
Ack. Given that we already have a problem in the case where a page is entering bfcache while some of its subframes are pending deletion, seems ok to modify the state change logic so that at least those subframes won't get restored as `kActive`.
// getting resored from the BFCache."restored"
| Code-Review | +1 |
// The frame is been restored from the BackForwardCache.has?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Auto-Submit | +1 |
Alex Moshchukusing child_rfh->DoNotDeleteForTesting() and checking for kRunningUnloadHandlers didn't work on Android, even if I added an observer. Any ideas why?
This is still a valuable test, I think, but any problems would only show as flakiness and not reliable failures.
I'm guessing that on Android, a.com and b.com will end up in the same process due to no site isolation, so I think the detach and the running of unload handlers will all happen on the renderer side before the browser is notified - so effectively we'll just end up detaching the subframe that already ran its unload handlers, [here](https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl.cc;l=5834-5835;drc=0a4bbd7daba492283f6b453c0b0a7d934f2b2ecd). (If the b.com subframe had any grandchild frames in OOPIFs, those would enter the kRunningUnloadHandlers check at this point, but there aren't any.)
You could do something like this to ensure that a.com and b.com end up in different processes, which would be needed for the browser process to drive the unload handler execution here: https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl_browsertest.cc;l=7982-7985;drc=1f707187191ead988b5bc4eb3442569051344acc
That was it, and forcing site-isolation worked 😊
// The frame is been restored from the BackForwardCache.Aldo Culquicondorhas?
This is called by `BackForwardCacheImpl::RestoreEntry`, so `is` makes sense.
// getting resored from the BFCache.Aldo Culquicondor"restored"
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
Thanks, LGTM with a couple of last fixes below.
{"b.com"});Looks like you need to do this after starting the embedded_test_server on the next line, as this can actually navigate. (Hopefully this fixes the current trybot failures)
// The frame is been restored from the BackForwardCache.nit: "being" to keep the grammar working.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Auto-Submit | +1 |
Looks like you need to do this after starting the embedded_test_server on the next line, as this can actually navigate. (Hopefully this fixes the current trybot failures)
I also needed an extra check in `RenderFrameHostImpl::OnSubframeDeletionUnloadTimeout`
// The frame is been restored from the BackForwardCache.nit: "being" to keep the grammar working.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
{"b.com"});Aldo CulquicondorLooks like you need to do this after starting the embedded_test_server on the next line, as this can actually navigate. (Hopefully this fixes the current trybot failures)
I also needed an extra check in `RenderFrameHostImpl::OnSubframeDeletionUnloadTimeout`
That seems fine. An alternative might've been to use [SetSubframeUnloadTimeoutForTesting]( https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl_browsertest.cc;l=5082;drc=1f707187191ead988b5bc4eb3442569051344acc) with a large value, but this seems cleaner.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Auto-Submit | +1 |
{"b.com"});Aldo CulquicondorLooks like you need to do this after starting the embedded_test_server on the next line, as this can actually navigate. (Hopefully this fixes the current trybot failures)
Alex MoshchukI also needed an extra check in `RenderFrameHostImpl::OnSubframeDeletionUnloadTimeout`
That seems fine. An alternative might've been to use [SetSubframeUnloadTimeoutForTesting]( https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/render_frame_host_impl_browsertest.cc;l=5082;drc=1f707187191ead988b5bc4eb3442569051344acc) with a large value, but this seems cleaner.
My approach didn't work because there is a test that depends on deletion to happen in that exact scenario. We probably should change that test, but I think I can follow up on that.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
24 is the latest approved patch-set.
The change was submitted with unreviewed changes in the following files:
```
The name of the file: content/browser/renderer_host/render_frame_host_impl.cc
Insertions: 0, Deletions: 3.
@@ -12305,9 +12305,6 @@
void RenderFrameHostImpl::OnSubframeDeletionUnloadTimeout() {
DCHECK(IsPendingDeletion());
- if (do_not_delete_for_testing_) {
- return;
- }
MaybeLogMissingUnloadAck();
parent_->RemoveChild(GetFrameTreeNodeForUnload());
}
```
```
The name of the file: content/browser/back_forward_cache_basics_browsertest.cc
Insertions: 2, Deletions: 0.
@@ -7,6 +7,7 @@
#include "base/strings/string_number_conversions.h"
#include "base/strings/stringprintf.h"
#include "base/test/bind.h"
+#include "base/time/time.h"
#include "content/browser/back_forward_cache_browsertest.h"
#include "content/browser/web_contents/web_contents_impl.h"
#include "content/common/content_navigation_policy.h"
@@ -1480,6 +1481,7 @@
// 1) Detach subframe RFH, but act as as if there was an infinite unload
// handler in the OOPIF.
child_rfh->DoNotDeleteForTesting();
+ child_rfh->SetSubframeUnloadTimeoutForTesting(base::Hours(1));
EXPECT_TRUE(ExecJs(main_rfh.get(),
"var f = document.querySelector('iframe');"
"f.parentNode.removeChild(f);"));
```
Avoid restoring subframes pending deletion from BFCache
When DelayRfhDestructionsOnUnloadAndDetach is enabled or when unload
handlers are delayed, it is possible that some subframes are pending
deletion at the time when the main frame is considered for the BFCache.
The main frame can be admitted, but the subframes should never be
restored.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |