Auditing on WebContentsImpl::FromRenderFrameHostID callers.

62 views
Skip to first unread message

Yoshisato Yanagisawa

unread,
Oct 19, 2021, 6:02:28 AM10/19/21
to navigat...@chromium.org
Hi,

As a part of https://bugs.chromium.org/p/chromium/issues/detail?id=1259521, I did auditing on WebContentsImpl::FromRenderFrameHostID callers.  (I want to thank Tal for helping me to understand the issue itself, and helping me for auditing)

tl;dr there are 5 unsure cases who use WebContents instance stored with DownloadItemUtils::AttachInfo.

WebContentsImpl::FromRenderFrameHostID has 9 callers.  It is difficult for me to explain general tendency of the code because it scatters.  Let me break them down:
  • 1 calls an overloaded WebContentsImpl::FromRenderFrameHostId.  We can ignore this.
  • 3 indirect usages of WebContentsImpl instance got by the method.  We need investigation on such indirect callers.
    • 1 returns WebContents instance got in the function. (*1)
    • 2 stores WebContents instance to SupportsUserData with a static string key via DownloadItemUtils::AttachInfo. (*2)
  • 1 unsure case.  I made it easily found with https://chromium-review.googlesource.com/c/chromium/src/+/3218840.
  • 1 confirmed deprecation.  It is the code inside AppCache.
  • 3 safe cases.
Let me break down (*1) and (*2).
(*1) https://source.chromium.org/chromium/chromium/src/+/main:content/browser/storage_partition_impl.cc;l=419;drc=e882b8e4a8272f65cb14c608d3d2bc4f0512aa20?q=FromRenderFrameHostID&ss=chromium%2Fchromium%2Fsrc has 5 callers, all of them might be fine.  They use WebContents to handle per tab data not per-frame data.  All of callers are in the same file because it is calling an anonymous function.

There are 21 callers, and 16 of them looks fine.  I was not sure for following 5 GetWebContents callers:
I have updated rows between 9969 and 10003 to explain this auditing in https://docs.google.com/spreadsheets/d/1q6Cs8rebBo__KVb8DUTeg2hxRfrAkfu1e9MMj5c7WvM/view#gid=0.

I want somebody to take a look at the cases I was not sure.  Cases I was not sure often handles refer chain, and I am not sure it s valid use of web_contents or not.

Thank you,
Yoshisato.


Kevin McNee

unread,
Oct 19, 2021, 2:02:57 PM10/19/21
to Yoshisato Yanagisawa, navigation-dev
The first one appears to largely be for showing prompts to the user which is likely fine. One of the callers is for logging a console message, which could perhaps be better targeted to a specific frame/document, but wouldn't be a major issue.

In the case of DownloadItemUtils::AttachInfo, it looks like the callers with a non-null WebContents also have a RFH id, so it looks like we could store a GlobalRenderFrameHostId with the DownloadItemData*, if any callers of DownloadItemUtils::GetWebContents need per-RFH info. Of the ones you've listed, just accessing the Profile is fine, the one showing a dialog is likely fine, the plugin one should probably use RFH. I'm not sure about the referrer chain.

* Or possibly store the GlobalRenderFrameHostId in lieu of the WC.

--
You received this message because you are subscribed to the Google Groups "navigation-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to navigation-de...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/navigation-dev/CAPNB-6XD70E_UH4UyKKEJ9cHj%3DbfHsJQtmBjggrzANS9YutrhQ%40mail.gmail.com.

Yoshisato Yanagisawa

unread,
Oct 19, 2021, 11:03:02 PM10/19/21
to Kevin McNee, navigation-dev
Thank you for the advice,

2021年10月20日(水) 3:02 Kevin McNee <mc...@chromium.org>:
The first one appears to largely be for showing prompts to the user which is likely fine. One of the callers is for logging a console message, which could perhaps be better targeted to a specific frame/document, but wouldn't be a major issue.

I feel I failed to catch up on this.  Do you mean this?
If yes, thank you for verifying.
 
In the case of DownloadItemUtils::AttachInfo, it looks like the callers with a non-null WebContents also have a RFH id, so it looks like we could store a GlobalRenderFrameHostId with the DownloadItemData*, if any callers of DownloadItemUtils::GetWebContents need per-RFH info. Of the ones you've listed, just accessing the Profile is fine, the one showing a dialog is likely fine, the plugin one should probably use RFH. I'm not sure about the referrer chain.


* Or possibly store the GlobalRenderFrameHostId in lieu of the WC.

This sounds like a broad change.  Can I ask you to tell me more on why storing the GlobalRenderFrameHostId is better than storing the WC?
 
Thank you,
Yoshisato.

Kevin McNee

unread,
Oct 20, 2021, 12:38:22 PM10/20/21
to Yoshisato Yanagisawa, navigation-dev
Yes, that matches my understanding.

Regarding WC vs GlobalRenderFrameHostId, the RFH id would be a more specific attribution of what triggered a download. And if you have a RFH, you could look up its WC. Although replacing wouldn't work if this code relies on the download item outliving the RFH, since then the RFH->WC lookup couldn't be done. In that case, we'd need the download item data to have both the WC and the RFH id.

Yoshisato Yanagisawa

unread,
Oct 20, 2021, 10:56:55 PM10/20/21
to Kevin McNee, navigation-dev
Thanks for the reply,
Then the storing WebContents instance here sounds like a legacy when WebCountents was tightly coupled with FrameTreeNode.
We might need to make DownloadItemData::Attach to take GlobalRenderFrameHostId in addition to WebContents and BrowserContext, and prefer to use WebContens coming from GlobalRenderFrameHostId if it is available.  Since it requires always looking up for getting WebContents from GlobalRenderFrameHostId, I am not sure it is acceptable from a performance perspective.

I am new to the team and do not know the procedure, but how do people usually proceed with these kinds of changes?  Do you usually write a simple design / planning doc etc?  Or, just sending CL to do the change is fine?

2021年10月21日(木) 1:38 Kevin McNee <mc...@chromium.org>:

Kevin McNee

unread,
Oct 21, 2021, 12:36:04 PM10/21/21
to Yoshisato Yanagisawa, navigation-dev
If DownloadItemData will store both WebContents and GlobalRenderFrameHostId, then doing the lookup wouldn't really be necessary, since RFHs don't change WebContents. If you like, you could DCHECK that the lookup would have produced the same WebContents.

As for whether to write a design doc, since these appear to be cases of https://bugs.chromium.org/p/chromium/issues/detail?id=1061899 , I think referring to that in a CL description along with describing how adding the RFH id to download item data addresses the attribution issues should be enough.

Yoshisato Yanagisawa

unread,
Oct 24, 2021, 9:32:23 PM10/24/21
to Kevin McNee, navigation-dev
Thanks for the reply, but to be honest, I am feeling confused about your answer.
I was thinking of the flow like:
1. Both WebContents and GlobalRenderFrameHostId are stored with DownloadItemData::Attach.
2. In case of GetWebContents call, WebContents will be obtained from stored GlobalRenderFrameHostId.
3. Step 2. failed to get the WebContents, GetWebContents returns WebContents stored in Step 1.

If WebContents stored in the Step 1 and WebContents obtained from GlobalRenderFrameHostId always match, why do we need to store RFH id in addition to WC?  I may misunderstand what you meant with "In that case, we'd need the download item data to have both the WC and the RFH id.", though.

As far as I understand, the linked crbug might want to prevent people from using GetMainFrame to get the active RFH.  To make this CL aligned with the crbug, I wonder if we can keep GetWebContents function itself because it may lead to misuse of WebContents by getting RFH from given WebContents.  With the scenario, GetWebContents function will be deprecated and GetGlobalRenderFrameHostId function would be newly made.  Callers of GetGlobalRenderFrameHostId can convert returned RFH id to WC if they want.  In that case, we break code that relies on the download item outliving the RFH.

Can I ask again what is expected to be stored with DownloadItemData::Attach and why?
Thank you,
Yoshisato.

2021年10月22日(金) 1:36 Kevin McNee <mc...@chromium.org>:

Kevin McNee

unread,
Oct 25, 2021, 3:34:19 PM10/25/21
to Yoshisato Yanagisawa, navigation-dev
In step 2, we could just return the stored WebContents. There's no need for an additional RFH->WC lookup, since that lookup would either fail or return the same WebContents that's being stored.

> If WebContents stored in the Step 1 and WebContents obtained from GlobalRenderFrameHostId always match, why do we need to store RFH id in addition to WC?
The RFH id would be used for the cases where the WC would not be specific enough for attribution. That is, the plugin and possibly the referrer chain call sites would be changed to call a new method: DownloadItemUtils::GetRenderFrameHost() (or GetRenderFrameHostId if preferred) which would use the RFH id.

Keeping DownloadItemUtils::GetWebContents seems appropriate for the call sites where use of the WebContents is correct.

> Can I ask again what is expected to be stored with DownloadItemData::Attach and why?
The way I'm imagining this is that it would store a BrowserContext, WebContents, and RenderFrameHost id, in order to implement DownloadItemUtils::GetBrowserContext, DownloadItemUtils::GetWebContents, and DownloadItemUtils::GetRenderFrameHost respectively.

Yoshisato Yanagisawa

unread,
Oct 25, 2021, 9:29:13 PM10/25/21
to Kevin McNee, navigation-dev
Thank you, I feel I started to understand what you meant.  The goal is preventing people from using WebContents for getting RenderFrameHost.  However, the current DownloadItemUtils does provide the way to get RenderFrameHost directly, and callers who need RenderFrameHost should get the RenderFrameHost via WebContents.  If DownloadItemUtils store RenderFrameHostId with WebContents and BrowserContext, the callers can use the valid way to get RenderFrameHost with the newly created function GetRenderFrameHost or GetRenderFrameHostId.  (For allowing the caller to guess the execution cost on looking up RFH from the Id, I prefer the latter signature if we record RFH Id)
Let me implement the function and use the functions where RenderFrameHost is obtained via WebContents.

2021年10月26日(火) 4:34 Kevin McNee <mc...@chromium.org>:

Yoshisato Yanagisawa

unread,
Oct 26, 2021, 1:18:44 AM10/26/21
to Kevin McNee, navigation-dev
I have written the CL: https://chromium-review.googlesource.com/c/chromium/src/+/3245015

While investigating the caller side of the function, I noticed that some of them use web_contents->GetBrowserContext() to get browser context instead of using the BrowserContext stored in DownloadItemData.  I do not know why WebContent's BrowserContext is preferred to DownloadManagerImpl's BrowserContext, but let me leave it as-is.

2021年10月26日(火) 10:28 Yoshisato Yanagisawa <yyana...@chromium.org>:

Kevin McNee

unread,
Oct 26, 2021, 2:39:25 PM10/26/21
to Yoshisato Yanagisawa, navigation-dev
> The goal is preventing people from using WebContents for getting RenderFrameHost.
Yeah, call sites assuming that they can use WebContents::GetMainFrame when the event they're handling was triggered by another RenderFrameHost is a problem (issue 1061899).

> ... If DownloadItemUtils store RenderFrameHostId with WebContents and BrowserContext, the callers can use the valid way to get RenderFrameHost ...
Yeah the callers of DownloadItemUtils::AttachInfo have access to the correct RenderFrameHost, so we can pass that through to AttachInfo, as you've done in the CL.

Yoshisato Yanagisawa

unread,
Oct 27, 2021, 7:54:40 AM10/27/21
to Kevin McNee, navigation-dev
Thank you for the reply,

Since I assume RenderFrameHost::FromID may not work for web_contents->GetMainFrame()->GetGlobalId(), whose web_contents is generated with WebContentsTester::CreateTestWebContents, I decided to add the special function for tests to set web_contents directly.
Let me ask you a review.

2021年10月27日(水) 3:39 Kevin McNee <mc...@chromium.org>:

Yoshisato Yanagisawa

unread,
Nov 4, 2021, 9:22:28 PM11/4/21
to Kevin McNee, navigation-dev
Let me close the loop.
There are several unsure cases for DownloadItemUtils::GetWebContents usage.  Others were already fixed or confirmed not affected.
We worked on mitigating the case with tracking RenderFrameHostId instead of WebContents and using it in case.  However, during the code review for https://chromium-review.googlesource.com/c/chromium/src/+/3245015, we decided to reset stored WebContents on primary page change instead of keeping RenderFrameHostId.
I believe it also prevents misuse of WebContents for other cases, and we are sure callers of WebContentsImpl::FromRenderFrameHostID use returned WebContents for per-tab purposes.

2021年10月27日(水) 20:54 Yoshisato Yanagisawa <yyana...@chromium.org>:
Reply all
Reply to author
Forward
0 new messages