How to debug flaky Mojo binding

227 views
Skip to first unread message

Marshall Greenblatt

unread,
Feb 7, 2022, 5:38:32 PM2/7/22
to chromium-mojo
Hi All,

I have an application running on MacOS (based on Chromium M97) where I'm initiating a Mojo binding from the renderer process and finding that my associated implementation in the browser process is not being executed. Referencing the below example code [1], I've verified that GetInterface is being called in the renderer process but the MyInterfaceImpl object is not being created in the browser process. Complicating matters is that this failure appears to be timing related, in that it succeeds most of the time, but will fail more frequently when the system is under heavy CPU load. I've also been unable to reproduce this failure on Windows using the same code.

Is there a recommended way to debug why a specific Mojo connection is failing? Am I doing anything wrong in the below code that might explain this behavior (wrong usage, wrong callback methods, etc)?

Thanks,
Marshall

[1] Example code:

From ContentRendererClient::RenderFrameCreated in the renderer process:

mojo::Remote<my::mojom::MyInterface> remote;
render_frame->GetBrowserInterfaceBroker()->GetInterface(
    remote.BindNewPipeAndPassReceiver());
remote->ExecuteFoo();  // This message never arrives.

From ContentBrowserClient::RegisterBrowserInterfaceBindersForFrame in the browser process:

map->Add<my::mojom::MyInterface>(base::BindRepeating(
    [](content::RenderFrameHost* frame_host,
       mojo::PendingReceiver<my::mojom::MyInterface> receiver) {
      // This object is never created.
      new MyInterfaceImpl(frame_host, std::move(receiver));
    }));


Marshall Greenblatt

unread,
Feb 8, 2022, 4:18:39 PM2/8/22
to chromium-mojo
The issue appears to be a new navigation that (in M98) results in the following log message:

WARNING:render_frame_host_impl.cc(1196)] InterfaceRequest was dropped, the document is no longer active: my.mojom.MyInterface

From the code (later removed in this commit):
// Logs interface requests that arrive after the frame has already committed a
// non-same-document navigation, and has already unbound
// |broker_receiver_| from the interface connection that had been used to
// service RenderFrame::GetBrowserInterfaceBroker for the previously active
// document in the frame.
The problem, for me, is that ContentRendererClient::RenderFrameCreated is not being called a 2nd time after this new navigation.

Is there a more appropriate renderer process callback for initiating (or retrying) the Mojo binding? For example, should I use something like RunScriptsAtDocumentStart instead?

Thanks,
Marshall

K. Moon

unread,
Feb 8, 2022, 4:47:46 PM2/8/22
to Marshall Greenblatt, chromium-mojo, blink-dev
Not an expert on any of this, but it sounds to me like the lifetime of your Mojo interface vis-à-vis the state of the renderer isn't well-defined, which is why you're running into problems with frame creation vs. navigation. Maybe it'd help to link to a WIP change or design or something, and someone can suggest an appropriate pattern?

+blink-dev, since this sounds like something a Blink developer would know about.

--
You received this message because you are subscribed to the Google Groups "chromium-mojo" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-moj...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-mojo/CAFW9UJ-9rN1ZbARopY4F%3DcCo35biOaj385NiyQibmTLjZDaxfA%40mail.gmail.com.

John Abd-El-Malek

unread,
Feb 8, 2022, 4:57:58 PM2/8/22
to Marshall Greenblatt, navigation-dev, Rakina Zata Amni, chromium-mojo
On Tue, Feb 8, 2022 at 1:18 PM Marshall Greenblatt <magree...@gmail.com> wrote:
The issue appears to be a new navigation that (in M98) results in the following log message:


To narrow this down it'd be helpful for you to bisect when you started seeing a failure.
 

Dave Tapuska

unread,
Feb 8, 2022, 5:06:56 PM2/8/22
to John Abd-El-Malek, Marshall Greenblatt, navigation-dev, Rakina Zata Amni, chromium-mojo
You can also set_disconnect_handler on the Remote if that is useful. I wonder if it is creating a speculative RenderFrame and then that is tossed away before your message arrives.

dave.

Marshall Greenblatt

unread,
Feb 8, 2022, 5:07:16 PM2/8/22
to K. Moon, chromium-mojo, blink-dev
On Tue, Feb 8, 2022 at 4:47 PM K. Moon <km...@chromium.org> wrote:
Not an expert on any of this, but it sounds to me like the lifetime of your Mojo interface vis-à-vis the state of the renderer isn't well-defined, which is why you're running into problems with frame creation vs. navigation. Maybe it'd help to link to a WIP change or design or something, and someone can suggest an appropriate pattern?

+blink-dev, since this sounds like something a Blink developer would know about.

Thanks for bringing in the larger audience. One existing example of my usage pattern (not my code specifically) is CastContentRendererClient::RenderFrameCreated which could presumably experience a similar problem with mojom::ApplicationMediaCapabilities if a navigation occurred.

I'm basically trying to accomplish (in the renderer process) what DocumentService provides in the browser process, but without terminating the mojo connection when the document has finished loading. In other words, a frame-based connection that remains live until that frame is navigated or destroyed.

Marshall Greenblatt

unread,
Feb 8, 2022, 5:29:42 PM2/8/22
to John Abd-El-Malek, navigation-dev, Rakina Zata Amni, chromium-mojo
On Tue, Feb 8, 2022 at 4:57 PM John Abd-El-Malek <j...@chromium.org> wrote:


On Tue, Feb 8, 2022 at 1:18 PM Marshall Greenblatt <magree...@gmail.com> wrote:
The issue appears to be a new navigation that (in M98) results in the following log message:


Thanks for the suggestion. It looks like that particular change landed in M98 and I'm seeing the issue back to at least M87.
 

To narrow this down it'd be helpful for you to bisect when you started seeing a failure.

Unfortunately my current reproduction case involves a whole third-party application. I'll try to bisect in broad strokes and see if I can at least identify the milestone where the issue started.

Marshall Greenblatt

unread,
Feb 8, 2022, 6:08:10 PM2/8/22
to Dave Tapuska, John Abd-El-Malek, navigation-dev, Rakina Zata Amni, chromium-mojo
On Tue, Feb 8, 2022 at 5:06 PM Dave Tapuska <dtap...@chromium.org> wrote:
You can also set_disconnect_handler on the Remote if that is useful. I wonder if it is creating a speculative RenderFrame and then that is tossed away before your message arrives.

Thanks for the suggestion. I am getting the disconnect callback in the binding failure case, so it may be possible to implement retry logic. I'll see if I can find a reliable way to identify when a frame in the renderer process is navigating vs getting destroyed (so as not to retry in the latter case).
Reply all
Reply to author
Forward
0 new messages