Identifying frame interface disconnect reasons

497 views
Skip to first unread message

Marshall Greenblatt

unread,
Nov 21, 2024, 12:53:48 PM11/21/24
to chromium-mojo
Hi All,

I have a Chromium-based application based on current master (M132) where I'm initiating a frame-based Mojo connection from the renderer process when a RenderFrame is created (ContentRenderClient::RenderFrameCreated callback), like so:

// In the frame-specific .h file:
mojo::Remote<mojom::MyInterface> remote_;

// In the frame-specific .cc file:
render_frame->GetBrowserInterfaceBroker().GetInterface(
    remote_.BindNewPipeAndPassReceiver());
remote_.set_disconnect_with_reason_handler(
    base::BindOnce(&MyObject::OnDisconnect, this));

I've observed the disconnect handler being called in the following cases:
  1. The browser-side RFH no longer exists so the initial connection fails.
  2. The browser-side RFH is destroyed which closes the existing connection.
I also believe that the connection can fail or close for other reasons (possibly unrelated to RFH destruction), but I don't have sufficient data currently to prove that.

In all of these cases the "reason" associated with the disconnect in the renderer process appears to be 0/empty. I would like to differentiate between these cases so that I can handle them differently. For example, in the "other reasons" case I would like to retry the connection after a delay.

Is it possible to differentiate these disconnect cases currently? If not, could we add "reason" values for at least cases 1 and 2?

Thanks,
Marshall

Dave Tapuska

unread,
Nov 21, 2024, 1:07:13 PM11/21/24
to Marshall Greenblatt, chromium-mojo
It's not clear why you need to retry? You can treat GetInterface as being reliable. What situation are you encountering that it isn't disconnecting? Since this is a non-associated interface don't rely on other messages like RenderFrameObserver::OnDestruct to occur in the same logical sequence. ie. You might get the disconnect before (or after) the RenderFrameObserver::OnDestruct.

What you need to think about is prerendering. The default policy is to delay them. I imagine you aren't prerendering with CEF so it might not be an issue.

If you have more source code we might be able to give further suggestions.

dave.

--
You received this message because you are subscribed to the Google Groups "chromium-mojo" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-moj...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chromium-mojo/CAFW9UJ8%2BJ6LeLfooD7x91TVNnDkH0_qNCsaQpW9N72NXe1P84g%40mail.gmail.com.

Daniel Cheng

unread,
Nov 21, 2024, 1:22:49 PM11/21/24
to Dave Tapuska, Marshall Greenblatt, chromium-mojo
The "reason" comes from the `ResetWithReason()` method that you can call on Mojo endpoints: https://source.chromium.org/search?q=ResetWithReason%20f:%5Emojo%20lang:c%2B%2B&sq=&ss=chromium

If you don't call those, you just get the default reason of 0. I don't think there's a good way to retrofit something for 1 or 2 in.

In general, Mojo pipes don't disconnect randomly. They disconnect because one endpoint goes away. This can happen:
- if the endpoint is explicitly closed (with reset() or ResetWithReason(), as noted above)
- if the endpoint is implicitly closed (by never being bound by the receiving end or by its owner going away)
- if the process the endpoint is in goes away (e.g. it is killed or terminated)

Daniel

Marshall Greenblatt

unread,
Nov 21, 2024, 1:25:33 PM11/21/24
to Dave Tapuska, chromium-mojo
Thanks for the response!

On Thu, Nov 21, 2024 at 1:07 PM Dave Tapuska <dtap...@chromium.org> wrote:
It's not clear why you need to retry? You can treat GetInterface as being reliable. What situation are you encountering that it isn't disconnecting?

We're using long-lived Mojo connections for frame-specific message passing. By "retry" I mean calling GetInterface again for the same RenderFrame after a disconnect. The problem that we're trying to identify/solve is disconnects not related to RFH destruction. We have anecdotal evidence to suggest that initial connections sometimes fail (maybe related to resource contention), and that long-lived connections sometimes disconnect (maybe related to wake from hibernation, etc). These are the connections that we want to retry.
 
Since this is a non-associated interface don't rely on other messages like RenderFrameObserver::OnDestruct to occur in the same logical sequence. ie. You might get the disconnect before (or after) the RenderFrameObserver::OnDestruct.

Yes, I've noticed that too.
 

What you need to think about is prerendering. The default policy is to delay them. I imagine you aren't prerendering with CEF so it might not be an issue.

If you have more source code we might be able to give further suggestions.

Sure :) Renderer side starts here, browser side starts here. We have additional logic to handle bfcache and prerender (I'll need to test if the prerender case is working 100%).

Marshall Greenblatt

unread,
Nov 21, 2024, 1:36:07 PM11/21/24
to Daniel Cheng, Dave Tapuska, chromium-mojo
On Thu, Nov 21, 2024 at 1:22 PM Daniel Cheng <dch...@chromium.org> wrote:
The "reason" comes from the `ResetWithReason()` method that you can call on Mojo endpoints: https://source.chromium.org/search?q=ResetWithReason%20f:%5Emojo%20lang:c%2B%2B&sq=&ss=chromium

If you don't call those, you just get the default reason of 0. I don't think there's a good way to retrofit something for 1 or 2 in.

In general, Mojo pipes don't disconnect randomly. They disconnect because one endpoint goes away. This can happen:
- if the endpoint is explicitly closed (with reset() or ResetWithReason(), as noted above) 
- if the endpoint is implicitly closed (by never being bound by the receiving end or by its owner going away)

Right, I'm interested primarily in the implicit cases that are not directly handled by my code: "receiver never bound" and "mojo::Receiver<Interface> destroyed while bound". Is there any way we could add a non-0 default "reason" for these cases?
 
- if the process the endpoint is in goes away (e.g. it is killed or terminated)

This case should be OK. When the main process dies the renderer processes should also be terminated.

Daniel Cheng

unread,
Nov 21, 2024, 1:44:15 PM11/21/24
to Marshall Greenblatt, Dave Tapuska, chromium-mojo
On Thu, 21 Nov 2024 at 10:36, Marshall Greenblatt <magree...@gmail.com> wrote:
On Thu, Nov 21, 2024 at 1:22 PM Daniel Cheng <dch...@chromium.org> wrote:
The "reason" comes from the `ResetWithReason()` method that you can call on Mojo endpoints: https://source.chromium.org/search?q=ResetWithReason%20f:%5Emojo%20lang:c%2B%2B&sq=&ss=chromium

If you don't call those, you just get the default reason of 0. I don't think there's a good way to retrofit something for 1 or 2 in.

In general, Mojo pipes don't disconnect randomly. They disconnect because one endpoint goes away. This can happen:
- if the endpoint is explicitly closed (with reset() or ResetWithReason(), as noted above) 
- if the endpoint is implicitly closed (by never being bound by the receiving end or by its owner going away)

Right, I'm interested primarily in the implicit cases that are not directly handled by my code: "receiver never bound" and "mojo::Receiver<Interface> destroyed while bound". Is there any way we could add a non-0 default "reason" for these cases?

For the latter, you can add that yourself... something has to own a mojo::Receiver<Interface>, and whatever owns it can just ResetWithReason() in its destructor.
For the former... it's not implemented, and I don't think anyone has plans to implement it. And at this point, it'd be difficult/annoying to retrofit without potentially stomping on a numeric reason that some bit of code is already using for some other reason.

Daniel

Marshall Greenblatt

unread,
Nov 21, 2024, 1:50:17 PM11/21/24
to Daniel Cheng, Dave Tapuska, chromium-mojo
On Thu, Nov 21, 2024 at 1:44 PM Daniel Cheng <dch...@chromium.org> wrote:
On Thu, 21 Nov 2024 at 10:36, Marshall Greenblatt <magree...@gmail.com> wrote:
On Thu, Nov 21, 2024 at 1:22 PM Daniel Cheng <dch...@chromium.org> wrote:
The "reason" comes from the `ResetWithReason()` method that you can call on Mojo endpoints: https://source.chromium.org/search?q=ResetWithReason%20f:%5Emojo%20lang:c%2B%2B&sq=&ss=chromium

If you don't call those, you just get the default reason of 0. I don't think there's a good way to retrofit something for 1 or 2 in.

In general, Mojo pipes don't disconnect randomly. They disconnect because one endpoint goes away. This can happen:
- if the endpoint is explicitly closed (with reset() or ResetWithReason(), as noted above) 
- if the endpoint is implicitly closed (by never being bound by the receiving end or by its owner going away)

Right, I'm interested primarily in the implicit cases that are not directly handled by my code: "receiver never bound" and "mojo::Receiver<Interface> destroyed while bound". Is there any way we could add a non-0 default "reason" for these cases?

For the latter, you can add that yourself... something has to own a mojo::Receiver<Interface>, and whatever owns it can just ResetWithReason() in its destructor.

Good point! I'm looking into this now.
 
For the former... it's not implemented, and I don't think anyone has plans to implement it. And at this point, it'd be difficult/annoying to retrofit without potentially stomping on a numeric reason that some bit of code is already using for some other reason.

Makes sense. If I wanted to try implementing this for myself (as a further debugging step), where in the code should I look?

Dave Tapuska

unread,
Nov 21, 2024, 2:01:12 PM11/21/24
to Marshall Greenblatt, Daniel Cheng, chromium-mojo
If you are doing prerendering it is likely related to the mojo binder policy apply a delayed application. That is the connection is requested from the renderer but it isn't applied on the browser side until the page actually moves from the prender state to the active state. You can certainly add a policy to always allow the connection but you need to make sure that it is safe to do so. ie. The page won't be the active page but a page that isn't visible to the user.

dave.

Marshall Greenblatt

unread,
Nov 21, 2024, 2:04:09 PM11/21/24
to Daniel Cheng, Dave Tapuska, chromium-mojo
On Thu, Nov 21, 2024 at 1:50 PM Marshall Greenblatt <magree...@gmail.com> wrote:
On Thu, Nov 21, 2024 at 1:44 PM Daniel Cheng <dch...@chromium.org> wrote:
On Thu, 21 Nov 2024 at 10:36, Marshall Greenblatt <magree...@gmail.com> wrote:
On Thu, Nov 21, 2024 at 1:22 PM Daniel Cheng <dch...@chromium.org> wrote:
The "reason" comes from the `ResetWithReason()` method that you can call on Mojo endpoints: https://source.chromium.org/search?q=ResetWithReason%20f:%5Emojo%20lang:c%2B%2B&sq=&ss=chromium

If you don't call those, you just get the default reason of 0. I don't think there's a good way to retrofit something for 1 or 2 in.

In general, Mojo pipes don't disconnect randomly. They disconnect because one endpoint goes away. This can happen:
- if the endpoint is explicitly closed (with reset() or ResetWithReason(), as noted above) 
- if the endpoint is implicitly closed (by never being bound by the receiving end or by its owner going away)

Right, I'm interested primarily in the implicit cases that are not directly handled by my code: "receiver never bound" and "mojo::Receiver<Interface> destroyed while bound". Is there any way we could add a non-0 default "reason" for these cases?

For the latter, you can add that yourself... something has to own a mojo::Receiver<Interface>, and whatever owns it can just ResetWithReason() in its destructor.

Good point! I'm looking into this now.
 
For the former... it's not implemented, and I don't think anyone has plans to implement it. And at this point, it'd be difficult/annoying to retrofit without potentially stomping on a numeric reason that some bit of code is already using for some other reason.

Makes sense. If I wanted to try implementing this for myself (as a further debugging step), where in the code should I look?

Just FYI I took a quick look at all existing users of ResetWithReason. They all appear to be using low numbers outside of unit tests. Choosing a high number should probably be safe for my use case.

Marshall Greenblatt

unread,
Nov 21, 2024, 6:37:13 PM11/21/24
to Daniel Cheng, Dave Tapuska, chromium-mojo
On Thu, Nov 21, 2024 at 1:44 PM Daniel Cheng <dch...@chromium.org> wrote:
On Thu, 21 Nov 2024 at 10:36, Marshall Greenblatt <magree...@gmail.com> wrote:
On Thu, Nov 21, 2024 at 1:22 PM Daniel Cheng <dch...@chromium.org> wrote:
The "reason" comes from the `ResetWithReason()` method that you can call on Mojo endpoints: https://source.chromium.org/search?q=ResetWithReason%20f:%5Emojo%20lang:c%2B%2B&sq=&ss=chromium

If you don't call those, you just get the default reason of 0. I don't think there's a good way to retrofit something for 1 or 2 in.

In general, Mojo pipes don't disconnect randomly. They disconnect because one endpoint goes away. This can happen:
- if the endpoint is explicitly closed (with reset() or ResetWithReason(), as noted above) 
- if the endpoint is implicitly closed (by never being bound by the receiving end or by its owner going away)

Right, I'm interested primarily in the implicit cases that are not directly handled by my code: "receiver never bound" and "mojo::Receiver<Interface> destroyed while bound". Is there any way we could add a non-0 default "reason" for these cases?

For the latter, you can add that yourself... something has to own a mojo::Receiver<Interface>, and whatever owns it can just ResetWithReason() in its destructor.
For the former... it's not implemented, and I don't think anyone has plans to implement it. And at this point, it'd be difficult/annoying to retrofit without potentially stomping on a numeric reason that some bit of code is already using for some other reason.

It looks like I can't provide a "reason" from the Receiver side for the "receiver never bound" case because the objects underlying the mojo::PendingReceiver type (e.g. MessagePipeHandle) only provide a Close() method. However, I do receive a MOJO_RESULT_FAILED_PRECONDITION value in Connector::OnHandleReadyInternal when the peer is closed, and this is distinct from the "other fatal errors" mentioned in that method. Can I somehow access this MojoResult value (or related state) from inside my disconnect handler without routing it as an argument via the |connection_error_handler_|?

Here's my current call stack for the disconnect on Windows:

> libcef.dll!CefFrameImpl::OnBrowserFrameDisconnect(unsigned int custom_reason, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> & description) Line 548 C++
  libcef.dll!base::internal::DecayedFunctorTraits<void (CefFrameImpl::*)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &),CefFrameImpl *&&>::Invoke<void (CefFrameImpl::*)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &),scoped_refptr<CefFrameImpl>,unsigned int,const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &>(void(CefFrameImpl::*)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &) method, scoped_refptr<CefFrameImpl> && receiver_ptr, unsigned int && args, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> & args) Line 738 C++
  libcef.dll!base::internal::InvokeHelper<0,base::internal::FunctorTraits<void (CefFrameImpl::*&&)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &),CefFrameImpl *&&>,void,0>::MakeItSo<void (CefFrameImpl::*)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &),std::__Cr::tuple<scoped_refptr<CefFrameImpl>>,unsigned int,const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &>(void(CefFrameImpl::*)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &) && functor, std::__Cr::tuple<scoped_refptr<CefFrameImpl>> && bound, unsigned int && args, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> & args) Line 930 C++
  libcef.dll!base::internal::Invoker<base::internal::FunctorTraits<void (CefFrameImpl::*&&)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &),CefFrameImpl *&&>,base::internal::BindState<1,1,0,void (CefFrameImpl::*)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &),scoped_refptr<CefFrameImpl>>,void (unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &)>::RunImpl<void (CefFrameImpl::*)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &),std::__Cr::tuple<scoped_refptr<CefFrameImpl>>,0>(void(CefFrameImpl::*)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &) && functor, std::__Cr::tuple<scoped_refptr<CefFrameImpl>> && bound, std::__Cr::integer_sequence<unsigned long long,0>, unsigned int && unbound_args, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> & unbound_args) Line 1067 C++
  libcef.dll!base::internal::Invoker<base::internal::FunctorTraits<void (CefFrameImpl::*&&)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &),CefFrameImpl *&&>,base::internal::BindState<1,1,0,void (CefFrameImpl::*)(unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &),scoped_refptr<CefFrameImpl>>,void (unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &)>::RunOnce(base::internal::BindStateBase * base, unsigned int unbound_args, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> & unbound_args) Line 980 C++
  mojo_public_cpp_bindings.dll!base::OnceCallback<void (unsigned int, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> &)>::Run(unsigned int args, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> & args) Line 157 C++
  mojo_public_cpp_bindings.dll!mojo::InterfaceEndpointClient::NotifyError(const std::__Cr::optional<mojo::DisconnectReason> & reason) Line 768 C++
  mojo_public_cpp_bindings.dll!mojo::internal::MultiplexRouter::ProcessNotifyErrorTask(mojo::internal::MultiplexRouter::Task * task, mojo::internal::MultiplexRouter::ClientCallBehavior client_call_behavior, base::SequencedTaskRunner * current_task_runner) Line 1036 C++
  mojo_public_cpp_bindings.dll!mojo::internal::MultiplexRouter::ProcessTasks(mojo::internal::MultiplexRouter::ClientCallBehavior client_call_behavior, base::SequencedTaskRunner * current_task_runner) Line 946 C++
  mojo_public_cpp_bindings.dll!mojo::internal::MultiplexRouter::OnPipeConnectionError(bool force_async_dispatch) Line 858 C++
  mojo_public_cpp_bindings.dll!base::internal::DecayedFunctorTraits<void (mojo::internal::MultiplexRouter::*)(bool),mojo::internal::MultiplexRouter *,bool &&>::Invoke<void (mojo::internal::MultiplexRouter::*)(bool),mojo::internal::MultiplexRouter *,bool>(void(mojo::internal::MultiplexRouter::*)(bool) method, mojo::internal::MultiplexRouter * && receiver_ptr, bool && args) Line 738 C++
  mojo_public_cpp_bindings.dll!base::internal::InvokeHelper<0,base::internal::FunctorTraits<void (mojo::internal::MultiplexRouter::*&&)(bool),mojo::internal::MultiplexRouter *,bool &&>,void,0,1>::MakeItSo<void (mojo::internal::MultiplexRouter::*)(bool),std::__Cr::tuple<base::internal::UnretainedWrapper<mojo::internal::MultiplexRouter,base::unretained_traits::MayNotDangle,0>,bool>>(void(mojo::internal::MultiplexRouter::*)(bool) && functor, std::__Cr::tuple<base::internal::UnretainedWrapper<mojo::internal::MultiplexRouter,base::unretained_traits::MayNotDangle,0>,bool> && bound) Line 930 C++
  mojo_public_cpp_bindings.dll!base::internal::Invoker<base::internal::FunctorTraits<void (mojo::internal::MultiplexRouter::*&&)(bool),mojo::internal::MultiplexRouter *,bool &&>,base::internal::BindState<1,1,0,void (mojo::internal::MultiplexRouter::*)(bool),base::internal::UnretainedWrapper<mojo::internal::MultiplexRouter,base::unretained_traits::MayNotDangle,0>,bool>,void ()>::RunImpl<void (mojo::internal::MultiplexRouter::*)(bool),std::__Cr::tuple<base::internal::UnretainedWrapper<mojo::internal::MultiplexRouter,base::unretained_traits::MayNotDangle,0>,bool>,0,1>(void(mojo::internal::MultiplexRouter::*)(bool) && functor, std::__Cr::tuple<base::internal::UnretainedWrapper<mojo::internal::MultiplexRouter,base::unretained_traits::MayNotDangle,0>,bool> && bound, std::__Cr::integer_sequence<unsigned long long,0,1>) Line 1067 C++
  mojo_public_cpp_bindings.dll!base::internal::Invoker<base::internal::FunctorTraits<void (mojo::internal::MultiplexRouter::*&&)(bool),mojo::internal::MultiplexRouter *,bool &&>,base::internal::BindState<1,1,0,void (mojo::internal::MultiplexRouter::*)(bool),base::internal::UnretainedWrapper<mojo::internal::MultiplexRouter,base::unretained_traits::MayNotDangle,0>,bool>,void ()>::RunOnce(base::internal::BindStateBase * base) Line 980 C++
  mojo_public_cpp_bindings.dll!base::OnceCallback<void ()>::Run() Line 157 C++
  mojo_public_cpp_bindings.dll!mojo::Connector::HandleError(bool force_pipe_reset, bool force_async_handler) Line 691 C++
  mojo_public_cpp_bindings.dll!mojo::Connector::OnHandleReadyInternal(unsigned int result) Line 443 C++
  mojo_public_cpp_bindings.dll!mojo::Connector::OnWatcherHandleReady(const char * interface_name, unsigned int result) Line 418 C++

Thanks,
Marshall
Reply all
Reply to author
Forward
0 new messages