PSA: LLCPP now supports per-call error handling

161 views

Skip to first unread message

Yifei Teng

unread,

Aug 20, 2021, 6:42:23 AM8/20/21

to fidl-dev, anno...@fuchsia.dev

~ If you don't use the low-level C++ bindings you may stop reading now ~

TLDR: Handling errors specific to a particular FIDL call just got easier in LLCPP asynchronous clients, as we are adding a method overload designed for this.

Tracking bug: fxbug.dev/75324

Motivation

The function signature for an asynchronous two-way call looks like the following today:

// |FooMethod| is a zero-argument two-way call.

// If an error happens during sending, the error is returned synchronously.

fidl::Result FooMethod(fit::callback<void(fidl::WireResponse<FooMethod>*)>);

When using this method, we often see the following pattern used to detect errors:

fidl::Result result = client->FooMethod(

[] (fidl::WireResponse<FooMethod>* response) {

// Process |response| here...

});

if (!result.ok()) {

FX_LOGS(ERROR) << "Something bad happened: " << result.error();

return result;

}

It's tempting to assume doing the above catches all possible errors, but that is not the case. For instance, if the server crashed while the call was in-flight, or if the server sent an invalid reply, the FooMethod callback will be silently dropped since there is no reply to present to the user. Worse, such behavior could happen racily: if the server crashed just before making the FooMethod call, the call would return an error writing to the channel, but not if the server crashed right after.

There is a way to get consistently notified about all errors on the connection: you may override the void on_fidl_error(fidl::UnbindInfo info); virtual method in the fidl::WireAsyncEventHandler<Protocol> event handler specified when creating the client. Unfortunately, that isn't as flexible as we'd like it to be:

There is no way to find out which call caused the error: we might want to know if the server sent an invalid reply to FooMethod and not BarMethod.
If the code making the call is from a different part of the application than the code which originally created the client (e.g. it could be temporarily borrowing the client), then it cannot add new logic into the event handler.

With those in mind, we present the new addition to LLCPP client API: result callbacks.

Result Callback

We've introduced a new overload for two-way calls:

// Existing: response callback version.

// The callback is invoked only if the FIDL call succeeds.

fidl::Result FooMethod(fit::callback<void(fidl::WireResponse<FooMethod>*)>);

// New: result callback version.

// The callback is guaranteed to always run at some point, either with

// a success or with an error.

void FooMethod(fit::callback<void(fidl::WireUnownedResult<FooMethod>&&)>);

fidl::WireUnownedResult is a result type similar to those used in synchronous calls: it supports .ok(), .status(), and logging integration among other things. In particular, calling .Unwrap() on the fidl::WireUnownedResult<T> returns a fidl::WireResponse<T>* if the FIDL call was successful. In other words, the result callback adds a level of error information to the response callback.

Using the result callback, there is no need to synchronously handle errors. We also get rid of the aforementioned race condition by shifting all error handling code paths into the callback.

When the central error handler that is fidl::WireAsyncEventHandler<Protocol>::on_fidl_error(fidl::UnbindInfo) is also overridden, both the result callback and the on_fidl_error handler will be notified of errors.

When multiple result callbacks are used, they may receive different errors that are tailored to their specific call. For example, if the user calls FooMethod and BarMethod, and the server replies to BarMethod with invalid data, the BarMethod result callback will receive a decoding error while the FooMethod result callback will receive a generic cancellation error since it wasn't associated with the invalid BarMethod reply.

Scheduling and use-after-free

Unless the dispatcher is shutting down, the callbacks are always scheduled asynchronously. This implies that a synchronous error (e.g. failed to encode) will be delivered asynchronously, at the next iteration of the event loop. When using the client it is usually safe to assume no re-entrancy, leading to simpler code.

When a client binding is torn down, the cancellation error will also be delivered to the result callbacks. Coupled with asynchronous delivery, care is needed to ensure any lambda captures are still alive. For example, if an object contains a fidl::WireClient and captures this in result callbacks, then manipulating the captured this within the result callbacks after destroying the object will lead to use-after-free. One way to avoid it is to return immediately if the error was due to destroying the fidl::WireClient (you may identify it by checking that the error reason is fidl::Reason::kUnbind).

Action Required

Switching to result callbacks can greatly simplify error handling if done appropriately. We would be migrating ones that seem good candidates over the next weeks, and you're welcome to try this out within your team's maintained code base!

As a related follow-up migration, we intend to also remove the return value of the response callback overloads. This will get rid of race conditions in synchronous errors across the entire client API - relying on the synchronous return value is already fragile as discussed above. If your application prefers to use response callbacks instead of result callbacks, then overriding on_fidl_error is the only reliable way to catch errors.

Please let us know on fidl...@fuchsia.dev if you have questions or suggestions. Thanks.

Cheers,

Yifei

Yifei Teng

unread,

Sep 13, 2021, 7:37:40 PM9/13/21

to fidl-dev, Yifei Teng, anno...@fuchsia.dev

A minor follow-up: we've received suggestions that a callback taking r-value reference (`fidl::WireUnownedResult<Foo>&&`) is not familiar to users and discouraged by the style guide. As a result, we'll be changing it to a single ampersand (`fidl::WireUnownedResult<Foo>&`) in fxrev.dev/577310. It shouldn't have any impact on the semantics of these calls, but just one aspect to note while using the result callbacks.

Cheers,

Yifei

Reply all

Reply to author

Forward

0 new messages