Rich error patterns

Adam Lesinski

unread,

Mar 3, 2021, 3:35:01 PM3/3/21

to fidl-dev, Gary Bressler

Hi FIDL team,

In Component Framework, we have several FIDL protocols that return our domain-specific error code when a failure occurs. We find that the error code is not enough to debug thorny issues. For instance, the implementor of the protocol (eg a ComponentResolver) is being called by component_manager, and the implementor returns a failure. Detail information of the failure is not available in the logs because something in the system may be broken and logs aren't coming back from the implementor.

Is there an existing pattern that offers richer error information?

I was thinking of blending the fuchsia.io approach with regular error codes, where the protocol would receive an optional channel as an argument, on which an event would be sent with detailed error information. The protocol method would still return an error code, which would be used to determine if the passed in channel should be listened to.

Your guidance would be most appreciated.

Adam

Adam Barth

unread,

Mar 4, 2021, 2:08:25 AM3/4/21

to Adam Lesinski, api-c...@fuchsia.dev, fidl-dev, Gary Bressler

[Fixed api-council email address]

On Thu, Mar 4, 2021 at 6:39 AM Adam Barth <aba...@google.com> wrote:

+Fuchsia API Council

The reason we prefer to use error codes rather than more elaborate error data is that the elaborate error data becomes ABI surface for the platform. In thinking about this use case, I would try to find a way to provide more detailed error information to the developer without making that error information part of the system ABI. That's why the rubric suggests logging as a mechanism: it's a way to communicate with developers that doesn't expand the system ABI.

Adam

--
All posts must follow the Fuchsia Code of Conduct https://fuchsia.dev/fuchsia-src/CODE_OF_CONDUCT or may be removed.
---
You received this message because you are subscribed to the Google Groups "fidl-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fidl-dev+u...@fuchsia.dev.
To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/fidl-dev/9aa6ff2b-d8f6-4e39-bbe0-9090987d8805n%40fuchsia.dev.

Adam Barth

unread,

Mar 4, 2021, 2:08:30 AM3/4/21

to Adam Lesinski, Fuchsia API Council, fidl-dev, Gary Bressler

+Fuchsia API Council

The reason we prefer to use error codes rather than more elaborate error data is that the elaborate error data becomes ABI surface for the platform. In thinking about this use case, I would try to find a way to provide more detailed error information to the developer without making that error information part of the system ABI. That's why the rubric suggests logging as a mechanism: it's a way to communicate with developers that doesn't expand the system ABI.

Adam

On Wed, Mar 3, 2021 at 8:35 PM 'Adam Lesinski' via fidl-dev <fidl...@fuchsia.dev> wrote:

--

Shai Barack

unread,

Mar 4, 2021, 2:19:32 AM3/4/21

to Adam Barth, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev, Gary Bressler

Developers consistently give us poor feedback about providing error causes in logs. A recent developer study found that while we were providing additional information in logs to accommodate errors with more elaborate reasons, developers were unable to find these logs for a variety of reasons (didn't know they were there, saw the logs but were unable to associate them in context, looked at the wrong logsink, there were too many logs to look at).

Even if we fixed all these problems somehow, emitting an error reason as a side effect (such as to a log or any other side effect) leaves something to be desired - it doesn't propagate.

Consider this beautiful error:

The image above is a screenshot from Google Chrome that says "your clock is ahead", with additional details.

The user is alerted that their TLS connection failed because the server's certificate is no longer valid according to the local system clock, which is actually inaccurate (as determined by checking against a reference time). Armed with this information, Chrome can offer the user a one-click fix. In addition, the entire error experience is subject to localization.

This can only happen if the many layers between the user interface and the TLS implementation propagate errors in context.

You received this message because you are subscribed to the Google Groups "api-council" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-council...@fuchsia.dev.
To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAP%3D28cfwztchvswBk3HYDh4K8RDcmdY_09kfazDReaPd0bX20A%40mail.gmail.com.

Yifei Teng

unread,

Mar 4, 2021, 3:08:11 AM3/4/21

to Shai Barack, Adam Barth, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev, Gary Bressler

I'd be interested in the particular kind of "extended error" that you're hoping to expose here, adamlesinski@.

I suppose different kinds of detailed error information have different trade-offs. IIRC we've shied away from providing textual reasons (error strings) as error return values, due to a variety of drawbacks including ABI fragility and translation.

On the other hand, I suppose it could be reasonable to expose some structured information in a principled way. Without knowing the exact components FIDL protocol, I could think of a contrived example: if there's a FIDL call `CheckNumbers(vector<int> a)`, with the requirement that all the numbers must be greater than 5; it would help the developers if this FIDL call returned an error that also carries the information as to which number was less than 5. We'd be changing the error from a `zx.status` to `union CheckError { 1: int which_number_was_wrong; }`.

FIDL only allows returning an integer error code in the `error T` method syntax, so you can't use a union there. A workaround is to manually define a union containing both the success variant, and a number of error variants, which may then individually link to detailed structural information. Here is an example I've found in-tree.

Would like to hear the API Council's opinion on this pattern too.

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/fidl-dev/CAK0PkCHK0dxGGOO_pVFTvH16Grfk8ef0XW72CceJvv6wAkdT2w%40mail.gmail.com.

Gary Bressler

unread,

Mar 4, 2021, 12:44:23 PM3/4/21

to Adam Barth, Shai Barack, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

Would it be possible to pass a VMO that is in some way closely associated with the error or epitaph, but is explicitly not part of the system ABI? Perhaps this could be restricted to debug builds only.

On Thu, Mar 4, 2021 at 9:34 AM Adam Barth <aba...@google.com> wrote:

Yes, I think we all agree on the problem statement. The missing piece is an idea for how to do better given the constraints. For example, the approach in the first message of this thread violates the constraint of not expanding the platform's ABI.

FWIW, the clock example you cite is a good example of a solution that meets the same constraints we're discussing. The browser provides the text to the user in a way that does not increase the ABI surface of the web platform. The error is shown to the user, but if a web page makes a programmatic HTTPS connection (e.g., using XMLHttpRequest), the browser does not give them this detailed textual information.

Adam

Adam Barth

unread,

Mar 4, 2021, 12:44:23 PM3/4/21

to Shai Barack, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev, Gary Bressler

Yes, I think we all agree on the problem statement. The missing piece is an idea for how to do better given the constraints. For example, the approach in the first message of this thread violates the constraint of not expanding the platform's ABI.

FWIW, the clock example you cite is a good example of a solution that meets the same constraints we're discussing. The browser provides the text to the user in a way that does not increase the ABI surface of the web platform. The error is shown to the user, but if a web page makes a programmatic HTTPS connection (e.g., using XMLHttpRequest), the browser does not give them this detailed textual information.

Adam

On Thu, Mar 4, 2021 at 7:10 AM Shai Barack <sha...@google.com> wrote:

Shai Barack

unread,

Mar 4, 2021, 12:44:23 PM3/4/21

to Adam Barth, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev, Gary Bressler

Oh, I definitely agree that we shouldn't return text strings from errors. Aside from being bad ABIs and hostile to l10n they also don't have the propagation and context properties that I want.

I do want more structure and hierarchy in errors though.

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAP%3D28cc4z4MAiYbBZQYMAUB%3Dt7nLTcKe%3Dy4c7s9snO4ehQdxdQ%40mail.gmail.com.

Adam Barth

unread,

Mar 4, 2021, 12:50:42 PM3/4/21

to Gary Bressler, Shai Barack, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

What if we improved our developer tools in some way? Conceptually, rather than returning the error information to the software that made the request, we should somehow find a way to return the error information to the developer of that software. For example, what if there was a way to associate a log message containing error information with the request that triggered the error in a way that our developer tools understood and presented to the developer in some useful way...

Adam

Adam Barth

unread,

Mar 4, 2021, 12:53:31 PM3/4/21

to Gary Bressler, Shai Barack, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

Here's a more concrete version of that idea:

When I work on a piece of software, I often set my logger to filter log messages to only those with my component's tag (i.e., fx log --tag <foo>). If a service my component uses generates an error, I won't see that because the log will have the tag of the component that recognizes the error instead of the tag of the requestor. What if there was a way to cause that error message to be displayed even when the developer is filtering on their own component's log tag?

Adam

Seth Ladd

unread,

Mar 4, 2021, 12:55:21 PM3/4/21

to Adam Barth, Amit Uttamchandani, Gary Bressler, Shai Barack, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

On Thu, Mar 4, 2021 at 9:53 AM 'Adam Barth' via api-council <api-c...@fuchsia.dev> wrote:

Here's a more concrete version of that idea:

When I work on a piece of software, I often set my logger to filter log messages to only those with my component's tag (i.e., fx log --tag <foo>). If a service my component uses generates an error, I won't see that because the log will have the tag of the component that recognizes the error instead of the tag of the requestor. What if there was a way to cause that error message to be displayed even when the developer is filtering on their own component's log tag?

fyi @Amit Uttamchandani

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAP%3D28cfnG-Dq3FKR7ZN8mC2vhy%3D8X_cWPrRLUEh-zg9kzLKnKQ%40mail.gmail.com.

Adam Barth

unread,

Mar 4, 2021, 12:56:10 PM3/4/21

to Gary Bressler, Shai Barack, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

There's no such thing as a "debug" build of the entire system. In-tree we have the "is_debug" GN arg, but that's not a thing that exists in the real world (i.e., out-of-tree). We have "eng" builds, but having a different system ABI in eng builds is a recipe for making debugging and fixing bugs from production difficult.

Adam

On Thu, Mar 4, 2021 at 5:42 PM Gary Bressler <g...@google.com> wrote:

Chase Latta

unread,

Mar 4, 2021, 1:05:22 PM3/4/21

to Adam Barth, Gary Bressler, Shai Barack, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

Would it be possible for us to include extra metadata in our SDK which can be consumed by ffx to help developers diagnose these problems? For example, when a fidl library is written the developer can include some metadata file that says what error codes can be returned for individual methods. We can then update ffx to have a command like `ffx error describe fuchsia.io.foo 37` which prints out the developer defined log statement. We could add a watch type function as well which automatically looks up this information for you.

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAP%3D28ccP1CoS_gqiYkwK83EHpCvd8FJv%2BtmGamHGGyBTUBMY-A%40mail.gmail.com.

Carlos Pizano

unread,

Mar 4, 2021, 1:24:08 PM3/4/21

to Chase Latta, Adam Barth, Gary Bressler, Shai Barack, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

In other systems we use debuggers. They were developed to exactly provide the context necessary to understand errors.

-cpu

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAN8c-rWTeT1KkeMZ%2BWLdKtR1x8zffqW0HutbTLFKm9WBeb%3D%2BrQ%40mail.gmail.com.

Shai Barack

unread,

Mar 4, 2021, 1:26:41 PM3/4/21

to Carlos Pizano, Chase Latta, Adam Barth, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

Following IPCs with debuggers is hard. The most success I've had was to anticipate where the other end of the IPC will land, attach to that process and prepare a breakpoint there, and then keep tracing the flow. I would often make mistakes, lose track of the control flow, and have to start all over again. It was so painful that I developed the habit of keeping a text file open where I'd jot down breadcrumbs and other notes on my progress.

Adam Barth

unread,

Mar 4, 2021, 1:54:01 PM3/4/21

to Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

Could we teach zxdb to do all of that automatically (and correctly)? I'm imagining in addition to step into, step over, and step out, we could have a "step through" that advanced to the point at which the next zx::channel message sent by the current thread was received by another process in the system.

Adam

Gary Bressler

unread,

Mar 4, 2021, 1:55:52 PM3/4/21

to Shai Barack, Carlos Pizano, Chase Latta, Adam Barth, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

On Thu, Mar 4, 2021 at 10:26 AM Shai Barack <sha...@google.com> wrote:

Following IPCs with debuggers is hard. The most success I've had was to anticipate where the other end of the IPC will land, attach to that process and prepare a breakpoint there, and then keep tracing the flow. I would often make mistakes, lose track of the control flow, and have to start all over again. It was so painful that I developed the habit of keeping a text file open where I'd jot down breadcrumbs and other notes on my progress.

On Thu, Mar 4, 2021 at 10:23 AM Carlos Pizano <c...@google.com> wrote:
In other systems we use debuggers. They were developed to exactly provide the context necessary to understand errors.

-cpu

On Thu, Mar 4, 2021 at 9:56 AM 'Chase Latta' via api-council <api-c...@fuchsia.dev> wrote:
Would it be possible for us to include extra metadata in our SDK which can be consumed by ffx to help developers diagnose these problems? For example, when a fidl library is written the developer can include some metadata file that says what error codes can be returned for individual methods. We can then update ffx to have a command like `ffx error describe fuchsia.io.foo 37` which prints out the developer defined log statement. We could add a watch type function as well which automatically looks up this information for you.

On Thu, Mar 4, 2021 at 9:50 AM 'Adam Barth' via api-council <api-c...@fuchsia.dev> wrote:
What if we improved our developer tools in some way? Conceptually, rather than returning the error information to the software that made the request, we should somehow find a way to return the error information to the developer of that software. For example, what if there was a way to associate a log message containing error information with the request that triggered the error in a way that our developer tools understood and presented to the developer in some useful way...

Adam

It sounds like some flavor of this is what I'm looking for, but I'm unsure of the right mechanism. I'd like the error info to be clearly attributed to the request, at the time the request was made. One idea is to wrap the FIDL call in a macro that would fetch the detailed info from somewhere and make it visible in a place the user knows where to look.

Seth Ladd

unread,

Mar 4, 2021, 1:57:13 PM3/4/21

to Adam Barth, Francois Rousseau, Brett Wilson, Amit Uttamchandani, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

On Thu, Mar 4, 2021 at 10:54 AM 'Adam Barth' via api-council <api-c...@fuchsia.dev> wrote:

Could we teach zxdb to do all of that automatically (and correctly)? I'm imagining in addition to step into, step over, and step out, we could have a "step through" that advanced to the point at which the next zx::channel message sent by the current thread was received by another process in the system.

cc @Francois Rousseau @Brett Wilson @Amit Uttamchandani

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAP%3D28cdcgMWchDCPDWVNEb4yMcCOf9WKryd2OqF9%3DDwh0oRRhg%40mail.gmail.com.

Dave Schuyler

unread,

Mar 4, 2021, 1:58:11 PM3/4/21

to Shai Barack, Carlos Pizano, Chase Latta, Adam Barth, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev, Christopher Johnson

Brainstorming:

What if (and this might be an ABI change IIUC), each message included a context (ID number(s)) which are then included in log output.

I'm thinking of a pair of numbers: The originating thread KOID with an arbitrary value (such as an incrementing message count number).

This would allow tools* to display the path of messages through the system and any errors in context (different meaning of context, sorry).

Pros:

big advantage in debugging
Possible to do without an API change (I'm about 80% sure of that)

Cons:

A rather invasive change
Some extra overhead per message (could be reduced with tradeoffs)
Additional log output (or extra fields in structured logging)

The above idea is related to an idea crjohns@ (or someone one that team?) was kicking around, so +crjohns@. This is also related to the work I'd done years ago developing/debugging distributed systems, where I found high performance logging to be the only viable way for me to debug these systems. Let me suggest that Fuchsia is essentially a 'distributed system in a box' (i.e. not actually distributed but has similar qualities/debugging challenges).

*Or a few find-in-text/greps to select the log lines.

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAK0PkCFmeHfWdQkAJkVi7yRWRPhzRSwgasXK5D2DQd_OSXr3gQ%40mail.gmail.com.

Suraj Malhotra

unread,

Mar 4, 2021, 1:58:41 PM3/4/21

to Shai Barack, Carlos Pizano, Chase Latta, Adam Barth, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

My understanding is that we want event tracing, but always enabled and "cheap". I believe structured logging + improved ergonomics for sending around and using flow_ids (and maybe an idiom of using them in all fidl messages + and structured logs), will mostly get us that. The log viewer can then intelligently relate logs from unrelated components via this mechanism.

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/fidl-dev/CAK0PkCFmeHfWdQkAJkVi7yRWRPhzRSwgasXK5D2DQd_OSXr3gQ%40mail.gmail.com.

Dave Schuyler

unread,

Mar 4, 2021, 1:59:40 PM3/4/21

to Adam Barth, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

On Thu, Mar 4, 2021 at 10:54 AM 'Adam Barth' via api-council <api-c...@fuchsia.dev> wrote:

Could we teach zxdb to do all of that automatically (and correctly)? I'm imagining in addition to step into, step over, and step out, we could have a "step through" that advanced to the point at which the next zx::channel message sent by the current thread was received by another process in the system.

IMO, this would be greatly aided by a context ID in the message itself. E.g. zxdb could 'contact the recipient' ahead of time (err, after the user asked to "step through" and before actually sending it and set a conditional breakpoint in that process for when message context ID == NNNN.

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAP%3D28cdcgMWchDCPDWVNEb4yMcCOf9WKryd2OqF9%3DDwh0oRRhg%40mail.gmail.com.

Jody Sankey

unread,

Mar 4, 2021, 2:09:34 PM3/4/21

to Dave Schuyler, Shai Barack, Carlos Pizano, Chase Latta, Adam Barth, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev, Christopher Johnson

I certainly don't claim it would fix the entire problem, but I wonder if we should support chaining the errors that are already a part of our ABI surface. I'm finding it quite common to return error X over FIDL protocol A because the component implementing A received error Y over component B. Could we include error Y as a cause when sending X?

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAOsj%2BYkgEOtMtCrcEQ5TXgYohgK8w1_qE4r3MGdSsQ54WE_2yw%40mail.gmail.com.

Jody Sankey

unread,

Mar 4, 2021, 2:09:34 PM3/4/21

to Dave Schuyler, Shai Barack, Carlos Pizano, Chase Latta, Adam Barth, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev, Christopher Johnson

<Sigh>, caught the typo too late. that should have read:

I'm finding it quite common to return error X over FIDL protocol A because the component implementing A received error Y over FIDL protocol B.

Adam Perry

unread,

Mar 4, 2021, 2:18:27 PM3/4/21

to Dave Schuyler, Adam Barth, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev, TQ Diagnostics Team

Recent discussions about following IPC flows using structured logs and/or the tracing subsystem headed in other directions, so I think it's most likely that tooling for this use case would need to be built on explicit context IDs in the FIDL protocol.

The structured logging format is pretty amenable to this kind of change, and we've already discussed the possibility of generalizing PID/TID into the "root context" ID. Once we do that it should be easy enough to make context IDs for arbitrary IPC transactions and to then build tooling which can filter based on it.

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/fidl-dev/CAOsj%2BY%3Dq6Mvt56-ZjeR9b9aVtxOW1qar3m0V%2BtuevtSaAGgOrg%40mail.gmail.com.

--

Adam Perry

Brett Wilson

unread,

Mar 4, 2021, 2:27:31 PM3/4/21

to Seth Ladd, Adam Barth, Francois Rousseau, Amit Uttamchandani, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

On Thu, Mar 4, 2021 at 10:57 AM Seth Ladd <seth...@google.com> wrote:

On Thu, Mar 4, 2021 at 10:54 AM 'Adam Barth' via api-council <api-c...@fuchsia.dev> wrote:
Could we teach zxdb to do all of that automatically (and correctly)? I'm imagining in addition to step into, step over, and step out, we could have a "step through" that advanced to the point at which the next zx::channel message sent by the current thread was received by another process in the system.

cc @Francois Rousseau @Brett Wilson @Amit Uttamchandani

Yes, though it will require some tricky kernel and library cooperation. This feature is perennially on the wish list but the debugger is currently not staffed enough to be able to do this type of difficult cross-functional project.

Jeremy Manson

unread,

Mar 4, 2021, 2:28:33 PM3/4/21

to Seth Ladd, Adam Barth, Francois Rousseau, Brett Wilson, Amit Uttamchandani, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

On Thu, Mar 4, 2021 at 10:57 AM 'Seth Ladd' via api-council <api-c...@fuchsia.dev> wrote:

On Thu, Mar 4, 2021 at 10:54 AM 'Adam Barth' via api-council <api-c...@fuchsia.dev> wrote:
Could we teach zxdb to do all of that automatically (and correctly)? I'm imagining in addition to step into, step over, and step out, we could have a "step through" that advanced to the point at which the next zx::channel message sent by the current thread was received by another process in the system.

cc @Francois Rousseau @Brett Wilson @Amit Uttamchandani

We've talked about this extensively for fidlcat, but it requires support from the Zircon team: you have to be able to stop the peer process when it gets the message.

We've talked about the zx folks about it at recurring intervals, but they're busy. fidlcat and zxdb share infrastructure, so if one can do it, the other can.

fidlcat has an approximation where you can attach to the processes at both ends and it will try hard to guess what the connection is.

Jeremy

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CACr%3D8_Ov%2B2YGu%3DD7%3DmN1SiMUuq4LY2ihKvZ58Hz7hj1MCgO0DA%40mail.gmail.com.

Amit Uttamchandani

unread,

Mar 4, 2021, 3:05:21 PM3/4/21

to Jeremy Manson, Jordon Wing, Seth Ladd, Adam Barth, Francois Rousseau, Brett Wilson, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

On Thu, Mar 4, 2021 at 9:53 AM 'Adam Barth' via api-council <api-c...@fuchsia.dev> wrote:

Here's a more concrete version of that idea:

When I work on a piece of software, I often set my logger to filter log messages to only those with my component's tag (i.e., fx log --tag <foo>). If a service my component uses generates an error, I won't see that because the log will have the tag of the component that recognizes the error instead of the tag of the requestor. What if there was a way to cause that error message to be displayed even when the developer is filtering on their own component's log tag?

> fyi @Amit Uttamchandani

+Jordon Wing as well who is working on ffx target log features and improvements. Additional context in filters is something we have discussed. Note that because we are now leveraging structured logs we have better ability to present logging information with context. I'll explore this with Jordon on what is feasible.

Jeremy Manson

unread,

Mar 4, 2021, 3:05:33 PM3/4/21

to Dave Schuyler, Adam Barth, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

On Thu, Mar 4, 2021 at 10:59 AM 'Dave Schuyler' via api-council <api-c...@fuchsia.dev> wrote:

On Thu, Mar 4, 2021 at 10:54 AM 'Adam Barth' via api-council <api-c...@fuchsia.dev> wrote:
Could we teach zxdb to do all of that automatically (and correctly)? I'm imagining in addition to step into, step over, and step out, we could have a "step through" that advanced to the point at which the next zx::channel message sent by the current thread was received by another process in the system.

IMO, this would be greatly aided by a context ID in the message itself. E.g. zxdb could 'contact the recipient' ahead of time (err, after the user asked to "step through" and before actually sending it and set a conditional breakpoint in that process for when message context ID == NNNN.

The issue is that there's no way for zxdb to know who the recipient is. We have a plan with the kernel team for them to implement a peer_owner_koid field for peered objects. When we know that, we can attach to the peer owner process (and also follow the other end of the peered object if it is passed across to yet other processes).

If you are monitoring both ends of the channel, you wouldn't *really* need the context id to keep it straight - you can see that a message went in one end and an identical message came out the other end.

The only problem a context id would solve would be if the client end sent lots of identical messages and you didn't know which one to trace one specifically, but the chances of getting that wrong when you are stepping through the system at the speed of typing should be pretty minimal.

Jeremy

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAOsj%2BY%3Dq6Mvt56-ZjeR9b9aVtxOW1qar3m0V%2BtuevtSaAGgOrg%40mail.gmail.com.

Dale Sather

unread,

Mar 4, 2021, 4:34:45 PM3/4/21

to Jeremy Manson, Dave Schuyler, Adam Barth, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

Addressing the original question, the chosen solution for this problem is to use the logs. The fact that this solution is currently problematic seems soluble to me.

I think we should:

1) formalize the way that 'holding it wrong' errors are logged

2) address the discoverability problem where those log messages are concerned

Off the top of my head, we could address 2) by providing an ffx view that shows a developer what's happening with their component(s), including logs for just that component and special presentation of this type of error.

To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAPYFHW3VZKdnAqcPRRRVRjU1%2BPE6uY-SdMV-nc9aU%2BpRKAnWmw%40mail.gmail.com.

Seth Ladd

unread,

Mar 4, 2021, 4:36:18 PM3/4/21

to Brett Wilson, Chris Osborn, Adam Barth, Francois Rousseau, Amit Uttamchandani, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, Adam Lesinski, api-c...@fuchsia.dev, fidl-dev

On Thu, Mar 4, 2021 at 11:24 AM Brett Wilson <bre...@google.com> wrote:

On Thu, Mar 4, 2021 at 10:57 AM Seth Ladd <seth...@google.com> wrote:

On Thu, Mar 4, 2021 at 10:54 AM 'Adam Barth' via api-council <api-c...@fuchsia.dev> wrote:
Could we teach zxdb to do all of that automatically (and correctly)? I'm imagining in addition to step into, step over, and step out, we could have a "step through" that advanced to the point at which the next zx::channel message sent by the current thread was received by another process in the system.

cc @Francois Rousseau @Brett Wilson @Amit Uttamchandani

Yes, though it will require some tricky kernel and library cooperation. This feature is perennially on the wish list but the debugger is currently not staffed enough to be able to do this type of difficult cross-functional project.

cc @Chris Osborn

There is a debugger workgroup with both zxdb and fidlcat, it's a good place to bring this up and see if we can add to the strategy doc and seek funding.

Adam Lesinski

unread,

Mar 4, 2021, 4:58:29 PM3/4/21

to Seth Ladd, Brett Wilson, Chris Osborn, Adam Barth, Francois Rousseau, Amit Uttamchandani, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, api-c...@fuchsia.dev, fidl-dev

Thanks for all your replies everyone! This is obviously a rich topic with plenty of opinions and potential solutions.

Unfortunately I was seeking more of an O(days) solution and a current best practice recommendation.

As I mentioned in the beginning, logs are useful but in my situation I can't always rely on there being logs (aside from the kernel log). Granted, this may be unique in that the only people experiencing this are people working on early boot.

I think for my situation I DO want some error details to be part of the ABI. There's a balance between too little information and too much, but I think erring on providing very little has its own price: PEER_CLOSED for every error case.

First, instead of reusing an existing Error enum, I think using a domain-specific error will allow me to represent errors better, while still just using an integer code to do that.

Second, if I need to wrap a downstream cause of the error, I think what Yifei mentioned is a decent approach: return my own union that represents the success and error states (like Result in Rust), which allows me to place more error information (part of the ABI) into a table in the error variant.

Adam Barth

unread,

Mar 4, 2021, 5:04:16 PM3/4/21

to Adam Lesinski, Seth Ladd, Brett Wilson, Chris Osborn, Francois Rousseau, Amit Uttamchandani, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, api-c...@fuchsia.dev, fidl-dev

I don't think we're going to find much of a solution to this problem in O(days) other than to return an enumerated error value.

On Thu, Mar 4, 2021 at 9:57 PM Adam Lesinski <adamle...@google.com> wrote:

Thanks for all your replies everyone! This is obviously a rich topic with plenty of opinions and potential solutions.

Unfortunately I was seeking more of an O(days) solution and a current best practice recommendation.

As I mentioned in the beginning, logs are useful but in my situation I can't always rely on there being logs (aside from the kernel log). Granted, this may be unique in that the only people experiencing this are people working on early boot.

I think for my situation I DO want some error details to be part of the ABI. There's a balance between too little information and too much, but I think erring on providing very little has its own price: PEER_CLOSED for every error case.

First, instead of reusing an existing Error enum, I think using a domain-specific error will allow me to represent errors better, while still just using an integer code to do that.

That sounds like a good step.

Second, if I need to wrap a downstream cause of the error, I think what Yifei mentioned is a decent approach: return my own union that represents the success and error states (like Result in Rust), which allows me to place more error information (part of the ABI) into a table in the error variant.

I'd be skeptical of that approach for the reasons discussed in this thread and in the FIDL API Rubric. Please don't rush ahead and add system ABI that will need to be maintained forever because we do not have a complete solution for this complex problem in O(days).

Chris Osborn

unread,

Mar 4, 2021, 9:04:16 PM3/4/21

to Adam Barth, Adam Lesinski, Seth Ladd, Brett Wilson, Francois Rousseau, Amit Uttamchandani, Shai Barack, Carlos Pizano, Chase Latta, Gary Bressler, api-c...@fuchsia.dev, fidl-dev

Ack that Adam is looking for a short-term solution, but I wanted to also note that Forensics is funding a Q2 project that allows fidlcat and zxdb to be used at the same time. From what I understand, this project will get us closer to being able to step across channels.

Gary Bressler

unread,

Mar 5, 2021, 12:19:08 PM3/5/21

to Chris Osborn, Adam Barth, Adam Lesinski, Seth Ladd, Brett Wilson, Francois Rousseau, Amit Uttamchandani, Shai Barack, Carlos Pizano, Chase Latta, api-c...@fuchsia.dev, fidl-dev

Does FIDL have any recommendations for the "scope" of an error type? We currently have a single error type, fuchsia.component.Error for all Component Framework FIDLs. But one could also define one type per protocol, or even per method (the latter seems like it probably goes too far). I expect that there's some overlap in terms of what errors each protocol returns, but certain errors may only make sense for certain protocols.

Adam Barth

unread,

Mar 5, 2021, 12:37:57 PM3/5/21

to Gary Bressler, Chris Osborn, Adam Lesinski, Seth Ladd, Brett Wilson, Francois Rousseau, Amit Uttamchandani, Shai Barack, Carlos Pizano, Chase Latta, api-c...@fuchsia.dev, fidl-dev

I don't see any advice on that topic in https://fuchsia.dev/fuchsia-src/concepts/api/fidl

Seems like a trade-off between being precise about what errors can arise from each operation and the awkwardness of proliferating many similar declarations. Generally, if there is a high degree of overlap between the errors returned from different methods, we should use a common error type. If the errors domains are largely distinct, then we should use distinct types.

Adam

Reply all

Reply to author

Forward