Proxying unknown data in FIDL

19 views
Skip to first unread message

Yifei Teng

unread,
Jul 28, 2021, 11:00:06 AM7/28/21
to api-c...@fuchsia.dev, fidl-dev

Hello API Council,


We would like to kindly hear if there are any current or future use cases that call for proxying unknown envelopes in a FIDL message, i.e. an unknown union variant, or an unknown table field. For example, a component may receive a FIDL table, optionally modify certain fields, then resend the same table object to another component.


A majority of FIDL bindings support this form of proxying today, by storing unknown bytes and handles within the generated domain objects. However, as we evolved the FIDL bindings and ecosystem, it turned out that preserving unknown fields (not simply ignoring unknowns) frequently come up as a complicating factor that are prompting us to revisit this support:


  • It makes the ABI of a component harder to reason about: a proxying component may send messages that it does not understand, but proxied from another component.

  • It complicates migrating the FIDL wire format itself: a sender cannot reasonably "upgrade" or "downgrade" the wire format of a FIDL message if it contains unknown data (one could upgrade the known bits but then get an inconsistent message that is a mixture of two wire format revisions). Certain wire format optimizations are also precluded by them.

  • Unknown fields introduce shadow APIs and hidden values into the generated domain objects: operations which update an object may inadvertently drop the unknown fields if not careful.

  • Perhaps a more minor point, other design aspects in the LLCPP bindings prevent proxying unknown fields, adding an inconsistency from the other bindings.


Experiences from gRPC/protobuf have some bearing, but FIDL is also in a different enough domain such that any user stories/use cases from our own FIDL APIs would much better inform our next steps :)


Thanks,

Yifei


Shai Barack

unread,
Jul 28, 2021, 5:38:26 PM7/28/21
to Yifei Teng, api-c...@fuchsia.dev, fidl-dev
I'm not on API council, just a lurker.
Came here to say something that I imagine you already heard from protos / gRPC people, but just in case you haven't -

In proto/gRPC land the unknown field set has been hugely important for being resilient to client/server skew or to skew in complex systems with multiple hops.
Furthermore, the unknown field set behavior is something that I exploited at one of my old teams (go/appreduce). We were able to eliminate a lot of proto generated code in apps this way, typically making bindings ~30% lighter, which was hugely impactful for business metrics (app size, performance, installs/uninstalls, update uptake ratio). The trick was to treat proto fields that are not get/set in the app code as unknown fields. This let us do things like being able to roundtrip a field from the server back to the server without any code that knows about it.

--
You received this message because you are subscribed to the Google Groups "api-council" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-council...@fuchsia.dev.
To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CANbn4XDw%3DMy6HJsMRnxKMPppmJTovJj3mhtKTTuTYPpmPH%3D%3Diw%40mail.gmail.com.

Jaeheon Yi

unread,
Jul 30, 2021, 5:15:48 PM7/30/21
to Shai Barack, Yifei Teng, api-c...@fuchsia.dev, fidl-dev
Yifei, 

I recently had a use case where allowing unknown fields would have led to a better FIDL design, but in a different context. The fuchsia.ui.pointer.TouchEvent table carries ordinary data and goes to ordinary clients. I wanted upgrade protocols in fuchsia.ui.pointer.augment to allow carrying more-privileged data in the same TouchEvent table over the same protocol, but the privileged types would remain unknown and uninterpretable by ordinary clients. 

Instead, we had to settle for creating a *new* protocol and table type for each augmentation, e.g., fuchsia.ui.pointer.augment.TouchEventWithLocalHit. Each augmentation requires embedding the ordinary TouchEvent in a new type, so we have:
- 1 new protocol and 1 new table type for each augmentation
- no composition of multiple augments, each C(C(T)) case must be dealt with ad-hoc
- server must know how to process each new augment protocol
- client must know how to process each new augment protocol, instead of obtaining a higher-resolution table type that allows picking out data intended for it. 

There's also an expressivity question - How to allow TouchEvent to be defined without the augmented members for ordinary clients, but allow each augmentation protocol to define additional members?

Jaeheon

Ian McKellar

unread,
Aug 3, 2021, 5:27:37 PM8/3/21
to Jaeheon Yi, Shai Barack, Yifei Teng, api-c...@fuchsia.dev, fidl-dev
Hi all,

I'm worried that blindly proxying unknown parts of messages can harm the security and privacy goals of Fuchsia. Our declarative component topology along with features like the recently introduced handle rights constraints allow us to reason about what capabilities are exposed to each component in the system. If intermediary services can be induced to pass arbitrary data and handles that's a huge side-channel. I'm not a security engineer, but that seems bad.

The problems that Shai & Jaeheon raise that could maybe be solved with proxying are real, but I think other solutions are better.

Ian

Bryan Henry

unread,
Aug 3, 2021, 5:46:57 PM8/3/21
to api-council, Ian McKellar, Shai Barack, Yifei Teng, api-c...@fuchsia.dev, fidl-dev, Jaeheon Yi
Thanks Ian, I had the same thought when scanning this thread earlier. Definitely proxying unknown handles (or, really, doing *anything* with unknown handles besides closing them) would be undesirable.

I feel less strongly about unknown data, though. Is there a useful middle ground here? For example, what if we supported preserving unknown fields, but on an opt-in basis (not sure where that opt-in would live; protocol, runtime opt-in on the server side, something else), so that the default configuration would be the "most secure"?

Yifei Teng

unread,
Aug 3, 2021, 6:05:22 PM8/3/21
to fidl-dev, Bryan Henry, Ian McKellar, Shai Barack, Yifei Teng, api-c...@fuchsia.dev, fidl-dev, Jaeheon Yi
By gathering use cases for proxying unknown data, we're hoping to draw commonalities from which we could generalize this "opt-in" feature, if needed.

A potential path we might end up taking is that FIDL doesn't support preserving unknowns by default, but we could offer the proxying feature via different APIs or libraries. We may end up re-trodding the same footsteps as protobuf and add back unknown proxying everywhere some years down the line, but that would be guided by concrete needs.

Just adding my thoughts to the two use cases (more or less) proposed above:

- Eliminate parsing/serialization code for unneeded fields from Shai: I wonder if this is solved by the partial update pattern. Specifically, instead of replacing the object on the server-side wholesale, the server would merge the object sent by the client with the one it has in storage. This way it's okay if any unknown data the client doesn't understand does not appear in the object sent by the client, since that does not overwrite/delete the corresponding unknown field on the server-side.

- Represent privileged/sensitive data as unknown fields from Jaehoen: IIUC, the hypothetical "better FIDL design" involves sending privileged data to unprivileged clients, and assuming that they cannot do anything with it since they're not compiled with the necessary code to parse them. From a security perspective, is it a sound guarantee though? It seems that a malicious client could reverse engineer the schema to a certain extent from the unknown data, and thus sift out potentially sensitive information. IMO the only safe way to prevent leaking info to unprivileged clients is to never send those fields in the first place.

Cheers,
Yifei

Jaeheon Yi

unread,
Aug 3, 2021, 10:23:24 PM8/3/21
to Yifei Teng, fidl-dev, Bryan Henry, Ian McKellar, Shai Barack, api-c...@fuchsia.dev
On Tue, Aug 3, 2021 at 3:05 PM Yifei Teng <yif...@google.com> wrote:
By gathering use cases for proxying unknown data, we're hoping to draw commonalities from which we could generalize this "opt-in" feature, if needed.

A potential path we might end up taking is that FIDL doesn't support preserving unknowns by default, but we could offer the proxying feature via different APIs or libraries. We may end up re-trodding the same footsteps as protobuf and add back unknown proxying everywhere some years down the line, but that would be guided by concrete needs.

Just adding my thoughts to the two use cases (more or less) proposed above:

- Eliminate parsing/serialization code for unneeded fields from Shai: I wonder if this is solved by the partial update pattern. Specifically, instead of replacing the object on the server-side wholesale, the server would merge the object sent by the client with the one it has in storage. This way it's okay if any unknown data the client doesn't understand does not appear in the object sent by the client, since that does not overwrite/delete the corresponding unknown field on the server-side.

- Represent privileged/sensitive data as unknown fields from Jaehoen: IIUC, the hypothetical "better FIDL design" involves sending privileged data to unprivileged clients, and assuming that they cannot do anything with it since they're not compiled with the necessary code to parse them. From a security perspective, is it a sound guarantee though? It seems that a malicious client could reverse engineer the schema to a certain extent from the unknown data, and thus sift out potentially sensitive information. IMO the only safe way to prevent leaking info to unprivileged clients is to never send those fields in the first place.

Correct, unprivileged clients would never receive privileged data, and the server would guarantee that. But more precisely, for each privilege P, only clients cleared for P would receive data defined in P.  

Here, the desire was to use one common table to define data carried to both unprivileged and privileged clients, where unprivileged clients would compile against a limited set of definitions, and would remain ignorant of the rest. But privileged clients (along N dimensions of privilege) can compile against an additional slice of the table. Then, it's very nice for auditing and evolution. When we update a privilege P, an unprivileged client (or a client with an unrelated privilege Q) do not need to be recompiled or managed through an API transition; only affected clients of P need to be dealt with. 

Yifei Teng

unread,
Aug 6, 2021, 4:42:34 AM8/6/21
to fidl-dev, Jaeheon Yi, fidl-dev, Bryan Henry, Ian McKellar, Shai Barack, api-c...@fuchsia.dev, Yifei Teng
On Tuesday, August 3, 2021 at 7:35:00 PM UTC-7 Jaeheon Yi wrote:
On Tue, Aug 3, 2021 at 3:05 PM Yifei Teng <yif...@google.com> wrote:
By gathering use cases for proxying unknown data, we're hoping to draw commonalities from which we could generalize this "opt-in" feature, if needed.

A potential path we might end up taking is that FIDL doesn't support preserving unknowns by default, but we could offer the proxying feature via different APIs or libraries. We may end up re-trodding the same footsteps as protobuf and add back unknown proxying everywhere some years down the line, but that would be guided by concrete needs.

Just adding my thoughts to the two use cases (more or less) proposed above:

- Eliminate parsing/serialization code for unneeded fields from Shai: I wonder if this is solved by the partial update pattern. Specifically, instead of replacing the object on the server-side wholesale, the server would merge the object sent by the client with the one it has in storage. This way it's okay if any unknown data the client doesn't understand does not appear in the object sent by the client, since that does not overwrite/delete the corresponding unknown field on the server-side.

- Represent privileged/sensitive data as unknown fields from Jaehoen: IIUC, the hypothetical "better FIDL design" involves sending privileged data to unprivileged clients, and assuming that they cannot do anything with it since they're not compiled with the necessary code to parse them. From a security perspective, is it a sound guarantee though? It seems that a malicious client could reverse engineer the schema to a certain extent from the unknown data, and thus sift out potentially sensitive information. IMO the only safe way to prevent leaking info to unprivileged clients is to never send those fields in the first place.

Correct, unprivileged clients would never receive privileged data, and the server would guarantee that. But more precisely, for each privilege P, only clients cleared for P would receive data defined in P.  

Here, the desire was to use one common table to define data carried to both unprivileged and privileged clients, where unprivileged clients would compile against a limited set of definitions, and would remain ignorant of the rest. But privileged clients (along N dimensions of privilege) can compile against an additional slice of the table. Then, it's very nice for auditing and evolution. When we update a privilege P, an unprivileged client (or a client with an unrelated privilege Q) do not need to be recompiled or managed through an API transition; only affected clients of P need to be dealt with. 

Got it. IIUC, the "additional slice" hypothetical feature still works if FIDL does not proxy unknown data. A strawman (with obvious downsides) could be...

// In library fuchsia.ui.pointer.augmentations
// One common table to define all the data
type TouchEventComplete = table {
  1: data TouchData;
  2: foo FooAugmentation;
  3: bar BarAugmentation;
};

type TouchEventWithFoo = table_slice TouchEventComplete { 1, 2 };  // TouchEventWithFoo has field 1 and 2 (data and foo)
type TouchEventWithBar = table_slice TouchEventComplete { 1, 3 };  // TouchEventWithBar has field 1 and 3 (data and bar)

// In library fuchsia.ui.pointer
type TouchEvent = table_slice TouchEventComplete { 1 };  // Regular TouchEvent just have field 1 (data)

Though, having written it out, it doesn't seem to be much better than the existing protocol. In particular, we'd still need to define separate protocols to carry the separate TouchEventWithFoo/TouchEventWithBar/etc types.

If we went one step further and made TouchEvent sometimes have all three fields, and sometimes have only one field, depending on some per-library build configuration, that gets complicated really quickly when libraries disagree what fields are present in a TouchEvent. Those libraries might end up living inside one process and give rise to conflicts.

Back to the question of proxying unknown data, it seems that in this particular case we benefit from actively dropping unknown data that is otherwise sensitive/privileged :)
Reply all
Reply to author
Forward
0 new messages