Reflection-driven schema-free RPC and ser/deser

31 views
Skip to first unread message

Christopher Wheeler

unread,
Jun 25, 2019, 10:29:13 AM6/25/19
to Cap'n Proto
I'm reaching out because I want to collect thoughts on this, but I'm going to enumerate the core points up front:

1: Code generation is better, but it's not helpful when you don't have control of the data structures being exchanged
2: It seems that the capnproto protocol could be used to encode information about a data structure that was not generated using a capnproto schema
3: If 2 is true, it seems that promise-driven capnproto RPC would be achievable for those same structures

The question is: would it be worthwhile to construct a system that uses capnproto to exchange data between data structures that weren't built using a capnproto schema?

Longer version:

I'm working on a project that needs to exchange calls both inter-and-intra-host and needs to be able to do RPC on data structures that were not build to support RPC.  For reasons, it's really hard to get the project owners to adjust their data structures (and it's not clear that doing so would provide a better customer experience anyway)

If we were to imagine reflecting over any in-memory data structure, we could also imagine a system that enumerated those properties by name and type and was capable of building an in-memory parse-tree for that structure.  With that, you could imagine constructing an in-memory data structure that performs the mapping that generated code would.  If you had those systems, the encoder and decoder could exchange data between processes and that data could be used to map to another object that was, "close enough" (meaning that the property enumeration was the same, and the types lined up)  

With packing, and assuming empty buffer space, it seems like your enumeration strategy could be pretty sparse (let's assume a configurable key space of a 8-24 bit integer).  If the properties weren't enumerated sequentially, but instead using some hashcode accounting for property name, property type, and nesting level, this would allow a lot of flexibility for type-to type mapping across processes.

Assuming that the decoder was able to discretely specify which properties on the receiving type were candidates for data received from the message, it seems like this would be pretty secure.

Ok, so now the question: does this make sense?  It seems to me like this would provide a pretty generic and straightforward RPC and ser/deser solution, but am I missing something?  It seems like the up-front cost of establishing the pointer maps using reflection might be high, and that cost might be incurred again whenever the key space in the messages changed, but still it seems like it helps a client accomplish a goal that was previously unachievable (sharing data that was stored in a structure it didn't own, with a system that also may not own the data structure)

Do people know of another solution that achieves this?  Can anyone think of something I'm just plain missing here?

Kenton Varda

unread,
Jun 26, 2019, 4:39:33 AM6/26/19
to Christopher Wheeler, Cap'n Proto
Hi Christopher,

Cap'n Proto defines a compiled format for schemas -- itself expressed as Cap'n Proto -- in schema.capnp:


This is actually the input format for code generators, so it definitely contains all information that code generators have.

The C++ implementation (and perhaps others, I'm not sure) also provides a "dynamic" API which can consume schemas for arbitrary types and runtime, then operate on instances of those schemas:


This seems like most of what you're describing. It is indeed useful to use the dynamic API to implement operations that apply over any general structure, such as stringification. One can also use annotations in order to provide extra information in a schema file that's useful for a particular operation. For example, the JSON converter supports annotations to control how the structure should be represented in JSON:


At some point I would like to extend the RPC protocol with a way to query schemas. I think this would be most useful for implementing developer tools that can dynamically connect to an arbitrary RPC server and let the developer interactively explore the API it provides and issue commands. Maybe it could also allow the creation of client libraries in dynamic languages that do not require schemas (though they'd be slow compared to generated code).

One part about your e-mail that I'm not sure I understand correctly: It sounds like you're also looking for the ability to operate on data structures with ad hoc layouts, not originally designed as Cap'n Proto. You might be able to do a little bit of this: the compiled format has the offsets of primitive fields already computed. So, if you had a C struct containing only primitive fields (integers, floats, booleans -- no pointers), you could conceivably construct a schema.capnp description of it. However, schema.capnp was not really designed to be used this way, and I would not be excited about trying to extend it for such use.

Another approach you could take is to develop an annotation-driven converter similar to JsonCodec. Then, you can support whatever specific features you need without having to modify the Cap'n Proto library.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/980a99b0-6fd3-4b7f-8b8b-7d515d311db9%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages