Dynamically Creating Schemas

410 views
Skip to first unread message

Hassan Syed

unread,
Oct 30, 2013, 2:38:59 PM10/30/13
to capn...@googlegroups.com
Hello Kenton,

I'm wondering if it would be possible to get a high level api to create/alter schemas dynamically at runtime. I want to store the schemas in a database and manage the id's myself. I also want to be able to pull the schemas out of the database, alter them and write them back. Or have my applications do it dynamically based on runtime needs.

I've managed to find a way to do it, but it isn't pretty. The only place to get to the offset calculations is by going through the compiler by generating a Decleration AST Schema. So, 

1. Generate a capnp chunk from a Node Schema.
2. Including all dependencies, or provide a loader that masquerades as the filesystem module loading stuff :(.
3. Lex and parse it.
4. Alter the declaration AST
5. Recompile it 

That is correct right  ? Is there a way to get the offset calculations to run against the Node Schema ? 

For the higher level API we use a user defined notion of scope that maps to filenames-id pairs.  I'm sticking with the notion of namespaces, any nested schemas in the namespace requiring an id get it as follows "namespace_id << 32 + highest_schema_id++". 

I'd only need API calls to alter the schemas in a non-breaking way, and a call to recalculate the offsets. I do plan on writing a tool to populate a database via capn files, well at least to start with. In the long run I might come up with a dynamic DDL to fit in with my project (CREATE, DROP, ALTER) so more fleshed out support would be great. 

Dynamic interfaces might be yummy as well, but I won't need that :D 

I've got a few more days till I go back to work, but I do not want to try and get my head around node-translator.c++. 

- Hassan

Kenton Varda

unread,
Oct 30, 2013, 5:30:39 PM10/30/13
to Hassan Syed, capnproto
Hi Hassan,

I'll have to think about this.

Currently the only public API to the struct layout code is through SchemaParser.  It sounds like you are using internal APIs from the "compiler" subdirectory -- note that these may change in future versions, which is why these headers aren't normally installed to /usr/include.

Can you describe what kind of alterations you intend to make programmatically?  Does the schema start out defined in a .capnp file, or is it completely auto-generated?  If an alteration is made, does it have to be compatible with the schema that you'd get if you had hand-modified the .capnp file?

-Kenton


--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto.

Hassan Syed

unread,
Oct 31, 2013, 5:31:30 AM10/31/13
to capn...@googlegroups.com, Hassan Syed

God this turned out to be quite a long one. I was meaning to write an email directly to you with some of my thoughts / motivations. However since we have begun tackling this stuff here I’ve done a brain dump of all the stuff I wanted to get across. The short version addresses your question, the rest is my brain dump :D


The short version and current needs:


My current use-case would be satisfied by altering an existing node in a non breaking/compatible way. I might just be covered by only the ability to add fields to structs. I will be using capn files to generate the basic types at the moment. 


However, i'm hoping to convince you to build a whole API for building schemas dynamically :D 


The long version. 


Premise


protobuf and capnp have a very general description model that can cover pretty much any data modelling need. Combined with there respective DSL's They do this well at the compile time level, and they can work with existing messages dynamically at runtime. But they cannot form the backbone of a system that wants to use them without the underlying description language / filesystem.


Motivation


If you remove the notion that schemas are backed by files (in later stages of the compilation), and relax the notion that the capnp DSL defines the AST, the AST/compilation model simplifies to the point where it easy to provide an API for client tools. 


This leads to capnp providing a low level programatic AST for general type schemas, that can be manipulated, stored by client tools whilst still being able interoperate with the compile time concepts of capnp.


If capnp catered for the above style of being a unified AST divorced from capnp files, it opens up the following use-cases:


0. As a central AST to represent translating data formats.


Perhaps the most important use-case: BSON, protobof, msgpack, thrift, json and a zillion other data formats out there. need to be interoped with. This functionality has advantages for capnp uptake. 


My motivations for this differ tho, if I can’t find an implementation for capnp for a certain language, or if I can’t handle the verbose and fiddly nature of constructing capnp messages in certain languages, I should be able to fall back onto BSON, especially since it’s so easy to work with, perhaps users of my system would think this was as well.


The protobuf wire format also has some advantages compared to capn in terms of how messages are constructed as well as the encoding format. I think capnp could do with protobuf style message building layer. I will make another post about this.


1. Fully Dynamic DDL for use in a system with a database as a backing store for schemas: 


Integration with a processing engine, or a database. I.e., CREATE, DROP, ALTER commands for building Structs that are enclosed in a database concept of a file (database name ? ). Client will probably need a AST for a data representation model, as well as a data model. capnp can do both. An example for data model needs could be DROP -ing a struct should lead to cascading drops. 


2. Embedded schemas. 


I came up with the following way of writing an embedded schema a few days ago (https://gist.github.com/hsyed/7246093).


Generating an embeddable representation of the schema is highly desired in what I am doing. It removes the need for parsing a capnp file, or having to compile in c++ into message sources. An additional performance advantage is the removal of the step of acquiring the field types by doing string lookups against the dynamic schema. If you can embed the schema you can declare the fields apriori. The end goal of what I am working on will send a schema to message source providers, embedding schemas is an interim solution.


3. Dynamic Languages / metaprogramming - Dynamic data, Dynamic interfaces. 


This goes beyond my data oriented needs. However being able to generate the data structures, and interfaces at runtime at a low level would be a big win / game changer. Sure you can do this by piggy backing off a level of indirection built on top of capn. However, if you had the right tools to create this stuff dynamically in a low level tool, the possibilities are endless. This is both for compile time and for runtime. 


1. Template expression based DSL in C++/Scala for specifying an RPC system and interchange format without having to rely on a capnp file ? Client runtime would infer everything form the server.


Conclusion and my work.


I’d like to work on some of this stuff, I’ve got my head around the majority of the codebase. I read somewhere that you are considering tackling storage concerns with capn. I’m highly motivated with what you are planning to do with that.


I’m exploring reimplementing a fully dynamic complex event processing system / toolchain with automaton based pattern detection, activate database functionality, as well as traditional storage of events in a relational manner, in a highly symbolised compact format. 


This should give you some insight as to why I want to with a dynamic capnp that can alter the schemas as runtime. The whole system is a pipeline of processing. being able to add stuff during the stages is vital. You specify what an event looks like at source, and the rest of the system builds on that definition. You also need as much flexibility as possible whilst bootstrapping the system, which is why I mentioned points 0,2 :D 

Kenton Varda

unread,
Nov 7, 2013, 2:03:31 AM11/7/13
to Hassan Syed, capnproto
Hi Hassan,

Sorry for the delay in replying.  I've been hoping I'd come up with a clear answer for you, but it doesn't seem to be happening.  :)  So I'll just write some scattered thoughts...

On Thu, Oct 31, 2013 at 2:31 AM, Hassan Syed <h.a....@gmail.com> wrote:

My current use-case would be satisfied by altering an existing node in a non breaking/compatible way. I might just be covered by only the ability to add fields to structs. I will be using capn files to generate the basic types at the moment. 


I guess the key question here is whether there's any need for these dynamically-constructed structs to be compatible with things one might define using the schema language.  If not, then you can assign offsets yourself using whatever algorithm you want, so you can just build schema.capnp and SchemaLoader directly -- that is to say, the API is already there.

But if you want users to be able to switch between your "dynamic" schemas and hand-written ones without breaking compatibility, things get complicated.  The code which decides how to lay out a struct is currently hidden in the compiler, and there's no way to use in programmatically except to actually generate textual .capnp files.

If we did create an API for this, I don't know that simply exposing grammar.capnp (the compiler's internal AST) would be the right way to do it.  That AST is kind of ugly, meant for internal use.  I would want to spend some time coming up with a clean API, but I don't think I have time for that right now.  Generating .capnp files is probably the way to go for the moment (if you need to ensure that what you generate is actually compatible with a hand-written schema).

Note that these auto-generated .capnp files do not have to exist on the filesystem.  The SchemaParser API lets you provide the content as a char array.

Premise


protobuf and capnp have a very general description model that can cover pretty much any data modelling need. Combined with there respective DSL's They do this well at the compile time level, and they can work with existing messages dynamically at runtime. But they cannot form the backbone of a system that wants to use them without the underlying description language / filesystem.


Both systems are intended to allow for dynamic schema construction.  In protocol buffers, you can create DescriptorProtos by hand -- although the required grouping into FileDescirptorProtos is awkward.  Cap'n Proto schemas (schema.capnp) are easier to construct, but there's the layout compatibility issue as mentioned above. 

0. As a central AST to represent translating data formats.


Perhaps the most important use-case: BSON, protobof, msgpack, thrift, json and a zillion other data formats out there. need to be interoped with. This functionality has advantages for capnp uptake. 


I strongly agree with this, and have been saying all along that I intend to provide a library for transcoding to JSON.

I'm not sure how this relates to dynamic schemas, though.  I don't think you'd want to dynamically create a schema from one particular instance of a JSON message.  If you don't have a schema at all, then you might as well leave the data in JSON (or BSON), or encode it in Cap'n Proto as a list of name/value pairs.

The protobuf wire format also has some advantages compared to capn in terms of how messages are constructed as well as the encoding format. I think capnp could do with protobuf style message building layer. I will make another post about this.


Please do; I don't think I understand what you're getting at here. 

1. Fully Dynamic DDL for use in a system with a database as a backing store for schemas: 

2. Embedded schemas.  

3. Dynamic Languages / metaprogramming - Dynamic data, Dynamic interfaces. 


I don't think I'm following what you're getting at with these three points.  What purpose does all this dynamicness serve in practice?  Can you give an example of a practical application that would use this?

-Kenton
Reply all
Reply to author
Forward
0 new messages