Hi Carter,I think generics should look something like this:struct Foo(Bar, Baz) {# Foo is generic. Bar and Baz are type parameters.a @0 :Int32;b @1 :Bar;c @2 :Text;d @3 :List(Baz);}struct Qux {e @0 :Foo(Text, SomeStruct);# Use of Foo().}Notice that we use function call syntax for generics. `Foo` is, after all, a type function. I hate that C++ and Java use <> because they introduce all sorts of ambiguity with arithmetic operators, and there really is no reason to distinguish from value functions.There are two important requirements I want to highlight:1) Generic Cap'n Proto schemas should be implementable using the generics available in common programming languages -- that is, for the declarations above, the C++ code generator should generate a template class Foo<Bar, Baz>, a Java code generator should use Java generics, etc.2) It should be possible to take an existing specific type and "genericize" it without breaking wire compatibility, and vice-versa. That is, if I defined:struct Foo2 {a @0 :Int32;b @1 :Text;c @2 :Text;d @3 :List(SomeStruct);}This type has exactly the same wire format as `Foo(Text, SomeStruct)`.Unfortunately, I think the above requirements imply a sad limitation: we can only allow type parameters to bind to pointer types, not primitives, much like in Java. The reason for this is that the algorithm to compute offsets of primitives within the struct's data section is complicated, with every field's offset depending on the sizes of all the fields before it. It seems unfeasible to compute the offset dynamically in template or runtime code.If we were willing to give up either of the two requirements, then we could possibly get around this. For instance, if we dropped (1) and instead said that code would only be generated for specific instances used in the schema files, then we could compute all offsets in advance the same way we do now. Or, if we were willing to drop (2) then we could say that fields of generic type always occupy a whole word in each of the sections, which is enough space to hold any kind of value.However, I think requirements (1) and (2) are both more important to me than the ability to parameterize non-pointer types. The pointer-only requirement can be worked around, after all, by "boxing" primitive types as one would in Java, which at worst wastes 127 bits (which is much better than, say, boxing in Java).If we agree that this is an acceptable limitation, then I think the remaining details fall into place intuitively. For any given generic type, we generate one version of the type where all the type parameters are bound to `AnyPointer`, and then we generate a C++ template/Java generic/whatever wrapper around that type which interprets all of the AnyPointers according to the specific parameterization. The most complicated implementation issues are around the schema and dynamic API in C++, but that's only because there are some unusual implementation requirements in C++ (mostly around avoiding heap allocation); I'm guessing C++-specific details don't interest you. :)Thoughts?-KentonOn Sat, Feb 15, 2014 at 10:16 PM, Carter Schonwald <carter.s...@gmail.com> wrote:
Not yet subscribed to the mailing list (amidst travel this week).
Anywho, please feel welcome to brain dump away. Having a nice story for "template"/ monomorphized at run time generics (or whatever you want to call em) would settle one of the bigger design smells I think I'm (seemingly) stuck on wrt the protocol. (I like it overall mind you)
Anywho, id love to hear your brain dump on this topic.
-Carter
If we were willing to give up either of the two requirements, then we could possibly get around this. For instance, if we dropped (1) and instead said that code would only be generated for specific instances used in the schema files, then we could compute all offsets in advance the same way we do now. Or, if we were willing to drop (2) then we could say that fields of generic type always occupy a whole word in each of the sections, which is enough space to hold any kind of value.
If we agree that this is an acceptable limitation, then I think the remaining details fall into place intuitively. For any given generic type, we generate one version of the type where all the type parameters are bound to `AnyPointer`, and then we generate a C++ template/Java generic/whatever wrapper around that type which interprets all of the AnyPointers according to the specific parameterization. The most complicated implementation issues are around the schema and dynamic API in C++, but that's only because there are some unusual implementation requirements in C++ (mostly around avoiding heap allocation); I'm guessing C++-specific details don't interest you. :)
This seems like a pretty big limitation. I imagine a lot of people (ie. me) would want to use this in the context of numerics, and be unable to without paying the cost of boxing. I haven't given it too much thought, but is there any way to keep (1), but use template specialization to deal with pre-computing offsets and what-not? I'm guessing this would be a C++ only solution though, since most other language's generics don't allow for specialization.
Would there be any big changes to how the dynamic API looks? Tangentially, what kind of changes would have to happen at the parsed schema level?
Regarding numerics, would it be fair to say that most of the problem is with lists of numbers, not individual fields? As it turns out, the overhead of boxing magically goes away if we're talking about a list, because lists of structs are flattened and compacted. E.g. if you have a struct containing a single UInt16 field, then a list of this struct will in fact only use two bytes per element (rounded up to one 64-bit word, plus a one-word tag).
Does this compaction still occur if there's a nested struct?
However, I think requirements (1) and (2) are both more important to me than the ability to parameterize non-pointer types. The pointer-only requirement can be worked around, after all, by "boxing" primitive types as one would in Java, which at worst wastes 127 bits (which is much better than, say, boxing in Java).
C++ templates are Turing complete, so C++ can, in theory, have the best of both worlds.
/me cowers :)
>
> --
> You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/capnproto.
2014-02-21 23:17 GMT+01:00 Kenton Varda <temp...@gmail.com>:[...]However, I think requirements (1) and (2) are both more important to me than the ability to parameterize non-pointer types. The pointer-only requirement can be worked around, after all, by "boxing" primitive types as one would in Java, which at worst wastes 127 bits (which is much better than, say, boxing in Java).Why is (1) so important?
Also, how about languages that don't have generics in the first place.. there point (1) doesn't even apply (at least not in the way as it was stated; there may still be some form of limitations to fulfill pt 1).
I feel (albeit vaguely) that it ought to be enough to have generated code for every specialization used in the schema, rather than to provide a generic schema type..
ok, thats super reasonable. I'm honestly still new to dealing with those sorts of issues. :)-CarterOn Tuesday, February 25, 2014 at 1:44 AM, Kenton Varda wrote:
On Mon, Feb 24, 2014 at 6:18 PM, Carter Schonwald <carter.s...@gmail.com> wrote:
As a point of order, i'm totally ok with the case where making a preexisting field generic breaks wire compatibility if that enables some flexibility in how much unboxing or not can be done on generic fields. Totally totally ok with it. You'd better have versioning anyways :)
Well, I'm not. :)Cap'n Proto inherits the Protobuf philosophy that eschews "versioning" in favor of a more fluid kind of compatibility where old code simply ignores new fields it doesn't know about and so there's no need to update the world after every change. If updating everyone is necessary then people will often choose to implement ugly hacks where they shoe-horn one piece of data into some other type that wasn't designed to hold it and eventually the whole protocol descends into madness.-Kenton
Is there a way to programmatically test two schemas to check if the new one is compatible with the old one?
--
greg
Is there a way to programmatically test two schemas to check if the new one is compatible with the old one?
Is there a way to programmatically test two schemas to check if the new one is compatible with the old one?