Self-contained structures in C++

1,021 views
Skip to first unread message

Nathan Hourt

unread,
Mar 3, 2016, 6:34:31 PM3/3/16
to Cap'n Proto
Hi, all. I'm looking for a way to have a capnp data structure which has standard value semantics (move, copy, etc), but owns its storage (i.e. is self-contained), which will be deallocated when the object is destroyed. Basically, I want it to act like a kj::Array: I can use it as-is, I can move it around between functions, but when it gets destroyed, its storage goes with it.

I've thought about something like std::pair<MallocMessageBuilder, Foo::Builder> where the builder references the MallocMessageBuilder, but doing that requires separate APIs for self-contained builders vs. normal builders which reference storage elsewhere. I can easily create my own solution (one trivial solution would be std::pair<kj::Maybe<MallocMessageBuilder>, Foo::Builder>), but if kj or capnp already has something for this, I might as well use that.

Kenton Varda

unread,
Mar 4, 2016, 8:58:40 PM3/4/16
to Nathan Hourt, Cap'n Proto
I think you want something like:

template <typename T>
class RcBuilder<T> {
  // Refcounted builder.

public:
  typename T::Builder* operator->() { return builder; }

  template <typename U>
  RcBuilder<capnp::FromBuilder<U>> child(U subBuilder) {
    // Given `subBuilder` which is a child of this object, return a new
    // RcBuilder wrapper that also holds a refcount.

    return RcBuilder<capnp::FromBuilder<U>>(kj::addRef(*message), subBuilder);
  }

private:
  class RefcountedMallocMessageBuilder: public kj::Refcounted, public capnp::MallocMessageBuilder {}

  kj::Own<RefcountedMallocMessageBuilder> message;
  typename T::Builder builder;
};

(Above is incomplete, but you get the idea.)

Note that any reference to any part of the message of course causes the entire message to remain resident in memory. If that's a problem, you'll need to copy the sub-object into a new MessageBuilder and wrap that.

-Kenton

On Thu, Mar 3, 2016 at 3:34 PM, Nathan Hourt <nat....@gmail.com> wrote:
Hi, all. I'm looking for a way to have a capnp data structure which has standard value semantics (move, copy, etc), but owns its storage (i.e. is self-contained), which will be deallocated when the object is destroyed. Basically, I want it to act like a kj::Array: I can use it as-is, I can move it around between functions, but when it gets destroyed, its storage goes with it.

I've thought about something like std::pair<MallocMessageBuilder, Foo::Builder> where the builder references the MallocMessageBuilder, but doing that requires separate APIs for self-contained builders vs. normal builders which reference storage elsewhere. I can easily create my own solution (one trivial solution would be std::pair<kj::Maybe<MallocMessageBuilder>, Foo::Builder>), but if kj or capnp already has something for this, I might as well use that.

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

Branislav Katreniak

unread,
Mar 11, 2016, 9:59:41 AM3/11/16
to Cap'n Proto, nat....@gmail.com
Hi Kenton / all

I am evaluating capnproto and I love it! But I am worried about the lack of self contained classes.

I would like to discuss how to extend capnpc-c++ to generate not only Reader and Builder classes but also the self contained class.
I will call it Mutable for now.

When Mutable is constructed from Reader, Builder or another Mutable, all data is copied into one new segment owned by Mutable.
This makes it quite fast operation without many allocations.
The price is that this initial memory is never released as Mutable is modified and it is well bounded.

When Mutable is modified and the modification needs to allocate memory, it allocates new segment for each chunk.
These separate segments can easily be released when they are disowned.

Would it be possible to extend List encoding to hold both size and capacity?

Then Mutable could be just Builder subclass owning MutableMessageBuilder.

Does it make sense?

Brano

Kenton Varda

unread,
Mar 11, 2016, 2:18:27 PM3/11/16
to Branislav Katreniak, Cap'n Proto, Nathan Hourt
Hi Branislav,

I have actually been planning for a while to add "POCS support" ("Plain Old C++ Struct"), which is slightly different from what you describe but solves similar problems.

Currently for a struct Foo you get Foo::Builder and Foo::Reader. The type `Foo` itself is a "namespace struct"; it only exists to contain the other names. My plan is to make `Foo` actually be a plain-old C++ struct matching the declared type. It would have a constructor that copies from a Reader and a method for copying itself into a Builder.

Using the POCS would of course entail a copy and some allocation, but the benefit is now you have a data structure that can be used in the ways that we're all used to using C++. It can even be mutated arbitrarily without the memory leaking problem. People would likely use POCS in non-performance-sensitive cases and then could use the zero-copy APIs when performance really matters.

There is a slight difference between what I'm proposing and what you're proposing in that a POCS would do more allocation of sub-objects since it's not using Cap'n Proto format at all, but I think that's the right choice since when performance matters people can optimize based on the zero-copy classes.

Thoughts?

-Kenton

--

Nathan Hourt

unread,
Mar 11, 2016, 2:36:16 PM3/11/16
to Kenton Varda, Branislav Katreniak, Cap'n Proto

I like that design. Having a POCS (which easily copies to/from a builder/reader) which I can use in my frontend code would be more convenient. The frontend isn't very performance sensitive, and it's a bit trickier to convince Qt to play nicely with the builders and readers.

--
Nathan Hourt

The Truth will set you free

Branislav Katreniak

unread,
Mar 11, 2016, 3:41:46 PM3/11/16
to Kenton Varda, Cap'n Proto, Nathan Hourt
POCS class would definitely solve my problem functionally and I if it was available now, I would likely not even think about this. 

I must say that I have bad experience with parsing json messages into Variants. That was so many allocations that it became bottleneck. POCS class can be much better, depending and what is in the message. If somebody encodes Hash<Text, Text>, it is a lot of allocations. But I have few reasons against simple POCS class:
* it will drop any information that is encoded in newer version of protocol. 
* kj::String allocates even small strings on heap. That is costly. Small size optimization in kj::String would solve this.
* POCS means another chunk of code generated per capnproto class.

My idea can be realized much simpler. If arena can track how much memory is used and how much is leaked, it can rebuild message from scratch if it wastes too much memory. That solves the whole problem of leaking. Arena can also start to reuse memory.

The remaining problem is to find the right moment when the message can be rebuilt. Arena would need to track all Readers and Builders. Then it is possible to implement different strategies by providing different MessageBuilder implementation:
* ensure (runtime) that no readers exists when builder is active and rebuild message when writer exits.
* ref counted arena - either thread safe locked or single threaded without locking. Qt-like implicit sharing (copy on write) is possible.
* Sub-Mutable can reference parent arena if if is bigger than some ration of whole arena. Otherwise it makes deep copy.

API wise ... can Mutable class be supper set of POCS class? I think so.

But this approach really needs extension to List encoding to hold both size and capacity.

Is it sane?

Brano

Kenton Varda

unread,
Mar 11, 2016, 4:06:36 PM3/11/16
to Branislav Katreniak, Cap'n Proto, Nathan Hourt
On Fri, Mar 11, 2016 at 12:41 PM, Branislav Katreniak <katr...@gmail.com> wrote:
POCS class would definitely solve my problem functionally and I if it was available now, I would likely not even think about this. 

I must say that I have bad experience with parsing json messages into Variants. That was so many allocations that it became bottleneck. POCS class can be much better, depending and what is in the message. If somebody encodes Hash<Text, Text>, it is a lot of allocations. But I have few reasons against simple POCS class:
* it will drop any information that is encoded in newer version of protocol. 

FWIW, I would definitely design this such that it preserves "unknown fields".
 
* kj::String allocates even small strings on heap. That is costly. Small size optimization in kj::String would solve this.

Hmm, "small size optimization" (inlining small strings into the struct) would change semantics in that any pointers into a kj::String would be invalidated when the string is moved. We'd probably need to introduce a new type, but that might not be so bad.

* POCS means another chunk of code generated per capnproto class.

That is a real problem, yes.
 
My idea can be realized much simpler. If arena can track how much memory is used and how much is leaked, it can rebuild message from scratch if it wastes too much memory. That solves the whole problem of leaking. Arena can also start to reuse memory.

The remaining problem is to find the right moment when the message can be rebuilt. Arena would need to track all Readers and Builders.

My intuition is that tracking all readers and builders would be way too expensive, and you'll end up with a net loss here. Currently these classes are trivially copyable; to track them you'd need every constructor/destructor call to manipulate a linked list, and you'd need to store extra pointers.
 
Then it is possible to implement different strategies by providing different MessageBuilder implementation:
* ensure (runtime) that no readers exists when builder is active and rebuild message when writer exits.
* ref counted arena - either thread safe locked or single threaded without locking. Qt-like implicit sharing (copy on write) is possible.
* Sub-Mutable can reference parent arena if if is bigger than some ration of whole arena. Otherwise it makes deep copy.

API wise ... can Mutable class be supper set of POCS class? I think so.

But this approach really needs extension to List encoding to hold both size and capacity.

Is it sane?

I think what you'll end up with is another API that's maybe somewhat easier to manipulate than Builders today but still much harder to manipulate than POCS. I think we will still want the POCS API for maximum ease of use, in which case I am not sure this in-between API adds a lot of value. Meanwhile it sounds pretty complex to implement. I'm also not convinced it would perform better, given the extra bookkeeping needed. E.g. reusing memory would require writing something very much like malloc(), so I wouldn't expect it to perform better than malloc(). Doing GC/compaction adds lots of complexity and may have its own performance issues.

-Kenton

M. Taha

unread,
Mar 12, 2016, 9:27:17 PM3/12/16
to Cap'n Proto
Hi All,
     I agree with Kenton, Please consider implementing POCS in near future (hopefully in next release). It'll be very helpful in my use case.
About performance Issue: I think It would perform better in some use-cases like exchanging messages between processes running in the same machine using shared memory mapping.
And the generated code will be very simple and much readable/understandable to more audience.
I think it would bring a new usage to the library as a code generator regardless of serialization and RPC (even liter than CAPNP_LITE).

keep doing well.

Branislav Katreniak

unread,
Mar 14, 2016, 4:17:33 AM3/14/16
to Cap'n Proto, katr...@gmail.com, nat....@gmail.com
I think what you'll end up with is another API that's maybe somewhat easier to manipulate than Builders today but still much harder to manipulate than POCS. I think we will still want the POCS API for maximum ease of use, in which case I am not sure this in-between API adds a lot of value. Meanwhile it sounds pretty complex to implement. I'm also not convinced it would perform better, given the extra bookkeeping needed. E.g. reusing memory would require writing something very much like malloc(), so I wouldn't expect it to perform better than malloc(). Doing GC/compaction adds lots of complexity and may have its own performance issues.
 
You are totally right. It was stupid idea. Thank you for naming it so clearly!

Do you have already ideas for POCS classes design? 
* What about integration with std classes / or KJ style? Nothrow move constructors? Disabled copy constructor? Using std::string, std::vector for strings and lists?
* How do you plan to preserve "unknown fields"? Reuse capnp encoding for internal representation for everything except pointers? That could lead to high code reuse and fast serialization.

I wonder how capnp is typically used within bigger project without POCS classes. How you use capn only on application edge for RPC and do you hold the state in different set of internal classes? Do you do the memory management explicitly by copying & passing MallocMessageBuilder instances around?

Kind regards
  Brano

Kenton Varda

unread,
Mar 18, 2016, 4:30:16 PM3/18/16
to Branislav Katreniak, Cap'n Proto, Nathan Hourt
On Mon, Mar 14, 2016 at 1:17 AM, Branislav Katreniak <katr...@gmail.com> wrote:
Do you have already ideas for POCS classes design? 
* What about integration with std classes / or KJ style? Nothrow move constructors? Disabled copy constructor? Using std::string, std::vector for strings and lists?

Naturally I'd prefer KJ over std. :) Probably text fields will be kj::String and lists will be kj::Vector.

I'm fine with marking move constructors nothrow (though I think std's insistence on this is misguided).

I think I would disable the copy constructor, but provide a .clone().
 
* How do you plan to preserve "unknown fields"? Reuse capnp encoding for internal representation for everything except pointers? That could lead to high code reuse and fast serialization.

It would be cool to lay out the data fields such that they can be memcpy()d from the serialized format, although I was also thinking it would be nice if you didn't have to go through accessor methods and instead these fields were simply public member variables. I don't think these two things are compatible, due to endianness issues. Will have to think about which is better.

Unknown fields would need to be preserved in separate arrays of bytes and pointers (for the respective sections). I guess preserving data fields is tricky if we don't use the memcpy()able layout since they could be interleaved with the known fields. To preserve a pointer, we'd actually serialize it and its target in "flat" format and store a word array. We should of course optimize for the case that any extra fields are zero/null and so need not be preserved.
 
I wonder how capnp is typically used within bigger project without POCS classes. How you use capn only on application edge for RPC and do you hold the state in different set of internal classes? Do you do the memory management explicitly by copying & passing MallocMessageBuilder instances around?

Well, I'm probably personally the largest-scale user of Cap'n Proto (within Sandstorm.io). I use a variety of styles depending on the situation. Simple data (with only a couple fields) is easy to translate into a struct internally. For complicated data I will sometimes copy into a MallocMessageBuilder, yes. Though I'd say most of the time our RPC calls consume their data directly and produce their results directly, without really copying it elsewhere. Still, the API is wonky and can get cumbersome, making me personally excited about the POCS solution.

-Kenton

Geoffrey Romer

unread,
Mar 18, 2016, 4:53:08 PM3/18/16
to Kenton Varda, Branislav Katreniak, Cap'n Proto, Nathan Hourt
On Fri, Mar 18, 2016 at 1:29 PM, Kenton Varda <ken...@sandstorm.io> wrote:
On Mon, Mar 14, 2016 at 1:17 AM, Branislav Katreniak <katr...@gmail.com> wrote:
Do you have already ideas for POCS classes design? 
* What about integration with std classes / or KJ style? Nothrow move constructors? Disabled copy constructor? Using std::string, std::vector for strings and lists?

Naturally I'd prefer KJ over std. :) Probably text fields will be kj::String and lists will be kj::Vector.

I'm fine with marking move constructors nothrow (though I think std's insistence on this is misguided).

I think I would disable the copy constructor, but provide a .clone().
 
* How do you plan to preserve "unknown fields"? Reuse capnp encoding for internal representation for everything except pointers? That could lead to high code reuse and fast serialization.

It would be cool to lay out the data fields such that they can be memcpy()d from the serialized format, although I was also thinking it would be nice if you didn't have to go through accessor methods and instead these fields were simply public member variables. I don't think these two things are compatible, due to endianness issues. Will have to think about which is better.

It might be possible to do both at once, by making those public member variables special types that used little-endian representation internally, regardless of the host endianness, while providing a native-integer-like interface through operator overloads etc. See e.g. the Boost.Endian arithmetic types.
 

Unknown fields would need to be preserved in separate arrays of bytes and pointers (for the respective sections). I guess preserving data fields is tricky if we don't use the memcpy()able layout since they could be interleaved with the known fields. To preserve a pointer, we'd actually serialize it and its target in "flat" format and store a word array. We should of course optimize for the case that any extra fields are zero/null and so need not be preserved.
 
I wonder how capnp is typically used within bigger project without POCS classes. How you use capn only on application edge for RPC and do you hold the state in different set of internal classes? Do you do the memory management explicitly by copying & passing MallocMessageBuilder instances around?

Well, I'm probably personally the largest-scale user of Cap'n Proto (within Sandstorm.io). I use a variety of styles depending on the situation. Simple data (with only a couple fields) is easy to translate into a struct internally. For complicated data I will sometimes copy into a MallocMessageBuilder, yes. Though I'd say most of the time our RPC calls consume their data directly and produce their results directly, without really copying it elsewhere. Still, the API is wonky and can get cumbersome, making me personally excited about the POCS solution.

-Kenton

--

Kenton Varda

unread,
Mar 18, 2016, 6:45:30 PM3/18/16
to Geoffrey Romer, Branislav Katreniak, Cap'n Proto, Nathan Hourt
On Fri, Mar 18, 2016 at 1:53 PM, 'Geoffrey Romer' via Cap'n Proto <capn...@googlegroups.com> wrote:
It might be possible to do both at once, by making those public member variables special types that used little-endian representation internally, regardless of the host endianness, while providing a native-integer-like interface through operator overloads etc. See e.g. the Boost.Endian arithmetic types.

Right, proxy types. My beef with them is that they produce surprising results when combined with type inference.

    auto foo = myStruct.bar;
    // Turns out `foo` now has type WeirdProxyThing<int>.

    int& iref = myStruct.bar;
    // Unexpectedly doesn't work.

    long n = myStruct.bar;
    someFuncOverloadedOnEveryIntType(myStruct.bar);
    // It's hard to make sure that neither of the above lines complains about ambiguity.

At one time I was thinking about proposing a C++ language change that would allow you to mark a proxy type as "always convert to type T when copying" -- and hence `foo`'s type above would be inferred as `int`, and ambiguity is solved by treating the type as `int`. References are still a problem, though.

Another problem with trying to lay out POCOs precisely is that boolean values would become bitfields. Fortunately we don't need any kind of proxy for booleans (though of course references won't work). Unfortunately the layout of bitfields differs by compiler, so we'll need some #ifdefs.

Yet another problem is that groups are going to be really weird to implement while matching Cap'n Proto layout, since a group's fields can be interleaved with non-grouped fields. The best implementation I can think of would be "slightly UB":

    struct Outer {
      struct SomeGroup {
        Proxy<int> baz;
        Padding<int> _outer_foo;
        Proxy<int> qux;
        Padding<int> _outer_bar;
        Proxy<int> corge;
      }

      union {
        SomeGroup group;
        struct {
          Padding<int> _group_baz;
          Proxy<int> foo;
          Padding<int> _group_qux;
          Proxy<int> bar;
          Padding<int> _group_corge;
        }
      }
    }

Now you might write code like:

    Outer outer;
    outer.foo = 123;
    outer.group.baz = 234;
    assert(outer.foo == 123);  // UB?

I think this is technically UB because accessing any one field of the union technically de-initializes all the others.

Then again, assuming we know exactly how the compiler will pack structs is already a pretty big assumption. Can we assume compilers won't actually have a problem with the above?

-Kenton

Geoffrey Romer

unread,
Mar 18, 2016, 7:38:13 PM3/18/16
to Kenton Varda, Branislav Katreniak, Cap'n Proto, Nathan Hourt
Not necessarily. The standard says that "In a standard-layout union with an active member of struct type T1, it is permitted to read a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2." So I think you're OK so long as SomeGroup and your anonymous struct are standard-layout and layout-compatible (i.e. their "common initial sequence" consists of all data members), and superficially that seems pretty doable.
 

Then again, assuming we know exactly how the compiler will pack structs is already a pretty big assumption. Can we assume compilers won't actually have a problem with the above? 

If the struct is standard-layout, its members are guaranteed to be laid out in the order they're declared, and there's guaranteed to be no padding before the first member. Padding between members is implementation-defined, but from what I can tell, in practice it's always the minimum amount of padding necessary to satisfy alignment requirements. Furthermore, you can use offsetof() to validate your guesses about the layout of the struct, so that if you get it wrong you get a build error rather than silent runtime corruption.
 

-Kenton

Kenton Varda

unread,
Mar 18, 2016, 7:42:14 PM3/18/16
to Geoffrey Romer, Branislav Katreniak, Cap'n Proto, Nathan Hourt
Oh, of course, things like struct sockaddr already rely on aliasing between structs in a union, so that has to work. Great! Maybe this will work.

-Kenton

Alexander S.

unread,
Mar 20, 2016, 12:37:18 AM3/20/16
to Kenton Varda, Geoffrey Romer, Branislav Katreniak, Cap'n Proto, Nathan Hourt
I'm probably missing the point here (and I admittedly know little about capnproto), but could we simply generate a class with byte arrays for primary storage and then expose the POCS fields as references (or pointers) into those arrays?
--ap

Kenton Varda

unread,
Mar 20, 2016, 7:08:09 AM3/20/16
to Alexander S., Geoffrey Romer, Branislav Katreniak, Cap'n Proto, Nathan Hourt
On Sat, Mar 19, 2016 at 9:37 PM, Alexander S. <alex...@thequery.net> wrote:
I'm probably missing the point here (and I admittedly know little about capnproto), but could we simply generate a class with byte arrays for primary storage and then expose the POCS fields as references (or pointers) into those arrays?

Not really. The pointers would be larger than the data they point to, the extra indirection would be slow, and copy constructors would have to be carefully-written since you wouldn't want to end up with pointers into the wrong object.

Better to go with accessor methods at that point.

Branislav Katreniak

unread,
Mar 21, 2016, 4:54:47 AM3/21/16
to Kenton Varda, Cap'n Proto, Nathan Hourt

Naturally I'd prefer KJ over std. :) Probably text fields will be kj::String and lists will be kj::Vector.

I understand that KJ types are a must. But I am curious. Would you accept patches to use std types in POCS classes if based on global define? Like to CAPNP_LITE.
  
* How do you plan to preserve "unknown fields"? Reuse capnp encoding for internal representation for everything except pointers? That could lead to high code reuse and fast serialization.

It would be cool to lay out the data fields such that they can be memcpy()d from the serialized format, although I was also thinking it would be nice if you didn't have to go through accessor methods and instead these fields were simply public member variables. I don't think these two things are compatible, due to endianness issues. Will have to think about which is better.

Getters and setters will make the code the api more consistent with Reader and Builder. The prefixes `set` and `get` are great in generated code, because they move all fields into kind of namespace. No collisions with language keywords, no collisions with generic methods like `clone()`. 
 
I guess preserving data fields is tricky if we don't use the memcpy()able layout since they could be interleaved with the known fields.

Internal padding is limited to 7 bytes. Compiler should know their positions. So this is bot big problem.

What about API for groups? Can they be pure references into parent POCS class without owned memory? If not, I have hard time to imagine how to effectively reuse serialized format for group POCS class. For me, this looks like a show stopper for serialized format in POCS classes.

Kind regards
 Brano

Branislav Katreniak

unread,
Mar 25, 2016, 3:46:04 PM3/25/16
to Kenton Varda, Cap'n Proto
As I am trying to implement generator for POCS classes, I have few questions.

What is good name for POCS class in source code? The generated code is placed in class that was called "outer" class now. It would be good to come with terminology that can be used also in documentation.

What is api to primitive fields? It is not possible to return reference to primitive type, because it can have different byte order and it may be xored with default value. It needs either setter and getter or proxy type. I start simple with setter and getter. Proxy type can be introduced later.

    uint32_t getNumber() const;
    void setNumber(uint32_t);

What is api to string field? Minimal approach that need is intrusive into string class

    kj::String& getName();
    const kj::String& getName() const { return _name; }

kj::String needs to be extend with null field.

Second approach

    bool hasName() const;
    kj::String& getName();
    const kj::String& getName() const;

Non-const getter sets name to non-null. Const getter can return reference to global instance if hasName() == false. This second approach is non-intrusive and works also with std::string.

Can we afford to not support NULL flag for strings in POCS class and encode empty string as null string on wire? That simplifies the API and generated code.

Do we need an option to set NULL flag?

It is easy to add also convenience setter for consistency with builders and primitive types.

    void setName(kj::String &&);

I believe structs and lists can stick to the same API as strings.

Groups can be just a view into POCS class without value semantics. Internally it will be pointer / reference to owning POCS instance. If POCS instance is deleted or if group sits in union section that is invalidated, group simply points to invalid memory. We could track group instances from POCS class and clear the pointers to trade speed / nice exceptions instead of crash. But first version can be simply without unions and groups.

Any  suggestions?

Kind regards
 Brano

Kenton Varda

unread,
Mar 25, 2016, 6:37:12 PM3/25/16
to Branislav Katreniak, Cap'n Proto
Hi Brano,
 
Getters and setters will make the code the api more consistent with Reader and Builder. The prefixes `set` and `get` are great in generated code, because they move all fields into kind of namespace. No collisions with language keywords, no collisions with generic methods like `clone()`. 

That's a good point. Though, we already work around language keyword collisions by appending a trailing underscore to conflicting names.
 
What about API for groups? Can they be pure references into parent POCS class without owned memory? If not, I have hard time to imagine how to effectively reuse serialized format for group POCS class. For me, this looks like a show stopper for serialized format in POCS classes.

I think it would be reasonable for group accessors to return a reference.

For that matter, sub-message accessors would probably return a reference too, except that presumably there'd be a method you can call to have the sub-message disowned which would then return Own<T>. The disown method would not be available for groups.

On Fri, Mar 25, 2016 at 12:46 PM, Branislav Katreniak <katr...@gmail.com> wrote:
As I am trying to implement generator for POCS classes, I have few questions.

It's cool that you're working on this. Note that I tend to have strong opinions on APIs, so it might be a good idea to write up a doc or something with a proposed API before going too far into implementation. :)
 
What is good name for POCS class in source code? The generated code is placed in class that was called "outer" class now. It would be good to come with terminology that can be used also in documentation.

I think the outer class itself (which currently behaves only as a namespace) should be the POCS type. This makes the POCS interface very natural to use, which is its goal in life after all.
 
What is api to primitive fields? It is not possible to return reference to primitive type, because it can have different byte order and it may be xored with default value. It needs either setter and getter or proxy type. I start simple with setter and getter. Proxy type can be introduced later.

    uint32_t getNumber() const;
    void setNumber(uint32_t);

I think it's important to settle on one API now -- I don't want to end up with multiple ways of doing things.

I think the "plain old fields" approach is a nicer API than accessors if we can make it work, so the question is: is there any reason we can't make it work? I think we have to go through all of the features and think about whether there is an issue.

I just realized: We don't need proxies. We can use regular integer types, and we say that endianness is translated during the copy between wire format and POCS. Since almost all CPUs are little-endian, on almost all CPUs we'll still be able to use a memcpy() -- only on big-endian CPUs will the translation have to fall back to handling each field. I think this is a far better plan than using proxy types in POCS since proxies have so many problems.

So let's list out some things:

Primitives:
* Void: C++ does not let you declare zero-width fields, but since voids don't affect the ultimate encoding we could pull the void fields out to the beginning or end of the structure.
* Boolean: Use bitfields.
* Integers/Floats: Use regular types.
* Enums: Use C++11 enum classes with uint16_t as the base type. In fact the enum types we're already generating should work for this.

Pointers: No need for perfect alignment here since we obviously can't memcpy() them anyway.
* Text: kj::String
* Data: kj::Array<kj::byte>
* List(T): kj::Vector<T>. This implies bool lists will expand to byte-per-element in POCS format, not bit-per-element as they are on the wire. That's probably OK; these aren't used very often. (Of course, std::vector<bool> would keep them as bits but everyone seems to agree that was a disaster.)
* Structs: kj::Own<T>
* Capabilities: T::Client (I wonder if we should move T::Client's members into T and make T::Client be an alias for backwards-compatibility?)

Null pointers: I suppose that String, Array, Vector, and Own will all need to support comparison with nullptr. String and Array do already, but they are equivalent to comparing with the empty string / array. This is arguably incorrect, although in practice I'd say programs should avoid distinguishing between null and empty. If we decide this needs to be distinguishable, then we probably need to introduce new types here. Probably, we'd use capnp::Text and capnp::List<T>, which is arguably more consistent with the rest of the API anyhow.

Groups: Described previously. The whole struct would be an anonymous union containing an anonymous struct of the top-level fields as well as named structs for each group, carefully padded to align with each other.

Unions: This is tricky. Possibilities:
- Use a proxy type. Since a union member could itself be a struct, and there's no way to proxy arbitrary members, we'd probably need to use operator->(): foo.unionField->member. But then it's hard for us to tell whether the union member is being accessed for read or for write, which is important because on read we want to throw an exception if it's not the active member and on write we want to make it active. We'd probably have to assume the latter (unless the struct is const).
- Use an accessor method returning a reference. This is similar to the previous point but instead of foo.unionField->member you'd have foo.unionField().member. Unclear whether this is more or less confusing.
- Revert all the way to getFoo(), setFoo(), initFoo(). Sadness.

Another issue with Unions is how to handle "which". It seems like this should be a method too, since we don't want people directly overwriting the union discriminant.

Structs containing unions will need to have non-default constructors, destructors, copy, and assignment to deal with any unioned pointers.

AnyPointer: We can have capnp::AnyPointer be a class with some arbitrary interface for this. It would probably need to store an encoded Cap'n Proto blob behind the scenes, which it could decode on-demand.

AnyList/AnyStruct: kj::Own<AnyList>/kj::Own<AnyStruct>, otherwise similar to AnyPointer.

Generics: I think this is straightforward. We'll need a template typedef Pointer<T> which expands to:
* Pointer<List<T>> -> kj::Vector<T>
* Pointer<T> | T is a capability -> T::Client
* Pointer<T> | T is a struct -> kj::Own<T>
* Pointer<AnyPointer> -> AnyPointer
* Pointer<Text> -> kj::String
* Pointer<Data> -> kj::Array<kj::byte>

So the most difficult parts are groups (seem to work, but weird) and unions (API is inconsistent). There is also an open question about whether to use kj::String, kj::Array, and kj::Vector vs. defining new types capnp::Text, capnp::Data, and capnp::List<T>. I'm actually starting to lean towards the latter for consistency's sake. We could design them to mostly reuse the KJ types under the hood.

I think overall this strategy seems doable.
 
What is api to string field? Minimal approach that need is intrusive into string class

    kj::String& getName();
    const kj::String& getName() const { return _name; }

kj::String needs to be extend with null field.

FWIW if we use accessors, I'd want to have the same set of accessors that builders have today, so get, set, etc:

    bool hasName() const;
    kj::StringPtr getName() const;
    void setName(kj::StringPtr);
    void setName(kj::String&&);
    kj::String disownName();

I would not want methods that return references, since returning a reference defeats a lot of the purpose of accessors -- it allows someone external to subsequently change the value of the struct (through the reference) without any chance for the class to detect this change.
 

Second approach

    bool hasName() const;
    kj::String& getName();
    const kj::String& getName() const;

Non-const getter sets name to non-null. Const getter can return reference to global instance if hasName() == false. This second approach is non-intrusive and works also with std::string.

Can we afford to not support NULL flag for strings in POCS class and encode empty string as null string on wire? That simplifies the API and generated code.

Do we need an option to set NULL flag?

It is easy to add also convenience setter for consistency with builders and primitive types.

    void setName(kj::String &&);

I believe structs and lists can stick to the same API as strings.

Groups can be just a view into POCS class without value semantics. Internally it will be pointer / reference to owning POCS instance. If POCS instance is deleted or if group sits in union section that is invalidated, group simply points to invalid memory. We could track group instances from POCS class and clear the pointers to trade speed / nice exceptions instead of crash. But first version can be simply without unions and groups.

Any  suggestions?

Kind regards
 Brano

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

Branislav Katreniak

unread,
Mar 28, 2016, 4:12:05 PM3/28/16
to Kenton Varda, Cap'n Proto
Hi Kenton

I would not want methods that return references, since returning a reference defeats a lot of the purpose of accessors

I fully agree with this. But is there any difference between returning reference to member variable and publishing member variable directly? I personally have no experience with proxy types. I prefer to choose method returning reference against proxy type just because there is less C++ dark magic involved.

I love your point that byte order and XORing default value is best handled at conversion to / from reader and builder. 

The proposal is nice. I don't really understand ANY pointers in capnproto yet, so these parts I will sink into me later. But I have a proposal for unions
* UnionEnum which() const - like Reader
* bool isFoo() const - like Reader
* FooType& foo() - throws if which is not FOO. Returns reference to foo.
* const FooType& foo() const - throws if which is not FOO. Returns const reference to foo.
* FooType& initFoo() - sets which to foo, returns reference.

I really dislike to act differently on const and non-const object. That is hard to think about.

> I'd say programs should avoid distinguishing between null and empty.

I would like to not give an option to distinguish between NULL and empty for lists, strings and data. I consider it bad design. If it is important, it is better to express the difference in separate bool.

> FWIW if we use accessors, I'd want to have the same set of accessors that builders have today, so get, set, etc:

Let's stick to naked public member fields for now. It generates little code and it is the fastest solution.

Can you, please, make API proposal for conversion from reader and to builder?

My attack plan is plan is to start with code that has the right API but is as simple as possible to be correct. Optimizations as phase 2. 

I am still curious about your position for an option to use std types in POCS classes. I understand that you don't want to use it yourself. But these POCS classes infiltrate much deeper into application logic than reader and builder. And they will look weird if the team uses std classes everywhere else. Do you think that kj::Own, kj::String and kj::Vector provide real benefit for POCS classes over std::unique_ptr, std::string and std::vector? I believe that it may help capnproto adoption if it plays more nicely with std code. 

Kind regards
 Brano

Branislav Katreniak

unread,
Mar 31, 2016, 11:00:51 AM3/31/16
to Kenton Varda, Cap'n Proto
I think the outer class itself (which currently behaves only as a namespace) should be the POCS type. This makes the POCS interface very natural to use, which is its goal in life after all.

Outer class is very nice place. But I consider this capnp declaration:

struct A {
  bi @0 : B.I;
  interface I {
  }
}

struct B {
  ai @0 : A.I;
  interface I {
  }
}

How to compile this into C++ classes? Class A declaration needs B.I fully declared before. Class B declaration needs A.I fully declared before. But there is no way to declare nested class in C++ before parent class declaration. My interpretation is that outer class must stay only as namespace. That leads to question how to name the POCS class.

Kind regards
 Brano


Kenton Varda

unread,
Apr 1, 2016, 7:06:50 PM4/1/16
to Branislav Katreniak, Cap'n Proto
On Mon, Mar 28, 2016 at 1:12 PM, Branislav Katreniak <katr...@gmail.com> wrote:
Hi Kenton

I would not want methods that return references, since returning a reference defeats a lot of the purpose of accessors

I fully agree with this. But is there any difference between returning reference to member variable and publishing member variable directly? I personally have no experience with proxy types. I prefer to choose method returning reference against proxy type just because there is less C++ dark magic involved.

I love your point that byte order and XORing default value is best handled at conversion to / from reader and builder. 

The proposal is nice. I don't really understand ANY pointers in capnproto yet, so these parts I will sink into me later. But I have a proposal for unions
* UnionEnum which() const - like Reader
* bool isFoo() const - like Reader
* FooType& foo() - throws if which is not FOO. Returns reference to foo.
* const FooType& foo() const - throws if which is not FOO. Returns const reference to foo.
* FooType& initFoo() - sets which to foo, returns reference.

I really dislike to act differently on const and non-const object. That is hard to think about.

Let's focus on non-unions for now while we keep thinking about this.

I have some crazy ideas forming for how we could make proxies work for unions that I'd like to play with. It occurs to me that my worry about struct members was not correct since a struct field would be Own<T> anyway.
 
> I'd say programs should avoid distinguishing between null and empty.

I would like to not give an option to distinguish between NULL and empty for lists, strings and data. I consider it bad design. If it is important, it is better to express the difference in separate bool.

Well, using null is common as a way to implement Maybe<T>. Adding a separate boolean is not great because it can be inconsistent -- I would actually recommend using a two-member union instead. But either of these approaches adds overhead, making relying on null pointers attractive.

Arguably, we should extend the language to support an explicit Maybe(T) type which we could then translate into kj::Maybe. Existing protocols which rely on null would need to transition over to Maybe(T), but it would be a backwards-compatible change.

> FWIW if we use accessors, I'd want to have the same set of accessors that builders have today, so get, set, etc:

Let's stick to naked public member fields for now. It generates little code and it is the fastest solution.

Can you, please, make API proposal for conversion from reader and to builder?

Builders should have field setters from POCS types. MessageBuilder should also allow POCS as an input to setRoot().

For Reader/Builder -> POCS, two options:

1) Support copy constructor / assignment operator. However, this is a little weird since the copy would do allocation, and we prefer to avoid implicit allocation in KJ/capnp code.

2) Have a asNative() or asPocs() method on Reader/Builder which creates POCS objects. Requires more tying but makes allocation explicit.
 
My attack plan is plan is to start with code that has the right API but is as simple as possible to be correct. Optimizations as phase 2. 

I am still curious about your position for an option to use std types in POCS classes. I understand that you don't want to use it yourself. But these POCS classes infiltrate much deeper into application logic than reader and builder. And they will look weird if the team uses std classes everywhere else. Do you think that kj::Own, kj::String and kj::Vector provide real benefit for POCS classes over std::unique_ptr, std::string and std::vector? I believe that it may help capnproto adoption if it plays more nicely with std code. 

I think there is quite a lot wrong with std::string, std::vector, and std::unique_ptr, which is why I wrote my own versions. For example:
- std::string is designed to support reference counting and copy-on-write, which is now broadly understood to have been a mistake, but it's impossible to eliminate the weird specification quirks now.
- std::unique_ptr uses template polymorphism to support custom allocators. It should have used virtual method dispatch polymorphism. Template polymorphism means your code must always declare exactly which allocator is allowed or be templatized itself. This has negative implications for our POCS types: it would be neat to implement an optimization in which Data, Text, and List(Primitive) fields inside the message could actually point back to the original input rather than make a copy, but that requires control over how they are deallocated.

Actually, thinking about it, I think that our POCS types should use neither std nor KJ types. Instead, we should have special types which we control:

- Pointer<T> for structs (not kj::Own).
- List<T> for lists (not kj::Vector).
- Text and Data for blobs (not kj::String nor kj::Array).

This way, we can potentially customize these interfaces if desired to support features like:
- Zero-copy blob references (pointers into the original message reader).
- Lazy parsing, i.e. only parse a sub-struct when first accessed. (We probably don't actually want this, but it's nice to be able to change our minds later.)
- Null pointer comparisons.
- Implicit conversions to appropriate std types for convenience (e.g. the way Text::Reader today can implicitly convert to std::string).
Ugh.

This is probably exceedingly rare in practice. It would be sad to make the API harder for everyone just to cover this one obscure case.

What if we ignore this problem for now, but plan that if it comes up for real in the future, we will make the code generator resolve the cycle by injecting a proxy type?

-Kenton

Lee Clagett

unread,
Apr 2, 2016, 12:45:37 AM4/2/16
to capn...@googlegroups.com
On Fri, 1 Apr 2016 16:06:25 -0700
Kenton Varda <ken...@sandstorm.io> wrote:
> On Mon, Mar 28, 2016 at 1:12 PM, Branislav Katreniak
> <katr...@gmail.com> wrote:
>
> > Hi Kenton
> >
[...]
> > My attack plan is plan is to start with code that has the right API
> > but is as simple as possible to be correct. Optimizations as phase
> > 2.
> >
> > I am still curious about your position for an option to use std
> > types in POCS classes. I understand that you don't want to use it
> > yourself. But these POCS classes infiltrate much deeper into
> > application logic than reader and builder. And they will look weird
> > if the team uses std classes everywhere else. Do you think that
> > kj::Own, kj::String and kj::Vector provide real benefit for POCS
> > classes over std::unique_ptr, std::string and std::vector? I
> > believe that it may help capnproto adoption if it plays more nicely
> > with std code.
>
> I think there is quite a lot wrong with std::string, std::vector, and
> std::unique_ptr, which is why I wrote my own versions. For example:
> - std::string is designed to support reference counting and
> copy-on-write, which is now broadly understood to have been a
> mistake, but it's impossible to eliminate the weird specification
> quirks now.

std::string reference counted and COW implementations are not permitted
in C++11 or newer. Gcc 5.1 has an ABI breakage for this and std::list
complexity changes [0]. Capnproto supports the Gcc 4.x versions, so
the non-consistency does stink.

> - std::unique_ptr uses template polymorphism to support custom
> allocators. It should have used virtual method dispatch polymorphism.
> Template polymorphism means your code must always declare exactly
> which allocator is allowed or be templatized itself. This has
> negative implications for our POCS types: it would be neat to
> implement an optimization in which Data, Text, and List(Primitive)
> fields inside the message could actually point back to the original
> input rather than make a copy, but that requires control over how
> they are deallocated.
>

The templated deleter allows for the empty-base-class optimization.
unique_ptr with the default deleter has identical size requirements to
a raw pointer. Any type-erased deleter must have storage, and therefore
will always take up more space. unique_ptr delete can be made a
polymorphic with a simple two-liner:

template<typename T>
using unique_poly_ptr = std::unique_ptr<T, std::function<void(T*)>>;

Although the move constructor of std::function is _not_ `noexcept`, so
resource leaks are possible without a wrapper that uses swap. And a
`make_poly_ptr` function would help prevent resource leaks in case the
std::function initially throws on construction. Halfway to writing an
entirely new unique_ptr anyway. Would be dead-simple if std::function
had `noexcept` moves.

Lee

[0]https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html

Branislav Katreniak

unread,
Apr 6, 2016, 4:16:30 AM4/6/16
to Kenton Varda, Cap'n Proto
Arguably, we should extend the language to support an explicit Maybe(T) type which we could then translate into kj::Maybe. Existing protocols which rely on null would need to transition over to Maybe(T), but it would be a backwards-compatible change.

It would be great to make it explicit in IDL that given Text / Data can be optional. If not set, empty text / data can be serialized as null pointer.

If I read the code correctly, kj::Maybe(T) allocates T on heap, forcing extra pointer lookup.

- List<T> for lists (not kj::Vector).

As of now, List<T> cannot be used with forward declared T.

- Text and Data for blobs (not kj::String nor kj::Array).

As of now, it is possible to just make capnp::Text subclass of kj::String and capnp::Data subclass of kj::Array and it works.
 
Outer class is very nice place. But I consider this capnp declaration:

struct A {
  bi @0 : B.I;
  interface I {
  }
}

struct B {
  ai @0 : A.I;
  interface I {
  }
}

How to compile this into C++ classes? Class A declaration needs B.I fully declared before. Class B declaration needs A.I fully declared before. But there is no way to declare nested class in C++ before parent class declaration. My interpretation is that outer class must stay only as namespace. That leads to question how to name the POCS class.

Ugh.

This is probably exceedingly rare in practice. It would be sad to make the API harder for everyone just to cover this one obscure case.

What if we ignore this problem for now, but plan that if it comes up for real in the future, we will make the code generator resolve the cycle by injecting a proxy type?

It is not that easy to ignore this problem now. It comes at many places, I highlighted here just one case. 

POCS type needs for its declaration
* forward declaration of Foo to declare member of struct type Foo
* full declaration of Foo::Client to declare member of interface type Foo
* forward declaration of enum Foo to declare member of enum type Foo

This is also problem.

struct A {
  bx @0 : B.X;
}

struct B {
  ax @0 : A.X;
  struct X {
  }
}


What does it mean to ignore this problem? Do we compile only POCS classes that use only previously declared types? Everything else will use proxy type?

Kind regards
 Brano

Kenton Varda

unread,
Apr 8, 2016, 3:56:18 PM4/8/16
to Branislav Katreniak, Cap'n Proto
On Wed, Apr 6, 2016 at 1:16 AM, Branislav Katreniak <katr...@gmail.com> wrote:
If I read the code correctly, kj::Maybe(T) allocates T on heap, forcing extra pointer lookup.

No, it doesn't. It uses placement-new.

- List<T> for lists (not kj::Vector).

As of now, List<T> cannot be used with forward declared T.

Is this solved by specifying explicitly List<T, Kind::STRUCT>? We could easily do that in generated code.

- Text and Data for blobs (not kj::String nor kj::Array).

As of now, it is possible to just make capnp::Text subclass of kj::String and capnp::Data subclass of kj::Array and it works.

That seems reasonable.

What does it mean to ignore this problem? Do we compile only POCS classes that use only previously declared types? Everything else will use proxy type?

Never mind. Better solution to the problem: Declare all classes at the top level, then make the inner type names be typedefs. This is what protobuf does, actually.

    struct A;
    struct A_X;
    ...

    struct A {
      typedef A_X X;
      ...
    }

-Kenton

raven...@gmail.com

unread,
Apr 9, 2016, 3:36:39 AM4/9/16
to Cap'n Proto
I would be pleased to see a POCS implementation (reading the Cap'n docs I was a bit dismayed by the lack of ability to practically use it to define a schema for your "live" data - it seems like that way you're saving serialize/deserialize time only hypothetically, while in practice you're just making it so you're serializing your data into a Cap'n message manually every time you approach a wire / process boundary / storage medium)

On the other hand, POCS support makes it more likely that you end up doing actual serialize-deserialize operations again, and weakens Cap'n's claim of being infinitely faster. :)

I'm mostly chiming in here, though, to suggest that if you're going to be generating code for POCS conversions, it would be nice to make this something you enable explicitly per file or per message. One of the selling points of Cap'n is the faster build times and reduced code vs. Protobuf, and this change reduces that advantage. If you make it only do POCS based on a flag or annotation of some kind, it makes it possible to keep, eg. RPC messages in their pure builder/reader implementation, but also have POCS translators for those structs that you plan to use for internal state. (Also, perhaps specified per-language - one might want a native struct in C++, but only need the builder/reader model on the Python side or whatever.)

For bonus points it would be nice if it was possible to use per-field annotations to specify special alternative conversions. It's always frustrating with Protobufs to have a map-style list of key-value pairs where the key isn't a primitive type, so it can't be translated into a std::map or similar. (Also, eg. distinguishing intent between std::map or std::multimap.) Being able to specify "include these header files in the generated POCS code, and use these functions for converting this field" would neatly cover all such annoying cases. Even better, if it was implemented with this support then the standard primitive POCS conversion could be built upon this mechanism, with the standard conversion functions just being the default values for the conversion annotations.

Branislav Katreniak

unread,
Apr 11, 2016, 7:02:54 AM4/11/16
to Kenton Varda, Cap'n Proto

Looking at where these POCS types are heading, I would like to step back in this discussion. Thinking about my requirements, I don't really need POCS classes. All I need is classes usable for mutable state.

I see two problems why current code is not really usable for mutable:
1. arena allocations are leaked when memory is released
2. builders have no way to effectively resize lists. Lists need to be extended with concept of capacity

The 1st problem can be solved by using introducing special MallocArena that uses malloc for each allocation. Builders don't own the memory they point to, but capnp already has a concept to own memory for builders and readers: Orhan. Orhans cannot outlive their arena, but MallocArena never goes of scope. MallocArena introduces complications because it allocates from big address space, but that should be workable. Special class and support methods for Orhan in MallocArena can be introduced.

The 2nd problem requires  tweaks in list layout. It is possible to restrict this new layout for MallocArena allocated builders and to let resize operations assert / always realloc for non MallocArena builders. But it is possible to push this to all builders the moment when the first reallocation happens.. It allows optimizations where replacing string can be done in place. Actually List of struct (list pointer block C set to 7) can support capacity by storing capacity in list pointer block D and real size in content prefix "tag".

Using these classes for purely mutable state will not be as fast as true POCS classes. But it generates little new code. And the mutable state can be passed to any existing reader and builder without conversion. 

A bit off topic, but I am talking about generated code size ... would it make sense to make struct Builder subclass of struct Reader? The reader methods would be reused in Builder.

Thoughts?

Kind regards
 Brano

Kenton Varda

unread,
Apr 14, 2016, 11:00:52 PM4/14/16
to Branislav Katreniak, Cap'n Proto
On Mon, Apr 11, 2016 at 4:02 AM, Branislav Katreniak <katr...@gmail.com> wrote:

Looking at where these POCS types are heading, I would like to step back in this discussion. Thinking about my requirements, I don't really need POCS classes. All I need is classes usable for mutable state.

I see two problems why current code is not really usable for mutable:
1. arena allocations are leaked when memory is released
2. builders have no way to effectively resize lists. Lists need to be extended with concept of capacity

The 1st problem can be solved by using introducing special MallocArena that uses malloc for each allocation. Builders don't own the memory they point to, but capnp already has a concept to own memory for builders and readers: Orhan. Orhans cannot outlive their arena, but MallocArena never goes of scope. MallocArena introduces complications because it allocates from big address space, but that should be workable. Special class and support methods for Orhan in MallocArena can be introduced.

A simple start to this is to create a MessageBuilder subclass that always allocates the minimum size passed to the allocateSegment() method. This will effectively force every allocation to create a new segment.

You could then implement an optimization where whenever an object is deleted that happens to be the last object in a segment, SegmentBuilder::tryTruncate() is used to free the space. This would be a reasonable optimization to have in general.

Then, as one more optimization: if tryTruncate() truncates the segment to zero-size, perhaps the whole segment can simply be deleted. And perhaps, later, when a new segment is allocated, it can replace a previously-deleted segment instead of being appended to the end.

Now you've solved the arena problem, with only some minor changes that are reasonable optimizations as-is.

However, this approach will suffer from the fact that all pointers will be far pointers, which use more space and are slower to dereference.

I'm not sure there's any better option, though. I really don't want Cap'n Proto to grow a whole internal implementation of malloc() that applies specifically within a message.
 
The 2nd problem requires  tweaks in list layout. It is possible to restrict this new layout for MallocArena allocated builders and to let resize operations assert / always realloc for non MallocArena builders. But it is possible to push this to all builders the moment when the first reallocation happens.. It allows optimizations where replacing string can be done in place. Actually List of struct (list pointer block C set to 7) can support capacity by storing capacity in list pointer block D and real size in content prefix "tag".

Perhaps you could exploit the fact that INLINE_COMPOSITE-type lists separately specify "total words" and "number of elements", where the former could in fact be much larger than is needed by the latter. You could over-allocate space and then increase the element count incrementally.

Note that list builders do not keep track of the locations of the pointer to the list nor the list tag, and I'd rather not add any new fields to this type as it is supposed to be a pass-by-value type. So, you'll need some sort of ResizeableList which contains both the list builder and an AnyPointer::Builder pointing back to the list's pointer. That doesn't seem too bad, though.
 
Using these classes for purely mutable state will not be as fast as true POCS classes. But it generates little new code. And the mutable state can be passed to any existing reader and builder without conversion. 

Note that if this feature is going to be wholly obsoleted by POCS then that strongly argues against implementing it at all. I don't want to add something to the library that turns out to be totally useless a few months later when we implement POCS. And I do think POCS is likely to get implemented within a few months -- I've been itching to do it for a while, and I anticipate having some breathing room in my workload soon.
 
A bit off topic, but I am talking about generated code size ... would it make sense to make struct Builder subclass of struct Reader? The reader methods would be reused in Builder.

This would be a very large refactoring and I'm pretty sure there's a good reason I didn't do it that way in the first place, though I don't know the reason off the top of my head.

Keep in mind that the get() methods of a Builder have slightly different semantics from those of a Reader -- for struct-typed fields, get() will initialize the pointer to be non-null.

-Kenton

Branislav Katreniak

unread,
Apr 15, 2016, 4:07:10 AM4/15/16
to Kenton Varda, Cap'n Proto
However, this approach will suffer from the fact that all pointers will be far pointers, which use more space and are slower to dereference.

I'm not sure there's any better option, though. I really don't want Cap'n Proto to grow a whole internal implementation of malloc() that applies specifically within a message.

Copying from Reader can happen into one segment. Further modifications will lead to one segment per allocation. This looks like good compromise to me. 
 
Note that list builders do not keep track of the locations of the pointer to the list nor the list tag, and I'd rather not add any new fields to this type as it is supposed to be a pass-by-value type. So, you'll need some sort of ResizeableList which contains both the list builder and an AnyPointer::Builder pointing back to the list's pointer. That doesn't seem too bad, though.

It is good idea to separate List and Resizeable list (like kj::Array and kj::Vector). ResizeableList can be limited to case where the list is in its own segment. So segment size easily holds the list capacity.
 
Note that if this feature is going to be wholly obsoleted by POCS then that strongly argues against implementing it at all. I don't want to add something to the library that turns out to be totally useless a few months later when we implement POCS. And I do think POCS is likely to get implemented within a few months -- I've been itching to do it for a while, and I anticipate having some breathing room in my workload soon.

Great! I am not against POCS classes, I like them. But I realized that POCS classes are not good task for me. It is a lot of work and there is a too much design work to create something acceptable by you :)

On the other side I learned my lessons to never count on non existing code from 3rd party. So I am happy to have my own attack plan. Looking at current priorities, I am likely to look on adding Promise into IDL next.

Thank you for your great feedback!

Kind regards
 Brano

Branislav Katreniak

unread,
Jul 7, 2016, 5:29:18 AM7/7/16
to Kenton Varda, Cap'n Proto
Hi Kenton

I would like to forward declare capnp generated client class. AFAIK this is not possible because it is generated as inner class. But this part of your earlier post would solve my problem:

> * Capabilities: T::Client (I wonder if we should move T::Client's members into T and make T::Client be an alias for backwards-compatibility?)

Are you willing to review & merge this change if I implement it?

Kind regards
 Brano

Kenton Varda

unread,
Jul 8, 2016, 1:21:38 AM7/8/16
to Branislav Katreniak, Cap'n Proto
Hi Brano,

Sorry, I'm uncomfortable carrying out that change without more research / experimentation.

A change I would be more comfortable with is to declare T_Client as a top-level class with T::Client then being an alias to T_Client. This is what Protobuf does with nested types. This way you can then forward-declare T_Client.

Thoughts?

-Kenton

Branislav Katreniak

unread,
Jul 8, 2016, 3:09:50 AM7/8/16
to Kenton Varda, Cap'n Proto
Hi Kenton.

That works for me and I like it. Moreover, it will be possible to consistently apply to all nested classes.

I am likely to look at this in August.

Thank you!
 Brano

maier...@gmail.com

unread,
Oct 4, 2018, 12:09:54 PM10/4/18
to Cap'n Proto
Hi everyone,

Is there any news on this topic? Is Anybody working on a PODS implementation?

I'm looking forward to this feature.

Best regads
Thomas

Kenton Varda

unread,
Oct 4, 2018, 3:28:02 PM10/4/18
to maier...@gmail.com, Cap'n Proto
Hi Thomas,

At the moment, no one is actively working on this. It's on my list of features that I really want to build, but I seem to have too many projects and not enough time. :/

-Kenton

Reply all
Reply to author
Forward
0 new messages