Hi, all. I'm looking for a way to have a capnp data structure which has standard value semantics (move, copy, etc), but owns its storage (i.e. is self-contained), which will be deallocated when the object is destroyed. Basically, I want it to act like a kj::Array: I can use it as-is, I can move it around between functions, but when it gets destroyed, its storage goes with it.I've thought about something like std::pair<MallocMessageBuilder, Foo::Builder> where the builder references the MallocMessageBuilder, but doing that requires separate APIs for self-contained builders vs. normal builders which reference storage elsewhere. I can easily create my own solution (one trivial solution would be std::pair<kj::Maybe<MallocMessageBuilder>, Foo::Builder>), but if kj or capnp already has something for this, I might as well use that.
--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
--
I like that design. Having a POCS (which easily copies to/from a builder/reader) which I can use in my frontend code would be more convenient. The frontend isn't very performance sensitive, and it's a bit trickier to convince Qt to play nicely with the builders and readers.
POCS class would definitely solve my problem functionally and I if it was available now, I would likely not even think about this.I must say that I have bad experience with parsing json messages into Variants. That was so many allocations that it became bottleneck. POCS class can be much better, depending and what is in the message. If somebody encodes Hash<Text, Text>, it is a lot of allocations. But I have few reasons against simple POCS class:* it will drop any information that is encoded in newer version of protocol.
* kj::String allocates even small strings on heap. That is costly. Small size optimization in kj::String would solve this.
* POCS means another chunk of code generated per capnproto class.
My idea can be realized much simpler. If arena can track how much memory is used and how much is leaked, it can rebuild message from scratch if it wastes too much memory. That solves the whole problem of leaking. Arena can also start to reuse memory.The remaining problem is to find the right moment when the message can be rebuilt. Arena would need to track all Readers and Builders.
Then it is possible to implement different strategies by providing different MessageBuilder implementation:* ensure (runtime) that no readers exists when builder is active and rebuild message when writer exits.* ref counted arena - either thread safe locked or single threaded without locking. Qt-like implicit sharing (copy on write) is possible.* Sub-Mutable can reference parent arena if if is bigger than some ration of whole arena. Otherwise it makes deep copy.API wise ... can Mutable class be supper set of POCS class? I think so.But this approach really needs extension to List encoding to hold both size and capacity.Is it sane?
I think what you'll end up with is another API that's maybe somewhat easier to manipulate than Builders today but still much harder to manipulate than POCS. I think we will still want the POCS API for maximum ease of use, in which case I am not sure this in-between API adds a lot of value. Meanwhile it sounds pretty complex to implement. I'm also not convinced it would perform better, given the extra bookkeeping needed. E.g. reusing memory would require writing something very much like malloc(), so I wouldn't expect it to perform better than malloc(). Doing GC/compaction adds lots of complexity and may have its own performance issues.
Do you have already ideas for POCS classes design?* What about integration with std classes / or KJ style? Nothrow move constructors? Disabled copy constructor? Using std::string, std::vector for strings and lists?
* How do you plan to preserve "unknown fields"? Reuse capnp encoding for internal representation for everything except pointers? That could lead to high code reuse and fast serialization.
I wonder how capnp is typically used within bigger project without POCS classes. How you use capn only on application edge for RPC and do you hold the state in different set of internal classes? Do you do the memory management explicitly by copying & passing MallocMessageBuilder instances around?
On Mon, Mar 14, 2016 at 1:17 AM, Branislav Katreniak <katr...@gmail.com> wrote:Do you have already ideas for POCS classes design?* What about integration with std classes / or KJ style? Nothrow move constructors? Disabled copy constructor? Using std::string, std::vector for strings and lists?Naturally I'd prefer KJ over std. :) Probably text fields will be kj::String and lists will be kj::Vector.I'm fine with marking move constructors nothrow (though I think std's insistence on this is misguided).I think I would disable the copy constructor, but provide a .clone().* How do you plan to preserve "unknown fields"? Reuse capnp encoding for internal representation for everything except pointers? That could lead to high code reuse and fast serialization.It would be cool to lay out the data fields such that they can be memcpy()d from the serialized format, although I was also thinking it would be nice if you didn't have to go through accessor methods and instead these fields were simply public member variables. I don't think these two things are compatible, due to endianness issues. Will have to think about which is better.
Unknown fields would need to be preserved in separate arrays of bytes and pointers (for the respective sections). I guess preserving data fields is tricky if we don't use the memcpy()able layout since they could be interleaved with the known fields. To preserve a pointer, we'd actually serialize it and its target in "flat" format and store a word array. We should of course optimize for the case that any extra fields are zero/null and so need not be preserved.I wonder how capnp is typically used within bigger project without POCS classes. How you use capn only on application edge for RPC and do you hold the state in different set of internal classes? Do you do the memory management explicitly by copying & passing MallocMessageBuilder instances around?Well, I'm probably personally the largest-scale user of Cap'n Proto (within Sandstorm.io). I use a variety of styles depending on the situation. Simple data (with only a couple fields) is easy to translate into a struct internally. For complicated data I will sometimes copy into a MallocMessageBuilder, yes. Though I'd say most of the time our RPC calls consume their data directly and produce their results directly, without really copying it elsewhere. Still, the API is wonky and can get cumbersome, making me personally excited about the POCS solution.-Kenton
--
It might be possible to do both at once, by making those public member variables special types that used little-endian representation internally, regardless of the host endianness, while providing a native-integer-like interface through operator overloads etc. See e.g. the Boost.Endian arithmetic types.
Then again, assuming we know exactly how the compiler will pack structs is already a pretty big assumption. Can we assume compilers won't actually have a problem with the above?
-Kenton
I'm probably missing the point here (and I admittedly know little about capnproto), but could we simply generate a class with byte arrays for primary storage and then expose the POCS fields as references (or pointers) into those arrays?
Naturally I'd prefer KJ over std. :) Probably text fields will be kj::String and lists will be kj::Vector.
* How do you plan to preserve "unknown fields"? Reuse capnp encoding for internal representation for everything except pointers? That could lead to high code reuse and fast serialization.It would be cool to lay out the data fields such that they can be memcpy()d from the serialized format, although I was also thinking it would be nice if you didn't have to go through accessor methods and instead these fields were simply public member variables. I don't think these two things are compatible, due to endianness issues. Will have to think about which is better.
I guess preserving data fields is tricky if we don't use the memcpy()able layout since they could be interleaved with the known fields.
Getters and setters will make the code the api more consistent with Reader and Builder. The prefixes `set` and `get` are great in generated code, because they move all fields into kind of namespace. No collisions with language keywords, no collisions with generic methods like `clone()`.
What about API for groups? Can they be pure references into parent POCS class without owned memory? If not, I have hard time to imagine how to effectively reuse serialized format for group POCS class. For me, this looks like a show stopper for serialized format in POCS classes.
As I am trying to implement generator for POCS classes, I have few questions.
What is good name for POCS class in source code? The generated code is placed in class that was called "outer" class now. It would be good to come with terminology that can be used also in documentation.
What is api to primitive fields? It is not possible to return reference to primitive type, because it can have different byte order and it may be xored with default value. It needs either setter and getter or proxy type. I start simple with setter and getter. Proxy type can be introduced later.uint32_t getNumber() const;void setNumber(uint32_t);
What is api to string field? Minimal approach that need is intrusive into string classkj::String& getName();const kj::String& getName() const { return _name; }kj::String needs to be extend with null field.
Second approachbool hasName() const;kj::String& getName();const kj::String& getName() const;Non-const getter sets name to non-null. Const getter can return reference to global instance if hasName() == false. This second approach is non-intrusive and works also with std::string.Can we afford to not support NULL flag for strings in POCS class and encode empty string as null string on wire? That simplifies the API and generated code.Do we need an option to set NULL flag?It is easy to add also convenience setter for consistency with builders and primitive types.void setName(kj::String &&);I believe structs and lists can stick to the same API as strings.Groups can be just a view into POCS class without value semantics. Internally it will be pointer / reference to owning POCS instance. If POCS instance is deleted or if group sits in union section that is invalidated, group simply points to invalid memory. We could track group instances from POCS class and clear the pointers to trade speed / nice exceptions instead of crash. But first version can be simply without unions and groups.Any suggestions?Kind regardsBrano
--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
I think the outer class itself (which currently behaves only as a namespace) should be the POCS type. This makes the POCS interface very natural to use, which is its goal in life after all.
> I would not want methods that return references, since returning a reference defeats a lot of the purpose of accessors
I fully agree with this. But is there any difference between returning reference to member variable and publishing member variable directly? I personally have no experience with proxy types. I prefer to choose method returning reference against proxy type just because there is less C++ dark magic involved.I love your point that byte order and XORing default value is best handled at conversion to / from reader and builder.The proposal is nice. I don't really understand ANY pointers in capnproto yet, so these parts I will sink into me later. But I have a proposal for unions:* UnionEnum which() const - like Reader* bool isFoo() const - like Reader* FooType& foo() - throws if which is not FOO. Returns reference to foo.* const FooType& foo() const - throws if which is not FOO. Returns const reference to foo.* FooType& initFoo() - sets which to foo, returns reference.I really dislike to act differently on const and non-const object. That is hard to think about.
> I'd say programs should avoid distinguishing between null and empty.
I would like to not give an option to distinguish between NULL and empty for lists, strings and data. I consider it bad design. If it is important, it is better to express the difference in separate bool.
> FWIW if we use accessors, I'd want to have the same set of accessors that builders have today, so get, set, etc:
Let's stick to naked public member fields for now. It generates little code and it is the fastest solution.Can you, please, make API proposal for conversion from reader and to builder?
My attack plan is plan is to start with code that has the right API but is as simple as possible to be correct. Optimizations as phase 2.I am still curious about your position for an option to use std types in POCS classes. I understand that you don't want to use it yourself. But these POCS classes infiltrate much deeper into application logic than reader and builder. And they will look weird if the team uses std classes everywhere else. Do you think that kj::Own, kj::String and kj::Vector provide real benefit for POCS classes over std::unique_ptr, std::string and std::vector? I believe that it may help capnproto adoption if it plays more nicely with std code.
Arguably, we should extend the language to support an explicit Maybe(T) type which we could then translate into kj::Maybe. Existing protocols which rely on null would need to transition over to Maybe(T), but it would be a backwards-compatible change.
- List<T> for lists (not kj::Vector).
- Text and Data for blobs (not kj::String nor kj::Array).
Outer class is very nice place. But I consider this capnp declaration:struct A {bi @0 : B.I;interface I {}}struct B {ai @0 : A.I;interface I {}}How to compile this into C++ classes? Class A declaration needs B.I fully declared before. Class B declaration needs A.I fully declared before. But there is no way to declare nested class in C++ before parent class declaration. My interpretation is that outer class must stay only as namespace. That leads to question how to name the POCS class.Ugh.This is probably exceedingly rare in practice. It would be sad to make the API harder for everyone just to cover this one obscure case.What if we ignore this problem for now, but plan that if it comes up for real in the future, we will make the code generator resolve the cycle by injecting a proxy type?
If I read the code correctly, kj::Maybe(T) allocates T on heap, forcing extra pointer lookup.
- List<T> for lists (not kj::Vector).As of now, List<T> cannot be used with forward declared T.
- Text and Data for blobs (not kj::String nor kj::Array).As of now, it is possible to just make capnp::Text subclass of kj::String and capnp::Data subclass of kj::Array and it works.
What does it mean to ignore this problem? Do we compile only POCS classes that use only previously declared types? Everything else will use proxy type?
Looking at where these POCS types are heading, I would like to step back in this discussion. Thinking about my requirements, I don't really need POCS classes. All I need is classes usable for mutable state.I see two problems why current code is not really usable for mutable:1. arena allocations are leaked when memory is released2. builders have no way to effectively resize lists. Lists need to be extended with concept of capacityThe 1st problem can be solved by using introducing special MallocArena that uses malloc for each allocation. Builders don't own the memory they point to, but capnp already has a concept to own memory for builders and readers: Orhan. Orhans cannot outlive their arena, but MallocArena never goes of scope. MallocArena introduces complications because it allocates from big address space, but that should be workable. Special class and support methods for Orhan in MallocArena can be introduced.
The 2nd problem requires tweaks in list layout. It is possible to restrict this new layout for MallocArena allocated builders and to let resize operations assert / always realloc for non MallocArena builders. But it is possible to push this to all builders the moment when the first reallocation happens.. It allows optimizations where replacing string can be done in place. Actually List of struct (list pointer block C set to 7) can support capacity by storing capacity in list pointer block D and real size in content prefix "tag".
Using these classes for purely mutable state will not be as fast as true POCS classes. But it generates little new code. And the mutable state can be passed to any existing reader and builder without conversion.
A bit off topic, but I am talking about generated code size ... would it make sense to make struct Builder subclass of struct Reader? The reader methods would be reused in Builder.
However, this approach will suffer from the fact that all pointers will be far pointers, which use more space and are slower to dereference.
I'm not sure there's any better option, though. I really don't want Cap'n Proto to grow a whole internal implementation of malloc() that applies specifically within a message.
Note that list builders do not keep track of the locations of the pointer to the list nor the list tag, and I'd rather not add any new fields to this type as it is supposed to be a pass-by-value type. So, you'll need some sort of ResizeableList which contains both the list builder and an AnyPointer::Builder pointing back to the list's pointer. That doesn't seem too bad, though.
Note that if this feature is going to be wholly obsoleted by POCS then that strongly argues against implementing it at all. I don't want to add something to the library that turns out to be totally useless a few months later when we implement POCS. And I do think POCS is likely to get implemented within a few months -- I've been itching to do it for a while, and I anticipate having some breathing room in my workload soon.