Implicit conversion to/from std::string

895 views
Skip to first unread message

Allan Odgaard

unread,
Aug 17, 2013, 5:18:32 AM8/17/13
to capnproto
According to the documentation Text::Reader / Text::Builder can be
implicitly converted to/from std::string.

In practice though this does not seem to be the case, nor does the
headers indicate it should be

Implicit conversion would be nice, not just to avoid calls to c_str()
and cStr(), but also so that higher level API can be used when
converting containers, for example std::copy to copy a C++ container to
a capnp List<Text> or using the iterator pair constructor of C++
containers when constructing these from List<Text>.

Kenton Varda

unread,
Aug 17, 2013, 1:40:04 PM8/17/13
to Allan Odgaard, capnproto
Eek, sorry, that documentation is outdated.  Looks like I missed a spot when updating it.

The trouble is that I don't actually want to #include <string>, because it's an enormous header and the Cap'n Proto code doesn't otherwise use it.  So what I had done before is had templated constructors and conversion operators that were designed with std::string in mind.  The problem with that is that it tended to confuse the compiler in many other places, causing overloads to unexpectedly become ambiguous, producing ugly compiler errors, etc.  So, I had to give up on that approach.

Unfortunately C++ just doesn't give me the power to make this work correctly without introducing overhead for those users that don't want to use std::string, which happens to include myself.

I'll make a note to fix the docs.

-Kenton




--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+unsubscribe@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto.

Derrick Johnson

unread,
Sep 15, 2013, 7:01:01 PM9/15/13
to capn...@googlegroups.com, Allan Odgaard
Hi Kenton -- Using a compiled-in schema, how can I parse a textual instance of a capnp object?

For example, if I have a std::string that contains a textual instance of a capnp object (not a schema, but an object), how do I turn it into a MyMessage::Reader capnp object?

I see that the schema is available with capnp::Schema::from<MyMessage>(), and I looked at dynamic.h, but it's not apparent what the next step is....  help!

thx

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.

Derrick Johnson

unread,
Sep 15, 2013, 7:26:01 PM9/15/13
to capn...@googlegroups.com, Allan Odgaard
HI - I got your email on the lack of text parser.  Makes sense.  So what is the proper way to parse a _binary_ object from a char * to a MyMessage::Reader, given a precompiled schema for MyMessage?

Kenton Varda

unread,
Sep 15, 2013, 7:38:36 PM9/15/13
to Derrick Johnson, capnproto
For those who didn't see it, this is how I responded to Derrick's question about text parsing when he e-mailed it directly to me:

Hi Derrick,

Right now, the capnp library includes the ability to convert *to* text format, but only the capnp command-line tool has the ability to *parse* it.

Partially this limitation exists because the parsing code actually uses the schema parser, since the text format is actually the format used to specify constants and default values in the schema language, so reusing that parser was easy.  The schema parser is pretty heavy and not written in a way that would make it convenient to parse text format based on a binary schema.  So, creating a proper runtime text format parser library would take some work.

At the same time, I'm not sure if it's the right thing to do philosophically.  With Protobufs, I noticed a lot of people tended to abuse runtime text-format parsing in cases where the right thing to do would have been to convert the text format message to binary ahead-of-time (as you can do with the capnp too) and then load the binary message at runtime.  Or, worse, people would send machine-to-machine messages in text format, even when no human was involved (text format makes no sense if neither the producer nor the consumer is human).  These people would then tend to demand that the text format parser do things like ignore unknown fields, which is just not a good idea when interacting with humans that can easily make typos.

So, I'm debating whether or not to ever support this.

On the other hand, at some point there will be a JSON transcoder, and since JSON is (unfortunately, IMO) commonly used machine-to-machine, it will definitely support runtime parsing and ignore unknown fields.

All that said, I'd like to hear your opinion on this, even if (actually, especially if) you disagree.  What do you think?

-Kenton

Kenton Varda

unread,
Sep 15, 2013, 7:41:09 PM9/15/13
to Derrick Johnson, capnproto
On Sun, Sep 15, 2013 at 4:26 PM, Derrick Johnson <badma...@gmail.com> wrote:
HI - I got your email on the lack of text parser.  Makes sense.  So what is the proper way to parse a _binary_ object from a char * to a MyMessage::Reader, given a precompiled schema for MyMessage?

Use the APIs in capnp/serialize.h, per the docs.  If your char* array is aligned to an 8-byte boundary, you can use FlatArrayMessageReader to read it without copying.  Otherwise you'll need to wrap it in a kj::ArrayInputStream and use InputStreamMessageReader.

Geoffrey Romer

unread,
Sep 16, 2013, 10:59:10 AM9/16/13
to Kenton Varda, Derrick Johnson, capnproto
On Sun, Sep 15, 2013 at 4:38 PM, Kenton Varda <temp...@gmail.com> wrote:
For those who didn't see it, this is how I responded to Derrick's question about text parsing when he e-mailed it directly to me:

Hi Derrick,

Right now, the capnp library includes the ability to convert *to* text format, but only the capnp command-line tool has the ability to *parse* it.

Partially this limitation exists because the parsing code actually uses the schema parser, since the text format is actually the format used to specify constants and default values in the schema language, so reusing that parser was easy.  The schema parser is pretty heavy and not written in a way that would make it convenient to parse text format based on a binary schema.  So, creating a proper runtime text format parser library would take some work.

At the same time, I'm not sure if it's the right thing to do philosophically.  With Protobufs, I noticed a lot of people tended to abuse runtime text-format parsing in cases where the right thing to do would have been to convert the text format message to binary ahead-of-time (as you can do with the capnp too) and then load the binary message at runtime.  Or, worse, people would send machine-to-machine messages in text format, even when no human was involved (text format makes no sense if neither the producer nor the consumer is human).  These people would then tend to demand that the text format parser do things like ignore unknown fields, which is just not a good idea when interacting with humans that can easily make typos.

So, I'm debating whether or not to ever support this.

On the other hand, at some point there will be a JSON transcoder, and since JSON is (unfortunately, IMO) commonly used machine-to-machine, it will definitely support runtime parsing and ignore unknown fields.

All that said, I'd like to hear your opinion on this, even if (actually, especially if) you disagree.  What do you think?

I once spent a couple months cleaning up an inappropriate use of text-format protos (after it brought down my service while I was on call), so I certainly sympathize with the motivation. However, in my experience the text format is invaluable for unit testing, since it lets you specify the test inputs concisely, in-line in the test code. You can accomplish the same thing by setting individual fields one by one via the message API, but that's tedious and annoying, and the result is harder to read. I'm in favor of just about anything that decreases the friction of writing tests, so I think it's worth providing a text format parser for that purpose.

One qualification: it's arguably possible that a sufficiently clean fluent-style builder API could allow you to populate a message via that API in a way that's just as concise and readable as the text format. I don't have any hands-on experience with Cap'n Proto, so I'm not sure if its Builders would fit the bill (the example code at the top of http://kentonv.github.io/capnproto/cxx.html is not encouraging).

Kenton Varda

unread,
Sep 16, 2013, 12:27:36 PM9/16/13
to Geoffrey Romer, Derrick Johnson, capnproto
On Mon, Sep 16, 2013 at 7:59 AM, Geoffrey Romer <gro...@google.com> wrote:
I once spent a couple months cleaning up an inappropriate use of text-format protos (after it brought down my service while I was on call), so I certainly sympathize with the motivation. However, in my experience the text format is invaluable for unit testing, since it lets you specify the test inputs concisely, in-line in the test code. You can accomplish the same thing by setting individual fields one by one via the message API, but that's tedious and annoying, and the result is harder to read. I'm in favor of just about anything that decreases the friction of writing tests, so I think it's worth providing a text format parser for that purpose.

That's a really good point.  Or rather, I'd say that this was by far the least-bad way of testing protobufs -- I always hated the fact that errors in my test string weren't detected until runtime, but the alternatives were far, far worse.
 
One qualification: it's arguably possible that a sufficiently clean fluent-style builder API could allow you to populate a message via that API in a way that's just as concise and readable as the text format. I don't have any hands-on experience with Cap'n Proto, so I'm not sure if its Builders would fit the bill (the example code at the top of http://kentonv.github.io/capnproto/cxx.html is not encouraging).

You may be on to something here.  The builder interfaces cannot be very literate because of the allocation constraints (you can't just allocate a Cap'n Proto object from thin air; you have to specify what message to put it in).  Perhaps, though, this is an important enough use case to generate code specifically for it.  To avoid bloating builds too much, the code could live in a separate file included only by tests.

I wish C++ supported C99's named aggregate initializer syntax, since this would be a perfect use case for those.  You could write:

  EXPECT_CAPNP_EQ({ .foo = 123, .bar = "abc", .baz = { .qux = {456, 789} } }, myMessage);

Alas, it is not to be.

The best syntax I can come up with requires specifying the type name, which would have to have a suffix like "Init" to distinguish from the main types:

  EXPECT_CAPNP_EQ(MyTypeInit().foo(123).bar("abc").baz(BazTypeInit().qux({456, 789})), myMessage);

Thoughts?

-Kenton

Kenton Varda

unread,
Sep 16, 2013, 2:19:36 PM9/16/13
to Geoffrey Romer, Derrick Johnson, capnproto
On Mon, Sep 16, 2013 at 9:27 AM, Kenton Varda <temp...@gmail.com> wrote:
I wish C++ supported C99's named aggregate initializer syntax, since this would be a perfect use case for those.  You could write:

  EXPECT_CAPNP_EQ({ .foo = 123, .bar = "abc", .baz = { .qux = {456, 789} } }, myMessage);

Sadness:  Clang actually supports this syntax in C++, but GCC does not.  If they both did, I would just go ahead and use it.

If the struct in question had its fields declared in order by field number, then you could use regular (non-designated) struct initializers and at least not have to worry about your code silently breaking due to a change in field ordering.  Readability would suffer from the lack of names, though you can always insert comments for that purpose.  Meanwhile, supporting that much would be pretty cheap in terms of generated code, and then people comfortable with requiring Clang (= me, in my next project) will be able to use designated initializers and get the best possible syntax.

Actually, this approach could be a nice "easy-mode" Cap'n Proto API where you basically just have structs allocated on the heap and you do a copy on read/write.  For people who really don't care about speed and just want to bash something out, it might be nice.

-Kenton

Geoffrey Romer

unread,
Sep 17, 2013, 10:49:51 AM9/17/13
to Kenton Varda, Derrick Johnson, capnproto
On Mon, Sep 16, 2013 at 9:27 AM, Kenton Varda <temp...@gmail.com> wrote:
On Mon, Sep 16, 2013 at 7:59 AM, Geoffrey Romer <gro...@google.com> wrote:
I once spent a couple months cleaning up an inappropriate use of text-format protos (after it brought down my service while I was on call), so I certainly sympathize with the motivation. However, in my experience the text format is invaluable for unit testing, since it lets you specify the test inputs concisely, in-line in the test code. You can accomplish the same thing by setting individual fields one by one via the message API, but that's tedious and annoying, and the result is harder to read. I'm in favor of just about anything that decreases the friction of writing tests, so I think it's worth providing a text format parser for that purpose.

That's a really good point.  Or rather, I'd say that this was by far the least-bad way of testing protobufs -- I always hated the fact that errors in my test string weren't detected until runtime, but the alternatives were far, far worse.

Yes, getting build-time error checking would be a substantial improvement. Doubly so since the parse errors are in terms of the location in the string literal, not the location in the source file (there are macro hacks you can use to mitigate that, but yuck).
 
 
One qualification: it's arguably possible that a sufficiently clean fluent-style builder API could allow you to populate a message via that API in a way that's just as concise and readable as the text format. I don't have any hands-on experience with Cap'n Proto, so I'm not sure if its Builders would fit the bill (the example code at the top of http://kentonv.github.io/capnproto/cxx.html is not encouraging).

You may be on to something here.  The builder interfaces cannot be very literate because of the allocation constraints (you can't just allocate a Cap'n Proto object from thin air; you have to specify what message to put it in).  Perhaps, though, this is an important enough use case to generate code specifically for it.  To avoid bloating builds too much, the code could live in a separate file included only by tests.

I'm ambivalent about this. Having a special syntax that's only used in tests has its own readability costs, since familiarity with the ordinary Builder API doesn't carry over to the test code (and vice-versa). On the other hand, I kind of suspect that people will use the "test" initialization syntax in production code as well, since programmers tend to unconsciously optimize for their own convenience. That mitigates the readability concern, but is arguably a problem in itself. So long as this only happens in local code, it's probably not terrible (unless they misattribute performance problems to Cap'n Proto, rather than to their choice to use the less-efficient API); what would be more worrisome is if people start passing around Init types in APIs, thereby locking each other into the inefficient initialization API.

 

I wish C++ supported C99's named aggregate initializer syntax, since this would be a perfect use case for those.

+1. Convenient syntax for structured literals is a seriously underappreciated language feature.
 
 You could write:

  EXPECT_CAPNP_EQ({ .foo = 123, .bar = "abc", .baz = { .qux = {456, 789} } }, myMessage);

Alas, it is not to be.

The best syntax I can come up with requires specifying the type name, which would have to have a suffix like "Init" to distinguish from the main types:

  EXPECT_CAPNP_EQ(MyTypeInit().foo(123).bar("abc").baz(BazTypeInit().qux({456, 789})), myMessage);

Thoughts?

"Init" somehow doesn't convey the right meaning to me. Maybe "Value"? In any event, I suggest using "MyType" as a namespace, not just a prefix (i.e. MyType::Init or MyType::Value).

Unserious alternative: make MyType itself a proper type rather than just a namespace, and give these semantics to it.
Reply all
Reply to author
Forward
0 new messages