C++'s most vexing parse strikes again

Michi Henning

unread,

Sep 5, 2013, 2:28:20 AM9/5/13

to capn...@googlegroups.com

<rant>

The following compiles just fine:

kj::ArrayPtr<kj::byte const> p(static_cast<kj::byte const*>(data), size);
capnp::StreamFdMessageReader message(0);
message.getRoot<capnproto::Request>();

Not very sensible, but it compiles.

The following does not compile:

kj::ArrayPtr<kj::byte const> p(static_cast<kj::byte const*>(data), size);
capnp::InputStreamMessageReader message(kj::ArrayInputStream(p));
message.getRoot<capnproto::Request>();

The most helpful error message from gcc is:

error: request for member ‘getRoot’ in ‘message’, which is of non-class type ‘capnp::InputStreamMessageReader(kj::ArrayInputStream)’

Of course, I have to write it like so, and then it works just fine:

kj::ArrayPtr<kj::byte const> p(static_cast<kj::byte const*>(data), size);
kj::ArrayInputStream is(p);
capnp::InputStreamMessageReader message(is);
message.getRoot<capnproto::Request>();

Interestingly, the usual trick of using extra parenthesis doesn't work:

capnp::InputStreamMessageReader message( (kj::ArrayInputStream(p)) );

Now gcc complains that it can't find a matching InputStreamReader constructor, because that constructor requires three arguments, but I'm providing only one. The fact that last two arguments of InputStreamReader constructor are defaulted doesn't count:

error: no matching function for call to ‘capnp::InputStreamMessageReader::InputStreamMessageReader(kj::ArrayInputStream)’
ObjectAdapter.cpp:433:84: note: candidate is:
In file included from ObjectAdapter.cpp:21:0:
/usr/include/capnp/serialize.h:77:3: note: capnp::InputStreamMessageReader::InputStreamMessageReader(kj::InputStream&, capnp::ReaderOptions, kj::ArrayPtr<capnp::word>)
/usr/include/capnp/serialize.h:77:3: note: no known conversion for argument 1 from ‘kj::ArrayInputStream’ to ‘kj::InputStream&’

Man, there are still days when I hate this language :-(

</rant>

Thanks for listening :-)

Michi.

Stanislav Ivochkin

unread,

Sep 5, 2013, 5:51:39 AM9/5/13

to Michi Henning, capnproto

2013/9/5 Michi Henning <michij....@gmail.com>

The following does not compile:

kj::ArrayPtr<kj::byte const> p(static_cast<kj::byte const*>(data), size);
capnp::InputStreamMessageReader message(kj::ArrayInputStream(p));
message.getRoot<capnproto::Request>();

The most helpful error message from gcc is:

error: request for member ‘getRoot’ in ‘message’, which is of non-class type ‘capnp::InputStreamMessageReader(kj::ArrayInputStream)’

Of course, I have to write it like so, and then it works just fine:

kj::ArrayPtr<kj::byte const> p(static_cast<kj::byte const*>(data), size);
kj::ArrayInputStream is(p);
capnp::InputStreamMessageReader message(is);
message.getRoot<capnproto::Request>();

Herb Sutter mentioned the c++11 solution to "C++ most vexing parse" problem in his GotW#1: http://herbsutter.com/2013/05/09/gotw-1-solution/ and in GotW#94 too: http://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-auto/.

--

Regards,

Stas.

Michi Henning

unread,

Sep 5, 2013, 6:25:39 AM9/5/13

to Stanislav Ivochkin, Michi Henning, capnproto

On 05/09/2013, at 19:51 , Stanislav Ivochkin <i...@extrn.org> wrote:

Herb Sutter mentioned the c++11 solution to "C++ most vexing parse" problem in his GotW#1: http://herbsutter.com/2013/05/09/gotw-1-solution/ and in GotW#94 too: http://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-auto/.

Thanks for that Stas!

I've known about this ever since Scott first wrote about it. But that doesn't mean that the vexing parse can't still turn around and bite me occasionally :-)

One of the problems I have with C++ is that, after more than twenty years of coding in C++ pretty much every day, I still have to spend a significant percentage of my wetware cycles watching my own back. After all this time, it's still awesomely easy to shoot myself in the foot.

When K&R did C, the first time they found that they needed look at the symbol table to decide how to make progress in the parser, they should have stopped and said "hang on, maybe that's not such a good idea." The advantages of context-free grammars were well understood by then, and C's declaration syntax is probably the single most-maligned (mis-)feature of the language.

Water under the bridge, I know…

Cheers,

Michi.

Kenton Varda

unread,

Sep 5, 2013, 2:59:06 PM9/5/13

to Michi Henning, capnproto

For this, I highly recommend replacing GCC with Clang. Its error messages are much nicer. For instance, your code would produce this:

warning: parentheses were disambiguated as a function declaration [-Wvexing-parse]

BTW, you really do need to construct ArrayInputStream explicitly on the stack here. It has to outlive the InputStreamMessageReader.

Then again, you probably should be using FlatArrayMessageReader instead. That way you avoid a copy. ArrayInputStream is actually rarely useful in Cap'n Proto. It's mainly there for testing.

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto.

Michi Henning

unread,

Sep 5, 2013, 11:06:22 PM9/5/13

to capn...@googlegroups.com, Michi Henning

On Friday, September 6, 2013 4:59:06 AM UTC+10, Kenton Varda wrote:

For this, I highly recommend replacing GCC with Clang. Its error messages are much nicer. For instance, your code would produce this:

warning: parentheses were disambiguated as a function declaration [-Wvexing-parse]

Yes, need to get around to adding clang to my build env.

BTW, you really do need to construct ArrayInputStream explicitly on the stack here. It has to outlive the InputStreamMessageReader.

OK, yes, that makes sense, thanks! Would it make sense to add a move constructor to InputStreamMessageReader and similar?

Then again, you probably should be using FlatArrayMessageReader instead. That way you avoid a copy. ArrayInputStream is actually rarely useful in Cap'n Proto. It's mainly there for testing.

Ah, OK, I didn't know that, thanks! The doc is still a little thin on the ground on what to use when (no accusation here!).

ZeroMQ is not terribly outspoken about what alignment guarantee it provides for its data pointer. From browsing around, it looks like it's guaranteed to be on a 64-bit boundary on a 64-bit machine, but only a 32-bit boundary on a 32-bit machine. I guess this means that, on 32-bit, I'd have to check the alignment of the data pointer I get and, if it isn't on a 64-bit boundary, pay the price of the copy.

Michi.

Michi Henning

unread,

Sep 6, 2013, 4:12:00 AM9/6/13

to capn...@googlegroups.com, Michi Henning

On Friday, September 6, 2013 1:06:22 PM UTC+10, Michi Henning wrote:

ZeroMQ is not terribly outspoken about what alignment guarantee it provides for its data pointer. From browsing around, it looks like it's guaranteed to be on a 64-bit boundary on a 64-bit machine, but only a 32-bit boundary on a 32-bit machine. I guess this means that, on 32-bit, I'd have to check the alignment of the data pointer I get and, if it isn't on a 64-bit boundary, pay the price of the copy.

No such luck. The data buffer pointer I get from ZeroMQ is byte-aligned only :-( So, a copy it is...

But now I've run into a different problem. Here it is in a nutshell, having whittled it down to the bare bones:

capnp::MallocMessageBuilder b;
auto request = b.initRoot<capnproto::Request>();
request.setMode(capnproto::RequestMode::TWOWAY);
request.setId("id");
request.setOpName("operation_name");
auto segments = b.getSegmentsForOutput();

When I look at the segments, there is exactly one, with 8 words in it, so that's a message of 64 bytes.

However, when I continue with

cappnp::writeMessageToFd(some_fd, b);

I find that 72 bytes are written.

On the receiving end, unmarshaling fails with the 64-byte message but, if I take the contents of the file produced by writeMessageToFd() and write that to my socket instead, the receiving FlatArrayMessageReader works fine, and decodes the parameters correctly.

I have a few questions:

- Shouldn't getSegmentsForOutput() return an array list with the first element containing 9 words instead of 8?

- How do I get a buffer that I can pass to my send() method from a MallocMessageBuilder?

Clearly, there is something I'm missing. I had a look at the code for FdOutputStream, and it just writes the pieces one after the other, as you would expect. But those pieces are arrays of bytes, not arrays of words, as returned by getSegmentsForOutput. But a sledgehammer cast from word* to byte* causes the bytes to be written in a different order from what is written by writeMessageToFd().

So, it seems that once I have a MallocMessageBuilder, there is no obvious way to get a buffer out of it for passing to some send() method.

Sorry for the stupid questions, maybe I'm just missing one crucial bit that's bleedingly obvious in the doc...

Thanks,

Michi.

Kenton Varda

unread,

Sep 6, 2013, 2:35:00 PM9/6/13

to Michi Henning, capnproto

On Thu, Sep 5, 2013 at 8:06 PM, Michi Henning <michij....@gmail.com> wrote:

Would it make sense to add a move constructor to InputStreamMessageReader and similar?

Can't -- the struct Reader objects contain pointers back to some of MessageReader's internal state. And anyway, it's generally considered bad form for a class with a vtable to be movable or copyable. You can always allocate it on the heap (preferably using kj::heap<T>() to get a kj::Own<T>, defined in <kj/memory.h>).

ZeroMQ is not terribly outspoken about what alignment guarantee it provides for its data pointer. From browsing around, it looks like it's guaranteed to be on a 64-bit boundary on a 64-bit machine, but only a 32-bit boundary on a 32-bit machine. I guess this means that, on 32-bit, I'd have to check the alignment of the data pointer I get and, if it isn't on a 64-bit boundary, pay the price of the copy.

I think 32-bit alignment is OK on 32-bit systems. At least, I've noticed that my message segments sometimes end up 32-bit aligned (but not 64-bit aligned) in 32-bit builds, which either means that malloc() sometimes returns 32-bit aligned buffers or the compiler doesn't enforce 64-bit alignment on the stack (I haven't tracked down which). In any case, it doesn't appear to cause a problem.

No such luck. The data buffer pointer I get from ZeroMQ is byte-aligned only :-( So, a copy it is...

Doh. That is a problem.

But now I've run into a different problem. Here it is in a nutshell, having whittled it down to the bare bones:

capnp::MallocMessageBuilder b;
auto request = b.initRoot<capnproto::Request>();
request.setMode(capnproto::RequestMode::TWOWAY);
request.setId("id");
request.setOpName("operation_name");
auto segments = b.getSegmentsForOutput();

When I look at the segments, there is exactly one, with 8 words in it, so that's a message of 64 bytes.

However, when I continue with

cappnp::writeMessageToFd(some_fd, b);

I find that 72 bytes are written.

Yes, the issue here is that the segments are not self-delimiting, so a table of segment sizes must be written at the start. In your case this table is 8 bytes. This is all implemented in capnp/serialize.c++ -- see writeMessage() in particular. Without the table, it's not possible to determine the segment boundaries on the receiving end.

Does 0mq require a message to be in contiguous memory, or does it have a "writev"-like interface? If it has writev, then you can implement your own kj::OutputStream that only implements the array-of-array-of-bytes form of write() and pass that to capnp::writeMessage() -- it will only make a single call to this method. If 0mq doesn't offer a writev interface then you'll need to call capnp::messageToFlatArray().

Of course, you can always invent your own framing format. As long as you can produce an ArrayPtr<const ArrayPtr<const word>> on the receiving end that matches exactly what getSegmentsForOutput() returned on the sending end, you can use SegmentArrayMessageReader and skip serialize.h entirely.

-Kenton

Michi Henning

unread,

Sep 6, 2013, 5:47:21 PM9/6/13

to Kenton Varda, Michi Henning, capnproto

Hi Kenton.

> Yes, the issue here is that the segments are not self-delimiting, so a table of segment sizes must be written at the start. In your case this table is 8 bytes. This is all implemented in capnp/serialize.c++ -- see writeMessage() in particular. Without the table, it's not possible to determine the segment boundaries on the receiving end.

OK, I get it.

> Does 0mq require a message to be in contiguous memory, or does it have a "writev"-like interface? If it has writev, then you can implement your own kj::OutputStream that only implements the array-of-array-of-bytes form of write() and pass that to capnp::writeMessage() -- it will only make a single call to this method. If 0mq doesn't offer a writev interface then you'll need to call capnp::messageToFlatArray().

OK, I'll have a look at that, thanks. There is a way to write messages in parts with zmq. Basically, when sending a message, you can set a flag that says "not complete yet, more to follow". You can set the flag on as many messages as you like, and the final chunk has a flag to say "complete". But that's not quite the same as writev(), which takes care of re-assembly on the receiving end based on the position of each chunk in the vector of chunks. With zmq, order for reassembly is determined by the order in which the chunks are written, that is, there is no API that would accept a vector, only an API that can send a message in parts, one part after the another.

> Of course, you can always invent your own framing format. As long as you can produce an ArrayPtr<const ArrayPtr<const word>> on the receiving end that matches exactly what getSegmentsForOutput() returned on the sending end, you can use SegmentArrayMessageReader and skip serialize.h entirely.

I'll try this, thanks!

One problem I see is that the Cap'n Proto API appears somewhat asymmetric right now. I can easily put my parameters into a Cap'n Proto message and write that message to a file descriptor, and read it again on the other side. But I can't easily put my parameters into a message and write the message *without* a file descriptor. (I don't get access to an fd from zmq.) But, on the receiving end, there *is* a FlatArrayMessageReader that *can* unpack a message that's in a buffer.

I guess the shortcoming right now is that I can't go to Cap'n Proto and say "give me a pointer to a buffer that contains the message in a form suitable for transmission" whereas, on the receiving side, there *is* such a thing.

What's missing is something in between the level of a message and the Fd* classes. I should be able to read and write without having to provide an fd, and without having to create my own framing.

But your advice about skipping serialize.h will do the trick for now, thanks very much for that!

Cheers,

Michi.

Kenton Varda

unread,

Sep 6, 2013, 6:52:55 PM9/6/13

to Michi Henning, Michi Henning, capnproto

On Fri, Sep 6, 2013 at 2:47 PM, Michi Henning <mi...@triodia.com> wrote:

OK, I'll have a look at that, thanks. There is a way to write messages in parts with zmq. Basically, when sending a message, you can set a flag that says "not complete yet, more to follow". You can set the flag on as many messages as you like, and the final chunk has a flag to say "complete". But that's not quite the same as writev(), which takes care of re-assembly on the receiving end based on the position of each chunk in the vector of chunks. With zmq, order for reassembly is determined by the order in which the chunks are written, that is, there is no API that would accept a vector, only an API that can send a message in parts, one part after the another.

Well, that should work just as well. In fact, if the receiving end gets the chunks delimited at the same points as when they were sent, the you don't need to send a separate segment table at all -- the 0mq chunk boundaries can serve as the segment boundaries, and there you have your framing.

OTOH, if the chunk boundaries aren't necessarily communicated to the receiver, then you can simply implement a custom kj::OutputStream which writes each byte array it receives as a separate chunk.

One problem I see is that the Cap'n Proto API appears somewhat asymmetric right now. I can easily put my parameters into a Cap'n Proto message and write that message to a file descriptor, and read it again on the other side. But I can't easily put my parameters into a message and write the message *without* a file descriptor. (I don't get access to an fd from zmq.) But, on the receiving end, there *is* a FlatArrayMessageReader that *can* unpack a message that's in a buffer.

capnp::messageToFlatArray() is the write-time analog to FlatArrayMessageReader. The problem is that it makes a redundant copy of the content.

I guess the shortcoming right now is that I can't go to Cap'n Proto and say "give me a pointer to a buffer that contains the message in a form suitable for transmission" whereas, on the receiving side, there *is* such a thing.

Well, this is exactly the purpose of getSegmentsForOutput(). But if you want Cap'n Proto to build the segment table for you, then it wants to write using a callback (a kj::OutputStream) so that it can allocate the segment table on the stack. I suppose we could add a method to serialize.h like "Array<word> makeSegmentTableFor(MessageBuilder&)" which would allocate the segment table on the heap, and then you'd have all the parts you need. But it seems more appropriate to use kj::OutputStream here.

What's missing is something in between the level of a message and the Fd* classes. I should be able to read and write without having to provide an fd, and without having to create my own framing.

Right, this is why kj::OutputStream is an abstract interface -- so you can support targets other than file descriptors.

-Kenton

Michi Henning

unread,

Sep 8, 2013, 4:10:17 AM9/8/13

to Kenton Varda, Michi Henning, capnproto

> Well, that should work just as well. In fact, if the receiving end gets the chunks delimited at the same points as when they were sent, the you don't need to send a separate segment table at all -- the 0mq chunk boundaries can serve as the segment boundaries, and there you have your framing.

Just got confirmation form Pieter that yes, indeed, zmq preserves the chunk boundaries, so it looks I'm good to go after all :-)

Cheers,

Michi.

Kenton Varda

unread,

Sep 8, 2013, 3:43:03 PM9/8/13

to Michi Henning, Michi Henning, capnproto

Great, so we can use 0mq's framing. I wonder if it makes sense to define this as the canonical way to send Cap'n Proto messages over 0mq, and maybe provide some reference code (which I imagine is not very long).

Cheers,

Michi.

Michi Henning

unread,

Sep 9, 2013, 2:32:49 AM9/9/13

to Kenton Varda, capnproto

On 09/09/13 05:43, Kenton Varda wrote:
> Great, so we can use 0mq's framing. I wonder if it makes sense to
> define this as the canonical way to send Cap'n Proto messages over
> 0mq, and maybe provide some reference code (which I imagine is not
> very long).

Here is something that does the job:

ZmqSender::ZmqSender(zmqpp::socket& s) :
s_(s)
{
}

// Send a message provided as a capnp segment list. Each segment is sent
as a separate zmq message part.

void ZmqSender::send(kj::ArrayPtr<kj::ArrayPtr<capnp::word const> const>
segments)
{
auto it = segments.begin();
auto i = segments.size();
assert(i != 0);
while (--i != 0)
{
s_.send_raw(reinterpret_cast<char const*>(&(*it)[0]),
it->size() * sizeof(capnp::word), zmqpp::socket::send_more);
++it;
}
s_.send_raw(reinterpret_cast<char const*>(&(*it)[0]), it->size() *
sizeof(capnp::word), zmqpp::socket::normal);
}

The receiver is a bit messy, due to a bit of impedance mismatch with the
zmq API (I have to unmarshal into a std::string). The check for a
mis-aligned string buffer is there because the standard doesn't
guarantee that a std::string buffer has any particular alignment, as far
as I know. Obviously, the ZmqReceiver instance must remain in scope
until after unmarshaling is complete.

class ZmqReceiver final : private util::NonCopyable
{
public:
ZmqReceiver(zmqpp::socket& s);

kj::ArrayPtr<kj::ArrayPtr<capnp::word const> const> receive();

private:
zmqpp::socket& s_;
std::vector<std::string> parts_;
std::vector<std::unique_ptr<capnp::word[]>> copied_parts_;
std::vector<kj::ArrayPtr<capnp::word const>> segments_;
};

ZmqReceiver::ZmqReceiver(zmqpp::socket& s) :
s_(s)
{
}

// Receive a message (as a single message or in parts) and convert to a
capnp segment list.

kj::ArrayPtr<kj::ArrayPtr<capnp::word const> const> ZmqReceiver::receive()
{
// Clear previously received content, if any.
parts_.clear();
copied_parts_.clear();
segments_.clear();

do
{
parts_.push_back(string());
string& str = parts_.back();
s_.receive(str);

assert(str.size() % sizeof(capnp::word) == 0); //
Received message must contain an integral number of words.
auto num_words = str.size() / sizeof(capnp::word);
char* buf = &str[0];

if (reinterpret_cast<uintptr_t>(buf) % sizeof(capnp::word) == 0)
{
// String buffer is word-aligned, point directly at the
start of the string.
segments_.push_back(kj::ArrayPtr<capnp::word
const>(reinterpret_cast<capnp::word const*>(buf), num_words));
}
else
{
// String buffer is not word-aligned, make a copy and point
at that.
unique_ptr<capnp::word[]> words(new capnp::word[num_words]);
memcpy(words.get(), buf, str.size());
segments_.push_back(kj::ArrayPtr<capnp::word
const>(&words[0], num_words));
copied_parts_.push_back(move(words));
}
}
while (s_.has_more_parts());

return kj::ArrayPtr<kj::ArrayPtr<capnp::word const>>(&segments_[0],
segments_.size());
}

Cheers,

Michi.

bmco...@gmail.com

unread,

Jun 20, 2016, 12:40:38 PM6/20/16

to Cap'n Proto, temp...@gmail.com

I guess identifiers have changed since 2013. Is there an up-to-date example of code that does this? 0mq + capnp seems like a worthwhile combination to me.

Reply all

Reply to author

Forward