From array<word> to array<byte> and back: strict aliasing and undefined behavior

788 views
Skip to first unread message

Daniel Hofmann

unread,
Nov 28, 2014, 5:11:44 AM11/28/14
to capn...@googlegroups.com
I want to pass my message to a (bytes, size)-tuple expecting function.
The messageToFlatArray() function returns an array<word>.

Right now, I've implemented it like this:

> const auto serialized = messageToFlatArray(message);
>
> kj::ArrayPtr<const char> byte_view{
> reinterpret_cast<const char*>(std::begin(serialized)),
> reinterpret_cast<const char*>(std::end(serialized))};
>
> assert(byte_view.size() % sizeof(capnp::word) == 0);
> fn(std::begin(bytes), bytes.size()); // (bytes, size)-tuple

And for the situation where I have bytes and want to read my message
from it, I do the following:

> std::vector<char> in;
> gn(std::back_inserter(in)); // fill with bytes
> assert(in.size() % sizeof(capnp::word) == 0);
>
> const kj::ArrayPtr<const capnp::word> view{
> reinterpret_cast<const capnp::word*>(&(*std::begin(in))),
> reinterpret_cast<const capnp::word*>(&(*std::end(in)))};
>
> capnp::FlatArrayMessageReader message{view};


Now I was wondering if this is safe to do, especially regarding all
those nasty reinterpret casts. I was talking to someone who mentioned
strict aliasing rules and that this may be Undefined Behavior without an
extra copy.

That's the reason I was wondering if this is the most idiomatic way and
especially if this is safe to do.

I also checked the word class's documentation:
>
https://github.com/kentonv/capnproto/blob/d6a45490341521fd9d9eaade5b312488555b57b0/c%2B%2B/src/capnp/common.h#L261-L270

and it does mentions reinterpret casts as a need for accessing the contents.


Cheers,
Daniel

Kenton Varda

unread,
Nov 28, 2014, 6:14:35 PM11/28/14
to Daniel Hofmann, capnproto
There's no violation of aliasing rules because C/C++ aliasing rules make an explicit exception for casting to char* (including signed/unsigned variants).

But I think we should make this easier by providing an ArrayPtr::asByteArray() method.

-Kenton


--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto.

Andrew Lutomirski

unread,
Nov 28, 2014, 6:20:50 PM11/28/14
to Kenton Varda, Daniel Hofmann, capnproto
On Fri, Nov 28, 2014 at 3:14 PM, Kenton Varda <ken...@sandstorm.io> wrote:
> There's no violation of aliasing rules because C/C++ aliasing rules make an
> explicit exception for casting to char* (including signed/unsigned
> variants).

I'm not 100% sure this is always correct.

For output (treating a just-written message as an array of char), I
agree. For input, if you allocate an array of, say, int, fill it in,
cast to unsigned char *, and feed it to capnproto, I think you've
broken the rules.

My not-quite-half-written nanocnp avoids this at (I think) no
performance cost, by only ever reading with memcpy.

I think it's entirely absurd that C offers no way to say "effectively
copy this memory block to itself for aliasing purposes".

--Andy

Kenton Varda

unread,
Nov 28, 2014, 7:16:27 PM11/28/14
to Andrew Lutomirski, Daniel Hofmann, capnproto
On Fri, Nov 28, 2014 at 3:20 PM, Andrew Lutomirski <an...@luto.us> wrote:
I'm not 100% sure this is always correct.

Can you cite a source? Because I'm pretty sure it is correct.

The rule is intended to make "read(fd,&obj, sizeof(obj))" and "write(fd, &obj, sizeof(obj))" legal, which is pretty much exactly what Cap'n Proto is doing.
 
For input, if you allocate an array of, say, int, fill it in,
cast to unsigned char *, and feed it to capnproto, I think you've
broken the rules.

Well, we don't do that. We do I/O on bytes, not ints.
 
My not-quite-half-written nanocnp avoids this at (I think) no
performance cost, by only ever reading with memcpy.

Indeed, memcpy operates on bytes, therefore it is legal.

Andrew Lutomirski

unread,
Nov 28, 2014, 7:20:51 PM11/28/14
to Kenton Varda, Daniel Hofmann, capnproto
On Fri, Nov 28, 2014 at 4:16 PM, Kenton Varda <ken...@sandstorm.io> wrote:
> On Fri, Nov 28, 2014 at 3:20 PM, Andrew Lutomirski <an...@luto.us> wrote:
>>
>> I'm not 100% sure this is always correct.
>
>
> Can you cite a source? Because I'm pretty sure it is correct.
>
> The rule is intended to make "read(fd,&obj, sizeof(obj))" and "write(fd,
> &obj, sizeof(obj))" legal, which is pretty much exactly what Cap'n Proto is
> doing.
>
>>
>> For input, if you allocate an array of, say, int, fill it in,
>> cast to unsigned char *, and feed it to capnproto, I think you've
>> broken the rules.
>
>
> Well, we don't do that. We do I/O on bytes, not ints.

This would only apply to rather strange code that does its own I/O. I
doubt it would affect any of the Cap'n Proto RPC stuff.

--Andy

Kenton Varda

unread,
Nov 28, 2014, 7:25:59 PM11/28/14
to Andrew Lutomirski, Daniel Hofmann, capnproto
On Fri, Nov 28, 2014 at 4:20 PM, Andrew Lutomirski <an...@luto.us> wrote:
This would only apply to rather strange code that does its own I/O.  I
doubt it would affect any of the Cap'n Proto RPC stuff.

(I'm parsing this statement as: Cap'n Proto RPC is fine, since it reads bytes from a socket, but people could in theory write their own I/O code which does not use bytes.)

Sure. Any code which does I/O in units other than bytes is technically breaking the rules. But I can't really imagine who would do that. (And it applies equally to both input and output.)

-Kenton

Kenton Varda

unread,
Nov 28, 2014, 9:00:34 PM11/28/14
to Daniel Hofmann, capnproto
On Fri, Nov 28, 2014 at 3:14 PM, Kenton Varda <ken...@sandstorm.io> wrote:
But I think we should make this easier by providing an ArrayPtr::asByteArray() method.
Makes the code a lot nicer in a lot of places!

-Kenton

Tobias Hahn

unread,
Dec 1, 2014, 6:31:11 AM12/1/14
to Kenton Varda, capnproto
I believe Kenton is correct concerning aliasing rules.

The question that (optimizing) compilers are concerned with is, given two pointers, can they alias? To be able to answer this question easily, it shouldn't matter how you arrived at these pointer types (e.g. casting from int* to char* to X*) because this would most likely require complicated static analysis, which would make the whole aliasing rules a lot less useful.

FYI:

> If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
> — [...]
> — a char or unsigned char type.

> The intent of this list is to specify those circumstances in which an object may or may not be aliased.


3.10.10, ISO International Standard ISO/IEC 14882:2012(E) – Programming Language C++

Cheers!
Tobias
> --
> You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/capnproto.

Ableton AG, Schoenhauser Allee 6-7, 10119 Berlin, Germany
Sitz (Registered Office) Berlin, Amtsgericht Berlin-Charlottenburg, HRB 72838
Vorstand (Management Board): Gerhard Behles, Jan Bohl
Vorsitzender des Aufsichtsrats (Chair of the Supervisory Board): Uwe Struck


Daniel Hofmann

unread,
Dec 3, 2014, 4:43:17 AM12/3/14
to capn...@googlegroups.com
Reading the rules regarding type aliasing, there is an exception for
char/unsigned char as target type:

> http://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing

So the asBytes() function should be fine.

But what about the other direction, i.e. from an array<char> to an
array<word>, as I mentioned in the initial post:

> const kj::ArrayPtr<const capnp::word> view{
> reinterpret_cast<const capnp::word*>(&(*std::begin(bytes))),
> reinterpret_cast<const capnp::word*>(&(*std::end(bytes)))};

Under strict aliasing rules this is not valid, is it? What is the
idiomatic solution here? How do I get Cap'n Proto to interact with bytes?

Kenton Varda

unread,
Dec 3, 2014, 5:04:24 AM12/3/14
to Daniel Hofmann, capnproto
The aliasing rules are not directional. Either two pointers are allowed to alias, or they aren't -- it doesn't matter which pointer came first, or is used first, or is derived from the other.

That said, a problem you may have with casting byte* to word* is that the byte array might not be aligned on a word boundary, in which case you might crash or harm performance. To be safe, you have to make a copy, i.e.:

    memcpy(words.begin(), bytes.begin(), bytes.size());

You could try to carefully check for alignment and only make the copy if the bytes aren't aligned, but that's going to be a bit ugly.

Ideally, you should try to write your code such that it reads directly into an Array<word> in the first place. If you are using a third-party library for I/O, see if it lets you provide a buffer to use.

-Kenton
Reply all
Reply to author
Forward
0 new messages