relaxing alignment requirements in capnproto-rust: am I missing anything?

David Renshaw

unread,

Jan 9, 2020, 11:03:57 PM1/9/20

to capnproto

I want to make it easy and safe for users of capnproto-rust to read messages from unaligned buffers without copying. (See this github issue.)

Currently, a user must pass their unaligned buffer through unsafe fn bytes_to_words(), asserting that they believe their hardware to be okay with unaligned reads. In other words, we require that the user understand some tricky low-level processor details, and that the user preclude their software from running on many platforms.

(With libraries like sqlite, zmq, redis, and many others, there simply is no way to request that a buffer be aligned -- you are just given an array of bytes. You can copy the bytes into an aligned buffer, but that has a performance cost and a complexity cost (who owns the new buffer?).)

I believe that it would be better for capnproto-rust to work natively on unaligned buffers. In fact, I have a work-in-progress branch that achieves this, essentially by changing a bunch of direct memory accesses into tiny memcpy() calls. This c++ godbolt snippet captures the main idea, and shows that, on x86_64 at least, the extra indirection gets optimized away completely. Indeed, my performance measurements so far support the hypothesis that there will be no performance cost in the x86_64 case. For processors that don't support unaligned access, the extra copy will still be there (e.g. https://godbolt.org/z/qgsGMT), but I hypothesize that it will be fast.

All in all, this change seems to me like a big usability win. So I'm wondering: have I missed anything in the above analysis? Are there good reasons I shouldn't make the change?

- David

Kenton Varda

unread,

Jan 10, 2020, 10:25:36 AM1/10/20

to David Renshaw, capnproto

First, make sure you add the -O2 compiler option in godbolt, so that these are actually optimized. If you do that, `direct()` becomes two instructions (on both architectures), while `indirect()` on ARM is still 9 instructions.

It's true that on x86_64, this change will have no negative impact, as you observed. But that's specifically because x86_64 supports unaligned reads and writes, and so on this platform you don't actually need to change anything to support unaligned buffers.

On ARM, your example is generating an out-of-line function call to memcpy. I could be wrong, but I think this will be heavier than you are imagining. There are three issues:

- The function call itself takes several instructions.

- An out-of-line function call will force the compiler to be more conservative about optimizations around it. When a getter is inlined into a larger function body, this could lead to a lot more overhead than is visible in the godbolt example. For example, caller-saved registers used by that outer function would need to be saved and restored around each call.

- The glibc implementation of memcpy() itself needs to be designed to handle any size of memcpy, and is optimized for larger, variable-sized copies, since small fixed copies would normally be inlined. Several branches will be needed even for a small copy.

Here's the code: https://github.com/lattera/glibc/blob/master/string/memcpy.c

And macros it depends on: https://github.com/lattera/glibc/blob/master/sysdeps/generic/memcopy.h

It's hard to say how much effect all this would really have, but it would make me uncomfortable.

But it might not be too hard to convince the compiler to generate a fixed sequence of byte copies, rather than a memcpy call. That could be a lot better. I'm kind of surprised that GCC doesn't optimize it this way automatically, TBH.

BTW it looks like arm64 gets optimized to an unaligned load just like x86_64. So the future seems to be one where we don't need to worry about alignment anymore. Maybe that's a good argument for going ahead with this approach now.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com.

David Renshaw

unread,

Jan 11, 2020, 11:12:06 AM1/11/20

to Kenton Varda, capnproto

Thanks for the feedback!

I figured out how to get rustc to emit assembly for a variety of targets. Results are in this blog post: https://dwrensha.github.io/capnproto-rust/2020/01/11/unaligned-memory-access.html

I don't think there's any case in which the extra copy will actually be an out-of-line memcpy function call.

- David

Ian Denhardt

unread,

Jan 11, 2020, 11:41:05 AM1/11/20

to David Renshaw, Kenton Varda, capnproto

I'm generally supportive of this, but also worth considering if the
change doesn't land: as an alternative to the current unsafe
bytes_to_words, you could provide a version that returns a Result, which
is Err unless the argument is not 8-byte aligned or the cpu architecture
is known to be able to handle unaligned access.

-Ian

Quoting David Renshaw (2020-01-11 11:11:54)

> Thanks for the feedback!
> I figured out how to get rustc to emit assembly for a variety of
> targets. Results are in this blog

> post:� [1]https://dwrensha.github.io/capnproto-rust/2020/01/11/unaligne
> d-memory-access.html

> code:� [3]https://github.com/lattera/glibc/blob/master/string/memcpy.c
> And macros it depends
> on:� [4]https://github.com/lattera/glibc/blob/master/sysdeps/generic/me
> mcopy.h

> It's hard to say how much effect all this would really have, but it
> would make me uncomfortable.
> But it might not be too hard to convince the compiler to generate a
> fixed sequence of byte copies, rather than a memcpy call. That could be
> a lot better. I'm kind of surprised that GCC doesn't optimize it this
> way automatically, TBH.
> BTW it looks like arm64 gets optimized to an unaligned load just like
> x86_64. So the future seems to be one where we don't need to worry
> about alignment anymore. Maybe that's a good argument for going ahead
> with this approach now.
> -Kenton
>

> On Thu, Jan 9, 2020 at 10:03 PM David Renshaw <[5]dwre...@gmail.com>

> wrote:
>
> I want to make it easy and safe for users of capnproto-rust to read

> messages from unaligned buffers without copying.� (See [6]this github
> issue.)
> Currently, a user must pass their unaligned buffer through� [7]unsafe

> fn bytes_to_words(), asserting that they believe their hardware� to be
> okay with unaligned reads. In other words, we require that the user
> understand some tricky low-level processor details, and that the user
> preclude their software from running on many platforms.
> (With libraries like sqlite, zmq, redis, and many others, there simply
> is no way to request that a buffer be aligned -- you are just given an
> array of bytes. You can copy the bytes into an aligned buffer, but that
> has a performance cost and a complexity cost (who owns the new
> buffer?).)
> I believe that it would be better for capnproto-rust to work natively
> on unaligned buffers. In fact, I have a work-in-progress branch that
> achieves this, essentially by changing a bunch of direct memory

> accesses into tiny memcpy() calls. This [8]c++ godbolt snippet captures

> the main idea, and shows that, on x86_64 at least, the extra
> indirection gets optimized away completely. Indeed, my performance
> measurements so far support the hypothesis that there will be no
> performance cost in the x86_64 case. For processors that don't support
> unaligned access, the extra copy will still be there (e.g.

> [9]https://godbolt.org/z/qgsGMT), but I hypothesize that it will be

> fast.
> All in all, this change seems to me like a big usability win. So I'm
> wondering: have I missed anything in the above analysis? Are there good
> reasons I shouldn't make the change?
> - David
>
> --
> You received this message because you are subscribed to the Google
> Groups "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it,

> send an email to [10]capnproto+...@googlegroups.com.

> To view this discussion on the web visit

> [11]https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4
> cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com.

>
> --
> You received this message because you are subscribed to the Google
> Groups "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send

> an email to [12]capnproto+...@googlegroups.com.

> To view this discussion on the web visit

> [13]https://groups.google.com/d/msgid/capnproto/CABR6rW8Xw5eveWtJGpv3_F
> Ex_wKesHc0EDHEtdw-q0Fow%3DK6eA%40mail.gmail.com.
>
> Verweise
>
> 1. https://dwrensha.github.io/capnproto-rust/2020/01/11/unaligned-memory-access.html
> 2. mailto:ken...@cloudflare.com
> 3. https://github.com/lattera/glibc/blob/master/string/memcpy.c
> 4. https://github.com/lattera/glibc/blob/master/sysdeps/generic/memcopy.h
> 5. mailto:dwre...@gmail.com
> 6. https://github.com/capnproto/capnproto-rust/issues/101
> 7. https://github.com/capnproto/capnproto-rust/blob/d1988731887b2bbb0ccb35c68b9292d98f317a48/capnp/src/lib.rs#L82-L88
> 8. https://godbolt.org/z/Wki7uy
> 9. https://godbolt.org/z/qgsGMT
> 10. mailto:capnproto+...@googlegroups.com
> 11. https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com?utm_medium=email&utm_source=footer
> 12. mailto:capnproto+...@googlegroups.com
> 13. https://groups.google.com/d/msgid/capnproto/CABR6rW8Xw5eveWtJGpv3_FEx_wKesHc0EDHEtdw-q0Fow%3DK6eA%40mail.gmail.com?utm_medium=email&utm_source=footer

David Renshaw

unread,

Jan 14, 2020, 6:13:25 PM1/14/20

to Kenton Varda, capnproto

Ralf Jung, who knows a lot about software verification, suggests that capnproto-c++'s UnalignedFlatArrayMessageReader might cause undefined behavior even on x86_64:

https://www.reddit.com/r/rust/comments/en9fmn/should_capnprotorust_force_users_to_worry_about/fedi5hk/?context=8&depth=9

Kenton Varda

unread,

Jan 14, 2020, 6:33:47 PM1/14/20

to David Renshaw, capnproto

UnalignedFlatArrayMessageReader's own doc comment mentions this fact. :)

FWIW I don't actually recommend using that class, but I was convinced to add it when enough people demanded it.

-Kenton

David Renshaw

unread,

Jan 14, 2020, 6:45:39 PM1/14/20

to Kenton Varda, capnproto

Ah, indeed it does! Now I’m feeling silly for failing at reading comprehension. :)

David Renshaw

unread,

Jan 19, 2020, 4:11:05 PM1/19/20

to Kenton Varda, capnproto

I found a solution that I think I can live with:

https://dwrensha.github.io/capnproto-rust/2020/01/19/new-feature-to-allow-unaligned-buffers.html

Reply all

Reply to author

Forward