To be clear, I would like to support this. I'm trying to figure out if there's a way to do it without too much trouble.
So, Cap'n Proto messages are composed of multiple "segments", each of which is composed of a number of "objects", where an object is e.g. a struct, a list, or a byte blob. Each segment is a contiguous block of memory. Objects are word aligned within their segments, where "word" always means 64-bit for Cap'n Proto.
So, in order to support higher alignment, we need to do two things:
1) Make sure objects are aligned relative to the segment start.
2) Make sure segments are aligned. This has two sub-problems:
a. Make sure segments of a newly-built message are aligned.
b. Make sure segments read off the wire are aligned.
These turn out to be very different problems.
2a is the most trivial to solve: Just use posix_memalign() instead of calloc() in MallocMessageBuilder and specify 32-byte alignment. We could probably just do this for everyone whether they need it or not. However, if people choose to pass a scratch array to MallocMessageBuilder's constructor, they will need to be responsible for aligning that array if it matters to them. And, of course, anyone writing a custom MessageBuilder subclass has to deal with alignment.
2b is trickier: A serialized message starts out with a table of segment sizes. This table is always a whole number of words, but isn't aligned beyond that -- in fact, the most common table size is 1 word. The first segment begins immediately after the table.
Now, InputStreamMessageReader actually reads the table separately from the content, so it could still allocate 32-byte aligned segments. But there is also FlatArrayMessageReader, which takes a user-provided buffer containing the entire serialized message and references it directly. FlatArrayMessageReader is particularly useful together with mmap(). Unfortunately, in the common case of a 1-word segment table, an mmap'd file will never end up with aligned segments.
One way we could fix this would be to automatically pad out the segment table to 16 or 32 bytes when writing a message which we know contains any content requiring alignment. The table would claim that the message has more segments than it really does, but the extra segments would all be zero-size. So, compatibility with the existing protocol can be preserved.
Finally, we come to point (1). Complexifying the allocation code to support alignment is not a big deal in itself. The bigger problem is deciding when to align.
I think that it probably makes the most sense to support higher alignment only for Data blobs. We will not support higher-aligned fields within structs. Here's why:
- The struct layout code is already too complex.
- If there are any systems where unaligned reads cause SIGBUS (or otherwise crash the app), then we'd need to verify alignment at the time the struct pointer is traversed. However, we wouldn't want a protocol to lose backwards-compatibility when adding a new higher-aligned field to a struct that didn't have any higher-aligned fields before. E.g. if I add a Float64x4 field to my struct, then go back and read an old message created before that field existed, the struct may not be aligned there. That's actually OK, because that old struct won't contain the field anyway, but it will be complex to validate -- we'll need to keep track of the offset of the first aligned field and allow non-alignment if the struct is smaller than that offset.
- Struct lists are prefixed with a one-word tag, which could throw off alignment for the whole list. We could support allocation with offset alignment -- e.g. "please allocate 17 words of memory aligned 1 word before a 4-word alignment boundary" -- but this is getting pretty weird.
- Cross-platformness. If Cap'n Proto is going to explicitly support a Float64x4 type, it is going to have to be little-endian IEEE-754. On any system where that is not acceptable to the vector processor (PPC, maybe?), all of this work on alignment is for naught, and we have to deal with byte-swapping. On the other hand, if we don't offer an explicit Float64x4 type at all, and just say "you can allocate aligned byte blobs, but it's up to you what's in them", then we wash our hands of this problem -- it's up to the application to decide what format is appropriate for its needs.
- It seems to me that typical use cases for vector processing involve a large list of vectors anyway, so a Float64x4 field type may be a complete waste of time.
So, given all this, I'd proposing just adding two new types: DataAligned128 and DataAligned256. These two types work exactly like Data, but are guaranteed to be allocated aligned on a 128-bit/256-bit boundary from the start of the segment. Moreover, allocating such a buffer anywhere in your message will set a flag which causes the segment table to be padded out so that all segments end up aligned relative to the start of the serialized message. The system will always allocate 32-byte aligned segments, but if you are doing any allocation yourself (either because you are providing scratch space or because you are using FlatArrayMessageReader) then it's up to you to deal with alignment. When reading a DataAligned128 or DataAligned256 pointer, the system will throw an exception if the target data is not actually aligned (whether because the sender failed to align it or because the segment is not aligned in local memory).
Thoughts?
-Kenton