Question about Alignment and crypto/subtle

265 views
Skip to first unread message

not...@google.com

unread,
Jun 23, 2016, 2:52:51 PM6/23/16
to golang-dev
(Context for this question is speeding up crypto/subtle)

When Go allocates strings/byte arrays, are there any implicit guarantees about alignment?  The Spec has fairly weak assertions about the alignment of arrays, but I was more probing here to see if the implementation does alignment under the hood anyways.  

The reason I was interested in this was the possibility of speeding up subtle.ConstantTimeCompare, which compares the arrays byte by byte.  I am fully aware that there may be dragons here, but the performance difference between byte-by-byte and int32-by-int32 checking is too good to not investigate.  

An additional reason for bringing up alignment is how array placement interacts with pages.   Accessing the individual bytes of an array is not constant time if they span across page boundaries, which would leak timing information about array placement (though, I think it should still take the same amount regardless of contents).  The question is whether being able to ignore page faults because they don't leak relevant timing info implies that doing other tricks is acceptable.  I have a hacky, broken (but passing) attempt at what I am exploring here: https://play.golang.org/p/3TZnmzUays

Before:
BenchmarkConstantTimeCompare-12       500000       2307 ns/op

After:
BenchmarkConstantTimeCompare-12      2000000        938 ns/op 



I am not a cryptographer, I have only heard about such things from tales of others.

Matthew Dempsky

unread,
Jun 23, 2016, 3:09:08 PM6/23/16
to Carl Mastrangelo, golang-dev
On Thu, Jun 23, 2016 at 11:52 AM, notcarl via golang-dev <golan...@googlegroups.com> wrote:
When Go allocates strings/byte arrays, are there any implicit guarantees about alignment?  The Spec has fairly weak assertions about the alignment of arrays, but I was more probing here to see if the implementation does alignment under the hood anyways.

I'd hesitate to call them even "implicit guarantees," but you can find a lot of the details about alignment handling for allocations in src/runtime/malloc.go and src/runtime/msize.go.

The relevant bits to your inquiry are:

1. The "tiny allocator" (used for allocs <16 bytes), will align based on the size of the allocation.  See lines 614--621 in malloc.go.

2. Larger allocations are sized according to msize.go's size classes.  The docs at the top (lines 43--46) explain: "All objects are 8-aligned, so the first array is indexed by the size divided by 8 (rounded up).  Objects >= 1024 bytes are 128-aligned, so the second array is indexed by the size divided by 128 (rounded up)."  But at line 78, it looks like objects >= 16 (i.e., all objects that don't use the tiny allocator) are actually 16-byte aligned.

Cheers

Matthew Dempsky

unread,
Jun 23, 2016, 3:19:54 PM6/23/16
to Carl Mastrangelo, golang-dev
Oh, I should also point out that code is for alignment of top-level heap allocations.  Of course, if you do something like

    x := new(struct {
        a [1]byte
        b [16]byte
     })

then *x will be 16-byte aligned, but x.b will only be 1-byte aligned.

Similar caveats apply if the compiler determines that a local variable does not escape and can be allocated on the stack instead of the heap, but you can explicitly use "new" to force heap allocation.

Matthew Dempsky

unread,
Jun 23, 2016, 4:51:06 PM6/23/16
to Carl Mastrangelo, golang-dev
On Thu, Jun 23, 2016 at 12:19 PM, Matthew Dempsky <mdem...@google.com> wrote:
but you can explicitly use "new" to force heap allocation.

Oops, it was pointed out to me privately that I misspoke: even variables allocated with new might end up stack allocated if the compiler determines they don't escape.

Jakob Borg

unread,
Jun 23, 2016, 5:10:52 PM6/23/16
to Matthew Dempsky, Carl Mastrangelo, golang-dev
2016-06-23 21:19 GMT+02:00 'Matthew Dempsky' via golang-dev
<golan...@googlegroups.com>:
> you can explicitly use "new" to force heap allocation

This doesn't seem to be the case in regular user code for trivial
things like new(int) that aren't otherwise forced to the heap by
escaping their scope. Do you mean that it should be, or that it is for
other more complex types, or that it should be in some compiler
version different from what I'm running?

//jb

not...@google.com

unread,
Jun 24, 2016, 12:23:21 PM6/24/16
to golang-dev, not...@google.com
Thank you for your response Matthew.  While I doubt this CL will actually get submitted, I I'd still like to understand it: https://go-review.googlesource.com/#/c/24422/1

The code is basically taking two byte arrays, trying to find if they are mutually aligned, and using word comparison on them.  However, there is something tricky going on:  In the case of both arrays being aligned on 4 byte (but not 8 byte) boundaries, the code is dramatically faster.  The benchmarks show going from 468ns to 320ns. This doesn't seem possible, since the 8byte alignment case should be the fastest on an amd64 processor. Looking at the addresses in both cases, the only difference is that the slice (which enters constant_time.go:44) ends up being aligned on a 16 byte boundary instead.

Will the Go runtime align the two 4096 byte arrays on a 8 byte boundary but not a 16, which would explain the speed discrepancy?

Austin Clements

unread,
Jun 24, 2016, 3:45:34 PM6/24/16
to Carl Mastrangelo, golang-dev
On Fri, Jun 24, 2016 at 10:23 AM, notcarl via golang-dev <golan...@googlegroups.com> wrote:
Thank you for your response Matthew.  While I doubt this CL will actually get submitted, I I'd still like to understand it: https://go-review.googlesource.com/#/c/24422/1

The code is basically taking two byte arrays, trying to find if they are mutually aligned, and using word comparison on them.  However, there is something tricky going on:  In the case of both arrays being aligned on 4 byte (but not 8 byte) boundaries, the code is dramatically faster.  The benchmarks show going from 468ns to 320ns. This doesn't seem possible, since the 8byte alignment case should be the fastest on an amd64 processor. Looking at the addresses in both cases, the only difference is that the slice (which enters constant_time.go:44) ends up being aligned on a 16 byte boundary instead.

Will the Go runtime align the two 4096 byte arrays on a 8 byte boundary but not a 16, which would explain the speed discrepancy?

Currently (no guarantees, obviously), a 4096 byte allocation will in fact be 4096 byte aligned. Though whether or not a 4096 byte array is a 4096 byte allocation depends on other things that have been discussed in this thread. The "natural" alignment of a byte array is one byte, so if it's on the stack or in a struct with other fields, there is no alignment guarantee at all.

For testing, you can force a 4096 byte allocation with something like:

var globalSink interface{}

func f() {
    x := new([4096]byte)
    globalSink = x
    ...
}

Reply all
Reply to author
Forward
0 new messages