16-byte alignment

1,382 views
Skip to first unread message

Charlie Dorian

unread,
Jun 14, 2012, 3:27:28 PM6/14/12
to golan...@googlegroups.com
Is there a way with 6a to lay down data that is 16-byte aligned? I'd like an array of pairs-of-float64s (e.g., complex128) to be aligned so that the xxxPD (e.g., ADDPD mem,X0) instructions could be used. Thanks.

Russ Cox

unread,
Jun 15, 2012, 3:17:45 PM6/15/12
to Charlie Dorian, golan...@googlegroups.com
You'd have to arrange that not just for array of pairs of float64, but for any complex128 in any data structure. You'd have to extend the alignment logic at the bottom of ld/data.c and also set t->align = 16 in case TCOMPLEX128 in gc/align.c's dowidth. I think that would be enough.

Russ

Charlie Dorian

unread,
Jun 15, 2012, 7:23:38 PM6/15/12
to golan...@googlegroups.com, Charlie Dorian, r...@golang.org
Thanks for the reply. That suggestion would work for .go code, but how about .s code? Does ld know from the size to align constants in DATA statements? If so, how do I get the second float64 into a statement like DATA p+0(SB)/16,$(1.0,1.0)

Russ Cox

unread,
Jun 21, 2012, 1:21:45 AM6/21/12
to Charlie Dorian, golan...@googlegroups.com
On Fri, Jun 15, 2012 at 4:23 PM, Charlie Dorian <cldo...@gmail.com> wrote:
> Thanks for the reply. That suggestion would work for .go code, but how about
> .s code? Does ld know from the size to align constants in DATA statements?
> If so, how do I get the second float64 into a statement like DATA
> p+0(SB)/16,$(1.0,1.0)

I don't think DATA knows about complex. You'd have to do

DATA p+0(SB)/8, $(1.0)
DATA p+8(SB)/8, $(1.0)
GLOBL p(SB), $16

The two DATA statements each lay out a single float64, and the GLOBL
defines p to be 16 bytes long. You'd have to change ld to align any
data >= 16 bytes long on a 16-byte boundary.

Russ

Charlie Dorian

unread,
Jun 21, 2012, 3:22:19 PM6/21/12
to golan...@googlegroups.com, Charlie Dorian, r...@golang.org
Thanks.

In addition to GLOBL p+0(SB),$16, there seems to be another form, e.g., GLOBL p+0(SB),16,$16. What does the middle number do? In cmd/6a/a.y, it says, "$$.from.scale = $3;", but I don't know bison well enough to trace it further.

(What I'm actually doing is calculating two similar polynomials in parallel, in the upper and lower halves of the SSE2 registers. It works, and I'd like to ensure that the coefficient arrays start on 16-byte boundaries. There's a speed-up [on my 2.53 GHz Core 2 Duo, xatan drops to 11 ns from 15 ns] roughly reflecting the reduction in the number of instructions.)

On Thursday, June 21, 2012 1:21:45 AM UTC-4, Russ Cox wrote:

Anthony Martin

unread,
Jun 21, 2012, 4:14:48 PM6/21/12
to Charlie Dorian, golan...@googlegroups.com, r...@golang.org
Charlie Dorian <cldo...@gmail.com> once said:
> In addition to GLOBL p+0(SB),$16, there seems to be another form, e.g.,
> GLOBL p+0(SB),16,$16. What does the middle number do? In cmd/6a/a.y, it
> says, "$$.from.scale = $3;", but I don't know bison well enough to trace it
> further.

The middle number is a bit field for passing
options to the linker. It reads better in hex:
16 is 0x10 which means the NOPTR flag is set.

$ cd src/cmd
$ grep '[^S]NOPTR' 6?/*.[ch]
6g/gsubr.c: p->from.scale |= NOPTR;
6l/6.out.h:#define NOPTR (1<<4)
6l/obj.c: else if(p->from.scale & NOPTR)
$

Cheers,
Anthony

Jesse van den Kieboom

unread,
Aug 9, 2013, 3:25:05 PM8/9/13
to golan...@googlegroups.com, Charlie Dorian, r...@golang.org
On Friday, June 15, 2012 9:17:45 PM UTC+2, Russ Cox wrote:
You'd have to arrange that not just for array of pairs of float64, but for any complex128 in any data structure. You'd have to extend the alignment logic at the bottom of ld/data.c and also set t->align = 16 in case TCOMPLEX128 in gc/align.c's dowidth. I think that would be enough.

I've been recently interested in having some data 16 byte aligned, for the purpose of using SSE instructions requiring 16 byte alignment (e.g. MOVAPS). It's more academical at this point than anything else, but when I tried to increase the alignment of TCOMPLEX128 (and increasing the check on max size in rnd), I get a fatal error in reflect:

panic: reflect: Bits of non-arithmetic Type.
goroutine 1 [running]:
reflect.(*rtype).Bits(0x210469150, 0x41)
/Users/jesse/go/src/pkg/reflect/type.go:460 +0x105

Any pointers towards fixing this?

Dmitry Vyukov

unread,
Aug 10, 2013, 6:16:02 AM8/10/13
to Jesse van den Kieboom, golang-nuts, Charlie Dorian, Russ Cox
Allocate your data on heap, it will be 16-byte aligned.
SSE support is a deliberate feature of memory allocator.





--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jesse van den Kieboom

unread,
Aug 12, 2013, 5:37:33 AM8/12/13
to golan...@googlegroups.com, Jesse van den Kieboom, Charlie Dorian, Russ Cox
On Saturday, August 10, 2013 12:16:02 PM UTC+2, Dmitry Vyukov wrote:
Allocate your data on heap, it will be 16-byte aligned.
SSE support is a deliberate feature of memory allocator.

Ok, that seems to work. However, if you are developing a library you can't really guarantee that someone is going to call your function with memory allocated on the heap, right? Also, the go spec does not really differentiate between what goes on the heap and what goes on the stack. As far as I understand it, code should be agnostic about stack vs. heap.

Dmitry Vyukov

unread,
Aug 12, 2013, 6:22:36 AM8/12/13
to Jesse van den Kieboom, golang-nuts, Charlie Dorian, Russ Cox
On Mon, Aug 12, 2013 at 1:37 PM, Jesse van den Kieboom <jess...@gmail.com> wrote:
On Saturday, August 10, 2013 12:16:02 PM UTC+2, Dmitry Vyukov wrote:
Allocate your data on heap, it will be 16-byte aligned.
SSE support is a deliberate feature of memory allocator.

Ok, that seems to work. However, if you are developing a library you can't really guarantee that someone is going to call your function with memory allocated on the heap, right? Also, the go spec does not really differentiate between what goes on the heap and what goes on the stack. As far as I understand it, code should be agnostic about stack vs. heap.


Yes, that's true.

But SSE can operate on uint8 vectors, so total SSE support would require bumping alignment of everything to 16 bytes (and 32 bytes in future).

If memory goes into assembly routines, it must be heap allocated. Probably you can just assert proper alignment and document it as a requirement for calling your code.

bdtex...@gmail.com

unread,
Jan 3, 2015, 9:33:40 PM1/3/15
to golan...@googlegroups.com, jess...@gmail.com, cldo...@gmail.com, r...@golang.org
Just a note for future readers:

Slicing can make alignment difficult. For example, even if the slice a := []float64{stuff}  is 16 or 32 bit aligned, the variable b :=  a[1:] is not. We ran into this issue in gonum attempting to make dot-product and matrix-multiply use SSE instructions.

Rob Pike

unread,
Jan 4, 2015, 4:24:09 PM1/4/15
to bdtex...@gmail.com, golan...@googlegroups.com, jess...@gmail.com, cldo...@gmail.com, Russ Cox
I don't understand. If the first n*m-sized element of the array is n-aligned, any subsequent element is n-aligned. In this case, n is 16 or 32, m is 4 or 2.

-rob


On Sun, Jan 4, 2015 at 1:33 PM, <bdtex...@gmail.com> wrote:
Just a note for future readers:

Slicing can make alignment difficult. For example, even if the slice a := []float64{stuff}  is 16 or 32 bit aligned, the variable b :=  a[1:] is not. We ran into this issue in gonum attempting to make dot-product and matrix-multiply use SSE instructions.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fazlul Shahriar

unread,
Jan 4, 2015, 5:44:56 PM1/4/15
to Rob Pike, bdtex...@gmail.com, golan...@googlegroups.com, jess...@gmail.com, Charlie Dorian, Russ Cox
He meant 16 or 32 bytes aligned, not bits.

Rob Pike

unread,
Jan 4, 2015, 6:04:33 PM1/4/15
to Fazlul Shahriar, bdtex...@gmail.com, golan...@googlegroups.com, Jesse van den Kieboom, Charlie Dorian, Russ Cox
Oh. Well in that case, what did he expect to happen?

-rob

minux

unread,
Jan 4, 2015, 7:05:32 PM1/4/15
to golang-nuts
On Sat, Jan 3, 2015 at 9:33 PM, <bdtex...@gmail.com> wrote:
Just a note for future readers:

Slicing can make alignment difficult. For example, even if the slice a := []float64{stuff}  is 16 or 32 bit aligned, the variable b :=  a[1:] is not. We ran into this issue in gonum attempting to make dot-product and matrix-multiply use SSE instructions.
I don't understand why this is worthy a note.

Given a is []float64 and &a[0] is 16-byte or 32-byte aligned, and each float64 is 8-byte,
it's quite obvious that &a[1] won't be 16-byte or 32-byte aligned anymore (its address
must be 8 mod 16)

Am I missing something?

Dan Kortschak

unread,
Jan 4, 2015, 7:11:26 PM1/4/15
to minux, golang-nuts
On Sun, 2015-01-04 at 19:04 -0500, minux wrote:
> I don't understand why this is worthy a note.

That could be said of many of the things that get posted here.

> Am I missing something?
>
It seems he was trying to help other people who might also get caught by
this - people are at a wide variety of learning stages here.

minux

unread,
Jan 4, 2015, 7:29:20 PM1/4/15
to Dan Kortschak, golang-nuts
OK, sorry, I probably have overreacted. I was surprised by the fact that
the OP cc'ed rsc, and was incorrectly assuming that he tried to describe
a runtime bug or something.

bdtex...@gmail.com

unread,
Jan 5, 2015, 11:01:21 PM1/5/15
to golan...@googlegroups.com, dan.ko...@adelaide.edu.au
Sorry, I used the 'reply all' button on the thread (that started 2 years ago). It's not surprising behavior based on the go spec, nor any form of bug. I was just trying to note for future readers that while []float64 are allocated to be useful for SSE, as a package designer I cannot assume than a []float64 coming into my function is so aligned. I thought it would be a useful note to people considering SSE who are looking for possible implications with the rest of the Go ecosystem. Sorry for raising extra alarm bells unnecessarily.

minux

unread,
Jan 5, 2015, 11:56:57 PM1/5/15
to bdtex...@gmail.com, golang-nuts, Dan Kortschak
On Mon, Jan 5, 2015 at 11:01 PM, <bdtex...@gmail.com> wrote:
Sorry, I used the 'reply all' button on the thread (that started 2 years ago). It's not surprising behavior based on the go spec, nor any form of bug. I was just trying to note for future readers that while []float64 are allocated to be useful for SSE, as a package designer I cannot assume than a []float64 coming into my function is so aligned. I thought it would be a useful note to people considering SSE who are looking for possible implications with the rest of the Go ecosystem. Sorry for raising extra alarm bells unnecessarily.
OK, now I understand your points more clearly.

Essentially, you are saying that if you have a function taking args []float64, you can't assume
&args[0] is aligned to more than 8-byte.

Yes, that's true, and if you want to use SSE, either use unaligned loads always, or detect
alignment and write code to handle both alignment conditions.

I think it's the case for other languages, though. (e.g. in C/C++, if you get a float64 *x, you
can't assume x is properly aligned to 16-byte. In fact, the problem is even worse in C/C++,
because technically, malloc doesn't need to guarantee 16-byte alignment if the maximum
alignment for C types is only 8-byte; but memory allocators in Go does guarantee 16-byte
alignment if the object is > 8 byte but <= 16-byte, and 32-byte alignment if the object is
larger than 16-byte)

Ian Lance Taylor

unread,
Jan 6, 2015, 10:27:30 AM1/6/15
to minux, bdtex...@gmail.com, golang-nuts, Dan Kortschak
On Mon, Jan 5, 2015 at 8:56 PM, minux <mi...@golang.org> wrote:
>
> I think it's the case for other languages, though. (e.g. in C/C++, if you
> get a float64 *x, you
> can't assume x is properly aligned to 16-byte. In fact, the problem is even
> worse in C/C++,
> because technically, malloc doesn't need to guarantee 16-byte alignment if
> the maximum
> alignment for C types is only 8-byte; but memory allocators in Go does
> guarantee 16-byte
> alignment if the object is > 8 byte but <= 16-byte, and 32-byte alignment if
> the object is
> larger than 16-byte)

All true but unimportant in practice. People who want to write their
own vectorized code in C use the aligned attribute on their variables
and use posix_memalign to dynamically allocate memory. Compilers like
GCC that support auto-vectorization force alignment of variables as
required. When alignment can not be forced, GCC generates code to
test the alignment at runtime and use the auto-vectorized path only
when possible.

Ian
Reply all
Reply to author
Forward
0 new messages