[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

235 views
Skip to first unread message

Liu, Chen3 via llvm-dev

unread,
Apr 14, 2021, 2:07:03 PM4/14/21
to llvm...@lists.llvm.org, Luo, Yuanke, Maslov, Sergey V

Hi all.

 

We want to make a patch to always emit unaligned vector move instructions on AVX machine with option control. We do this for the following reason:

 

  1. With AVX the performance for aligned vector move and unaligned vector move on X86 are the same if the address is aligned. In this case we prefer to use unaligned move because it can avoid some run time exceptions;
  2. This fixes an inconsistency in optimization: suppose a load operation was merged into another instruction (e.g., load and add becomes `add [memop]'). If a misaligned pointer is passed to the two-instruction sequence, it will

raise an exception. If the same pointer is passed to the memop instruction, it will work. Thus, the behavior of misalignment depends upon what optimization levels and passes are applied, and small source changes could cause

issues to appear and disappear. It's better for the user to consistently use unaligned load/store to improve the debug experience;

  1. Makes good use of HW that is capable of handling misaligned data gracefully. It is not necessarily a bug in users code but a third-part library. For example it would allow using a library built in old ages where stack alignment was 4-byte only.
  2. Compatible with ICC so that users can easily use llvm;

 

Roman Lebedev is worried that this patch will hide UB. In our opinions, UB doesn't have to mean raise an exception. The example code(https://godbolt.org/z/43bYPraoa) does have UB behavior but it is still valid (and reasonable) to interpret that UB as `go slower',

instead of `raise exception'. Besides, as default we still emit aligned instructions as before,  but we provide an option for users with this need.

 

We have two patches discussing this issue, one of which has been abandoned:

https://reviews.llvm.org/D88396 (abandoned)

https://reviews.llvm.org/D99565 (in review)

 

Thanks.

Chen Liu.

James Y Knight via llvm-dev

unread,
Apr 14, 2021, 2:58:24 PM4/14/21
to Liu, Chen3, llvm...@lists.llvm.org, Luo, Yuanke, Maslov, Sergey V
This is not a principled change -- it avoids a problem arising from one use of alignment information, but there are other uses of alignment in LLVM, and those will still cause problems, potentially less clearly. So, I think that this will not be a useful option to provide to users, in this form.

What I suspect you actually want here is an option to tell Clang not to infer load/store alignments based on object types or alignment attributes -- instead treating everything as being potentially aligned to 1 unless the allocation is seen (e.g. global/local variables). Clang would still need to use the usual alignment computation for variable definitions and structure layout, but not memory operations. If clang emits "load ... align 1" instructions in LLVM IR, the right thing would then happen in the X86 backend automatically.

My initial inclination is that this feature is also not particularly worthwhile to implement, but I'm open to being convinced that this is indeed valuable enough to be worthwhile. It should actually work reliably, and is somewhat in line with other such "not-quite-C" flags we provide (e.g. -fno-delete-null-pointer-checks). Of course, even with such an implementation, you can still have a problem with user code depending on alignof() returning a reliable answer (e.g., llvm::PointerUnion). Not much can be done about that.


_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Craig Topper via llvm-dev

unread,
Apr 14, 2021, 3:42:05 PM4/14/21
to James Y Knight, llvm...@lists.llvm.org, Liu, Chen3, Luo, Yuanke, Maslov, Sergey V
When would you recommend a user should use this flag as proposed? Anytime they moved code from icc? Or after they encounter an exception, should they use this flag to get rid of the exception rather than using tools like ubsan to find the bug in their code? Where would we document recommendations so that users know the tradeoffs and risks?

Your patch is only doing this for AVX and not SSE because folding loads requires alignment with SSE, but not AVX. So users that need to support non-AVX CPUs have to fix their bugs and can't sweep them away with this flag.

I'm somewhat ok with unconditionally using unaligned instructions on AVX because then there is no command line option to explain to users. But then there's probably a group of people out there that want the alignment check.

~Craig

Philip Reames via llvm-dev

unread,
Apr 14, 2021, 6:43:56 PM4/14/21
to James Y Knight, Liu, Chen3, llvm...@lists.llvm.org, Luo, Yuanke, Maslov, Sergey V

+1 to what James said.  My reaction to the original proposal is a strong -1, and James did a good job of explaining why.

Philip

Liu, Chen3 via llvm-dev

unread,
Apr 15, 2021, 4:43:43 AM4/15/21
to James Y Knight, llvm...@lists.llvm.org, Luo, Yuanke, Maslov, Sergey V

Hi, James Y Knight.

 

I'm not sure if you misunderstood this patch. This patch won’t change any alignment information in IR and MI, which means ‘load…align 32’ will always keep the alignment information but select ‘vmovups’ instead of ‘vmovaps’ during ISEL. It can be simply considered that the only thing this patch does is to replace the aligned-move mnemonic with the unaligned-move mnemonic (in fact, we shouldn’t call it replace but emit unaligned). I think there is no impact on optimization or code layout.

 

After discussion, we think this option more like changing the behavior when process with unaligned memory: raising exception or accepting performance degradation.  Maybe the option is more like “no-exception-on-unalginedmem”. We do have some users want this feature. They can accept “run slow” but do not want exception.

 

Thanks.

Chen Liu.

James Y Knight via llvm-dev

unread,
Apr 15, 2021, 9:09:55 AM4/15/21
to Liu, Chen3, llvm...@lists.llvm.org, Luo, Yuanke, Maslov, Sergey V
On Thu, Apr 15, 2021 at 4:43 AM Liu, Chen3 <chen...@intel.com> wrote:

Hi, James Y Knight.

 

I'm not sure if you misunderstood this patch. This patch won’t change any alignment information in IR and MI, which means ‘load…align 32’ will always keep the alignment information but select ‘vmovups’ instead of ‘vmovaps’ during ISEL. It can be simply considered that the only thing this patch does is to replace the aligned-move mnemonic with the unaligned-move mnemonic (in fact, we shouldn’t call it replace but emit unaligned). I think there is no impact on optimization or code layout.


Yes -- I understood that, and that is exactly why this patch is not OK. Giving LLVM incorrect information about the alignment of objects causes problems other than just the emission of movaps instructions -- that alignment information is correct gets relied upon throughout the optimization pipeline.

So, a command-line option to "fix" only that one instruction is not something which we can reasonably provide, because it will not reliably fix users' problems. A program which is being "mis"-compiled due to the use of misaligned objects might still be miscompiled by LLVM when using your proposed patch. ("mis" in quotes, since the compiler is correctly compiling the code according to the standard, even if not according to the user's expectations).

The second paragraph of my original email describes an alternative patch that you could write, which would reliably fix such miscompilation -- effectively creating a variant of C where creating and accessing misaligned objects has fully defined behavior. (And, just to reiterate, my initial feeling is that creating such an option is not a worthwhile endeavor, but I could be persuaded otherwise.)

Luo, Yuanke via llvm-dev

unread,
Apr 15, 2021, 10:08:06 AM4/15/21
to James Y Knight, Liu, Chen3, llvm...@lists.llvm.org, Maslov, Sergey V

Yes, replacing aligned move instruction with unaligned move instruction doesn’t solve all the issue that happens in optimization pipeline, but it doesn’t make things worse. One advantage for unaligned move is that it makes the behavior the same no matter the mov instruction is folded or not. Do you think it is worth to support this feature if compiler can help users avoid changing their complex legacy code?

 

Thanks

Yuanke

Philip Reames via llvm-dev

unread,
Apr 15, 2021, 11:37:07 AM4/15/21
to Luo, Yuanke, James Y Knight, Liu, Chen3, llvm...@lists.llvm.org, Maslov, Sergey V

IMO, no.  We should encourage sanitizers instead.

From experience, any code base where porting trips across this probably also has a bunch of other undefined behavior which is causing less obvious miscompiles, and also need found and fixed.  That's why we have sanitizers. 

Philip

via llvm-dev

unread,
Apr 15, 2021, 11:55:07 AM4/15/21
to yuank...@intel.com, jykn...@google.com, chen...@intel.com, llvm...@lists.llvm.org, sergey....@intel.com

I’ve debated whether to chime in, and decided it can’t hurt.

 

Sony had to do a similar downstream patch for PS4.  Our use-case is pretty constrained, though.  There’s only one toolchain, there’s only one target chip, so we don’t have any portability considerations to think about.  What we do have are games shipping on DVD that can’t be re-released and can’t even necessarily be patched, and a strict backward compatibility requirement.  So, if there’s a game out there that didn’t happen to follow all the alignment requirements, and it worked on console version 1.00, it still has to be working on version 100.00.  (FTR, we’re currently on about 8.00.)

 

I don’t think we ever seriously considered upstreaming our patch.  The circumstances where it’s really necessary do exist, but are pretty limited.

 

I don’t think arguments of the form “it’s okay because X Y and Z” are going to be persuasive.  “We have this situation in the following circumstances” might help people understand.

--paulr

 

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Luo, Yuanke via llvm-dev
Sent: Thursday, April 15, 2021 10:07 AM
To: James Y Knight <jykn...@google.com>; Liu, Chen3 <chen...@intel.com>
Cc: llvm...@lists.llvm.org; Maslov, Sergey V <sergey....@intel.com>
Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

 

Yes, replacing aligned move instruction with unaligned move instruction doesn’t solve all the issue that happens in optimization pipeline, but it doesn’t make things worse. One advantage for unaligned move is that it makes the behavior the same no matter the mov instruction is folded or not. Do you think it is worth to support this feature if compiler can help users avoid changing their complex legacy code?

 

Thanks

Yuanke

 

Craig Topper via llvm-dev

unread,
Apr 15, 2021, 12:12:19 PM4/15/21
to Robinson, Paul, Luo, Yuanke, Liu, Chen3, Maslov, Sergey V, llvm-dev
Paul, was your patch attached to a command line option, if so what was the default, or did you just always use unaligned instructions?

~Craig


_______________________________________________

James Y Knight via llvm-dev

unread,
Apr 15, 2021, 12:23:45 PM4/15/21
to Luo, Yuanke, llvm...@lists.llvm.org, Liu, Chen3, Maslov, Sergey V
I believe strongly that we should not add an option which makes it sound like it makes unaligned access work, when we know for a fact that optimization passes make use of the alignment information and will also break such misaligned-object-using code. Worse, we also can predict that even more such optimizations will be added in future versions of llvm, and break such code more. Offering such an option which seems like it would do what they want, but which doesn't actually, is a perfect recipe for creating unhappy users.

That's why I've been saying over and over that if we do end up providing some "make unaligned access work" option, it needs to make it actually work, reliably, both now and in the future.

via llvm-dev

unread,
Apr 15, 2021, 12:27:58 PM4/15/21
to craig....@gmail.com, yuank...@intel.com, chen...@intel.com, sergey....@intel.com, llvm...@lists.llvm.org

We conditioned it on the PS4 target; no option.  So, PS4 consistently uses unaligned instructions for (temporal) vector load/store.

We get occasional downstream test failures because of this, which we deal with.

--paulr

Reid Kleckner via llvm-dev

unread,
Apr 15, 2021, 12:58:55 PM4/15/21
to James Y Knight, llvm...@lists.llvm.org, Liu, Chen3, Luo, Yuanke, Maslov, Sergey V
On Wed, Apr 14, 2021 at 11:58 AM James Y Knight via llvm-dev <llvm...@lists.llvm.org> wrote:
What I suspect you actually want here is an option to tell Clang not to infer load/store alignments based on object types or alignment attributes -- instead treating everything as being potentially aligned to 1 unless the allocation is seen (e.g. global/local variables). Clang would still need to use the usual alignment computation for variable definitions and structure layout, but not memory operations. If clang emits "load ... align 1" instructions in LLVM IR, the right thing would then happen in the X86 backend automatically.

This sounds like the -fmax-type-align flag:
Explicit alignment attributes are still honored, so some aligned vector instructions may be generated. However, the documentation describes essentially this exact use case.

via llvm-dev

unread,
Apr 15, 2021, 2:54:36 PM4/15/21
to r...@google.com, jykn...@google.com, yuank...@intel.com, chen...@intel.com, llvm...@lists.llvm.org, sergey....@intel.com

| This sounds like the -fmax-type-align flag:

 

Well, no, at least not for the PS4 case.  In our case, the type had an alignment attribute but the caller didn’t make sure the allocated memory was aligned properly.  The -fmax-type-align flag explicitly doesn’t do anything in that case, if I’m reading it correctly.  (Yes, it’s a bug.  Yes, sanitizers or other testing could have found it.  No, there is no opportunity to do any of the things that would have fixed it correctly.)

 

Really what we did was effectively this:  Pretend X86 doesn’t have a VMOVAPS opcode.  That’s all.  Nothing about memory/operand alignment attributes was modified, IR is unchanged.  Pretend that one machine opcode is missing.  Can’t possibly affect anything about IR optimizations, *maybe* something post-ISel would be different but even that is hard to imagine.  (As best I can remember, the only test updates we had to make were to change things like “vmovaps” to “vmov{{u|a}}ps” and done.)  It’s like we did s/movaps/movups/g on the assembly output.

 

I still can’t say I think it should be appropriate to do upstream—no real info yet on Intel’s problem case--but I hope this explains why the bigger hammer (i.e., get Clang involved) doesn’t seem necessary or appropriate.

--paulr

 

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Reid Kleckner via llvm-dev
Sent: Thursday, April 15, 2021 12:59 PM
To: James Y Knight <jykn...@google.com>
Cc: llvm...@lists.llvm.org; Liu, Chen3 <chen...@intel.com>; Luo, Yuanke <yuank...@intel.com>; Maslov, Sergey V <sergey....@intel.com>
Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

 

On Wed, Apr 14, 2021 at 11:58 AM James Y Knight via llvm-dev <llvm...@lists.llvm.org> wrote:

Reid Kleckner via llvm-dev

unread,
Apr 15, 2021, 4:09:41 PM4/15/21
to Robinson, Paul, Luo, Yuanke, Liu, Chen3, Maslov, Sergey V, llvm-dev
Right, I get that this doesn't match what you are doing for PS4, and it doesn't match what Chen3 Liu proposed. To James's point, the -fmax-type-align flag is more principled because it powers down all the other LLVM optimizations that assume aligned pointers have zeros in the low bits.

As for how to handle explicit alignment attributes that don't come from type information, maybe we could revisit that behavior, or conditionalize it with a flag. I just mean to say that there is prior art for this direction. We should continue in the direction of a complete solution from the frontend, rather than adding a workaround in the backend.

Craig Topper via llvm-dev

unread,
Apr 15, 2021, 4:50:50 PM4/15/21
to Reid Kleckner, Luo, Yuanke, Liu, Chen3, llvm-dev, Maslov, Sergey V
What if we didn't use aligned instructions by default like what PS4 did. And then had a command line option that would "enable alignment exceptions" if someone wants them. Maybe that option should also disable memory folding since memory folding never checks alignment with AVX? Do other targets that have vectors have alignment exceptions like this? We're not obligated to emit code that detects alignment errors. And we already don't if the load gets folded. It seems the problem with the current proposal is that once you have the exception, setting a flag to make it go away is the wrong response.

~Craig


_______________________________________________

via llvm-dev

unread,
Apr 15, 2021, 6:03:39 PM4/15/21
to craig....@gmail.com, r...@google.com, yuank...@intel.com, chen...@intel.com, llvm...@lists.llvm.org, sergey....@intel.com

Reid, I’m not clear why anyone would want to “power down” the alignment-aware optimizations?  How does that benefit anyone?  For example…

 

Let’s postulate a target that has only non-trapping load/store instructions; maybe they go faster on aligned addresses, but don’t trap on unaligned addresses.  It has been a few decades but I think VAX worked this way.

Would you insist we should power-down the alignment-aware optimizations for this target?  Just because the hardware couldn’t require aligned data?  I hope not.

 

The conclusion must be, then, that there is no relationship between the existence of trapping/non-trapping instruction behavior for a given target, and how the frontend and middle-end should behave.

 

Therefore, we can’t insist on the front-end slapping “align 1” on everything just because the target doesn’t trap a non-aligned load.

 

Therefore, the choice of trapping/non-trapping instruction behavior in the X86 target specifically, has no necessary relationship to how alignment is thought of in the front-end/middle-end.

 

HTH,

--paulr

Maslov, Sergey V via llvm-dev

unread,
Apr 15, 2021, 6:38:35 PM4/15/21
to Reid Kleckner, Robinson, Paul, Luo, Yuanke, Liu, Chen3, llvm-dev

Before we completely float away from adding this to LLVM, consider that MSVC behavior is already like ICC: https://godbolt.org/z/o4eaqGv9v

And GCC folks are saying they could add an option for compatibility.

The Intel’s cases where this is needed is exactly for robustness of interoperability with the already released software.

 

From: Reid Kleckner <r...@google.com>
Sent: Thursday, April 15, 2021 1:09 PM
To: Robinson, Paul <paul.r...@sony.com>

Cc: James Y Knight <jykn...@google.com>; Liu, Chen3 <chen...@intel.com>; Luo, Yuanke <yuank...@intel.com>; Maslov, Sergey V <sergey....@intel.com>; llvm-dev <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

James Y Knight via llvm-dev

unread,
Apr 16, 2021, 5:35:59 PM4/16/21
to Robinson, Paul, Liu, Chen3, Luo, Yuanke, llvm-dev, Maslov, Sergey V
On Thu, Apr 15, 2021, 6:03 PM via llvm-dev <llvm...@lists.llvm.org> wrote:

Reid, I’m not clear why anyone would want to “power down” the alignment-aware optimizations?  How does that benefit anyone?  For example…

 

Let’s postulate a target that has only non-trapping load/store instructions; maybe they go faster on aligned addresses, but don’t trap on unaligned addresses.  It has been a few decades but I think VAX worked this way.

Would you insist we should power-down the alignment-aware optimizations for this target?  Just because the hardware couldn’t require aligned data?  I hope not.


The conclusion must be, then, that there is no relationship between the existence of trapping/non-trapping instruction behavior for a given target, and how the frontend and middle-end should behave.

 

Therefore, we can’t insist on the front-end slapping “align 1” on everything just because the target doesn’t trap a non-aligned load.


Certainly, it's entirely valid for a target to not trap on an unaligned load. We have many such targets. A target trapping on misaligned loads isn't a required feature. (If users want to reliably diagnose misalignment bugs, -fsanitize=alignment is the way to do so.)

Therefore, the choice of trapping/non-trapping instruction behavior in the X86 target specifically, has no necessary relationship to how alignment is thought of in the front-end/middle-end.


If the proposal here had been: "We should switch X86 from using movaps (alignment-checking) to movups (non-alignment-checking), because movups has a smaller encoding size (or is faster to execute on new microarchitectures, or ...), there'd be no problem.

But, that is not what's being proposed here. This proposal is to switch to movups as a workaround for software that has undefined behavior due to misaligned objects. That is misguided, because the proposed change does not fix such code! That the movaps instruction traps in such programs is like a proverbial "canary in a coal mine". It's a result of your program containing alignment-related UB. Removing the canary prevents you from having a dead canary, but it doesn't prevent the mine from exploding.

I have the feeling folks aren't understanding what exactly I'm talking about w.r.t. alignment-related breakage. There's at least three things LLVM can do with alignment information today.
1. Most obviously, it allows generation of hardware load instructions that require a certain alignment (MOVAPS on X86, LDM on ARM, etc.).
2. It enables known-bits analysis on pointers: "ptr & 0x3" is optimized to 0 if ptr is known to have alignment >= 4. Example: `int foo(int& x) { return ((uintptr_t)&x) & 0x3; }` 
3. It can assist with alias analysis: if both addr1 and addr2 have align 8, then a 4-byte load from (addr1 + 0) cannot possibly alias a 4-byte load from (addr2 + 4). This is true even without TBAA, and even if know nothing else about the relationship between addr1 and addr2. (I don't have an example of this -- it looks like llvm may not be doing as good a job here as it could, but I definitely recall reading code which purported to implement this.)

The initial proposal only addresses the first issue, leaving users who depend on this are in an extremely precarious position -- liable to be broken by any future optimization improvement.

James Y Knight via llvm-dev

unread,
Apr 16, 2021, 5:59:48 PM4/16/21
to Reid Kleckner, llvm...@lists.llvm.org, Liu, Chen3, Luo, Yuanke, Maslov, Sergey V
Wow, thanks! Somehow I've missed that this flag has existed all this time. ISTM that it would be reasonable to modify -fmax-type-align to override even an explicit alignment attribute on the type (or typedef).

It looks like -fmax-type-align is barely used in the wild, except that -fmax-type-align=16 is _default_ for Darwin platforms (since commit bcd82afad64a22b15000de350d075b10f2de273a). It's unclear to me what purpose that default is really serving, however, given that the only types with greater "native" alignment than 16 are vector types, and typically used vector typedefs already have an alignment specified, such as `typedef float __m256 __attribute__ ((__vector_size__ (32), __aligned__(32)));`. So the most-commonly-used vector types are exempted from the effect of the flag, anyways...

Wang, Pengfei via llvm-dev

unread,
Apr 16, 2021, 10:23:26 PM4/16/21
to James Y Knight, Robinson, Paul, Luo, Yuanke, Liu, Chen3, llvm-dev, Maslov, Sergey V
  • If the proposal here had been: "We should switch X86 from using movaps (alignment-checking) to movups (non-alignment-checking), because movups has a smaller encoding size (or is faster to execute on new microarchitectures, or ...), there'd be no problem. … This proposal is to switch to movups as a workaround for software that has undefined behavior due to misaligned objects.

I think we can consider movaps is a limitation for legacy microarchitectures which gets better performance for aligned memory load/store. It does to be feature for new microarchitectures, i.e. movups is faster to execute on new microarchitectures when aligned.

  • The initial proposal only addresses the first issue, leaving users who depend on this are in an extremely precarious position -- liable to be broken by any future optimization improvement.

If users depend on exceptions on alignment tricks, they should explicitly use proposed option like “-exception-on-unalginedmem”, which is not only keep to use movaps but also block existing memory folding. Does it make more sense?

 

Thanks

Pengfei

 

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of James Y Knight via llvm-dev
Sent: Saturday, April 17, 2021 5:35 AM
To: Robinson, Paul <paul.r...@sony.com>

via llvm-dev

unread,
Apr 19, 2021, 10:48:10 AM4/19/21
to jykn...@google.com, chen...@intel.com, yuank...@intel.com, llvm...@lists.llvm.org, sergey....@intel.com
|
> If the proposal here had been: "We should switch X86 from using
> movaps (alignment-checking) to movups (non-alignment-checking),
> because movups has a smaller encoding size (or is faster to
> execute on new microarchitectures, or ...), there'd be no problem.
>
> But, that is not what's being proposed here. This proposal is to
> switch to movups as a workaround for software that has undefined
> behavior due to misaligned objects. That is misguided, because
> the proposed change does not fix such code! That the movaps
> instruction traps in such programs is like a proverbial "canary
> in a coal mine". It's a result of your program containing
> alignment-related UB. Removing the canary prevents you from
> having a dead canary, but it doesn't prevent the mine from
> exploding.

Hi James,

It's apparent from your reply that you misunderstand one thing:
The mine has *already* exploded.

I still don't know exactly what Intel is facing, but at Sony we
have games already shipped that CANNOT BE FIXED because they are
embedded in DVD. It is literally physically impossible to fix the
buggy software, and we have a moral contract with users that their
games will continue to run on all future releases of the console.

I understand your goal is to find and fix bugs in software that is
still under development and CAN be fixed. I fully endorse that
goal. However, that is not the situation that Sony has, and likely
not what Intel has. Your proposal will NOT solve our problem.

HTH,
--paulr

Roman Lebedev via llvm-dev

unread,
Apr 19, 2021, 10:58:24 AM4/19/21
to Robinson, Paul, Yuanke, chen...@intel.com, sergey....@intel.com, llvm...@lists.llvm.org
On Mon, Apr 19, 2021 at 5:48 PM via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> |
> > If the proposal here had been: "We should switch X86 from using
> > movaps (alignment-checking) to movups (non-alignment-checking),
> > because movups has a smaller encoding size (or is faster to
> > execute on new microarchitectures, or ...), there'd be no problem.
> >
> > But, that is not what's being proposed here. This proposal is to
> > switch to movups as a workaround for software that has undefined
> > behavior due to misaligned objects. That is misguided, because
> > the proposed change does not fix such code! That the movaps
> > instruction traps in such programs is like a proverbial "canary
> > in a coal mine". It's a result of your program containing
> > alignment-related UB. Removing the canary prevents you from
> > having a dead canary, but it doesn't prevent the mine from
> > exploding.
>
> Hi James,
>
> It's apparent from your reply that you misunderstand one thing:
> The mine has *already* exploded.

> I still don't know exactly what Intel is facing, but at Sony we
> have games already shipped that CANNOT BE FIXED because they are
> embedded in DVD. It is literally physically impossible to fix the
> buggy software, and we have a moral contract with users that their
> games will continue to run on all future releases of the console.

Are they JIT'ed? If not, i'm not really sure how this change
to the X86 backend would retroactively "fix" already-compiled code.

> I understand your goal is to find and fix bugs in software that is
> still under development and CAN be fixed. I fully endorse that
> goal. However, that is not the situation that Sony has, and likely
> not what Intel has. Your proposal will NOT solve our problem.
>
> HTH,
> --paulr

Roman

via llvm-dev

unread,
Apr 19, 2021, 11:02:36 AM4/19/21
to lebed...@gmail.com, yuank...@intel.com, chen...@intel.com, sergey....@intel.com, llvm...@lists.llvm.org
> > Hi James,
> >
> > It's apparent from your reply that you misunderstand one thing:
> > The mine has *already* exploded.
>
> > I still don't know exactly what Intel is facing, but at Sony we
> > have games already shipped that CANNOT BE FIXED because they are
> > embedded in DVD. It is literally physically impossible to fix the
> > buggy software, and we have a moral contract with users that their
> > games will continue to run on all future releases of the console.
> Are they JIT'ed? If not, i'm not really sure how this change
> to the X86 backend would retroactively "fix" already-compiled code.

No; the actual problem is that buggy game code uses a type that is
tagged as 32-byte aligned but allocates data that is 16-byte aligned.
The problem is when the (immutable) game calls (updated) system
software that expects 32-byte alignment, and doesn't get it.

The backend change allows our system software not to trap on the
misaligned data that the caller gives to it.
--paulr

Luo, Yuanke via llvm-dev

unread,
Apr 19, 2021, 11:08:52 AM4/19/21
to paul.r...@sony.com, lebed...@gmail.com, llvm...@lists.llvm.org, Liu, Chen3, Towner, Daniel, Maslov, Sergey V
So the application software is unchangeable, right?

via llvm-dev

unread,
Apr 19, 2021, 11:20:39 AM4/19/21
to yuank...@intel.com, lebed...@gmail.com, llvm...@lists.llvm.org, chen...@intel.com, daniel...@intel.com, sergey....@intel.com
> So the application software is unchangeable, right?

Exactly right. The application software works fine within itself.

The system software, which we update roughly twice a year, also
accepted the misaligned data, until Clang was modified to emit the
aligned (trapping) opcodes. We had to fix that so the system
software would continue to allow the (buggy but unchangeable)
application software to continue to work.

Yes, it is indeed the case that we can update the system software
but not the game software. I think it would be a distraction to
spell out the scenarios but please accept that it is the case.

Philip Reames via llvm-dev

unread,
Apr 19, 2021, 12:22:00 PM4/19/21
to paul.r...@sony.com, yuank...@intel.com, lebed...@gmail.com, llvm...@lists.llvm.org, chen...@intel.com, daniel...@intel.com, sergey....@intel.com

On 4/19/21 8:20 AM, via llvm-dev wrote:
>> So the application software is unchangeable, right?
> Exactly right. The application software works fine within itself.
>
> The system software, which we update roughly twice a year, also
> accepted the misaligned data, until Clang was modified to emit the
> aligned (trapping) opcodes. We had to fix that so the system
> software would continue to allow the (buggy but unchangeable)
> application software to continue to work.
>
> Yes, it is indeed the case that we can update the system software
> but not the game software. I think it would be a distraction to
> spell out the scenarios but please accept that it is the case.

Out of curiosity, why do you solve this in the backend rather than
patching your frontend/headers to not specify alignment?  To my
knowledge (which isn't great in this area), alignment of pointer
arguments isn't part of the ABI.  Wouldn't tweaking the headers and
simply recompiling your system libraries get you the same effect?

p.s. The more you explain about your use case, the less motivating I
find it for upstream.  This sounds like a weird situation you've created
for yourselves and should bear the cost of maintaining the mitigation
for.  Just as other downstream distributions do for other issues.  Just
my 2 cents.

via llvm-dev

unread,
Apr 19, 2021, 1:18:54 PM4/19/21
to list...@philipreames.com, yuank...@intel.com, lebed...@gmail.com, llvm...@lists.llvm.org, chen...@intel.com, daniel...@intel.com, sergey....@intel.com
> >> So the application software is unchangeable, right?
> > Exactly right. The application software works fine within itself.
> >
> > The system software, which we update roughly twice a year, also
> > accepted the misaligned data, until Clang was modified to emit the
> > aligned (trapping) opcodes. We had to fix that so the system
> > software would continue to allow the (buggy but unchangeable)
> > application software to continue to work.
> >
> > Yes, it is indeed the case that we can update the system software
> > but not the game software. I think it would be a distraction to
> > spell out the scenarios but please accept that it is the case.
>
> Out of curiosity, why do you solve this in the backend rather than
> patching your frontend/headers to not specify alignment?  To my
> knowledge (which isn't great in this area), alignment of pointer
> arguments isn't part of the ABI.  Wouldn't tweaking the headers and
> simply recompiling your system libraries get you the same effect?

This is about the alignment of the pointed-to data... surely that is
a requirement that must be preserved? I don't *think* the ABI says
anything to indicate pointed-to data cannot have alignment more than X.

I am not familiar with the component that is using this 32-byte-aligned
data structure, and can't say whether modifying it would be appropriate.
Toolchain was asked to address it; perhaps we could have pushed back,
but we didn't.

> p.s. The more you explain about your use case, the less motivating I
> find it for upstream.  This sounds like a weird situation you've created
> for yourselves and should bear the cost of maintaining the mitigation
> for.  Just as other downstream distributions do for other issues.  Just
> my 2 cents.

And in fact we decided not to upstream it, and we've lived with it (and
its fallout) for years. The only reason I brought it up was because the
Intel folks were suggesting the same thing, and I thought it might be
worthwhile to describe motivating cases from Sony.

I would love to hear Intel's motivation, because it might be something
very different that would change all our minds.

James Y Knight via llvm-dev

unread,
Apr 19, 2021, 2:30:49 PM4/19/21
to Robinson, Paul, Luo, Yuanke, Liu, Chen3, daniel...@intel.com, llvm-dev, Maslov, Sergey V

I understand your goal is to find and fix bugs in software that is
still under development and CAN be fixed.  I fully endorse that
goal.  However, that is not the situation that Sony has, and likely
not what Intel has.  Your proposal will NOT solve our problem.

No, that's not it at all! I'm afraid you've totally misunderstood my concern.

My goal is that if we add a compiler feature to address this problem -- so that you can compile code with under-aligned objects, and have it work as the author expected it to --  that the feature reliably addresses the problem, and makes such code no longer exhibit Undefined Behavior. The proposed backend change does not accomplish that, but we can implement a feature which will.

As Reid said, -fmax-type-align=N appears to be almost that feature, and something like this little patch (along with documentation update) may be all that's needed (but this is totally untested).

diff --git clang/lib/CodeGen/CodeGenModule.cpp clang/lib/CodeGen/CodeGenModule.cpp
index b23d995683bf..3aef166a690e 100644
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -6280,8 +6280,7 @@ CharUnits CodeGenModule::getNaturalTypeAlignment(QualType T,
   // Cap to the global maximum type alignment unless the alignment
   // was somehow explicit on the type.
   if (unsigned MaxAlign = getLangOpts().MaxTypeAlign) {
-    if (Alignment.getQuantity() > MaxAlign &&
-        !getContext().isAlignmentRequired(T))
+    if (Alignment.getQuantity() > MaxAlign)
       Alignment = CharUnits::fromQuantity(MaxAlign);
   }
   return Alignment;


James Y Knight via llvm-dev

unread,
Apr 19, 2021, 2:33:51 PM4/19/21
to Robinson, Paul, Luo, Yuanke, Liu, Chen3, daniel...@intel.com, llvm-dev, Maslov, Sergey V
(And while I did initially also express skepticism about whether this feature is necessary at all -- even if properly implemented -- I think the fact that Intel, Apple, and Sony have all proposed or implemented something along these lines is good evidence that it is sufficiently useful. I'm convinced on that point!)

via llvm-dev

unread,
Apr 19, 2021, 4:42:44 PM4/19/21
to jykn...@google.com, yuank...@intel.com, chen...@intel.com, daniel...@intel.com, llvm...@lists.llvm.org, sergey....@intel.com

We might still not be fully understanding one another, because this:

so that you can compile code with under-aligned objects, and have it work as the author expected it to

sounds like you’re expecting us to recompile the client code that creates the under-aligned objects.  That is literally not possible.  If you do understand that part, great, it’s just not obvious to me from how you’re phrasing things.

 

I (still) don’t know what Intel is facing.  For Sony’s problem, we would be much more likely to try to do something specific to the APIs that are being abused, rather than something draconian like eliminating alignment requirements for everyone.  But of course we have a solution that works for us, so there’s that much more inertia to overcome.

--paulr

 

From: James Y Knight <jykn...@google.com>

Sent: Monday, April 19, 2021 2:30 PM
To: Robinson, Paul <paul.r...@sony.com>

Luo, Yuanke via llvm-dev

unread,
Apr 19, 2021, 10:30:09 PM4/19/21
to paul.r...@sony.com, jykn...@google.com, llvm...@lists.llvm.org, Liu, Chen3, Towner, Daniel, Maslov, Sergey V

 

I collected the feedback/requirement from Intel customer as below.

 

Our software runs in an embedded environment and is processing buffers which are unaligned. Sometimes this misalignment is simply because the buffer allocation is beyond the immediate control of our software but  it can also be because we are processing blocks of data which are not multiples of the vector size (e.g., 6, 12 or 24). We can’t just fix our buffers to make them aligned. Our code is complicated and we support multiple instruction sets operating using the same algorithms by using templated code. For example:

 

template<typename DVEC_TYPE>

void doSomething(DVEC_TYPE* data)

{

  // Trivial example – reality would be something much more substantial, possibly with loops or other function calls.

  *data += 1.0f;

}

 

Note that we use dvec to help us abstract the ISA, but other similar header-only vector overloading libraries also exist.

 

We would then instantiate our function above multiple times for each ISA or data type we care about:

 

template void doSomething<float>(float* data); // Scalar type useful for debugging algorithm and doing basic testing

template void doSomething<F32vec8>(F32vec8* data); // Different AVX widths

template void doSomething<F32vec16>(F32vec16* data);

template void doSomething<I32vec16>(I32vec16* data); // Different element type

 

The functions are sufficiently large that we don’t want to have to write a different version for each ISA. We know that the incoming data may be mis-aligned and that accessing it directly is UB, so we could modify our code to explicitly handle misalignment. Something like:

 

template<typename DVEC_TYPE>

void doSomething(DVEC_TYPE* data)

{

  DVEC_TYPE t;

  loadu(t, data);

  t += 1.0f;

  storeu(data, t);

}

 

The code has become more verbose, less readable (maintainable, debuggable, etc), and it no longer works with plain scalar types which don’t have loadu/storeu defined unless we start defining overloaded helper functions. Also, if `data’ pointed at an array, we’d have to throw some pointer arithmetic into the mix, rather than just using plain `data[IDX]’ syntax. We can certainly write code which could cope with the misalignment explicitly but it just ends up becoming messy. Or, we could leverage the hardware to manage this misalignment for us letting the compiler emit the movups instruction, instead of movaps.  

 

Until now we have only been using the Intel Compiler, so we have written our code to use ICC’s unaligned operations and hardware support to make our code cleaner. We are looking at porting our code to LLVM, but LLVM is making this harder than it needs to be.

 

Thanks

Yuanke

Craig Topper via llvm-dev

unread,
Apr 19, 2021, 10:50:35 PM4/19/21
to Luo, Yuanke, llvm...@lists.llvm.org, Liu, Chen3, Towner, Daniel, Maslov, Sergey V
I don't think it's mentioned in Yuanke's mail. dvec.h is a header file that is included with icc that provides C++ wrapper classes around the SSE/AVX vector types that provide operator overloading.

~Craig


_______________________________________________

Roman Lebedev via llvm-dev

unread,
Apr 20, 2021, 3:27:54 AM4/20/21
to Luo, Yuanke, llvm...@lists.llvm.org, Liu, Chen3, Towner, Daniel, Maslov, Sergey V
On Tue, Apr 20, 2021 at 5:30 AM Luo, Yuanke <yuank...@intel.com> wrote:
>
>
>
> I collected the feedback/requirement from Intel customer as below.
>
>
>
> Our software runs in an embedded environment and is processing buffers which are unaligned. Sometimes this misalignment is simply because the buffer allocation is beyond the immediate control of our software but it can also be because we are processing blocks of data which are not multiples of the vector size (e.g., 6, 12 or 24). We can’t just fix our buffers to make them aligned. Our code is complicated and we support multiple instruction sets operating using the same algorithms by using templated code. For example:
>
>
>
> template<typename DVEC_TYPE>
>
> void doSomething(DVEC_TYPE* data)
>
> {
>
> // Trivial example – reality would be something much more substantial, possibly with loops or other function calls.
>
> *data += 1.0f;
>
> }
>
>
>
> Note that we use dvec to help us abstract the ISA, but other similar header-only vector overloading libraries also exist.
>
>
>
> We would then instantiate our function above multiple times for each ISA or data type we care about:
>
>
>
> template void doSomething<float>(float* data); // Scalar type useful for debugging algorithm and doing basic testing
>
> template void doSomething<F32vec8>(F32vec8* data); // Different AVX widths
>
> template void doSomething<F32vec16>(F32vec16* data);
>
> template void doSomething<I32vec16>(I32vec16* data); // Different element type
>
>
>
> The functions are sufficiently large that we don’t want to have to write a different version for each ISA. We know that the incoming data may be mis-aligned and that accessing it directly is UB, so we could modify our code to explicitly handle misalignment. Something like:
>
>
>
> template<typename DVEC_TYPE>
>
> void doSomething(DVEC_TYPE* data)
>
> {
>
> DVEC_TYPE t;
>
> loadu(t, data);
>
> t += 1.0f;
>
> storeu(data, t);
>
> }
>
>
>
> The code has become more verbose, less readable (maintainable, debuggable, etc), and it no longer works with plain scalar types which don’t have loadu/storeu defined unless we start defining overloaded helper functions. Also, if `data’ pointed at an array, we’d have to throw some pointer arithmetic into the mix, rather than just using plain `data[IDX]’ syntax. We can certainly write code which could cope with the misalignment explicitly but it just ends up becoming messy.

How about:
https://godbolt.org/z/vsj9raaqM

> Or, we could leverage the hardware to manage this misalignment for us letting the compiler emit the movups instruction, instead of movaps.

I guess people are intentionally ignoring all mentions that the code
will *still* be miscompiled in other ways.
That's sad.

> Until now we have only been using the Intel Compiler, so we have written our code to use ICC’s unaligned operations and hardware support to make our code cleaner. We are looking at porting our code to LLVM, but LLVM is making this harder than it needs to be.
>
>
>
> Thanks
>
> Yuanke

Roman

Reply all
Reply to author
Forward
0 new messages