[llvm-dev] [X86] How do I set just the low byte of an x86_64 register?

41 views
Skip to first unread message

Mat Hostetter via llvm-dev

unread,
Feb 11, 2021, 8:29:52 AM2/11/21
to llvm...@lists.llvm.org

I work on a compiler that uses LLVM for its back end. I'm interested in setting just the low byte of a register, leaving the other bits alone, for some GC tag bit shenanigans, e.g.:

 

long replace_low_byte_with_37(long* a) {

  return (*a & ~0xFFL) | 37;

}

 

x86_64 has a movb instruction that does exactly this, but I can't get clang (or any other compiler), to use movb for this purpose, even at -Os.

 

Here is the -Os -march=sandybridge compiler output for gcc-10.2, icc-21.1.9, and clang-11.0.1 (all different!), as well as how a simple movb assembles:

 

0000000000000000 <gcc>:

   0:   48 8b 07                mov    (%rdi),%rax

   3:   30 c0                   xor    %al,%al

   5:   48 83 c8 25             or     $0x25,%rax

   9:   c3                      retq

000000000000000a <icc>:

   a:   48 8b 07                mov    (%rdi),%rax

   d:   48 25 00 ff ff ff       and    $0xffffffffffffff00,%rax

  13:   48 83 c0 25             add    $0x25,%rax

  17:   c3                      retq

0000000000000018 <clang>:

  18:   48 c7 c0 00 ff ff ff    mov    $0xffffffffffffff00,%rax

  1f:   48 23 07                and    (%rdi),%rax

  22:   48 83 c8 25             or     $0x25,%rax

  26:   c3                      retq

0000000000000027 <simple_movb_by_hand>:

  27:   48 8b 07                mov    (%rdi),%rax

  2a:   b0 25                   mov    $0x25,%al

  2c:   c3                      retq

 

As you can see, movb would be smallest (and llvm's is the biggest). Size is important for my use case.

 

So why don't these compilers generate movb? Perhaps the concern is partial register stalls and how %rax and %al interact with the register renamer. As I understand the background from Peter Cordes referenced by #34707, the punchline is that since Sandy Bridge, and especially Skylake, the partial register stall is no big deal for an actual RMW operation like this.

 

But even on CPUs where there is a stall that's worse than the added instructions from not using movb, -Os should still prefer movb.

 

I'm not advocating using this for %ah (etc.), which is famously incorrect in some Skylake and Kaby Lake CPUs without a microcode patch.

 

Is there a way to get LLVM to generate movb to set just the low byte?

 

Tim Northover via llvm-dev

unread,
Feb 11, 2021, 1:13:59 PM2/11/21
to Mat Hostetter, llvm...@lists.llvm.org
On Thu, 11 Feb 2021 at 13:29, Mat Hostetter via llvm-dev
<llvm...@lists.llvm.org> wrote:
> But even on CPUs where there is a stall that's worse than the added instructions from not using movb, -Os should still prefer movb.

It doesn't really change anything (we still don't do movb), but
Clang's real size optimization option is "-Oz". -Os is much closer to
-O2 with just a hint of caring about size.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reply all
Reply to author
Forward
0 new messages