[llvm-dev] x86: How to Force 2-byte `jmp` instruction in lowering

294 views
Skip to first unread message

Dean Michael Berris via llvm-dev

unread,
Jun 22, 2016, 3:10:35 AM6/22/16
to llvm-dev
I have a bit of a riddle:

In http://reviews.llvm.org/D19904 I'm trying to spell the following assembly:

  .palign 2, 0x90
  jmp +0x9
  nopw 512(%rax,%rax,1)
  // rest of the code

I try the following snippet to accomplish this:

  OutStreamer->EmitLabel(CurSled);
  OutStreamer->EmitCodeAlignment(4);
  auto Target = OutContext.createLinkerPrivateTempSymbol();

  // Use a two-byte `jmp`. This version of JMP takes an 8-bit relative offset as
  // an operand (computed as an offset from the jmp instruction).
  OutStreamer->EmitInstruction(
      MCInstBuilder(X86::JMP_1)
          .addExpr(MCSymbolRefExpr::create(Target, OutContext)),
      getSubtargetInfo());
  EmitNops(*OutStreamer, 9, Subtarget->is64Bit(), getSubtargetInfo());
  OutStreamer->EmitLabel(Target);

Which turns into:

.Lxray_sled_0:
  .palign 2, 0x90
  jmp .Ltmp0
  nopw 512(%rax,%rax,1)
.Ltmp0:
  // rest of the code

Is there a way of forcing the lowered JMP instruction to turn into a two-byte jump that does a short relative jump (one that fits within 8 bits)? When I run the binary and disassemble the function I'm seeing it turn into a 5-byte jump (jmpq <32-bit offset>) instead of a 2-byte jump (jmp <8-bit offset>).

Thanks in advance!

Nirav Davé

unread,
Jun 22, 2016, 9:05:27 AM6/22/16
to Dean Michael Berris, llvm-dev
This appears to work: 

auto Target = OutContext.createLinkerPrivateTempSymbol();

with 

auto Target = OutContext.createTempSymbol();

-Nirav


_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


Dean Michael Berris via llvm-dev

unread,
Jun 22, 2016, 12:14:29 PM6/22/16
to Nirav Davé, llvm-dev
On Wed, Jun 22, 2016 at 6:05 AM Nirav Davé <nir...@google.com> wrote:
This appears to work: 

auto Target = OutContext.createLinkerPrivateTempSymbol();

with 

auto Target = OutContext.createTempSymbol();

-Nirav


Thanks Nirav -- I tried this but I'm still getting a "jmpq <address>" with this incantation when I load and disassemble from gdb. I'm seeing a 5-instruction jump, followed by the nops.

If I disassemble with llvm-objdump though I see the following:

_Z3foov:
  400c10:       e9 09 00 00 00  jmp     9 <_Z3foov+0xE>
  400c15:       66 0f 1f 84 00 00 02 00 00      nopw    512(%rax,%rax)

I'm not sure whether the extra 0's after '0xe9 0x09' are alignment padding (though I was expecing 0x90 to show up if this was an alignment issue).

Is there anything else I can try here?

Thanks in advance!

Nirav Davé

unread,
Jun 22, 2016, 12:37:08 PM6/22/16
to Dean Michael Berris, llvm-dev
Hmm. Odd. I just rebuilt from scratch and it seems to work with the test/CodeGen/X86/xray-attribute-instrumentation.ll test case outputing straight to obj:

   llc -filetype=obj -o ~/a.o -mtriple=x86_64-apple-macosx < test/CodeGen/X86/xray-attribute-instrumentation.ll

What test case are you using? 

In any case, the issue appears to be that llvm doesn't realize that the target address is resolved and erroneously applies branch relaxation to the jump. I don't know why a linker private symbol would make a difference. 

-Nirav


Dean Michael Berris via llvm-dev

unread,
Jun 22, 2016, 1:41:09 PM6/22/16
to Nirav Davé, llvm-dev
Thanks Nirav,

I can confirm that this works when I do the compile with llc, but then when linking to an executable with clang (patched with http://reviews.llvm.org/D20352 and compiler-rt patched with http://reviews.llvm.org/D21612) on Linux, I'm getting something different. Here's a sample of the transcript, and what I'm seeing:

--->8 clang invocation 8<---
[16-06-23 3:33:42] dberris@dberris: ~/xray/llvm-build% ./bin/clang -fxray-instrument -x c++ -std=c++11 -o test.bin test.cc -g --verbose
clang version 3.9.0 (http://llvm.org/git/clang.git 3ae26ac8b1c9c5db65f3dc0236139448b8b0520a) (http://llvm.org/git/llvm.git 8fd5dd6aa8a633eeb03b245cd0060479371fc521)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/google/home/dberris/xray/llvm-build/./bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7.3
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64
 "/usr/local/google/home/dberris/xray/llvm-build/bin/clang-3.9" -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -main-file-name test.cc -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -v -dwarf-column-info -debug-info-kind=limited -dwarf-version=4 -debugger-tuning=gdb -fxray-instrument -resource-dir /usr/local/google/home/dberris/xray/llvm-build/bin/../lib/clang/3.9.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward -internal-isystem /usr/local/include -internal-isystem /usr/local/google/home/dberris/xray/llvm-build/bin/../lib/clang/3.9.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /usr/local/google/home/dberris/xray/llvm-build -ferror-limit 19 -fmessage-length 272 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/test-03d46e.o -x c++ test.cc
clang -cc1 version 3.9.0 based upon LLVM 3.9.0svn default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "/include"
ignoring duplicate directory "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8
 /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8
 /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward
 /usr/local/include
 /usr/local/google/home/dberris/xray/llvm-build/bin/../lib/clang/3.9.0/include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
 "/usr/bin/ld" -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o test.bin /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.8/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../.. -L/usr/local/google/home/dberris/xray/llvm-build/bin/../lib -L/lib -L/usr/lib -whole-archive /usr/local/google/home/dberris/xray/llvm-build/bin/../lib/clang/3.9.0/lib/linux/libclang_rt.xray-x86_64.a -no-whole-archive /tmp/test-03d46e.o --no-as-needed -lpthread -lrt -lm -latomic -ldl -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/4.8/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crtn.o
--->8 clang invocation 8<---

The test.cc is simply:

--->8 test.cc 8<---
#include <cstdio>
#include <cassert>

[[clang::xray_always_instrument]] void foo() { std::printf("Hello, XRay!\n"); }

void bar() { std::printf("Not instrumented\n"); }

extern "C" {
extern int __xray_patch();
}

int main(int argc, char* argv[]) {
  printf("main has started.\n");
  bar();
  foo();
  __xray_patch();
  foo();
}
--->8 test.cc 8<---

A snippet of the disassembly (llvm-objdump -disassemble test.bin) looks like:

--->8 disassembly 8<---
_Z3foov:
  400cb0:       e9 09 00 00 00  jmp     9 <_Z3foov+0xE>
  400cb5:       66 0f 1f 84 00 00 02 00 00      nopw    512(%rax,%rax)
  400cbe:       55      pushq   %rbp
  400cbf:       48 89 e5        movq    %rsp, %rbp
  400cc2:       48 83 ec 10     subq    $16, %rsp
  400cc6:       48 bf c5 0e 40 00 00 00 00 00   movabsq $4198085, %rdi
  400cd0:       b0 00   movb    $0, %al
  400cd2:       e8 a9 f9 ff ff  callq   -1623 <.plt+0x30>
  400cd7:       89 45 fc        movl    %eax, -4(%rbp)
  400cda:       48 83 c4 10     addq    $16, %rsp
  400cde:       5d      popq    %rbp
  400cdf:       c3      retq
  400ce0:       2e 66 0f 1f 84 00 00 02 00 00   nopw    %cs:512(%rax,%rax)
  400cea:       66 0f 1f 44 00 00       nopw    (%rax,%rax)

--->8 disassembly 8<---

Having looked at this a bit, I think you're right that the jumps are being relaxed, due to the -mrelax-all option being used by clang. The question becomes whether it's possible to inhibit relaxation for specific instructions at the LLVM level.

Cheers

Dean Michael Berris via llvm-dev

unread,
Jun 22, 2016, 4:36:55 PM6/22/16
to Nirav Davé, Peter Collingbourne, llvm-dev
Peter suggested just writing out '.byte 0xeb, 0x09' and that allowed the jump instruction to bypass the relaxation, so that fixes my immediate problem. The question still stands though whether it should be possible to do through the instruction builder interface.

Cheers

Rafael Espíndola

unread,
Jun 28, 2016, 10:06:40 PM6/28/16
to Dean Michael Berris, llvm-dev, Peter Collingbourne
On 22 June 2016 at 16:36, Dean Michael Berris via llvm-dev

<llvm...@lists.llvm.org> wrote:
> Peter suggested just writing out '.byte 0xeb, 0x09' and that allowed the
> jump instruction to bypass the relaxation, so that fixes my immediate
> problem. The question still stands though whether it should be possible to
> do through the instruction builder interface.
>

I don't think so. When the relax-all flag is on MC will relax all instructions.

Cheers,
Rafael

Dean Michael Berris via llvm-dev

unread,
Jun 28, 2016, 10:14:39 PM6/28/16
to Rafael Espíndola, llvm-dev, Peter Collingbourne
On Wed, Jun 29, 2016 at 12:06 PM Rafael Espíndola <rafael.e...@gmail.com> wrote:
On 22 June 2016 at 16:36, Dean Michael Berris via llvm-dev
<llvm...@lists.llvm.org> wrote:
> Peter suggested just writing out '.byte 0xeb, 0x09' and that allowed the
> jump instruction to bypass the relaxation, so that fixes my immediate
> problem. The question still stands though whether it should be possible to
> do through the instruction builder interface.
>

I don't think so. When the relax-all flag is on MC will relax all instructions.


I see.

So the question becomes what's the advantage (if any) of Clang passing the 'relax-all' flag down to MC? Is there a good reason for this behaviour at all?

Thanks Rafael!

Rafael Espíndola

unread,
Jun 28, 2016, 10:17:16 PM6/28/16
to Dean Michael Berris, llvm-dev, Peter Collingbourne

Speed, but it has probably been years since anyone benchmarked that.

Dean Michael Berris via llvm-dev

unread,
Jun 29, 2016, 2:50:20 AM6/29/16
to Rafael Espíndola, llvm-dev, Peter Collingbourne
On Wed, Jun 29, 2016 at 12:17 PM Rafael Espíndola <rafael.e...@gmail.com> wrote:
On 28 June 2016 at 22:14, Dean Michael Berris <dbe...@google.com> wrote:
> On Wed, Jun 29, 2016 at 12:06 PM Rafael Espíndola
> <rafael.e...@gmail.com> wrote:
>>
>> On 22 June 2016 at 16:36, Dean Michael Berris via llvm-dev
>> <llvm...@lists.llvm.org> wrote:
>> > Peter suggested just writing out '.byte 0xeb, 0x09' and that allowed the
>> > jump instruction to bypass the relaxation, so that fixes my immediate
>> > problem. The question still stands though whether it should be possible
>> > to
>> > do through the instruction builder interface.
>> >
>>
>> I don't think so. When the relax-all flag is on MC will relax all
>> instructions.
>>
>
> I see.
>
> So the question becomes what's the advantage (if any) of Clang passing the
> 'relax-all' flag down to MC? Is there a good reason for this behaviour at
> all?

Speed, but it has probably been years since anyone benchmarked that.


Okay. Any objections to just removing that from the clang side?

Rafael Espíndola

unread,
Jun 29, 2016, 10:38:22 AM6/29/16
to Dean Michael Berris, llvm-dev, Peter Collingbourne


Depends on what the benchmarks show.

Reid Kleckner via llvm-dev

unread,
Jun 29, 2016, 12:27:14 PM6/29/16
to Nirav Davé, llvm-dev
On Wed, Jun 22, 2016 at 9:36 AM, Nirav Davé <llvm...@lists.llvm.org> wrote:
In any case, the issue appears to be that llvm doesn't realize that the target address is resolved and erroneously applies branch relaxation to the jump. I don't know why a linker private symbol would make a difference. 

Relaxation is the process of *shortening* jumps that can be shortened, and then re-running instruction layout to discover more relaxations until fixpoint. Removing the -relax-all flag in clang won't help here, it would hurt.

I'm not exactly sure what the semantics of linker private symbols are, but using a normal assembler temporary label is probably the way to go anyway.

Craig Topper via llvm-dev

unread,
Jun 29, 2016, 1:05:52 PM6/29/16
to Reid Kleckner, llvm-dev
I thought jumps start short and relaxation widens them as needed until fixpoint. So relax-all causes them all to be widened unconditionally.

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




--
~Craig

Reid Kleckner via llvm-dev

unread,
Jun 29, 2016, 1:19:23 PM6/29/16
to Craig Topper, llvm-dev
On Wed, Jun 29, 2016 at 10:05 AM, Craig Topper <craig....@gmail.com> wrote:
I thought jumps start short and relaxation widens them as needed until fixpoint. So relax-all causes them all to be widened unconditionally.

My mistake, you're right.

I've been reading that code for years and assuming that it goes large-to-small, but I guess the process is the same regardless of which direction you go. :)

Dean Michael Berris via llvm-dev

unread,
Jun 29, 2016, 8:17:39 PM6/29/16
to Rafael Espíndola, llvm-dev, Peter Collingbourne
Good point. Is there a set of benchmarks I can run to make sure that this is a mostly benign change? Something that comes with the test suite?

Cheers

Dean Michael Berris via llvm-dev

unread,
Jun 29, 2016, 8:19:57 PM6/29/16
to Reid Kleckner, Craig Topper, llvm-dev
Okay, so then removing -mrelax-all should make this widening/shortening work more effectively? i.e. we wouldn't be unconditionally widening short jumps and allowing the instruction layout process "work as intended"?
Reply all
Reply to author
Forward
0 new messages