[LLVMdev] ARM Jump table pcrelative relaxation in clang / llc

223 views
Skip to first unread message

Eric Bentura

unread,
Jul 5, 2015, 10:29:11 AM7/5/15
to llv...@cs.uiuc.edu
Hi,

I have written a PassManager (IR) pass that seriously increases the size of the original IR code. 
As a result it seems that the generated machine code is incorrect (as of LLVM 3.5): The AsmPrinter generates the following instruction :
adr r2, .LJTI4_0_0
when going through the MC streamer, I get a "fatal error: error in backend: out of range pc-relative fixup" .
Apparently, the fixup does not hold in the 12 bits we have available. I would have expected clang to perform relaxation on this instruction on that particular case. Using the flag mrelax-all does not help.
Is there a way in the PassManager::runOnFunction to anticipate that so that I can generate a IR code that would fit when converted to machine code?
Strangely enough, this is not happening when using llc to generate the code from the bc file, I get the object file.
The target is armv5e-none-linux-androideabi ( I used -mtriple with llc).
I have seen a similar thread in 2012 " Questions on MachineFunctionPass and relaxation of pcrel calls (ARM/thumb2)". Even though there have been improvements since them, I am concerned with the difference of behavior of the two tools.

Thanks for your help.

Eric

Tim Northover

unread,
Jul 5, 2015, 11:57:32 AM7/5/15
to Eric Bentura, LLVM Developers Mailing List
Hi Eric,

On 5 July 2015 at 07:22, Eric Bentura <eben...@gmail.com> wrote:
> As a result it seems that the generated machine code is incorrect (as of
> LLVM 3.5): The AsmPrinter generates the following instruction :
> adr r2, .LJTI4_0_0
> when going through the MC streamer, I get a "fatal error: error in backend:
> out of range pc-relative fixup" .

We've fairly recently fixed a bug that looks very similar (r238680,
which was well after 3.6)

> Apparently, the fixup does not hold in the 12 bits we have available. I
> would have expected clang to perform relaxation on this instruction on that
> particular case.

Agreed, whatever's going on it's a bug in the ARM backend.

> Is there a way in the PassManager::runOnFunction to anticipate that so that
> I can generate a IR code that would fit when converted to machine code?

Not as far as I'm aware; since the bug is further back there's no
reason to try and provide such information to earlier passes. The
backend is expected to cope with whatever you give it.

> Strangely enough, this is not happening when using llc to generate the code
> from the bc file, I get the object file.

That's weird. Even with "-filetype=obj" (the bug only occurs when
directly writing an object file)? Not that it really affects anything,
getting the same backend options with llc can be a bit tricky.

> Even though there have been
> improvements since them, I am concerned with the difference of behavior of
> the two tools.

The most common one I find perturbing output is "-mcpu" (even with a
triple), but really there are so many options front-ends can twiddle
that you just have to know what it's doing and copy that.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Eric Bentura

unread,
Jul 6, 2015, 2:23:03 AM7/6/15
to Tim Northover, LLVM Developers Mailing List
Hi Tim,
Thank you for your answer.

We've fairly recently fixed a bug that looks very similar (r238680,
which was well after 3.6)

If I wanted to back port that to 3.5 where should I look at? Where in the ARM backend the decision to relax an instruction is taken?

That's weird. Even with "-filetype=obj" (the bug only occurs when
directly writing an object file)? Not that it really affects anything,
getting the same backend options with llc can be a bit tricky.

This is passing even with -filetype=obj. The transformation I apply are in the optimizer so I must build the new bc to create the object file. 

Thanks for your help

Eric

Renato Golin

unread,
Jul 6, 2015, 8:45:16 AM7/6/15
to Eric Bentura, LLVM Developers Mailing List
On 6 July 2015 at 07:18, Eric Bentura <eben...@gmail.com> wrote:
> We've fairly recently fixed a bug that looks very similar (r238680,
> which was well after 3.6)
>
> If I wanted to back port that to 3.5 where should I look at? Where in the
> ARM backend the decision to relax an instruction is taken?

Hi Eric,

First, I'd make sure if Tim's fix works for you. If you can't forward
port your pass to trunk, try to backport Tim's patch into your tree.


> This is passing even with -filetype=obj. The transformation I apply are in
> the optimizer so I must build the new bc to create the object file.

This is good news, means that the problem is probably not in the
asm/obj emitters. The difference in behaviour between llc and clang
are normally due to target description issues, as Tim mentioned.

I'd encourage you to check on llc's object file and see how the jump
table is being lowered. It's possible that the lack of a few flags
clang passes to the back-end made that instruction not be selected
during ISel.

Essentially, "clang -target armv5t" is *not* the same as "llc -mtriple armv5t".

I'm guessing you're hitting the same bug Tim found earlier...

cheers,
--renato

Eric Bentura

unread,
Jul 6, 2015, 2:16:26 PM7/6/15
to Renato Golin, LLVM Developers Mailing List
It is certainly helping - Thanks Renato.

2015-07-06 18:39 GMT+03:00 Renato Golin <renato...@linaro.org>:
On 6 July 2015 at 16:32, Eric Bentura <eben...@gmail.com> wrote:
> I tried to build the object file using clang 3.7 and it fails with the same
> error.
> Where should I look at in the ARM backend to understand what happens?
> Where the jump table instruction is generated and supposed to be relaxed?

Have a look at lib/Target/ARM/ARMConstantIslandPass.cpp, especially
where Tim's patch touches:

http://llvm.org/viewvc/llvm-project?view=revision&revision=238680

Instruction relaxation rules should be in the TableGen files, I think,
but that means it could be in a number of places.

Step through lib/Target/ARM/ARMAsmPrinter.cpp, at
ARMAsmPrinter::EmitJumpTableInsts and see what the operand is.

You'd expect that it would be already relaxed by that point. If it is,
the bug is in the printer. If not, it could be in the instruction
selection process, either ARMISelLowering or during validation, at
ARMISelDAGToDAG.

Hope that helps.

cheers,
--renato

Eric Bentura

unread,
Jul 7, 2015, 9:10:11 AM7/7/15
to Renato Golin, LLVM Developers Mailing List
I have created a small ll file to reproduce the problem.
I used the intrinsic function llvm.arm.space to introduce space between the beginning of the code and the jump table.
If the first argument of llvm.arm.space is higher than INT_MAX (2147483647), then the bug is hit. Lower or equal to that value, it passes. It looks like a precision issue. Does this sound familiar to someone?

; ModuleID = 'test.c'
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "armv5e-none-linux-androideabi"

declare i32 @llvm.arm.space(i32, i32)

; Function Attrs: nounwind
define i32 @main() #0 {
entry:
  %retval = alloca i32, align 4
  %a = alloca i32, align 4
  store i32 0, i32* %retval
  store i32 0, i32* %a, align 4
  %0 = load i32* %a, align 4
  call i32 @llvm.arm.space(i32 2147483647, i32 undef)
  switch i32 %0, label %sw.default [
    i32 0, label %sw.bb
    i32 1, label %sw.bb1
    i32 2, label %sw.bb2
    i32 3, label %sw.bb3
  ]

sw.bb:                                            ; preds = %entry
  store i32 1, i32* %retval
  br label %return

sw.bb1:                                           ; preds = %entry
  store i32 2, i32* %retval
  br label %return

sw.bb2:                                           ; preds = %entry
  store i32 3, i32* %retval
  br label %return

sw.bb3:                                           ; preds = %entry
  store i32 4, i32* %retval
  br label %return

sw.default:                                       ; preds = %entry
  br label %sw.epilog

sw.epilog:                                        ; preds = %sw.default
  store i32 0, i32* %retval
  br label %return

return:                                           ; preds = %sw.epilog, %sw.bb3, %sw.bb2, %sw.bb1, %sw.bb
  %2 = load i32* %retval
  ret i32 %2
}

; Function Attrs: nounwind
declare i32 @rand() #0

attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="true" }
attributes #1 = { nounwind }

!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 1, !"min_enum_size", i32 4}
!2 = !{i32 1, !"PIC Level", i32 1}
!3 = !{!"clang version 3.7.0 (trunk 229364)"}

Tim Northover

unread,
Jul 7, 2015, 11:12:37 AM7/7/15
to Eric Bentura, LLVM Developers Mailing List
On 7 July 2015 at 06:06, Eric Bentura <eben...@gmail.com> wrote:
> I have created a small ll file to reproduce the problem.
> I used the intrinsic function llvm.arm.space to introduce space between the
> beginning of the code and the jump table.

It does look like the value in @llvm.arm.space is interpreted
incorrectly if it's bigger than INT_MAX, but that's well outside its
intended range and could inevitably be used to break ConstantIslands
(the longest ARM immediate branch is 26-bits; no indivisible entity in
.text can be bigger than that). It's probably an unrelated issue.

Also, I know I said the backend should accept any size code, but 2GB
is definitely going to trigger more edge cases than average.

Cheers.

Tim.

Eric Bentura

unread,
Jul 13, 2015, 8:46:12 AM7/13/15
to Tim Northover, LLVM Developers Mailing List
Hi, 
I have kept working on this and found the following (as llvm 3.5):
1) In the function MCObjectStreamer::EmitInstruction there is a check for the instruction being relaxable or not:
 if (!Assembler.getBackend().mayNeedRelaxation(Inst)) {
    EmitInstToData(Inst, STI);
    return;
  }
At this stage, the instruction as been already selected to be ARM::ADR.
The call to mayNeed

Eric Bentura

unread,
Jul 13, 2015, 8:59:55 AM7/13/15
to Tim Northover, Renato Golin, LLVM Developers Mailing List
Hi, 
I have kept working on this and found the following (as llvm 3.5):
1) In the function MCObjectStreamer::EmitInstruction there is a check for the instruction being relaxable or not:
    if (!Assembler.getBackend().mayNeedRelaxation(Inst)) {
       EmitInstToData(Inst, STI);
       return;
    } 
   At this stage, the instruction as been already selected to be ARM::ADR.
   The call to mayNeedRelaxation() resolve to ARMAsmBackend::getRelaxedOpcode().
   There is no processing in there for ARM:ADR. I added the following line: 
           case ARM::ADR:        return ARM::t2ADR;
   As a result, if relaxation is enabled or bundling is enabled then the instruction is relaxed.
   And compilation to object passes.
   I am not familiar enough with this to understand why there is a condition to enter the relaxtion step : I had to set manually Assembler.setRelaxAll(true) to get into this step.
2) It seems that Fast instruction selection is enabled by default (even when using -O3). The problem does not appear when not using Fast sel (again used a hack in the code) although the same ADR instruction is selected since the offset to apply to the fixup is small enough.

I am not sure I am on the right track, but as far as I understand:
1)ARM::ADR is not handled by relaxation
2)Relaxtion happens under some condition in the ObjectStreamer that I don't fully understand
 
What do you think?

Thanks,

Eric

Tim Northover

unread,
Jul 13, 2015, 12:24:12 PM7/13/15
to Eric Bentura, LLVM Developers Mailing List
> There is no processing in there for ARM:ADR. I added the following line:
> case ARM::ADR: return ARM::t2ADR;
> As a result, if relaxation is enabled or bundling is enabled then the instruction is relaxed.

Unfortunately, that's not going to work at runtime, for a couple of reasons:

1. An ARM::ADR instruction is ARM-mode, but ARM::t2ADR is Thumb-mode.
They can't be mixed in the same function (to a first approximation).
It'll be interpreted as an entirely different ARM instruction if a CPU
ever sees it.
2. Even if it did what you were hoping, it only staves of the issue:
t2ADR has a limited range too, it's just longer than ADR.

> I am not sure I am on the right track, but as far as I understand:
> 1)ARM::ADR is not handled by relaxation
> 2)Relaxtion happens under some condition in the ObjectStreamer that I don't fully understand

As suggested by the second problem above, relaxation is not the
correct approach. There is no instruction that we can guarantee will
reach the jump table. There are two plausible ways to fix it (that I
could think of):

1. Enhance ARMConstantIslands.cpp to move the jump table in range if
needed (this is what we did on trunk, see r238680).
2. Fuse the ADR to the jump-table with a pseudo-instruction when
they're first created and expand them much later. This is uglier, but
might be a simpler way to do it.

Of course, the real solution is the usual recommendation to track
trunk wherever possible. Getting stuck on 3.5 is a recipe for ongoing
pain.
Reply all
Reply to author
Forward
0 new messages