[LLVMdev] Bug in ARM Thumb inline asm?

63 views
Skip to first unread message

Richard Pennington

unread,
Feb 10, 2015, 4:16:34 PM2/10/15
to LLVM Developers Mailing List, cf >> "cfe-dev@cs.uiuc.edu Developers"
I'm porting the musl C library to ARM Thumb. It looks like inline asm is
failing in some cases. Here's one:

The lseek system call looks like this:
...
off_t result;
return syscall(SYS__llseek, fd, offset>>32, offset, &result,
whence) ? -1 : result;
...

Which eventually goes through this macro:

static inline long __syscall5(long n, long a, long b, long c, long d,
long e)
{
register long r7 __asm__("r7") = n;
register long r0 __asm__("r0") = a;
register long r1 __asm__("r1") = b;
register long r2 __asm__("r2") = c;
register long r3 __asm__("r3") = d;
register long r4 __asm__("r4") = e;
__asm_syscall("r"(r7), "0"(r0), "r"(r1), "r"(r2), "r"(r3),
"r"(r4));
}

And then this macro:
#define __asm_syscall(...) do { \
__asm__ __volatile__ ( "svc 0" \
: "=r"(r0) : __VA_ARGS__ : "memory"); \
return r0; \
} while (0)

Gives:
Disassembly of section .text:

00000000 <lseek>:
0: b590 push {r4, r7, lr}
2: af01 add r7, sp, #4
4: b083 sub sp, #12
6: 278c movs r7, #140 ; 0x8c
8: 46ec mov ip, sp
a: 4619 mov r1, r3
c: 68bc ldr r4, [r7, #8] ; XXX r7 is clobbered here.
e: 4663 mov r3, ip
10: df00 svc 0
12: f7ff fffe bl 0 <__syscall_ret>
16: 9a00 ldr r2, [sp, #0]
18: 9901 ldr r1, [sp, #4]
1a: 2800 cmp r0, #0
1c: bf1c itt ne
1e: f04f 32ff movne.w r2, #4294967295 ; 0xffffffff
22: f04f 31ff movne.w r1, #4294967295 ; 0xffffffff
26: 4610 mov r0, r2
28: b003 add sp, #12
2a: bd90 pop {r4, r7, pc}

The question is, does the line

register long r7 __asm__("r7") = n;

make any guarantee about the value of r7 in the asm block?

Adding the -fomit-frame-pointer flag fixes it, but is the bug in the
code or in the compiler?

-Rich
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

James Molloy

unread,
Feb 10, 2015, 5:42:27 PM2/10/15
to Richard Pennington, LLVM Developers Mailing List, cf >> "cfe-dev@cs.uiuc.edu Developers", Renato Golin
Hi Richard,

My belief is that we don't fully support this syntax. I believe, and I'm sure Renato on CC will tell me if I'm wrong as he implemented it, that we only support this for non-allocatable registers, such as SP or FP.

Cheers,

James

Renato Golin

unread,
Feb 10, 2015, 9:55:54 PM2/10/15
to James Molloy, cf >> cfe-dev@cs.uiuc.edu Developers, LLVM Developers Mailing List
On 11 February 2015 at 06:39, James Molloy <ja...@jamesmolloy.co.uk> wrote:
> My belief is that we don't fully support this syntax. I believe, and I'm
> sure Renato on CC will tell me if I'm wrong as he implemented it, that we
> only support this for non-allocatable registers, such as SP or FP.

Hi James,

We do support that syntax, and that's the only syntax for local
allocatable registers we support. But I'm not sure if we guarantee
that the register won't be clobbered in between, especially if you're
using macros in inline functions.

The main issue here is that such a macro would be expanded inside a
function that is already using a lot of registers for whatever reasons
and even though the compiler can try to guarantee it will assign a
specific register inside the inline asm, it won't have much effect
outside of it, and this is what I think it's going on.


>> 6: 278c movs r7, #140 ; 0x8c
>> 8: 46ec mov ip, sp
>> a: 4619 mov r1, r3
>> c: 68bc ldr r4, [r7, #8] ; XXX r7 is clobbered
>> e: 4663 mov r3, ip
>> 10: df00 svc 0
>> 12: f7ff fffe bl 0 <__syscall_ret>

That code seems very odd... It may be a result of the outside function
(lseek) intermixing code with the macro after some heavy optimisation.


>> register long r7 __asm__("r7") = n;
>> make any guarantee about the value of r7 in the asm block?

I believe the guarantee is IFF the inline asm immediately follows,
which is what you have, so I'm not sure why r7 is being clobbered in
this case. It may be a side effect of some other optimisation or just
that the guarantee is not being kept in that special case.


>> Adding the -fomit-frame-pointer flag fixes it,

That sounds like a coincidence... :)


>> but is the bug in the code or in the compiler?

Hard to say. Inline asm is a poorly documented GNU extension that was
not designed in any meaningful way but evolved from whatever was out
there pretty much like a genetic algorithm. Clang/LLVM tries to
rationalise the implementation and, for obvious reasons, will get
different behaviour, not necessarily wrong. Basically, YMMV.

The first thing is to duplicate the macro code inside the static
inline function and see if the problem disappears. Also, try with
lower optimisation levels and if that fixes it, run bugpoint on a
reduced case to spot what pass "breaks" it.

I think we have to approach the problem in a different way, though. Is
there a better way of doing this?

cheers,
--renato

Richard Pennington

unread,
Feb 18, 2015, 2:32:47 PM2/18/15
to llv...@cs.uiuc.edu
On 02/10/2015 03:10 PM, Richard Pennington wrote:
> I'm porting the musl C library to ARM Thumb. It looks like inline asm
> is failing in some cases. Here's one:

I've put together a simple test file and am puzzled by the results.
If I compile this for Thumb:

typedef long long off_t;

#if 1
long foo(long);
#else
#define foo(x) (x)
#endif

#if 1
inline long __syscall5(n, a, b, c, d, e)
{
register long r7 __asm__("r7") = n;
register long r0 __asm__("r0") = a;
register long r1 __asm__("r1") = b;
register long r2 __asm__("r2") = c;
register long r3 __asm__("r3") = d;
register long r4 __asm__("r4") = e;
do { __asm__ __volatile__ ( "svc 0" : "=r"(r0) : "r"(r7), "0"(r0),
"r"(r1), "r"(r2), "r"(r3), "r"(r4) : "memory"); return r0; } while (0);
}
#else

#define __syscall5(n, a, b, c, d, e) \
({ \
register long r7 __asm__("r7") = n; \
register long r0 __asm__("r0") = a; \
register long r1 __asm__("r1") = b; \
register long r2 __asm__("r2") = c; \
register long r3 __asm__("r3") = d; \
register long r4 __asm__("r4") = e; \
do { __asm__ __volatile__ ( "svc 0" : "=r"(r0) : "r"(r7), "0"(r0),
"r"(r1), "r"(r2), "r"(r3), "r"(r4) : "memory"); return r0; } while (0); \
;r0; })
#endif

off_t lseek(int fd, off_t offset, int whence)
{

off_t result;
return foo(__syscall5(140,((long) (fd)),((long) (offset>>32)),
((long) (offset)),((long) (&result)),((long) (whence)))) ? -1 : result;
}

I get

.globl lseek
.align 2
.type lseek,%function
.code 16 @ @lseek
.thumb_func
lseek:
.fnstart
.Leh_func_begin0:
@ BB#0: @ %entry
push {r4, r7, lr}
add r7, sp, #4
sub sp, #12
movs r7, #140
mov r12, sp
mov r1, r3
ldr r4, [r7, #8]
mov r3, r12
@APP
svc #0
@NO_APP
bl foo
ldr r2, [sp]
ldr r1, [sp, #4]
cmp r0, #0
itt ne
movne.w r2, #-1
movne.w r1, #-1
mov r0, r2
add sp, #12
pop {r4, r7, pc}
.Ltmp0:
.size lseek, .Ltmp0-lseek
.cantunwind
.fnend

where the "ldr r4, [r7, #8]" instruction uses r7 as a frame pointer even
though it was over-written earlier.

If I use -fomit-frame-pointer or any other combination if the
conditionals (macro vs. inline, used as a function parameter vs. not)
the code emitted is correct.

Is there some in the original code that causes it to break while the
other forms do not? Or are the other forms working just because of luck?
Should the original code work?
Reply all
Reply to author
Forward
0 new messages