RISC-V GCC problems - division by constant Mersenne prime - strange asm

55 views
Skip to first unread message

Michael Clark

unread,
Mar 25, 2016, 4:42:41 PM3/25/16
to RISC-V Software Developers
Hi All,

I have found an interesting case where an optimisation is not being applied by GCC on RISC-V. And also some strange assembly output from GCC on RISC-V.

Both GCC and Clang appear to optimise division by a constant Mersenne prime on x86_64 however GCC on RISC-V is not applying this optimisation.

See test program and assembly output for these platforms:

* GCC -O3 on RISC-V
* GCC -O3 on x86_64
* LLVM/Clang -O3 on x86_64

Another strange observation is GCC on RISC-V is moving a1 to a5 via a stack store followed by a stack load. Odd? GCC 5 also seems to be doing odd stuff with stack ‘moves' on x86_64, moving esi to ecx via the stack (I think recent x86 micro-architecture treats tip of the stack like an extended register file so this may only have a small penalty on x86).

See GCC on RISC-V is emitting this:

test_0:
add sp,sp,-16
sw a1,12(sp)
lw a5,12(sp)
add sp,sp,16
remuw a0,a0,a5
jr ra

instead of this:

test_0:
remuw a0,a0,a1
jr ra

Compiler devs, please read Test program and assembly output. I have not yet tested LLVM/Clang on RISC-V yet… I will do that next… I have not had time to dig into compiler code yet...

Regards,
Michael.


/* Test program */

#include <stdio.h>
#include <limits.h>

static const int p = 8191;
static const int s = 13;

int __attribute__ ((noinline)) test_0(unsigned int k, volatile int p)
{
return k % p;
}

int __attribute__ ((noinline)) test_1(unsigned int k)
{
return k % p;
}

int __attribute__ ((noinline)) test_2(unsigned int k)
{
int i = (k&p) + (k>>s);
i = (i&p) + (i>>s);
if (i>=p) i -= p;
return i;
}

int main()
{
test_0(1, 8191); /* control */
for (int i = INT_MIN; i < INT_MAX; i++) {
int r1 = test_1(i), r2 = test_2(i);
if (r1 != r2) printf("%d %d %d\n", i, r1, r2);
}
}



/* RISC-V GCC */

$ riscv64-unknown-elf-gcc --version
riscv64-unknown-elf-gcc (GCC) 5.2.0

test_0:
add sp,sp,-16
sw a1,12(sp)
lw a5,12(sp)
add sp,sp,16
remuw a0,a0,a5
jr ra
test_1:
li a5,8192
addw a5,a5,-1
remuw a0,a0,a5
ret
test_2:
li a3,8192
addw a2,a3,-1
and a4,a0,a2
srlw a0,a0,13
addw a5,a4,a0
and a0,a5,a2
sraw a5,a5,13
addw a0,a0,a5
addw a3,a3,-2
ble a0,a3,.L5
subw a0,a0,a2
.L5:
ret


/* Linux x86_64 GCC */

$ gcc --version
gcc (Debian 5.2.1-23) 5.2.1 20151028

test_0:
mov DWORD PTR [rsp-4], esi
mov ecx, DWORD PTR [rsp-4]
mov eax, edi
cdq
idiv ecx
mov eax, edx
ret
test_1:
mov eax, edi
mov rcx, rax
mov rdx, rax
sal rcx, 6
sal rdx, 19
add rdx, rcx
add rax, rdx
mov edx, edi
shr rax, 32
sub edx, eax
shr edx
add eax, edx
shr eax, 12
mov edx, eax
sal edx, 13
sub edx, eax
sub edi, edx
mov eax, edi
ret
test_2:
mov eax, edi
shr edi, 13
and eax, 8191
add eax, edi
mov edx, eax
sar eax, 13
and edx, 8191
add eax, edx
lea edx, [rax-8191]
cmp eax, 8191
cmovge eax, edx
ret


/* Darwin x86_64 LLVM Clang */

$ cc --version
Apple LLVM version 7.3.0 (clang-703.0.29)

_test_0:
mov dword ptr [rsp - 4], esi
xor edx, edx
mov eax, edi
div dword ptr [rsp - 4]
mov eax, edx
ret
_test_1:
mov eax, edi
imul rax, rax, 524353
shr rax, 32
mov ecx, edi
sub ecx, eax
shr ecx
add ecx, eax
shr ecx, 12
imul eax, ecx, 8191
sub edi, eax
mov eax, edi
ret
_test_2:
mov eax, edi
and eax, 8191
mov ecx, edi
shr ecx, 13
add eax, ecx
add ecx, edi
and ecx, 8191
shr eax, 13
lea edx, [rcx + rax]
cmp edx, 8190
lea eax, [rcx + rax - 8191]
cmovbe eax, edx
ret

Michael Clark

unread,
Mar 25, 2016, 8:43:37 PM3/25/16
to RISC-V Software Developers
Now considering I have no idea how many cycles it takes for an integer divide on the Rocket so the optimisation may not be a win.

Trying to read MuDiv in multiplier.scala, and will at some point run some timings in the cycle-accurate simulator.

In either case, the spurious stack moves emitted by GCC are curious...
> --
> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/2600D96D-94BC-4259-9D39-DE4993859281%40mac.com.

signature.asc

Andrew Waterman

unread,
Mar 26, 2016, 9:32:57 PM3/26/16
to Michael Clark, RISC-V Software Developers
It would be good to figure out how to get rid of the spurious register spills.

The strength reduction optimization isn't always profitable on Rocket,
as it increases instruction count and code size. The divider has an
early out and for small numbers is quite fast.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/9F3C9DE6-F00B-4402-A83B-354455DEAFFA%40mac.com.

Michael Clark

unread,
Mar 27, 2016, 1:58:11 AM3/27/16
to g...@gcc.gnu.org, cfe...@lists.llvm.org, Andrew Waterman, RISC-V Software Developers
Seems I had misused volatile. I removed ‘volatile’ from the function argument on test_0 and it prevented the spill through the stack.

I added volatile because I was trying to avoid the compiler optimising away the call to test_0 (as it has no side effects) but it appeared that volatile was unnecessary and was a misuse of volatile (intended to indicate storage may change outside of the control of the compiler). However it is an interesting case… as a register arguments don’t have storage.

GCC, Clang folk, any ideas on why there is a stack spill for a volatile register argument passed in esi? Does volatile force the argument to have storage allocated on the stack? Is this a corner case in the C standard? This argument in the x86_64 calling convention only has a register, so technically it can’t change outside the control of the C "virtual machine” so volatile has a vague meaning here. This seems to be a case of interpreting the C standard in such a was as to make sure that a volatile argument “can be changed” outside the control of the C "virtual machine” by explicitly giving it a storage location on the stack. I think volatile scalar arguments are a special case and that the volatile type label shouldn’t widen the scope beyond the register unless it actually *needs* storage to spill. This is not a volatile stack scoped variable unless the C standard interprets ABI register parameters as actually having ‘storage’ so this is questionable… Maybe I should have gotten a warning… or the volatile type qualifier on a scalar register argument should have been ignored…

volatile for scalar function arguments seems to mean: “make this volatile and subject to change outside of the compiler” rather than being a qualifier for its storage (which is a register).

# gcc
test_0:
mov DWORD PTR [rsp-4], esi
mov ecx, DWORD PTR [rsp-4]
mov eax, edi
cdq
idiv ecx
mov eax, edx
ret

# clang
test_0:
mov dword ptr [rsp - 4], esi
xor edx, edx
mov eax, edi
div dword ptr [rsp - 4]
mov eax, edx
ret

/* Test program compiled on x86_64 with: cc -O3 -fomit-frame-pointer -masm=intel -S test.c -o test.S */

Paul-Antoine ARRAS

unread,
Mar 29, 2016, 4:56:25 AM3/29/16
to Michael Clark, sw-...@groups.riscv.org
The C99 standard reads ($6.7.3.6):
"An object that has volatile-qualified type may be modified in ways
unknown to the implementation or have other unknown side effects.
Therefore any expression referring to such an object shall be evaluated
strictly according to the rules of the abstract machine, as described in
5.1.2.3. Furthermore, at every sequence point the value last stored in
the object shall agree with that prescribed by the abstract machine,
except as modified by the unknown factors mentioned previously. 116)
What constitutes an access to an object that has volatile-qualified type
is implementation-defined."

My take on this is that a volatile-qualified object will not undergo any
optimization so as to adhere strictly to the semantics. As a result, a
volatile object will systematically be allocated on the stack (in main
memory) and thus subject to modifications unknown to the implementation.
Hence the register spilling on each access.

Furthermore, the GCC 4.9.2 doc ($4.10) reads:
"Such an object is normally accessed by pointers and used for accessing
hardware."

Generally speaking, there is little sense in declaring an object
volatile. Only pointers to such objects should be declared.

Le 27/03/2016 07:57, Michael Clark a écrit :
> Seems I had misused volatile. I removed ‘volatile’ from the function argument on test_0 and it prevented the spill through the stack.
>
> I added volatile because I was trying to avoid the compiler optimising away the call to test_0 (as it has no side effects) but it appeared that volatile was unnecessary and was a misuse of volatile (intended to indicate storage may change outside of the control of the compiler). However it is an interesting case… as a register arguments don’t have storage.
>
> GCC, Clang folk, any ideas on why there is a stack spill for a volatile register argument passed in esi? Does volatile force the argument to have storage allocated on the stack? Is this a corner case in the C standard? This argument in the x86_64 calling convention only has a register, so technically it can’t change outside the control of the C "virtual machine” so volatile has a vague meaning here. This seems to be a case of interpreting the C standard in such a was as to make sure that a volatile argument “can be changed” outside the control of the C "virtual machine” by explicitly giving it a storage location on the stack. I think volatile scalar arguments are a special case and that the volatile type label shouldn’t widen the scope beyond the register unless it actually *needs* storage to spill. This is not a volatile stack scoped variable unless the C standard interprets ABI register parameters as actually having ‘storage’ so this is questionable… Ma!
ybe I shou
--
Paul-Antoine Arras
CEA LIST
Reply all
Reply to author
Forward
0 new messages