unnecessary spills around function call

38 views
Skip to first unread message

Pengvado

unread,
Mar 17, 2011, 12:34:25 PM3/17/11
to asmjit-dev
Is it possible to mark a variable as used up to and including a
function call, but unused afterward, so that the call is allowed to
clobber it? AsmJit thinks my temporary variables (in which I computed
the function args) need to be preserved. Explicitly unuse()ing them
afterward doesn't help.

In some cases, the immediate form of ECall::setArgument() would work.
(Though it looks like that's unimplemented? I see TODOs in the
source.) But this question applies to all arguments of all functions.

e.g.
Compiler c;
FileLogger logger(stderr);
c.setLogger(&logger);
c.newFunction(CALL_CONV_DEFAULT, FunctionBuilder0<int>());
c.getFunction()->setHint(FUNCTION_HINT_NAKED, true);
GPVar x(c.newGP());
c.mov(x, 42);
ECall* ctx = c.call((void*)&abs);
ctx->setPrototype(CALL_CONV_DEFAULT, FunctionBuilder1<int,
int>());
ctx->setArgument(0, x);
c.unuse(x);
c.endFunction();
c.make();

compiles to:
L.0:
; Prolog
push rbx
; Body
mov rbx, 42
; Function Call
mov edi, ebx
call 0x400df0
L.1:
; Epilog
pop rbx
ret
; Trampoline from 0x7fcba15d600b -> 0x400df0

whereas I want (modulo aligning the stack):
mov edi, 42
call 0x400df0
ret

Petr Kobalíček

unread,
Mar 17, 2011, 3:20:48 PM3/17/11
to asmji...@googlegroups.com
I'm going to fix this unnecessary spill, I though that it's already
implemented, but there is probably some issue in ECall() code.

Thank you
Petr Kobalicek

Petr Kobalíček

unread,
Mar 17, 2011, 3:46:59 PM3/17/11
to asmji...@googlegroups.com
Please try the latest revision, I think that the unnecessary move is
out related to this simple case.

Best regards
Petr Kobalicek

Petr Kobalíček

unread,
Mar 17, 2011, 4:13:56 PM3/17/11
to asmji...@googlegroups.com
Hi,

you might also want to try rev. #338, I disabled stack adjusting when
calling simple function (no stack arguments).

Best regards
Petr Kobalicek

On Thu, Mar 17, 2011 at 8:46 PM, Petr Kobalíček

Pengvado

unread,
Mar 17, 2011, 11:36:41 PM3/17/11
to asmjit-dev
Still spills if there's more than one function call. Also still spills
the callee address for function pointers.
And it now spills to the redzone, which isn't preserved across
function calls. And uses an unbounded amount of redzone if there's
lots to spill, whereas only 128 bytes are guaranteed to exist.

(When I said "modulo stack alignment", I didn't mean there was
anything about stack alignment that should change. I guess my previous
example should have been
sub rsp, 8
mov edi, 42
call 0x400df0

ret

r338 + this patch:
---
AsmJit-1.0/Test/testfunccall2.cpp | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/AsmJit-1.0/Test/testfunccall2.cpp b/AsmJit-1.0/Test/
testfunccall2.cpp
index f34868f..9b3b480 100644
--- a/AsmJit-1.0/Test/testfunccall2.cpp
+++ b/AsmJit-1.0/Test/testfunccall2.cpp
@@ -68,6 +68,22 @@ int main(int argc, char* argv[])
ctx->setPrototype(CALL_CONV_COMPAT_FASTCALL, FunctionBuilder1<Void,
int>());
ctx->setArgument(0, argument);

+ c.unuse(argument);
+ GPVar argument2(c.newGP(VARIABLE_TYPE_GPD));
+ c.mov(argument2, imm(2));
+
+ ECall* ctx2 = c.call(address);
+ ctx2->setPrototype(CALL_CONV_COMPAT_FASTCALL,
FunctionBuilder1<Void, int>());
+ ctx2->setArgument(0, argument2);
+
+ c.unuse(argument2);
+ GPVar argument3(c.newGP(VARIABLE_TYPE_GPD));
+ c.mov(argument3, imm(3));
+
+ ECall* ctx3 = c.call(address);
+ ctx3->setPrototype(CALL_CONV_COMPAT_FASTCALL,
FunctionBuilder1<Void, int>());
+ ctx3->setArgument(0, argument3);
+
c.endFunction();
//
==========================================================================

--
1.7.4.1

produces:
; Function Prototype:
;
;
; Variables:
;
; ID | Type | Sz | Home | Register Access | Memory
Access |
; ---+----------+----+----------------+-------------------
+-------------------+
; 0 | GP.Q | 8 | [None] | r=3 w=1 x=0 | r=0
w=0 x=0 |
; 1 | GP.D | 4 | [rsp - 0xC] | r=1 w=1 x=0 | r=0
w=0 x=0 |
; 2 | GP.D | 4 | [rsp - 0x10] | r=1 w=1 x=0 | r=0
w=0 x=0 |
; 3 | GP.D | 4 | [None] | r=1 w=1 x=0 | r=0
w=0 x=0 |
;
; Modified registers (2):
; GP : rbx, rdi
; MM :
; XMM:

L.0:
; Prolog
push rbx
; Body
mov rbx, 0x401614
mov edi, 0x1
; Function Call
mov [rsp - 0xC], edi ; Spill var_1
call rbx
mov edi, 0x2
; Function Call
mov [rsp - 0x10], edi ; Spill var_2
call rbx
mov edi, 0x3
; Function Call
call rbx
L.1:
; Epilog
pop rbx
ret
*** COMPILER SUCCESS - Wrote 39 bytes, code: 39, trampolines: 0.

Petr Kobalíček

unread,
Mar 18, 2011, 5:21:36 AM3/18/11
to asmji...@googlegroups.com
Hi Pengvado,

I'd like to solve the red-zone issue first. I though that it is
correctly implemented, but now I'm not sure, please can you write me
the asm of correctly implemented function?

I'm sure I broke it yesterday

The other issue (unnecessary spill) can be workarounded by using new
variable. AsmJit knows the scope of each variable and there is
currently no mechanism how to tell it that the variable will be
rewritten, I'd like to fix this, but I don't know how (it must be fast
and not complicated).

Best regards
Petr Kobalicek

Petr Kobalíček

unread,
Mar 18, 2011, 5:30:34 AM3/18/11
to asmji...@googlegroups.com
Ok, I tried to improve the isEspAdjusted flag, please try rev #340 if
it fixes the red-zone issue.

Best regards
Petr Kobalicek

Pengvado

unread,
Mar 18, 2011, 7:07:29 AM3/18/11
to asmjit-dev
r340 fixed the issue of using redzone in non-leaf functions.

Remaining redzone issues:
* x86_32 and win64 don't have a redzone at all. On those systems, you
simply can't safely access negative offsets of esp/rsp. I don't have a
win64 system to test, but AsmJit does try to use redzone on x86_32.
* unix x86_64 has 128 bytes of redzone. You can't safely access stack
addresses below [rsp-128].
By "unsafe" I don't mean that violating these constraints will
necessarily fail, just that the OS is allowed to clobber memory below
the stack whenever it wants to.

(The following example is long because it has to be in order to
generate lots of spills in a leaf function.)

======= redzone.cpp =======
#include <stdio.h>
#include <AsmJit/AsmJit.h>
using namespace AsmJit;

float a[N];

int main()
{
Compiler c;
FileLogger logger(stderr);
c.setLogger(&logger);
c.newFunction(CALL_CONV_DEFAULT, FunctionBuilder0<float>());
c.getFunction()->setHint(FUNCTION_HINT_NAKED, true);
XMMVar x[N];
for(int i=N-1; i>=0; i--)
c.movss((x[i] = c.newXMM()), ptr_abs(&a[i]));
for(int i=N-1; i>0; i--)
c.addss(x[i-1], x[i]);
c.ret(x[0]);
c.endFunction();

for(int i=0; i<N; i++)
a[i] = i+1;
float (*func)() = (float (*)())c.make();
printf("result: %f (expected %f)\n", func(), (float)(N*(N+1)/2));
return 0;
}

======= wrong output on x86_64 (though I got lucky and it ran anyway)
~> g++ -DN=25 redzone.cpp -o redzone -lAsmJit && ./redzone
; Function Prototype:
;
;
; Variables:
;
; ID | Type | Sz | Home | Register Access | Memory
Access |
; ---+----------+----+----------------+-------------------
+-------------------+
; 0 | XMM | 16 | [None] | r=1 w=1 x=0 | r=0
w=0 x=0 |
; 1 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 2 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 3 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 4 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 5 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 6 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 7 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 8 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 9 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 10 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 11 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 12 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 13 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 14 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 15 | XMM | 16 | [rsp - 24] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 16 | XMM | 16 | [rsp - 40] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 17 | XMM | 16 | [rsp - 56] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 18 | XMM | 16 | [rsp - 72] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 19 | XMM | 16 | [rsp - 88] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 20 | XMM | 16 | [rsp - 104] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 21 | XMM | 16 | [rsp - 120] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 22 | XMM | 16 | [rsp - 136] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 23 | XMM | 16 | [rsp - 152] | r=1 w=1 x=1 | r=0
w=0 x=0 |
; 24 | XMM | 16 | [None] | r=1 w=1 x=1 | r=0
w=0 x=0 |
;
; Modified registers (16):
; GP :
; MM :
; XMM: xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8, xmm9,
xmm10, xmm11, xmm12, xmm13, xmm14, xmm15

L.0:
; Prolog
; Body
movss xmm0, [602180]
movss xmm1, [60217C]
movss xmm2, [602178]
movss xmm3, [602174]
movss xmm4, [602170]
movss xmm5, [60216C]
movss xmm6, [602168]
movss xmm7, [602164]
movss xmm8, [602160]
movss xmm9, [60215C]
movss xmm10, [602158]
movss xmm11, [602154]
movss xmm12, [602150]
movss xmm13, [60214C]
movss xmm14, [602148]
movss xmm15, [602144]
movdqa [rsp - 24], xmm15 ; Spill var_15
movss xmm15, [602140]
movdqa [rsp - 40], xmm15 ; Spill var_16
movss xmm15, [60213C]
movdqa [rsp - 56], xmm15 ; Spill var_17
movss xmm15, [602138]
movdqa [rsp - 72], xmm15 ; Spill var_18
movss xmm15, [602134]
movdqa [rsp - 88], xmm15 ; Spill var_19
movss xmm15, [602130]
movdqa [rsp - 104], xmm15 ; Spill var_20
movss xmm15, [60212C]
movdqa [rsp - 120], xmm15 ; Spill var_21
movss xmm15, [602128]
movdqa [rsp - 136], xmm15 ; Spill var_22
movss xmm15, [602124]
movdqa [rsp - 152], xmm15 ; Spill var_23
movss xmm15, [602120]
addss xmm1, xmm0
addss xmm2, xmm1
addss xmm3, xmm2
addss xmm4, xmm3
addss xmm5, xmm4
addss xmm6, xmm5
addss xmm7, xmm6
addss xmm8, xmm7
addss xmm9, xmm8
addss xmm10, xmm9
addss xmm11, xmm10
addss xmm12, xmm11
addss xmm13, xmm12
addss xmm14, xmm13
movdqa xmm0, [rsp - 24] ; Alloc var_15
addss xmm0, xmm14
movdqa xmm1, [rsp - 40] ; Alloc var_16
addss xmm1, xmm0
movdqa xmm0, [rsp - 56] ; Alloc var_17
addss xmm0, xmm1
movdqa xmm1, [rsp - 72] ; Alloc var_18
addss xmm1, xmm0
movdqa xmm0, [rsp - 88] ; Alloc var_19
addss xmm0, xmm1
movdqa xmm1, [rsp - 104] ; Alloc var_20
addss xmm1, xmm0
movdqa xmm0, [rsp - 120] ; Alloc var_21
addss xmm0, xmm1
movdqa xmm1, [rsp - 136] ; Alloc var_22
addss xmm1, xmm0
movdqa xmm0, [rsp - 152] ; Alloc var_23
addss xmm0, xmm1
addss xmm15, xmm0
movdqa xmm0, xmm15
L.1:
; Epilog
ret
*** COMPILER SUCCESS - Wrote 482 bytes, code: 482, trampolines: 0.

result: 325.000000 (expected 325.000000)


======= desired output on x86_64 =======
L.0:
sub rsp, 152
movss xmm0, [602180]
movss xmm1, [60217C]
movss xmm2, [602178]
movss xmm3, [602174]
movss xmm4, [602170]
movss xmm5, [60216C]
movss xmm6, [602168]
movss xmm7, [602164]
movss xmm8, [602160]
movss xmm9, [60215C]
movss xmm10, [602158]
movss xmm11, [602154]
movss xmm12, [602150]
movss xmm13, [60214C]
movss xmm14, [602148]
movss xmm15, [602144]
movdqa [rsp + 128], xmm15
movss xmm15, [602140]
movdqa [rsp + 112], xmm15
movss xmm15, [60213C]
movdqa [rsp + 96], xmm15
movss xmm15, [602138]
movdqa [rsp + 80], xmm15
movss xmm15, [602134]
movdqa [rsp + 64], xmm15
movss xmm15, [602130]
movdqa [rsp + 48], xmm15
movss xmm15, [60212C]
movdqa [rsp + 32], xmm15
movss xmm15, [602128]
movdqa [rsp + 16], xmm15
movss xmm15, [602124]
movdqa [rsp], xmm15
movss xmm15, [602120]
addss xmm1, xmm0
addss xmm2, xmm1
addss xmm3, xmm2
addss xmm4, xmm3
addss xmm5, xmm4
addss xmm6, xmm5
addss xmm7, xmm6
addss xmm8, xmm7
addss xmm9, xmm8
addss xmm10, xmm9
addss xmm11, xmm10
addss xmm12, xmm11
addss xmm13, xmm12
addss xmm14, xmm13
movdqa xmm0, [rsp + 128]
addss xmm0, xmm14
movdqa xmm1, [rsp + 112]
addss xmm1, xmm0
movdqa xmm0, [rsp + 96]
addss xmm0, xmm1
movdqa xmm1, [rsp + 80]
addss xmm1, xmm0
movdqa xmm0, [rsp + 64]
addss xmm0, xmm1
movdqa xmm1, [rsp + 48]
addss xmm1, xmm0
movdqa xmm0, [rsp + 32]
addss xmm0, xmm1
movdqa xmm1, [rsp + 16]
addss xmm1, xmm0
movdqa xmm0, [rsp]
addss xmm0, xmm1
addss xmm15, xmm0
movdqa xmm0, xmm15
add rsp, 152
ret

=======
The same program with N=23 is ok as-is, since that doesn't use too
much redzone.

Petr Kobalíček

unread,
Mar 18, 2011, 7:24:45 AM3/18/11
to asmji...@googlegroups.com
Hi Pengvado,

I read the Unix 64-bit ABI (including the calling convention) and you
are right. Please verify my understanding:

- Leaf function is function that doesn't call other function
- Red-zone means 128 bytes
- Leaf function can use only 128 bytes of stack mem
- If I need more memory, I need to adjust rsp in prolog/epilog.

Correct?

Best regards
Petr Kobalicek

Pengvado

unread,
Mar 18, 2011, 7:32:16 AM3/18/11
to asmjit-dev
On Mar 18, 11:24 am, Petr Kobalíček <kobalicek.p...@gmail.com> wrote:
> - Leaf function is function that doesn't call other function
> - Red-zone means 128 bytes
> - Leaf function can use only 128 bytes of stack mem
> - If I need more memory, I need to adjust rsp in prolog/epilog.

Correct.

Petr Kobalíček

unread,
Mar 18, 2011, 7:33:11 AM3/18/11
to asmji...@googlegroups.com
So,

I need to adjust esp/rsp always in function calling convention is
unix-amd64 and stack size is larger than 128:

This should solve the issue:

if (_functionPrototype.getCallingConvention() == CALL_CONV_X64U &&
cc._memBytesTotal >= 128)
_isEspAdjusted = true;

Please try the latest rev, it should fix the red-zone issue.

Best regards
Petr Kobalicek

Petr Kobalíček

unread,
Mar 18, 2011, 7:46:22 AM3/18/11
to asmji...@googlegroups.com
And the remaining issue:

> Remaining redzone issues:
> * x86_32 and win64 don't have a redzone at all. On those systems, you
> simply can't safely access negative offsets of esp/rsp. I don't have a
> win64 system to test, but AsmJit does try to use redzone on x86_32.

Where is mentioned that I can't use stack below esp/rsp under windows?
I studied these calling conventions, but didn't notice something about
discarding stack. I understand the red-zone concept, but I thought
that it is possible to access bytes up to 4096+- (page size) bytes
under windows/linux.

But I'm going to fix it, no problemo :)

Best regards
Petr Kobalicek

On Fri, Mar 18, 2011 at 12:33 PM, Petr Kobalíček

Pengvado

unread,
Mar 18, 2011, 7:50:42 AM3/18/11
to asmjit-dev
On Mar 18, 11:33 am, Petr Kobalíček <kobalicek.p...@gmail.com> wrote:
>   if (_functionPrototype.getCallingConvention() == CALL_CONV_X64U &&
> cc._memBytesTotal >= 128)
>     _isEspAdjusted = true;

That fixes unix x86_64. To fix the other OSes, it also needs:

if (_functionPrototype.getCallingConvention() != CALL_CONV_X64U &&
cc._memBytesTotal > 0)
_isEspAdjusted = true;

Petr Kobalíček

unread,
Mar 18, 2011, 7:53:34 AM3/18/11
to asmji...@googlegroups.com
Try the latest rev, this should be fixed;)

Thank you!
Petr Kobalicek

Pengvado

unread,
Mar 18, 2011, 7:58:11 AM3/18/11
to asmjit-dev
On Mar 18, 11:46 am, Petr Kobalíček <kobalicek.p...@gmail.com> wrote:
> Where is mentioned that I can't use stack below esp/rsp under windows?
> I studied these calling conventions, but didn't notice something about
> discarding stack.

The first reasonably-official doc I found:
http://msdn.microsoft.com/en-us/library/x4ea06t0.aspx

> Try the latest rev, this should be fixed;)

Works.

Petr Kobalíček

unread,
Mar 18, 2011, 8:02:02 AM3/18/11
to asmji...@googlegroups.com
Ok, I missed it :)

Petr Kobalíček

unread,
Mar 18, 2011, 9:35:00 AM3/18/11
to asmji...@googlegroups.com
Hi,

try rev #345, I added better strenght to unuseVar() and now it is able
to prevent unnecessary spill. However, its experimental:)

Best regards
Petr Kobalicek

Pengvado

unread,
Mar 18, 2011, 11:38:20 AM3/18/11
to asmjit-dev
On Mar 18, 1:35 pm, Petr Kobalíček <kobalicek.p...@gmail.com> wrote:
> try rev #345, I added better strenght to unuseVar() and now it is able
> to prevent unnecessary spill. However, its experimental:)

Thanks. As far as I can tell, this generates correct and optimal code
in all the cases I'm using so far in my application. So I'm happy with
it.

I notice, however, that the callee address for a function pointer is
still unnecessarily allocated to a callee-saved register if possible,
although it isn't spilled if it happens to already be in a caller-
saved register.
testfunccall1:
L.0:
; Prolog
push rbp
mov rbp, rsp
push rbx
; Body
shl edi, 1
shl esi, 1
shl edx, 1
mov rbx, 4200132 ; <-- this could be rax
; Function Call
xchg rdi, edx
call rbx
L.1:
; Epilog
pop rbx
mov rsp, rbp
pop rbp
ret

Pengvado

unread,
Mar 20, 2011, 12:32:38 PM3/20/11
to asmjit-dev
More unnecessary spills:
Compiler decides whether to prefer caller-saved or callee-saved regs
on a per function basis, depending on whether the function is a leaf.
This should be a per variable decision, depending on whether the
variable's lifetime crosses any subroutine calls.

e.g.
void nop() {}
int x[2];
Compiler c;
c.newFunction(CALL_CONV_DEFAULT, FunctionBuilder0<int>());
c.getFunction()->setHint(FUNCTION_HINT_NAKED, true);
GPVar y(c.newGP(VARIABLE_TYPE_GPD));
c.mov(y, ptr_abs(&x[0]));
c.mov(ptr_abs(&x[1]), y);
c.unuse(y);
c.call((void*)nop)->setPrototype(CALL_CONV_DEFAULT,
FunctionBuilder0<Void>());
c.endFunction();
c.make();

compiles to:
push ebx
mov ebx, [0x804B0A4]
mov [0x804B0A8], ebx
call 0x8048C73
pop ebx
ret

whereas I want:
mov ecx, [0x804B0A4]
mov [0x804B0A8], ecx
call 0x8048C73
ret

Petr Kobalíček

unread,
Mar 20, 2011, 4:22:32 PM3/20/11
to asmji...@googlegroups.com
Hi,

I think that this is possible with a little modification, I'm going to try it.

Best regards
Petr Kobalicek

Petr Kobalíček

unread,
Mar 20, 2011, 11:49:38 PM3/20/11
to asmji...@googlegroups.com
Try rev #350, I did some improvements so your problem might be solved.

I'm now thinking how to improve the register allocator. It's still not
smart in some cases, but I think that last 2 days there was an
improvement:)

Pengvado

unread,
Mar 21, 2011, 2:10:55 PM3/21/11
to asmjit-dev
On Mar 21, 3:49 am, Petr Kobalíček <kobalicek.p...@gmail.com> wrote:
> Try rev #350, I did some improvements so your problem might be solved.

Variables allocated after the last subroutine call still get callee-
saved regs.

e.g.
void nop() {}
int x[2];
Compiler c;
c.newFunction(CALL_CONV_DEFAULT, FunctionBuilder0<int>());
c.getFunction()->setHint(FUNCTION_HINT_NAKED, true);
c.call((void*)nop)->setPrototype(CALL_CONV_DEFAULT,
FunctionBuilder0<Void>());
GPVar y(c.newGP(VARIABLE_TYPE_GPD));
c.mov(y, ptr_abs(&x[0]));
c.mov(ptr_abs(&x[1]), y);
c.endFunction();
c.make();

compiles to:
push ebx
call 0x8048C73
mov ebx, [0x804B0A4]
mov [0x804B0A8], ebx
pop ebx
ret

whereas I want:
call 0x8048C73
mov ecx, [0x804B0A4]
mov [0x804B0A8], ecx
ret

Petr Kobalíček

unread,
Mar 21, 2011, 3:54:04 PM3/21/11
to asmji...@googlegroups.com
Hi,

please try rev #356, there was mistake in condition, now it should work

Best regards
Petr Kobalicek

Pengvado

unread,
Mar 29, 2011, 6:18:48 PM3/29/11
to asmjit-dev
Here's a case of unnecessary spilling, even though Compiler correctly
decides that it never needs to unspill the variable:

c.newFunction(CALL_CONV_DEFAULT, FunctionBuilder1<int,int>());
c.getFunction()->setHint(FUNCTION_HINT_NAKED, true);
Label L0 = c.newLabel();
Label L1 = c.newLabel();
GPVar x(c.newGP());
c.jnz(L0);
ECall *ctx = c.call((void*)callee);
ctx->setPrototype(CALL_CONV_DEFAULT, FunctionBuilder1<int,int>());
ctx->setArgument(0, c.argGP(0));
ctx->setReturn(x);
c.jmp(L1);
c.bind(L0);
c.mov(x, c.argGP(0));
c.bind(L1);
c.ret(x);
c.endFunction();

compiles to:
L.0:
; Prolog
sub rsp, 0x18
; Body
short jnz L.2
; Function Call
mov [rsp], edi ; Spill arg_0
call 0x4014E4
short jmp L.3
L.2:
mov rax, edi
L.3:
L.1:
; Epilog
add rsp, 0x18
ret

whereas I want:
L.0:
; Prolog
; Body
short jnz L.2
; Function Call
call 0x4014E4
short jmp L.3
L.2:
mov rax, edi
L.3:
L.1:
; Epilog
ret



Here's a case of unnecessarily using a callee-saved reg. And even
given that, it also uses a spill where a reg-to-reg move would work:

c.newFunction(CALL_CONV_DEFAULT, FunctionBuilder1<int,int>());
c.getFunction()->setHint(FUNCTION_HINT_NAKED, true);
Label L0(c.newLabel());
Label L1(c.newLabel());
GPVar x(c.newGP());
GPVar y(c.newGP());
c.mov(x, 0);
c.jnz(L0);
c.mov(y, c.argGP(0));
c.jmp(L1);
c.bind(L0);
ECall *ctx = c.call((void*)callee);
ctx->setPrototype(CALL_CONV_DEFAULT, FunctionBuilder1<int,int>());
ctx->setArgument(0, c.argGP(0));
ctx->setReturn(y);
c.bind(L1);
c.add(x, y);
c.endFunction();

compiles to:
L.0:
; Prolog
push rbx
push r12
sub rsp, 0x18
; Body
mov rbx, 0x0
short jnz L.2
mov r12, edi
short jmp L.3
L.2:
; Function Call
call 0x401464
mov [rsp], rax ; Spill var_1
mov r12, [rsp] ; Alloc var_1
L.3:
add rbx, r12
L.1:
; Epilog
add rsp, 0x18
pop r12
pop rbx
ret

whereas I want:
L.0:
; Prolog
push rbx
; Body
mov rbx, 0x0
short jnz L.2
mov rax, edi
short jmp L.3
L.2:
; Function Call
call 0x401464
L.3:
add rbx, rax

Petr Kobalíček

unread,
Apr 2, 2011, 4:29:08 PM4/2/11
to asmji...@googlegroups.com
Hi,

I will look at the second issue when I get some spare time. The first
issue is harder, because the value is at the scope. You can try the
hints in cases where you want to unuse/spill variable.

For example this should work, but I didn't test it:)

c.newFunction(CALL_CONV_DEFAULT, FunctionBuilder1<int,int>());
c.getFunction()->setHint(FUNCTION_HINT_NAKED, true);
Label L0 = c.newLabel();
Label L1 = c.newLabel();
GPVar x(c.newGP());
c.jnz(L0);

c.unuse(x);


ECall *ctx = c.call((void*)callee);
ctx->setPrototype(CALL_CONV_DEFAULT, FunctionBuilder1<int,int>());
ctx->setArgument(0, c.argGP(0));
ctx->setReturn(x);
c.jmp(L1);
c.bind(L0);
c.mov(x, c.argGP(0));
c.bind(L1);
c.ret(x);
c.endFunction();

Best regards
Petr Kobalicek

Pengvado

unread,
Apr 11, 2011, 5:56:49 PM4/11/11
to asmjit-dev
Another case of unnecessarily using a callee-saved reg (or failing to
find eax when looking for a caller-saved reg?)

c.newFunction(CALL_CONV_DEFAULT, FunctionBuilder0<Void>());
c.getFunction()->setHint(FUNCTION_HINT_NAKED, true);
GPVar x(c.newGP());
GPVar y(c.newGP());
GPVar z(c.newGP());
c.mov(x, 0);
c.mov(y, 1);
c.mov(z, 2);
c.add(y, x);
c.add(z, y);
c.endFunction();

compiles to:
push ebx
mov ecx, 0x0
mov edx, 0x1
mov ebx, 0x2
add edx, ecx
add ebx, edx
pop ebx
ret

whereas I want:
mov ecx, 0x0
mov edx, 0x1
mov eax, 0x2
add edx, ecx
add eax, edx
ret
Reply all
Reply to author
Forward
0 new messages