Unused stack space

michael_...@hotmail.com

unread,

Sep 30, 2005, 1:45:53 AM9/30/05

to

In a C module that I have compiled, it seems to generate object files
which waste stack space. The C file is as follows

print1(char * str)
{
printf(str);
}

yes I know it is useless, it was simply created so that I could examine
how GCC generates asm code, which is below

>objdump -d print1.o
00000013 <print1>:
13: 55 push %ebp
14: 89 e5 mov %esp,%ebp
16: 83 ec 08 sub $0x8,%esp
19: 8b 45 08 mov 0x8(%ebp),%eax
1c: 89 04 24 mov %eax,(%esp)
1f: e8 fc ff ff ff call 20 <print1+0xd>
24: c9 leave
25: c3 ret

The code is relatively simple, but I don't understand the third
instruction (sub 0x8, esp). obviously it allocates 8 bytes to the
stack, but why does it allocate this much? It only uses 4 bytes of it
(to pass the str parameter to printf), and leaves four bytes unused.
Why is this? Or am I confusing something??

My mental image of the stack just before the call to printf is as
follows. Is this accurate?

|Param to print1 |
|--------------------|
|Addr of caller |
|--------------------|
|Caller EBP |
|--------------------|
|UNUSED DWORD |
|--------------------|
|Param to printf |
|--------------------|

cheers
MQ

"Nils O. Selåsdal"

unread,

Sep 30, 2005, 1:52:46 AM9/30/05

to

michael_...@hotmail.com wrote:
> In a C module that I have compiled, it seems to generate object files
> which waste stack space. The C file is as follows
>
> print1(char * str)
> {
> printf(str);
> }
>
> yes I know it is useless, it was simply created so that I could examine
> how GCC generates asm code, which is below
>
>
>>objdump -d print1.o
>
> 00000013 <print1>:
> 13: 55 push %ebp
> 14: 89 e5 mov %esp,%ebp
> 16: 83 ec 08 sub $0x8,%esp
> 19: 8b 45 08 mov 0x8(%ebp),%eax
> 1c: 89 04 24 mov %eax,(%esp)
> 1f: e8 fc ff ff ff call 20 <print1+0xd>
> 24: c9 leave
> 25: c3 ret
>
>
> The code is relatively simple, but I don't understand the third
> instruction (sub 0x8, esp). obviously it allocates 8 bytes to the
> stack, but why does it allocate this much? It only uses 4 bytes of it
> (to pass the str parameter to printf), and leaves four bytes unused.
> Why is this? Or am I confusing something??

Yes, you shouldn't care squat what the compiler does in this regard.
Also, the return address and frame pointer are pushed onto the stack
when calling a function, normally.
You could play with e.g. -fomit-frame-pointer and -O2 also..

Basile Starynkevitch [news]

unread,

Sep 30, 2005, 1:53:41 AM9/30/05

to

On 2005-09-30, michael_...@hotmail.com <michael_...@hotmail.com> wrote:
> In a C module that I have compiled, it seems to generate object files
> which waste stack space. The C file is as follows
>
> print1(char * str)
> {
> printf(str);
> }

Better use puts, calling print1 with a string like "a%s" will harm.

> yes I know it is useless, it was simply created so that I could examine
> how GCC generates asm code, which is below
>
>>objdump -d print1.o
> 00000013 <print1>:
> 13: 55 push %ebp
> 14: 89 e5 mov %esp,%ebp
> 16: 83 ec 08 sub $0x8,%esp
> 19: 8b 45 08 mov 0x8(%ebp),%eax
> 1c: 89 04 24 mov %eax,(%esp)
> 1f: e8 fc ff ff ff call 20 <print1+0xd>
> 24: c9 leave
> 25: c3 ret

This depends of the compiler, and mostly of the required optimisation
level. <ith -O3 using gcc 3.4.5 I am getting (gcc -S -O3 print1.c, I
replaced printf with puts)

.file "print1.c"
.text
.p2align 4,,15
.globl print1
.type print1, @function
print1:
pushl %ebp
movl %esp, %ebp
popl %ebp
jmp puts
.size print1, .-print1
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.4.5 20050821 (prerelease) (Debian 3.4.4-8)"

>
> The code is relatively simple, but I don't understand the third
> instruction (sub 0x8, esp). obviously it allocates 8 bytes to the
> stack, but why does it allocate this much? It only uses 4 bytes of it
> (to pass the str parameter to printf), and leaves four bytes unused.

The stack frame is required to always be 8 byte aligned (to make local
double variables well aligned enough). So the compiler expects it to
be so aligned on call, and provide code respecting this aligment.

Details are probably in x86 ABI

--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile(at)starynkevitch(dot)net
8, rue de la Faïencerie, 92340 Bourg La Reine, France

Josef Moellers

unread,

Sep 30, 2005, 5:15:21 AM9/30/05

to

It could very well be that the compiler aligns stack frames such that
they are always a multiple of 16.

This is what gcc version 3.3.1 (SuSE Linux) makes of it:
0804833c <main>:
804833c: 55 push %ebp
804833d: 89 e5 mov %esp,%ebp
804833f: 83 ec 08 sub $0x8,%esp
8048342: 83 e4 f0 and $0xfffffff0,%esp
8048345: 83 ec 0c sub $0xc,%esp
8048348: 68 78 84 04 08 push $0x8048478
804834d: e8 02 ff ff ff call 8048254 <_init+0x28>
8048352: c9 leave
8048353: c3 ret

It wastes quite a lot of stack space, but this:
and $0xfffffff0,%esp
makes me make that assumption.

You could retry with various numbers of arguments.

Josef
--
Josef Möllers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize
-- T. Pratchett

Kasper Dupont

unread,

Oct 1, 2005, 5:37:38 AM10/1/05

to

"Basile Starynkevitch [news]" wrote:
>
> pushl %ebp
> movl %esp, %ebp
> popl %ebp

Interesting that it was't able to optimize those three instructions.
Maybe a few new peephole patterns could help there.

>
> The stack frame is required to always be 8 byte aligned (to make local
> double variables well aligned enough). So the compiler expects it to
> be so aligned on call, and provide code respecting this aligment.

That makes some sense, but I thought it was only required to be
aligned on four byte boundaries. Why eight? Is there any performance
to gain? AFAIK x86 is always able to handle access on unaligned
addresses, though performance decrease if one access cross a four byte
boundary.

--
Kasper Dupont
Note to self: Don't try to allocate
256000 pages with GFP_KERNEL on x86.

Basile Starynkevitch [news]

unread,

Oct 1, 2005, 7:28:36 AM10/1/05

to

On 2005-10-01, Kasper Dupont <kas...@daimi.au.dk> wrote:
> "Basile Starynkevitch [news]" wrote:

I did post something, but the 3 below instructions are
compiler-produced. I'm only guilty of running the compiler!

>>
>> pushl %ebp
>> movl %esp, %ebp
>> popl %ebp
>
> Interesting that it was't able to optimize those three instructions.
> Maybe a few new peephole patterns could help there.
>
>>
>> The stack frame is required to always be 8 byte aligned (to make local
>> double variables well aligned enough). So the compiler expects it to
>> be so aligned on call, and provide code respecting this aligment.
>
> That makes some sense, but I thought it was only required to be
> aligned on four byte boundaries. Why eight?

Because on x86 double (64 bits floating point) are faster accessed
when aligned to 8 bytes, so in order to be sure that double locals
(inside a call frame) are suitably 8byte aligned, the simplest way is
to ensure (by convention) that sp is aligned to 8 bytes.

AFAIK, PowerPC has for similar reason the convention that sp is 16
bytes aligned. And having well aligned local data is practically
important, for performance (and also cache alignment) reasons.

The alternative would be that each function prologue allcate a stack
frame with its suitable aligment constraint, this would be probably
much slower (to do pointer arithmetic on the frame pointer to achieve
this).