Simple question about RET 4 and the stack

catcalls

unread,

Apr 21, 2011, 6:30:23 PM4/21/11

to

Hi asm-junkies!

Been doing some research on Wikipedia about the RET command.
Basically, as I understand it, a ret command with a value after it
pops an ADDITIONAL 16-bit value from the stack. This is in addition to
the pushed return call address.

So, if I were to have the following code for example;

foo:

push ebp
mov ebp, esp

mov eax, [ebp + 8]
...
mov esp, ebp
pop ebp
ret 4

This will adjust the stack for 2x DWORDS pushed onto stack prior to
calling foo. Am I right in thinking this?

The reason I ask is because I have the following code, that calls foo
(for example) three times in a row, each time pushing 2xDWORD onto
stack before calling foo, and using a simple RET without the 4 does
not effect the behaviour of the program?

How ever, as I understand it, it takes special circumstances to
actually expose a RET stack bug. And, I just require the confirmation
from people here that it is actually a 16-bit base value that gets
removed from the stack with a RET 1 and to remove two DWORDS I would
need a RET 4?

TIA,

Cat

Frank Kotler

unread,

Apr 21, 2011, 7:14:39 PM4/21/11

to

catcalls wrote:
> Hi asm-junkies!
>
> Been doing some research on Wikipedia about the RET command.
> Basically, as I understand it, a ret command with a value after it
> pops an ADDITIONAL 16-bit value from the stack. This is in addition to
> the pushed return call address.
>
> So, if I were to have the following code for example;
>
> foo:
>
> push ebp
> mov ebp, esp
>
> mov eax, [ebp + 8]
> ...
> mov esp, ebp
> pop ebp
> ret 4
>
> This will adjust the stack for 2x DWORDS pushed onto stack prior to
> calling foo. Am I right in thinking this?

Not quite. The parameter following "ret" (if any) refers to the number
of bytes to remove from the stack, so to remove 2 dwords, you'd want
"ret 8". There are two different "calling conventions" you might
encounter. "cdecl" expects the caller to clean up...

push A
push B
call foo ; foo just ends in "ret"
add esp, 8 ; clean up stack

Or, "stdcall" expects the callee to clean up - Windows APIs (for
example) use this...

push A
push B
call foo
; callee has cleaned up stack

foo:
; do some stuff
ret 8

This ASSumes 32-bit code. In your next message (or two), you do:

mov eax, [esp+2] ; Compare high dwords of a and b
cmp eax, [esp + 6]

This is probably not where your parameters are! Most likely, you want
"[esp + 4]" and "[esp + 8]"...

Best,
Frank

catcalls

unread,

Apr 22, 2011, 1:04:59 AM4/22/11

to

On Apr 22, 12:14 am, Frank Kotler

Thanks, Frank!

Actually, this code has started to get a little more complex then
first imagined.

In the first example, (I'll post my real code now), there is this call
I am working on taken from a asm.x86 OP;

Cmp32To32: ; a:int64, b:int64
; push ebp
; mov ebp, esp

mov eax, [esp + 2] ;2 Compare high dwords of a and b
cmp eax, [esp + 6] ;6
seta cl
movzx ecx, cl ; Fill rest of ECX with 0s
sbb edx, edx ; If a < b, EDX = -1, else 0
or edx, ecx

mov eax, edx

; mov esp, ebp
; pop ebp

ret 4

Now, I get the result as follows when I call the following code;

push DWORD 0x00000003 ; (a)
push DWORD 0x00000002 ; (b)
call Cmp32To32
mov esi, Message_Out
add eax, 66
mov [esi+10], al ; (A)

push DWORD 0x00000001 ; (a)
push DWORD 0x00000002 ; (b)
call Cmp32To32
add eax, 66
mov [esi+21], al ; (C)

push DWORD 0x00000002 ; (a)
push DWORD 0x00000002 ; (b)
call Cmp32To32
add eax, 66
mov [esi+32], al ; (B)
mov bx, Message_Out_Length
call SecPutC

The message that SecPutC puts out is;

A > B : A
A < B : C
A = B : B

Where, I've 'added' a value of 66 to the result from Cmp32To32 to get
a letter instead of numerical output.

Read Cmp64bit Value to 64Bit Value thread in asm.x86 for more
information.

Well, returning to the original code above;

When I include the mov ebp, esp and push ebp stuff at the top, I get a
different result from SecPutC. It seems my values are not being
located correctly.

Is this due to me using 32-bit stack when in 16-bit Real Mode?
Because, using my 16-bit bootloader, I write and test all my code in
16-bit Real Mode first!

My next 'test' will be to use BP and SP instead of EBP and ESP.

So, how do I 'convert' the 16-bit values of [esp + 2] and [esp + 6] to
32-bit Protected Mode values? Will I require that RET 8?

Kind regards,

Cat

catcalls

unread,

Apr 22, 2011, 1:17:22 AM4/22/11

to

Final Code:

Cmp32To32: ; a:int64, b:int64

push ebp
mov ebp, esp

mov eax, [ebp+10] ;2 Compare high dwords of a and b
mov ebx, 16
call writeInt
cmp eax, [ebp+6] ;6

seta cl
movzx ecx, cl ; Fill rest of ECX with 0s
sbb edx, edx ; If a < b, EDX = -1, else 0
or edx, ecx

mov eax, edx

mov esp, ebp
pop ebp

ret 8

This code basically used the writeInt sub I wrote to obtain the
location on the stack of the two DWORDs.

The first was at [ebp+6] and the second at [ebp+10] : Interesting,
hey?

Does this solve the 32-bit riddle? I hope so. Do not forget...this
code is currently running in 16-bit Real Mode.

Kind regards,

cAt

wolfgang kern

unread,

Apr 22, 2011, 5:06:35 AM4/22/11

to

"catcalls" posted:

> Cmp32To32: ; a:int64, b:int64

...
with two 64-bit arguments on the stack I'd expect RET 16 instead 8.
OTOH you named it CMP 32 to 32 and use this 'procedure' twice ?

It could be done less detoured and faster with only one single branch:
(it saves on push/call/ret and cache penalties from stack abusing)

mov eax,[a+4]
mov edx,[b]
cmp eax,[b+4]
jnz L1
cmp edx,[a] ;compare lowparts only if highparts are equal.
L1:
; jcc/setcc/cmovcc ;the flagregister tells everything here.

> This code basically used the writeInt sub I wrote to obtain the
> location on the stack of the two DWORDs.
>
> The first was at [ebp+6] and the second at [ebp+10] : Interesting,
> hey?
>
> Does this solve the 32-bit riddle? I hope so.
> Do not forget...this code is currently running in 16-bit Real Mode.

16-bit realmode may show that the highpart of ESP is always Zero,
and the Return address on the stack is only 16 bits wide of course.

I just wonder why you use this detour with parameter copying over
the stack in an environment which can use EBP as a GP-register
beside that you don't use the stack nor EBP at all in this yet.

Either way, the stack will always be filled in the order of occurence
and it grows downwards. Any CALL (near/far/16/32/64) also PUSHes the
return address as the last item for a HLL-'procedure'-call.
And the last pushed item is the one which (e)SP points to on entry.
(AFAIR, anchient 8080 had this different).

So a "RET Number" POPs first the return address into (e)IP and ADDs
Number to e(SP) to get rid of pushed argument-'bytes'.
__
wolfgang

catcalls

unread,

Apr 22, 2011, 11:15:43 AM4/22/11

to

Dear Wolfgang,

Thank you for your experience and insights into the code I presented
here.

In fact, personally find this code rather _fun_ because of the
branchless nature and also some of the instructions I need under my
belt as reference.

So, I'm throwing them into the OS I'm writing as future reference.

Oh, btw...I tested the routine in 32-bit PMode and did not require
changing any of the stack locations at all which is something I found
interesting.

So, now I have a function that returns either A, B, or C depending on
a comparison of two 32-bit stack values.

Neat, eh?

One point I wished to raise with the author of the original 64-bit CMP
function that is also branchless, was that even if you have -1, 1, and
0 as a result for

the comparison of two 64-bit values...surely you'll require a Jxx
(E.g. Jump if Equal) instruction to utilise these values in some base
C-style switch statement later

on in the code?

But, nevertheless I'm always open to new ideas.

Ah, just re-read your closing statement on RET...and I now am thinking
that when I state [esp+12] for example, that +12 takes me further down
the stack addresses past

the return address, and into the realm of pushed values onto the
stack.

O.K. Thank you for your time...perhaps I should really modify those
misleading code comments!

Cat

DSF

unread,

Apr 22, 2011, 9:46:01 PM4/22/11

to

On Fri, 22 Apr 2011 11:06:35 +0200, "wolfgang kern" <now...@never.at>
wrote:

>
>"catcalls" posted:
>
>> Cmp32To32: ; a:int64, b:int64
>...
>with two 64-bit arguments on the stack I'd expect RET 16 instead 8.
>OTOH you named it CMP 32 to 32 and use this 'procedure' twice ?

The comments on the above line come from my original code for
Cmp64To64. He's passing 32 bit values.

DSF

unread,

Apr 22, 2011, 10:40:27 PM4/22/11

to

On Fri, 22 Apr 2011 08:15:43 -0700 (PDT), catcalls
<obrzu...@nospicedham.gmail.com> wrote:

>On Apr 22, 10:06 am, "wolfgang kern" <nowh...@never.at> wrote:
>> "catcalls" posted:

>One point I wished to raise with the author of the original 64-bit CMP

>function that is also branchless, was that even if you have -1, 1, and
>0 as a result for
>
>the comparison of two 64-bit values...surely you'll require a Jxx
>(E.g. Jump if Equal) instruction to utilise these values in some base
>C-style switch statement later
>
>on in the code?
>

>Cat

Probably. A comparison may be needed or the return value may be
used directly as you did. My original goal was to get rid of the
branches just to see if I could do it. The fact that it worked out to
be -1, 0, 1 instead of (some negative number), 0, (some positive
number) wasn't a goal, but a byproduct of the algorithm.

One thing that puzzles me is that you are operating in 16 bit real
mode, yet using 32 bit registers. (?) If you have access to 32 bit
registers, a simple CMP followed by JA, JAE, JB JBE, etc. will handle
32 bit comparisons and not require the overhead of CALL/RET. If my C
compiler would support it, I would love to inline Cmp64To64 and save
the CALL/RET myself.

>In fact, personally find this code rather _fun_ because of the
>branchless nature and also some of the instructions I need under my
>belt as reference.

I too am trying to increase the number of instructions I'm familiar
with enough so that I do not need to look them up every time I need
them. This gave me some experience with the SET instruction. Plus a
little refresher on the very useful fact that SBBing a register with
itself sets all bits of the register to 0 or 1 depending on the carry
flag.

As most ASM I write interfaces with C, I don't have a lot of
experience with using RET x to clean up the stack, but it seems
logical that x should represent the total size in bytes of any
variables pushed on the stack before the call.

In 32 bit mode, the last variable pushed on the stack can be found
at [ESP+4] ([ESP] contains the 32 bit return address.) The previous
variable pushed can be found at [ESP+4+size of first variable], etc.
In 16 bit mode, the return address is at (E)SP but would only be two
bytes and I assume the first variable pushed would be at [(E)SP+2].

DSF

catcalls

unread,

Apr 23, 2011, 2:36:07 AM4/23/11

to

On Apr 23, 3:40 am, DSF <notava...@address.here> wrote:
> On Fri, 22 Apr 2011 08:15:43 -0700 (PDT), catcalls
>

Hi DSF,

Yes, I absolutely loved your branchless code. The code really blew my
mind when
I seen what you did with the conversion between original code and the
branchless version.

Thank you for taking the time out to write about your motivations,
which are similiar I must
admit to my own...expanding our knowledge and ability in ASM.

I totally understand that the return address is pushed onto the stack
when making a call to
a sub-routine...but...I found the first occurrence of my pushed stack
variable at ESP+6, not
ESP+2 or ESP+4, and that is true for both real and PMode?

Perhaps this is something to do with my code...or VMWare Player since
I've only tested the code
in a virtual machine.

When I try the stack locations that I pretty much assumed would be
where they are located (ESP+4 and ESP+8)
in Protected Mode, the sub-routine throws a wobbly. Let me actually
test +4 and +8 again to get the _true_
results of my code...

Yes, when I use the below code, I get 'C' (a < b) returned for each
comparison!

Cmp32: ; a:int64, b:int64

push ebp
mov ebp, esp

mov eax, [ebp+4] ;6 Compare dwords of a and b
cmp eax, [ebp+8] ;10
;

seta cl
movzx ecx, cl ; Fill rest of ECX with 0s
sbb edx, edx ; If a < b, EDX = -1, else 0

or edx, ecx ; If a > b, EDX = 1

mov eax, edx

mov esp, ebp
pop ebp

ret 8

Now, when I use +6 and +10 with the same code, I get the correct A, B,
or C returned. Yet, I've only tested single digit
numbers, I'll report back further testing in the future...

Cat

catcalls

unread,

Apr 23, 2011, 2:42:34 AM4/23/11

to

On Apr 23, 3:40 am, DSF <notava...@address.here> wrote:

> On Fri, 22 Apr 2011 08:15:43 -0700 (PDT), catcalls
>

Hi DSF,

It is as I suspected, when I used comparisons that started with
0xF0000000, I got false results, which is annoying!

It seems the code can only 'handle' the first four digits of the
comparison. Such as in 0000xxxxh where the x's are the
available numbers to compare using this code.

Use anything with the last digit (x0000000h) then the code fails to
work. I'll take a look into it after I get back from the city today.

Peace,

Cat

DSF

unread,

Apr 23, 2011, 5:58:41 PM4/23/11

to

On Fri, 22 Apr 2011 23:42:34 -0700 (PDT), catcalls
<obrzu...@nospicedham.gmail.com> wrote:

>Hi DSF,
>
>It is as I suspected, when I used comparisons that started with
>0xF0000000, I got false results, which is annoying!
>
>It seems the code can only 'handle' the first four digits of the
>comparison. Such as in 0000xxxxh where the x's are the
>available numbers to compare using this code.
>
>Use anything with the last digit (x0000000h) then the code fails to
>work. I'll take a look into it after I get back from the city today.
>
>Peace,
>
>Cat

A little blind leading the blind, here... :o)

(If any of the following is wrong, anyone *please* feel free to step
in and correct me!)

Maybe it relates to the fact that you're in 16 bit mode. Have you
looked at the generated code, in a debugger, for instance?
For example:

mov eax, ecx
mov ax, cx

The MOV instructions above use the same instructions, but in 32 bit
mode an operand-size override prefix (66h) indicates 16 bit registers
are used.

00000006 8B C1 mov eax, ecx
00000008 66| 8B C1 mov ax, cx

In 16 bit mode, the prefix is used to indicate 32 bit registers(?)
(This is what I've read, I haven't actually assembled code like
above in 16 bit mode to test this.)

So you may be writing EAX and getting AX from the assembler, or 16
bit mode may just ignore the high word of 32 bit values, either would
explain the results you're getting. As I said before, I didn't even
know that code with 32 bit registers in it would assemble in 16 bit
mode. (If only we could access all of the 64 bit registers in 32 bit
mode. Sigh!)

DSF

catcalls

unread,

Apr 23, 2011, 9:23:20 PM4/23/11

to

Hi DSF,

Simple answer really. Basically, you were right that the stack offset
_starts_ at +4...how ever...
because I was using a DWORD, I required +8 and +12 as my offsets. When
I think about stacks
growing downwards, and memory offsets etc...it starts to make sense,
no?

Thanks all the same (btw...your code rocks!)

Cat

DSF

unread,

Apr 25, 2011, 6:17:24 PM4/25/11

to

On Sat, 23 Apr 2011 18:23:20 -0700 (PDT), catcalls
<obrzu...@nospicedham.gmail.com> wrote:

>
>Simple answer really. Basically, you were right that the stack offset
>_starts_ at +4...how ever...
>because I was using a DWORD, I required +8 and +12 as my offsets. When
>I think about stacks
>growing downwards, and memory offsets etc...it starts to make sense,
>no?

Yes. I was just puzzled regarding the +6 and +10 working. But if
the return address is two bytes, then the offsets wouldn't be
multiples of four. That makes sense.

I'm just confused (and curious) as to how 32 bit registers act in 16
bit real mode. It sounds like (from your previous example) that maybe
the high words are ignored?

>
>Thanks all the same (btw...your code rocks!)

Thanks for the compliment!

DSF