MOV EAX versus PUSH/POP

Bob Masta

unread,

Jun 5, 2003, 8:38:12 AM6/5/03

to

To move a dword from one memory location to another
I've always used
MOV EAX,VAR1
MOV VAR2,EAX
instead of
PUSH VAR1
POP VAR2
since the former has always been faster up
through 486 at least, as well as being shorter.
But does the speed difference still hold up on
modern CPUs? I'm wondering if back-to-back
use of EAX causes a stall, and/or maybe the
stack operations have been optimized such
that the advantages are now reversed.

Thanks!

Bob Masta
tech(AT)daqarta(DOT)com

D A Q A R T A
Data AcQuisition And Real-Time Analysis
Shareware from Interstellar Research
www.daqarta.com

Shill

unread,

Jun 5, 2003, 11:04:49 AM6/5/03

to

Bob Masta wrote:
> To move a dword from one memory location to another
> I've always used
> MOV EAX,VAR1
> MOV VAR2,EAX
> instead of
> PUSH VAR1
> POP VAR2
> since the former has always been faster up
> through 486 at least, as well as being shorter.
> But does the speed difference still hold up on
> modern CPUs? I'm wondering if back-to-back
> use of EAX causes a stall, and/or maybe the
> stack operations have been optimized such
> that the advantages are now reversed.

Interesting question...

I don't have a definitive answer, only a few comments:

PUSH mem32 is functionally equivalent to:
SUB ESP, 4
MOV [ESP], mem32

POP mem32 is functionally equivalent to:
MOV mem32, [ESP]
ADD ESP, 4

Note: I said *functionally* equivalent, I know MOV can't be used with
two memory references.

1) The stall will happen whether you use a register or memory, you can't
run from a read-after-write dependency.

In the case of the PUSH/POP pair, your load/store buffers will detect
that the POP instruction is attempting to read from the same memory
location (ESP) that the PUSH instruction has just written to.

2) In the Athlon, PUSH/POP mem32 are VectorPath instructions, i.e. you
can only decode one per cycle, whereas MOV reg32/mem32 and its twin
brother are DirectPath instructions, i.e. you can decode 3 per cycles.

This is only relevant if you have other instructions around your pair of
instructions, of course.

Koen van der Meij

unread,

Jun 5, 2003, 11:30:00 AM6/5/03

to

> To move a dword from one memory location to another
> I've always used
> MOV EAX,VAR1
> MOV VAR2,EAX
> instead of
> PUSH VAR1
> POP VAR2

I could be wrong, but weren't the push/pop instructions for registers only
(at least on the 8086 they were right) So if I'm right (I know, it's a BIG
if),if I'm right, the assembler translates your 'push var1' so it would
become slower.
I could be terribly wrong, but hey, if I would be, I would learn, right?

Koen

Dennis Bliefernicht

unread,

Jun 5, 2003, 2:21:54 PM6/5/03

to

Koen van der Meij wrote:
> I could be terribly wrong, but hey, if I would be, I would learn, right?

You just learned :)

Shill

unread,

Jun 5, 2003, 5:16:03 PM6/5/03

to

>>To move a dword from one memory location to another
>>I've always used
>> MOV EAX,VAR1
>> MOV VAR2,EAX
>>instead of
>> PUSH VAR1
>> POP VAR2
>
> I could be wrong, but weren't the push/pop instructions for registers
> only (at least on the 8086 they were right)

http://web.archive.org/web/20020219153200/www.quantasm.com/opcode_i.html
http://www.cs.tut.fi/~siponen/upros/intel/instr/push.html

PUSH mem16 was, indeed, available in the 8086 (opcode 0xFF). You can
assemble PUSH [1234] in MSDOS Debug.

PUSH imm, however, was only introduced later (opcodes 0x68 and 0x6A).

> So if I'm right (I know, it's a BIG if),if I'm right, the assembler
> translates your 'push var1' so it would become slower.

One of the MIPS R3000 assembler I've used had a pseudo-instruction which
translated to two instructions to load a 32-bit immediate, but I didn't
know x86 assemblers had pseudo-instructions. Which assembler are you
talking about?

Matthew Taylor

unread,

Jun 5, 2003, 9:58:19 PM6/5/03

to

No, pop is VP. The push instruction is DP, but it has a latency of 2 clocks.
The esp register is updated in 1 cycle, so dependencies can partially
overlap. Theory was that Opteron might play games with the push/pop
instructions to reduce latency and increase performance, but AMD has been
very hush hush about it.

As you said, the mov instructions will hit the L/S queue and execute with a
latency of 2 clocks and a throughput of 2/3 clocks. In general this is the
way to do it. The push/pop solution is never faster and may in some cases be
slower (Athlon, Pentium 4).

BTW, for Athlon the pair of mov instructions will be unconditionally faster.
The push instruction is 2 clocks, and pop is also 2 clocks IIRC (too lazy to
look it up). Each mov instruction is DP and 1 clock, so the total latency is
equivalent to a single push instruction regardless of whether or not you can
pair stuff. Also, since Athlon is OOOE, you can't be absolutely certain that
nothing will pair with them unless you contrive an example with cpuid.

-Matt

"Shill" <nob...@example.com> wrote in message
news:bbnm6h$g3r$1...@biggoron.nerim.net...

Shill

unread,

Jun 6, 2003, 4:39:40 AM6/6/03

to

> No, pop is VP. The push instruction is DP, but it has a latency of
> 2 clocks.

In the Athlon, PUSH mem32 is a VectorPath instruction with an execute
latency of 3 cycles. However, ESP is available one clock earlier than
the specified latency, i.e. after 2 cycles, I assume.

> The push instruction is 2 clocks, and pop is also 2 clocks IIRC
> (too lazy to look it up). Each mov instruction is DP and 1 clock,
> so the total latency is equivalent to a single push instruction
> regardless of whether or not you can pair stuff.

MOV mem32, reg32 (opcode 0x89)
MOV reg32, mem32 (opcode 0x8B)
PUSH mem32 (opcode 0xFF)
POP mem32 (opcode 0x8F)

In the Athlon, these 4 instructions have an execute latency of 3 cycles.
The first two are DirectPath, the last two are VectorPath.

I double checked in Appendix F ;)

Ben Peddell

unread,

Jun 6, 2003, 7:01:00 AM6/6/03

to

I know that A86 had a psuedo-instruction (MOV sreg2, sreg1) which it
converted to:
PUSH AX
MOV AX, sreg1
MOV sreg2, AX
POP AX

asbe abi

unread,

Jun 6, 2003, 4:34:58 AM6/6/03

to

>The push/pop solution is never faster and may in some cases be
>slower (Athlon, Pentium 4).

Genius!!

GiM

unread,

Jun 6, 2003, 2:10:06 PM6/6/03

to

Someone saying he's: Ben Peddell wrote:
>
> I know that A86 had a psuedo-instruction (MOV sreg2, sreg1) which it
> converted to:
> PUSH AX
> MOV AX, sreg1
> MOV sreg2, AX
> POP AX
>

hey this seems senseless to me :)
why to do mov sreg2, ax if you then do pop ax, huh ?

cya GiM
--
as* ss `$s_s$' RLU:261015 ++++++[>+++[>+++++++>+++>+++++>++++++<<<<-
$l'__ ____ $`$'$ ]>-->->+<<<<-]>>+>-->+.>.<<<+.-->.<--.->>>.<+<.>+.<<.
`$_`| `||' _$s s$_ --.+++.->.<.++++.>.>-->----.+++<.++.>.--.+++++.<++++.

Frank Kotler

unread,

Jun 6, 2003, 3:32:40 PM6/6/03

to

GiM wrote:
> Someone saying he's: Ben Peddell wrote:
>
>>I know that A86 had a psuedo-instruction (MOV sreg2, sreg1) which it
>>converted to:
>>PUSH AX
>>MOV AX, sreg1
>>MOV sreg2, AX
>>POP AX
>>
>
> hey this seems senseless to me :)
> why to do mov sreg2, ax if you then do pop ax, huh ?

The actual code A86 assembles for "mov ds, cs" is "push cs"/"pop ds".

Best,
Frank

Shill

unread,

Jun 7, 2003, 9:33:00 AM6/7/03

to

>> I know that A86 had a psuedo-instruction (MOV sreg2, sreg1)
>> which it converted to:
>> PUSH AX
>> MOV AX, sreg1
>> MOV sreg2, AX
>> POP AX
>
> hey this seems senseless to me :)
> why to do mov sreg2, ax if you then do pop ax, huh ?

In case you want to preserve the value in AX.

PUSH AX ; save a copy of AX on the stack
MOV AX, sreg1 ; overwrite AX with the value in sreg1
MOV sreg2, AX ; then move that value to sreg2
POP AX ; now we can restore the old AX

Matt Taylor

unread,

Jun 7, 2003, 9:44:59 PM6/7/03

to

Shill <nob...@example.com> wrote in message news:<bbpk0c$1o8p$1...@biggoron.nerim.net>...

Yes, I'm not sure why but I was using the reg-reg numbers which I have
memorized. Using reg-mem forms, the mov/mov will -always- be faster
post-486.

-Matt