Thanks!
Bob Masta
tech(AT)daqarta(DOT)com
D A Q A R T A
Data AcQuisition And Real-Time Analysis
Shareware from Interstellar Research
www.daqarta.com
Interesting question...
I don't have a definitive answer, only a few comments:
PUSH mem32 is functionally equivalent to:
SUB ESP, 4
MOV [ESP], mem32
POP mem32 is functionally equivalent to:
MOV mem32, [ESP]
ADD ESP, 4
Note: I said *functionally* equivalent, I know MOV can't be used with
two memory references.
1) The stall will happen whether you use a register or memory, you can't
run from a read-after-write dependency.
In the case of the PUSH/POP pair, your load/store buffers will detect
that the POP instruction is attempting to read from the same memory
location (ESP) that the PUSH instruction has just written to.
2) In the Athlon, PUSH/POP mem32 are VectorPath instructions, i.e. you
can only decode one per cycle, whereas MOV reg32/mem32 and its twin
brother are DirectPath instructions, i.e. you can decode 3 per cycles.
This is only relevant if you have other instructions around your pair of
instructions, of course.
I could be wrong, but weren't the push/pop instructions for registers only
(at least on the 8086 they were right) So if I'm right (I know, it's a BIG
if),if I'm right, the assembler translates your 'push var1' so it would
become slower.
I could be terribly wrong, but hey, if I would be, I would learn, right?
Koen
You just learned :)
http://web.archive.org/web/20020219153200/www.quantasm.com/opcode_i.html
http://www.cs.tut.fi/~siponen/upros/intel/instr/push.html
PUSH mem16 was, indeed, available in the 8086 (opcode 0xFF). You can
assemble PUSH [1234] in MSDOS Debug.
PUSH imm, however, was only introduced later (opcodes 0x68 and 0x6A).
> So if I'm right (I know, it's a BIG if),if I'm right, the assembler
> translates your 'push var1' so it would become slower.
One of the MIPS R3000 assembler I've used had a pseudo-instruction which
translated to two instructions to load a 32-bit immediate, but I didn't
know x86 assemblers had pseudo-instructions. Which assembler are you
talking about?
As you said, the mov instructions will hit the L/S queue and execute with a
latency of 2 clocks and a throughput of 2/3 clocks. In general this is the
way to do it. The push/pop solution is never faster and may in some cases be
slower (Athlon, Pentium 4).
BTW, for Athlon the pair of mov instructions will be unconditionally faster.
The push instruction is 2 clocks, and pop is also 2 clocks IIRC (too lazy to
look it up). Each mov instruction is DP and 1 clock, so the total latency is
equivalent to a single push instruction regardless of whether or not you can
pair stuff. Also, since Athlon is OOOE, you can't be absolutely certain that
nothing will pair with them unless you contrive an example with cpuid.
-Matt
"Shill" <nob...@example.com> wrote in message
news:bbnm6h$g3r$1...@biggoron.nerim.net...
In the Athlon, PUSH mem32 is a VectorPath instruction with an execute
latency of 3 cycles. However, ESP is available one clock earlier than
the specified latency, i.e. after 2 cycles, I assume.
> The push instruction is 2 clocks, and pop is also 2 clocks IIRC
> (too lazy to look it up). Each mov instruction is DP and 1 clock,
> so the total latency is equivalent to a single push instruction
> regardless of whether or not you can pair stuff.
MOV mem32, reg32 (opcode 0x89)
MOV reg32, mem32 (opcode 0x8B)
PUSH mem32 (opcode 0xFF)
POP mem32 (opcode 0x8F)
In the Athlon, these 4 instructions have an execute latency of 3 cycles.
The first two are DirectPath, the last two are VectorPath.
I double checked in Appendix F ;)
I know that A86 had a psuedo-instruction (MOV sreg2, sreg1) which it
converted to:
PUSH AX
MOV AX, sreg1
MOV sreg2, AX
POP AX
Genius!!
cya GiM
--
as* ss `$s_s$' RLU:261015 ++++++[>+++[>+++++++>+++>+++++>++++++<<<<-
$l'__ ____ $`$'$ ]>-->->+<<<<-]>>+>-->+.>.<<<+.-->.<--.->>>.<+<.>+.<<.
`$_`| `||' _$s s$_ --.+++.->.<.++++.>.>-->----.+++<.++.>.--.+++++.<++++.
The actual code A86 assembles for "mov ds, cs" is "push cs"/"pop ds".
Best,
Frank
In case you want to preserve the value in AX.
PUSH AX ; save a copy of AX on the stack
MOV AX, sreg1 ; overwrite AX with the value in sreg1
MOV sreg2, AX ; then move that value to sreg2
POP AX ; now we can restore the old AX
Yes, I'm not sure why but I was using the reg-reg numbers which I have
memorized. Using reg-mem forms, the mov/mov will -always- be faster
post-486.
-Matt