Why are mem-to-mem moves disallowed?

Tony

unread,

Dec 22, 2010, 7:21:08 PM12/22/10

to

The subject is the question.

Robert Redelmeier

unread,

Dec 22, 2010, 9:40:27 PM12/22/10

to

Tony <nos...@nospicedham.myisp.net> wrote in part:
> The subject is the question.

They're not disallowed -- x86 architecture has always
had MOVS instructions that move mem-to-mem, and very
convieniently for blocks.

More general forms were probably never added since they
would result in much longer instructions in some cases,
and require more mod r/m bits.

Code density is actually quite important since once
thru-code takes much longer to fetch than to execute.
As quirkly as it is, x86 is one of the best for density.

-- Robert

>

wolfgang kern

unread,

Dec 23, 2010, 6:28:28 AM12/23/10

to

Tony asked:

...
Who said that mem to mem is not allowed ?

x86 just have only a few instructions which does it:

MOVSb/d/q/.. for 8/16/32/64 bit moves

and even this instruction isn't one of the fastest
it is quite fast and handy too if used with REP.

historical DMA also had a mem2mem capability, perhaps
you mean this feature, because it isn't there anymore?

Sure, your compiler will probably cry out load if you'd
try something like:

MOV [var1],[var2]

there are just no direct instructions available for it.
So you'll need to write:

MOV esi,var2 ;MASM may need 'OFFSET' for this
MOV edi,var1
MOVSD ;copy [esi] to [edi] and advance both regs by 4

__
wolfgang

Tony

unread,

Dec 23, 2010, 4:25:00 PM12/23/10

to

That common scenario is actually what I meant by mem-to-mem moves. Why is
that not allowed?

Lasse Reichstein Nielsen

unread,

Dec 23, 2010, 7:45:59 PM12/23/10

to

"Tony" <nos...@nospicedham.myisp.net> writes:

> wolfgang kern wrote:
...

>> Sure, your compiler will probably cry out load if you'd
>> try something like:
>>
>> MOV [var1],[var2]

> That common scenario is actually what I meant by mem-to-mem moves. Why is
> that not allowed?

Because there is no instruction that does it.

Now you could ask: Why has Intel/ARM not added such an instruction?
Since I don't have insider information from the CPU designers, I can, at
most, guess at that.
Probably because of a combination of the following:
1. There is no need. You can do a read followed by a write to get the
same effect.
2. It would complicate instruction decoding. Currently all instructions
have at most one memory operand. An instruction with two would require
special-case circuitry/microcode for decoding that isn't currently present.
3. It wouldn't be any faster anyway. Since it's executed by the CPU,
it will still have to do one read followed by one write. It would probably
be a complex operation that is decoded into several microops - the same
that a load followed by a store would create. It would only have a chance
at being faster if it was possible to make the memory do the copy without
roundtripping the CPU, but that's a memory controller operation, not really
a CPU operation.

/L
--
Lasse Reichstein Holst Nielsen
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

BGB

unread,

Dec 23, 2010, 8:02:07 PM12/23/10

to

yeah, and there is always, say:
mov eax, [something]
mov [something2], eax

which usually works well enough...

BGB

unread,

Dec 23, 2010, 8:16:30 PM12/23/10

to

maybe because the opcode encoding is based on bit-twiddling and there is
no good way to encode a direct "opcode mem, mem" operation.

so, the issue is not really that it is actively disallowed, but rather
there is no good way for it to be allowed without some alternate way of
encoding the operation...

it all has to do with the Mod/RM and SIB byte encoding, which only allow
a certain number of configurations.

newer features like x86-64 and AVX came at the cost of great hackery to
the instruction set, and in the case of the former, a partial loss of
backwards compatibility...

it is not likely worthwhile to do this simply to allow using a single
instruction for what can be done easily enough with 2 instructions.

Tony

unread,

Dec 23, 2010, 11:39:03 PM12/23/10

to

Lasse Reichstein Nielsen wrote:
> "Tony" <nos...@nospicedham.myisp.net> writes:
>
>> wolfgang kern wrote:
> ...
>>> Sure, your compiler will probably cry out load if you'd
>>> try something like:
>>>
>>> MOV [var1],[var2]
>
>> That common scenario is actually what I meant by mem-to-mem moves.
>> Why is that not allowed?
>
> Because there is no instruction that does it.
>
> Now you could ask: Why has Intel/ARM not added such an instruction?

That is exactly what I was asking though didn't clearly express it in the
question, I guess.

> Since I don't have insider information from the CPU designers, I can,
> at most, guess at that.
> Probably because of a combination of the following:
> 1. There is no need. You can do a read followed by a write to get the
> same effect.

That misses the point entirely. Surely instructions are chosen partly for
the pervasiveness of their use. I wonder if var-to-var is being
discouraged and anything to/with a register is being encouraged (perhaps
for "product security" reason?).

> 2. It would complicate instruction decoding. Currently all
> instructions have at most one memory operand.

Well that, again, is my question.

> An instruction with two
> would require special-case circuitry/microcode for decoding that
> isn't currently present.

I'd be curious to see the detailed statistics and design notes for what
implementing such an instruction would mean.

> 3. It wouldn't be any faster anyway. Since it's executed by the CPU,
> it will still have to do one read followed by one write. It would
> probably be a complex operation that is decoded into several microops
> - the same that a load followed by a store would create. It would
> only have a chance at being faster if it was possible to make the
> memory do the copy without roundtripping the CPU, but that's a memory
> controller operation, not really a CPU operation.

It takes 2 moves to do something that is done a zillion times. It just
looked like a curious omission of sorts, from my layman perspective (not
that I've analyzed the entire set of instructions for similar
curiosities, yet).

Rod Pemberton

unread,

Dec 24, 2010, 2:18:23 AM12/24/10

to

"Tony" <nos...@nospicedham.myisp.net> wrote in message
news:CtVQo.619$231...@newsfe14.iad...

>
> Surely instructions are chosen partly for
> the pervasiveness of their use.
>

I'm sure the x86 design is far more historical than anything else. You have
to remember the 8086 was designed a long time ago. Most of those early
microprocessors were load-store machine models. The data was loaded into a
specific register called the accumulator. Arithmetic was performed on the
accumulator. Then, the data was stored back to memory.

Did they have the understanding about the optimal instruction operations
back then? Could they foresee what you know today? Did those
microprocessors have as much transistors or logic or functionality as they
do today? No.

There were many different perspectives on how to correctly design a cpu:
load-store, then CISC, then cpu's redefined as RISC, etc. They were fought
in academia and the market place.

The designers of the 8086 had to chose a predominant machine model:
register, stack, memory. The 8086 is register based. That is, the
instruction set performs most operations on registers. x86 also performs
many on memory too. So, it's not purely register based, just predominantly.
If you want to see a predominantly memory based machine model, you can look
at the 6502. Even though the 6502 is memory based, it performs most math
and bitwise operations on the accumulator or index registers. If you want
to see a stack based machine model, you can look at the early Forth micro's.

> I wonder if var-to-var is being
> discouraged and anything to/with a register is being
> encouraged (perhaps for "product security" reason?).
>

I doubt that. I don't see anything nefarious going on. With x86
instructions, you can do:
register to register
register to stack, or vice-versa
memory to stack, or vice-versa
memory to register, or vice-versa

With one exception, you need multiple instructions to do:
stack to stack
memory to memory

String instructions can do memory to memory but they require some register
setup, i.e., the memory operands are not encoded in the instruction.

So, ISTM, you can do most data transfers with one instruction, or two for
less used situations.

> > 3. It wouldn't be any faster anyway. Since it's executed by the CPU,
> > it will still have to do one read followed by one write.
>

> It takes 2 moves to do something that is done a zillion times.
>

Due to the design of the x86 instruction set, that's not really the case.
You must move data into the registers much of the time, either to perform an
operation, or to keep the data in a register. You need to keep data in the
registers to either reduce loading from and saving to memory, or to perform
additional operations against it before saving it back to memory. I.e.,
data doesn't move directly from memory to memory as much.

Rod Pemberton

Bob Masta

unread,

Dec 24, 2010, 8:25:11 AM12/24/10

to

There might not be much "improvement" from such an
instruction. Consider that the direct 2-move approach
(using EAX) takes up to 5 bytes each (1 byte for the MOV EAX
opcode and 4 for the address), for 10 bytes total. To
combine these into a single instruction would still require
the same 8 address bytes. Then the best we could hope for
would be a 9-byte total, assuming we could have a
single-byte opcode for the double-move. But single-byte
opcodes are a precious resource in an instruction set; if we
used one here we would have to take it away from some other
use, which might be more valuable in the big picture. And
all this to save 1 byte out of 10!

That's for direct addressing. If we allow indirect
addressing, we'd need an additional byte to specify the
other memory operand.

Best regards,

Bob Masta

DAQARTA v5.10
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
Frequency Counter, FREE Signal Generator
Pitch Track, Pitch-to-MIDI
DaqMusic - FREE MUSIC, Forever!
(Some assembly required)
Science (and fun!) with your sound card!

Michael Foukarakis

unread,

Dec 24, 2010, 5:20:27 AM12/24/10

to

On Thursday, December 23, 2010 11:25:00 PM UTC+2, Tony wrote:
>
> That common scenario is actually what I meant by mem-to-mem moves. Why is
> that not allowed?

If you're using mem-to-mem moves, you're doing it wrong.

Bluemoon

unread,

Dec 24, 2010, 9:45:01 AM12/24/10

to

Agree. there is not much gain to squeeze the 2 moves into single
instruction.
In fact combine the two moves would make it slower due to memory
stalls,
i would prefer to break it down into mov reg,mem and mov mem,reg pair,
and inject some code in-between while waiting the memory fetch.

wolfgang kern

unread,

Dec 24, 2010, 11:41:30 AM12/24/10

to

Tony asked:
...

>> Sure, your compiler will probably cry out load if you'd
>> try something like:

>> MOV [var1],[var2]

>> there are just no direct instructions available for it.
>> So you'll need to write:

>> MOV esi,var2 ;MASM may need 'OFFSET' for this
>> MOV edi,var1
>> MOVSD ;copy [esi] to [edi] and advance both regs by 4

> That common scenario is actually what I meant by mem-to-mem moves.
> Why is that not allowed?

Your compiler may not be able (thank all gods for that) to
convert your wanted:

MOV [var1],[var2]

into:

PUSH edi
PUSH esi
mov edi,var1
mov esi,var2
MOVSD ;assumes DS=ES=data,
POP esi
POP edi

or less worse:

PUSH eax
mov eax,[var2]
mov [var1],eax
POP eax

feel free to create your own M2M-macro for it :)

Why the CPU doesn't have a MEM2MEM instruction is found in
the general OPCODE-design, where only one immediate value
can be a pointer into memory.
Another reason is the maximal opcode length.
32-bit Memory addressing can be up to:

2 mod + 3 R/M-bits + SIB + 32bit displacement = 45 bits, where the
first 5 bits are part of the opcode and a second SIB also wont fit.

Sure there could have been a ten byte x86 instruction like:

XX XX aa aa aa aa bb bb bb bb MOV [ptr1],[ptr2]

but there is just no decoder for the second pointer implemented.
__
wolfgang

Nathan

unread,

Dec 24, 2010, 12:25:55 PM12/24/10

to

On Dec 24, 11:41 am, "wolfgang kern" <nowh...@never.at> wrote:
>
> or less worse:
>
> PUSH eax
> mov eax,[var2]
> mov [var1],eax
> POP eax
>

Couldn't this be shortened to two instructions??

PUSH [var2]
POP [var1]

Nathan.

Frank Kotler

unread,

Dec 24, 2010, 2:16:12 PM12/24/10

to

Right. That's *two* mem-to-mem instructions (stack is "memory", too!).

As for your compiler eating it, I give you HLA:

mov (ebx,ebxstore); //store ebx address
mov ([ebx],fhcrclist);
mov ([ebx+4],crcbuff);
mov ([ebx+8],crcend);
mov ([ebx+12],crccount);

I'm pretty sure HLA makes a push/pop out of each line but the first. As
I recall, Eric Isaacson's A86 provided the same "help".

The reason I was told we didn't have mem-to-mem moves is that "we only
have one address bus", but "push [mem]" and "movsb/w/d" do it, so... Go
figure...

A Splendid Solstice and associated holidays to all!

Best,
Frank

Tony

unread,

Dec 24, 2010, 11:57:52 PM12/24/10

to

I wasn't coming at it from a performance perspective though, but rather a
programming one. Who says assembly can't be readable?

Tony

unread,

Dec 24, 2010, 11:58:50 PM12/24/10

to

Bluemoon wrote:
> On Dec 24, 9:25 pm, N0S...@daqarta.com (Bob Masta) wrote:
>> On Thu, 23 Dec 2010 22:39:03 -0600, "Tony"

>>> It takes 2 moves to do something that is done a zillion times. It

>>> just looked like a curious omission of sorts, from my layman
>>> perspective (not that I've analyzed the entire set of instructions
>>> for similar curiosities, yet).
>>
>> There might not be much "improvement" from such an
>> instruction. Consider that the direct 2-move approach
>> (using EAX) takes up to 5 bytes each (1 byte for the MOV EAX
>> opcode and 4 for the address), for 10 bytes total. To
>> combine these into a single instruction would still require
>> the same 8 address bytes. Then the best we could hope for
>> would be a 9-byte total, assuming we could have a
>> single-byte opcode for the double-move. But single-byte
>> opcodes are a precious resource in an instruction set; if we
>> used one here we would have to take it away from some other
>> use, which might be more valuable in the big picture. And
>> all this to save 1 byte out of 10!
>>
>> That's for direct addressing. If we allow indirect
>> addressing, we'd need an additional byte to specify the
>> other memory operand.
>>
>

> Agree. there is not much gain to squeeze the 2 moves into single
> instruction.

Clearer code.

Tony

unread,

Dec 25, 2010, 12:08:29 AM12/25/10

to

wolfgang kern wrote:

>
> PUSH eax
> mov eax,[var2]
> mov [var1],eax
> POP eax
>
> feel free to create your own M2M-macro for it :)
>

Did you push and pop around the moves just to make it macro-safe?

Tony

unread,

Dec 25, 2010, 12:05:13 AM12/25/10

to

I currently use inline assembly, so macros at that level I don't do. I
create macros (CPP macros) such as FuncProlog.

>
> Why the CPU doesn't have a MEM2MEM instruction is found in
> the general OPCODE-design, where only one immediate value
> can be a pointer into memory.
> Another reason is the maximal opcode length.
> 32-bit Memory addressing can be up to:
>
> 2 mod + 3 R/M-bits + SIB + 32bit displacement = 45 bits, where the
> first 5 bits are part of the opcode and a second SIB also wont fit.
>
> Sure there could have been a ten byte x86 instruction like:
>
> XX XX aa aa aa aa bb bb bb bb MOV [ptr1],[ptr2]
>
> but there is just no decoder for the second pointer implemented.

I'm sure all that makes sense. I'm sure to put up more wishful thinking
like my desire for a one instruction var-to-var move in the future. I can
live with whatever is there, because I only have to create this stuff
"once". Else I'd probably consider developing my own macro/high-level
assembler (not to the degree R. Hyde did though).

pe...@nospam.demon.co.uk

unread,

Dec 25, 2010, 1:47:42 AM12/25/10

to

In article <if2rla$p2i$1...@speranza.aioe.org>
fbko...@nospicedham.myfairpoint.net "Frank Kotler" writes:

> Nathan wrote:
> > On Dec 24, 11:41 am, "wolfgang kern" <nowh...@never.at> wrote:
> >> or less worse:
> >>
> >> PUSH eax
> >> mov eax,[var2]
> >> mov [var1],eax
> >> POP eax

For readability this could be defined as a macro, though of course
macros can hide performance-critical code.

> > Couldn't this be shortened to two instructions??
> >
> > PUSH [var2]
> > POP [var1]

I had this thought too...

> Right. That's *two* mem-to-mem instructions (stack is "memory", too!).

...but discounted it for this reason! There are however occasions
when one might elect to do this, e.g. a shortage of registers. Not
quite the same thing, but code like

push ds
pop es

is quite often seen as an alternative to the more usual

mov ax, ds
mov es, ax

in cases where all the GP regs are tied up.

> As for your compiler eating it, I give you HLA:
>
> mov (ebx,ebxstore); //store ebx address
> mov ([ebx],fhcrclist);
> mov ([ebx+4],crcbuff);
> mov ([ebx+8],crcend);
> mov ([ebx+12],crccount);
>
> I'm pretty sure HLA makes a push/pop out of each line but the first. As
> I recall, Eric Isaacson's A86 provided the same "help".
>
> The reason I was told we didn't have mem-to-mem moves is that "we only
> have one address bus", but "push [mem]" and "movsb/w/d" do it, so... Go
> figure...

One can only guess that the CPU has one or more internal "scratch"
regs that code "push [mem]" into something like

stor scratch_reg, [mem]
push scratch_reg

and if so it wouldn't be a huge leap to allow a "mov [mem2],[mem1]"
instruction like

stor scratch_reg, [mem1]
stor [mem2], scratch_reg

As you say, go figure...

> A Splendid Solstice and associated holidays to all!
>
> Best,
> Frank

And the same to you, Frank and Nathan, for your efforts in keeping
clax86 the excellent newsgroup that it is. Thanks!

Pete
--
"We have not inherited the earth from our ancestors,
we have borrowed it from our descendants."

wolfgang kern

unread,

Dec 25, 2010, 8:06:14 AM12/25/10

to

Tony asked:

> I wrote:

Yes, the one who make it dirty has to clean up after ... :)

__
wolfgang

wolfgang kern

unread,

Dec 25, 2010, 8:01:02 AM12/25/10

to

Nathan found out:>

> or less worse:
>
> PUSH eax
> mov eax,[var2]
> mov [var1],eax
> POP eax
>

|Couldn't this be shortened to two instructions??

|PUSH [var2]
|POP [var1]

Absolutely right Nate!
but as Frank already figured, this end up in four memory-accesses.

__
wolfgang
Merry X-mas!

Nathan Baker

unread,

Dec 27, 2010, 12:53:10 PM12/27/10

to

"Frank Kotler" <fbko...@nospicedham.myfairpoint.net> wrote in message
news:if2rla$p2i$1...@speranza.aioe.org...

>
> The reason I was told we didn't have mem-to-mem moves is that "we only
> have one address bus", but "push [mem]" and "movsb/w/d" do it, so... Go
> figure...
>

Perhaps we can experiment with 'mem-to-mem moves' by hacking Intel's CPU
emulator??

http://freshmeat.net/projects/intelsde

Or temporarily substitute the microcode in our chip to accept a new opcode??

http://freshmeat.net/projects/intelp6microcodeupdateutility

> A Splendid Solstice and associated holidays to all!
>

Happy Holidays everyone!

Nathan.