Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CMOV store in x86

212 views
Skip to first unread message

Andi Kleen

unread,
Mar 14, 2003, 3:10:01 PM3/14/03
to

Recently while writing some x86 assembly I discovered to my great
annoyance that x86 CMOV does not support stores, only loads.

Does anybody know why that is the case?

-Andi

Terje Mathisen

unread,
Mar 14, 2003, 6:52:00 PM3/14/03
to

Easy:

Doing CMOVcc towards memory would turn it into a read-modify-write
opcode, since the opcode seems to always write something to the target:

Either the old or the new value.

It seems to me like this is a dataflow requirement: An instruction
cannot be totally suppressed without blowing away the entire pipeline,
so CMOVcc will always generate _something_ that can be forwarded to the
next consumer.

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Norbert Juffa

unread,
Mar 15, 2003, 3:23:12 AM3/15/03
to
"Andi Kleen" <fre...@alancoxonachip.com> wrote in message news:m3fzppr...@averell.firstfloor.org...

>
> Recently while writing some x86 assembly I discovered to my great
> annoyance that x86 CMOV does not support stores, only loads.

The way one typically does conditional stores where required is
to keep two pointers: one to the location that should potentially
be written to, the other permanently pointing to a piece of scratch
pad memory that serves as a bit bucket. The CMOV is used to select
between the two pointers, and the store is performed to either
location as appropriate.

While a true conditional integer store would cause a few complications
I see no fundamental reasons that prevents them from being implemented
in x86. For example, in certain exceptional cases with late detection
during the execution phase floating-point stores already have to be
squashed, so integer stores could piggy-back onto that mechanism (store
cancel of some sort). Hm, that squashing of floating-point stores may
involve "blowing away the machine" which would not be condusive to high
performance but is OK for exception handling.


-- Norbert


Bernd Paysan

unread,
Mar 15, 2003, 5:21:11 PM3/15/03
to
Terje Mathisen wrote:
> Easy:
>
> Doing CMOVcc towards memory would turn it into a read-modify-write
> opcode, since the opcode seems to always write something to the target:
>
> Either the old or the new value.

However, this does not explain why cmovcc [dest], source is not possible:
x86 implements a lot of read-modify-write instructions, so the pipeline can
already handle those.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Terje Mathisen

unread,
Mar 16, 2003, 1:10:45 PM3/16/03
to
Bernd Paysan wrote:
> Terje Mathisen wrote:
>
>>Easy:
>>
>>Doing CMOVcc towards memory would turn it into a read-modify-write
>>opcode, since the opcode seems to always write something to the target:
>>
>>Either the old or the new value.
>
>
> However, this does not explain why cmovcc [dest], source is not possible:
> x86 implements a lot of read-modify-write instructions, so the pipeline can
> already handle those.

You're right, it might simply be that they didn't want to take the extra
overhead of decoding CMOV into a variable number of micro-ops?

I.e. add [mem],reg turns into multiple micro-ops, while add reg,reg is a
single op.

Andy Glew

unread,
Mar 16, 2003, 9:39:08 PM3/16/03
to
"Andi Kleen" <fre...@alancoxonachip.com> wrote in message
news:m3fzppr...@averell.firstfloor.org...
>

Mea culpa. We were in opcode diet. I only wanted to use one
row (group) of 8 MODRM opcodes,
so my choices were reg,reg and one of load or store, but not all three.
I chose load.

Using the existing MODRM format instead of mandating a new format,
and using 8 "opcodes" to specify the conditions were motivated by
trying to reuse existing decoder hardware. In this format,
16 opcodes would have been consumed to get CMOV reg,reg,
load, and store.

---

By the way, you will note that it is not "conditional load",
but an unconditional load followed by a conditional move.

i.e.

tmp := load(mem)
dest := cond ? tmp : old_dest

not
if( cond )
dest := load(mem)
else
dest := old_dest

There really should be a conditional load, and a conditional store,
but I was not allowed to add such microoperations to the vocabulary.
(I can't remember why - probably just the usual "don't add anything
until justified by Hennessy and Patterson" reluctance.)
I tried to not define the memory form of CMOV, so that the conditional
load (or store) form could be added later, but the compiler people
found this sufficiently annoying that I later gave in an added the
unconditional load/conditional move form).

Also, that latter form falls out almost for free given x86 MODRM decoding.


---


You can get conditional store out of MASKMOV*. Of course,
its conditions are in a different format...

There's no real conditional load, but you can do it by using CMOV
on the address, selecting between a known safe (and useless) value
and the actual value you may conditionally want to load from,
which may cache miss or produce a fault.

---

Oh, yeah, more trivia. The condition evaluation logic lives only in one
pipe, the pipe that executes branches (Jcc) and CMOVcc reg,reg.
Some varieties of conditional load or store CMOV would have
required that logic be duplicated in one or both of the memory pipes.

Also, the memory address pipes had all (both) (2) input operands.
So the effective address would have had to be computed as a separate
uop.

Worst case, if we could not have gotten the conditional stuff added to
the memory pipelines, CMOVcc reg,mem would look like:

tmpCond := evaluateCondCC( flags )
tmpEA := LEA( memargs )
tmpLD := loadCC( tmpCond, tmpEA )
dest := select( tmpLD, dest )

where loadCC would have been a new widget load that
produced the load data, or some garbage value if condition
false, with the condition as upper bits beyond 32 or 64 (or 80).

Similarly, CMOVcc mem, reg would have looked like

tmpCond := evaluateCondCC( flags )
tmpEA := LEA( memargs )
sta( tmpCond, tmpEA )
std( src )

With the CC support added to the memory pipelines, this would look like

tmpEA := LEA( memargs )
tmpLD := loadCC( flags, tmpEA )
dest := select( tmpLD, dest )

and
tmpEA := LEA( memargs )
staCC( flags, tmpEA )
std( src )

0 new messages