Atomic write of 64 bits

60 views
Skip to first unread message

Mike

unread,
Sep 25, 2000, 3:00:00 AM9/25/00
to
I'm curious if anyone knows how to, with the 32 bit x86 instruction set,
read or write a 64 bit value atomically. By that, I mean given:

mov EAX, dword ptr [x]
mov EDX, dword ptr 4[x]

has a problem. If another thread changes x in between those two
instructions, then [EDX,EAX] will contain a qword value that never was in x.

I do know how to make this work by bracketing it in an
EnterCriticalSection() and LeaveCriticalSection(), but is there a cheap way
to do it?


Frans Morsch

unread,
Sep 25, 2000, 3:00:00 AM9/25/00
to

"Mike" <geo...@myob.com> wrote in message
news:pmIz5.18912$nk3.9...@newsread03.prod.itd.earthlink.net...


How about using mmx instructions:

.mmx
movd mm0,eax
movd mm1,edx
psllq mm1,32
por mm0,mm1
movq [x],mm0
emms


The emms instruction is not cheap, but it will be faster than a
critical section.


Frans Morsch


nju...@my-deja.com

unread,
Sep 25, 2000, 3:00:00 AM9/25/00
to
In article <pmIz5.18912$nk3.9...@newsread03.prod.itd.earthlink.net>,

"Mike" <geo...@myob.com> wrote:
> I'm curious if anyone knows how to, with the 32 bit x86 instruction
set,
> read or write a 64 bit value atomically. By that, I mean given:
>
> mov EAX, dword ptr [x]
> mov EDX, dword ptr 4[x]
>
> has a problem. If another thread changes x in between those two
> instructions, then [EDX,EAX] will contain a qword value that never
was in x.
>
> I do know how to make this work by bracketing it in an
> EnterCriticalSection() and LeaveCriticalSection(), but is there a
cheap way
> to do it?

Depending on what exactly you are trying to do, the CMPXCHG8B
instruction may help here:

CMPXCHG8B [mem64]

if (EDX:EAX == [mem64]) {
ZF = 1
[mem64] = ECX:EBX
} else {
ZF = 0
EDX:EAX = [mem64]
}

You may have to use a LOCK prefix in addition to the instruction
itself.

If you use this instruction in your code, make sure that the CPU
supports CMPXCHG8B. There is a feature flag for it in bit 8 of the
feature flags returned in EDX by CPUID function 1. As far as I know it
is supported on P6 compatible CPUs.

-- Norbert


Sent via Deja.com http://www.deja.com/
Before you buy.


Dr John Stockton

unread,
Sep 25, 2000, 3:00:00 AM9/25/00
to
JRS: In article <pmIz5.18912$nk3.9...@newsread03.prod.itd.earthlink.n
et> of Mon, 25 Sep 2000 13:24:37 seen in news:comp.lang.asm.x86, Mike

<geo...@myob.com> wrote:
>I'm curious if anyone knows how to, with the 32 bit x86 instruction set,
>read or write a 64 bit value atomically. By that, I mean given:
>
> mov EAX, dword ptr [x]
> mov EDX, dword ptr 4[x]
>
>has a problem. If another thread changes x in between those two
>instructions, then [EDX,EAX] will contain a qword value that never was in x.
>
>I do know how to make this work by bracketing it in an
>EnterCriticalSection() and LeaveCriticalSection(), but is there a cheap way
>to do it?

Either "x" is known to increase or decrease monotonically, or it is not.

In the former case, for example if "x" represents time, it should at
least sometimes be sufficient to next compare EAX with dword ptr [x],
and repeat the lot until success has been attained. I think; check!

--
© John Stockton, Surrey, UK. j...@merlyn.demon.co.uk Turnpike v4.00 MIME. ©
<URL: http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
<URL: http://www.merlyn.demon.co.uk/clpb-faq.txt> Pedt Scragg: c.l.p.b. mFAQ;
<URL: ftp://garbo.uwasa.fi/pc/link/tsfaqp.zip> Timo Salmi's Turbo Pascal FAQ.


nju...@my-deja.com

unread,
Sep 25, 2000, 3:00:00 AM9/25/00
to
In article <pmIz5.18912$nk3.9...@newsread03.prod.itd.earthlink.net>,

"Mike" <geo...@myob.com> wrote:
> I'm curious if anyone knows how to, with the 32 bit x86 instruction
set,
> read or write a 64 bit value atomically. By that, I mean given:
>
> mov EAX, dword ptr [x]
> mov EDX, dword ptr 4[x]
>
> has a problem. If another thread changes x in between those two
> instructions, then [EDX,EAX] will contain a qword value that never
was in x.
>
> I do know how to make this work by bracketing it in an
> EnterCriticalSection() and LeaveCriticalSection(), but is there a
cheap way
> to do it?

Depending on what exactly you are trying to do, the CMPXCHG8B

nju...@my-deja.com

unread,
Sep 25, 2000, 3:00:00 AM9/25/00
to

Dr John Stockton

unread,
Sep 25, 2000, 3:00:00 AM9/25/00
to
JRS: In article <pmIz5.18912$nk3.9...@newsread03.prod.itd.earthlink.n
et> of Mon, 25 Sep 2000 13:24:37 seen in news:comp.lang.asm.x86, Mike

<geo...@myob.com> wrote:
>I'm curious if anyone knows how to, with the 32 bit x86 instruction set,
>read or write a 64 bit value atomically. By that, I mean given:
>
> mov EAX, dword ptr [x]
> mov EDX, dword ptr 4[x]
>
>has a problem. If another thread changes x in between those two
>instructions, then [EDX,EAX] will contain a qword value that never was in x.
>
>I do know how to make this work by bracketing it in an
>EnterCriticalSection() and LeaveCriticalSection(), but is there a cheap way
>to do it?

Either "x" is known to increase or decrease monotonically, or it is not.

nju...@my-deja.com

unread,
Sep 25, 2000, 3:00:00 AM9/25/00
to
In article <pmIz5.18912$nk3.9...@newsread03.prod.itd.earthlink.net>,

"Mike" <geo...@myob.com> wrote:
> I'm curious if anyone knows how to, with the 32 bit x86 instruction
set,
> read or write a 64 bit value atomically. By that, I mean given:
>
> mov EAX, dword ptr [x]
> mov EDX, dword ptr 4[x]
>
> has a problem. If another thread changes x in between those two
> instructions, then [EDX,EAX] will contain a qword value that never
was in x.
>
> I do know how to make this work by bracketing it in an
> EnterCriticalSection() and LeaveCriticalSection(), but is there a
cheap way
> to do it?

Depending on what exactly you are trying to do, the CMPXCHG8B

Dr John Stockton

unread,
Sep 25, 2000, 3:00:00 AM9/25/00
to
JRS: In article <pmIz5.18912$nk3.9...@newsread03.prod.itd.earthlink.n
et> of Mon, 25 Sep 2000 13:24:37 seen in news:comp.lang.asm.x86, Mike

<geo...@myob.com> wrote:
>I'm curious if anyone knows how to, with the 32 bit x86 instruction set,
>read or write a 64 bit value atomically. By that, I mean given:
>
> mov EAX, dword ptr [x]
> mov EDX, dword ptr 4[x]
>
>has a problem. If another thread changes x in between those two
>instructions, then [EDX,EAX] will contain a qword value that never was in x.
>
>I do know how to make this work by bracketing it in an
>EnterCriticalSection() and LeaveCriticalSection(), but is there a cheap way
>to do it?

Either "x" is known to increase or decrease monotonically, or it is not.

Richard Pavlicek

unread,
Sep 25, 2000, 8:53:17 PM9/25/00
to

"Mike" <geo...@myob.com> wrote:

> I'm curious if anyone knows how to, with the 32 bit x86 instruction set,
> read or write a 64 bit value atomically. By that, I mean given:
>
> mov EAX, dword ptr [x]
> mov EDX, dword ptr 4[x]
>
> has a problem. If another thread changes x in between those two
> instructions, then [EDX,EAX] will contain a qword value that never
> was in x.

Well, this would certainly do it:

readit:
mov eax,dword ptr[x]
mov edx,dword ptr[x+4]
cmp dword ptr[x],eax
jne readit
cmp dword ptr[x+4],edx
jne readit
...

But, there may be a better way. :)

--
Richard Pavlicek
Web site: http://www.rpbridge.net


Mike

unread,
Sep 26, 2000, 12:12:53 AM9/26/00
to

Richard Pavlicek wrote in message ...


Your idea is a great one! But alas, it cannot work. If some thread tickles
x+4 just before the 2nd instruction, tickles it back before the 3rd, and
tickles it *the same way* before the 5th instruction, the test succeeds, but
it's an invalid value.

Richard Pavlicek

unread,
Sep 26, 2000, 2:07:54 AM9/26/00
to

Mike <geo...@myob.com> wrote in message news:st08i5m...@corp.supernews.com...

I see your point, but is this really possible? Assuming a single CPU, it
would mean that when my thread was active (after 1st instruction) _three_
interrupts occurred, and between these interrupts my thread was returned
control twice for an infinitesimally short period. I don't know a lot
about CPU architecture, but this would be a hopeless waste of resources
considering the overhead in switching processes.

I suppose with multiple CPUs working on the same data, it would be a
real concern, but I know nothing about that neck of the woods.

Mike

unread,
Sep 26, 2000, 2:07:56 AM9/26/00
to

That looks like it should work! Thanks for the suggestion.

Frans Morsch wrote in message ...
>
>"Mike" <geo...@myob.com> wrote in message
>news:pmIz5.18912$nk3.9...@newsread03.prod.itd.earthlink.net...


>> I'm curious if anyone knows how to, with the 32 bit x86 instruction set,
>> read or write a 64 bit value atomically. By that, I mean given:
>>
>> mov EAX, dword ptr [x]
>> mov EDX, dword ptr 4[x]
>>
>> has a problem. If another thread changes x in between those two
>> instructions, then [EDX,EAX] will contain a qword value that never was in
>x.
>>

>> I do know how to make this work by bracketing it in an
>> EnterCriticalSection() and LeaveCriticalSection(), but is there a cheap
>way
>> to do it?
>
>

Randall Hyde

unread,
Sep 26, 2000, 3:00:00 AM9/26/00
to

in article st0f9qs...@corp.supernews.com, Richard Pavlicek at
ric...@rpbridge.net wrote on 9/26/00 2:07 PM:

>
>

>> Your idea is a great one! But alas, it cannot work. If some thread tickles
>> x+4 just before the 2nd instruction, tickles it back before the 3rd, and
>> tickles it *the same way* before the 5th instruction, the test succeeds, but
>> it's an invalid value.
>
> I see your point, but is this really possible? Assuming a single CPU, it
> would mean that when my thread was active (after 1st instruction) _three_
> interrupts occurred, and between these interrupts my thread was returned
> control twice for an infinitesimally short period. I don't know a lot
> about CPU architecture, but this would be a hopeless waste of resources
> considering the overhead in switching processes.
>
> I suppose with multiple CPUs working on the same data, it would be a
> real concern, but I know nothing about that neck of the woods.

Improbably, but certainly possible on a single CPU. That means that
this is one of those defects that's nearly impossible to find when
it does rear its ugly head.
Of course, with the number CPUs per system increasing on a yearly
basis, one can expect $1,000 systems to have two CPUs (all on one die)
by the year 2010. Hopefully the Y2K problem taught us not to believe
that "no one will be using this program in 10 years." Therefore,
always write your software assuming a multiprocessor environment.
Randy Hyde


Paul Hsieh

unread,
Sep 26, 2000, 3:00:00 AM9/26/00
to

ric...@rpbridge.net says...

Indeed, but its not just a fanciful dream. This is really a plausible
scenario.

Imagine a scenario where the task switch interrupt occurred between the
two mov instructions. The new task then changes the value at [x] and
[x+4], then say it fetches some data which causes the location of [x] and
[x+4] to be paged out.

Then once the task switches back the attempted access to [x] is trapped
by the page fault handler which re-loads the page first, then attempts to
return control back to the program by jumping back to it. In the (long)
time that it takes for this to happen the task can again be switched to
the new task which changes [x] back to the same value thats in eax and
[x+4] to some other value. Again it pages out [x] and [x+4]. Then as
the task returns the first cmp causes a page fault, the page is reloaded
again and it resumes execution at the cmp instruction which passes. Then
after this time has passed the task is switched again, [x+4] might be
changed to the value that's currently in edx and the false verification
scenario is complete.

Norbert's CMPXCHG8 is the most typical solution for implementing a mutex
which is the usual reason why someone wants to do this kind of thing.
Other possibilities include:

cli
mov eax,[x]
mov edx,[x+4]
sti

; Does not work under Windows NT or multiprocessor scenarios.

call BeginCriticalSection
mov eax,[x]
mov edx,[x+4]
push eax
push edx
call EndCriticalSection
pop edx
pop eax

; Only works under Windows, and is ridiculously slow.

fild qword ptr [x]
fistp qword ptr [temp]
mov eax,[temp]
mov edx,[temp+4]

; In a multiprocessor situation, this only works under the assumption
; that the access is a single 64 bit access, which is only true if [x]
; is 64-bit aligned. Also assumes the FP stack is available to play
; with.

movq mm0, qword ptr [x]
movd eax, mm0
psllq mm0, 32
movd edx, mm0

; Same concerns as above. Requires MMX enabled processor as well.

If the lock prefix can be used in conjunction with the last two solutions
then you are set. But I have to admit that I have not looked too deeply
into the rules about when you can or cannot use the lock prefix.

--
Paul Hsieh
http://www.pobox.com/~qed/


Mike

unread,
Sep 26, 2000, 3:00:00 AM9/26/00
to


The trouble is, if it is possible for it to happen it will happen! And these
kinds of multithreading bugs can be incredibly nasty to root out. I do agree
with you, however, that it is much more likely to happen in a multi-cpu
machine.


Mike

unread,
Sep 26, 2000, 8:29:28 PM9/26/00
to

Thanks for pointing out that the data must be 64 bit aligned in order for
the qword instructions to work. Alas, there seems to be no really good
general solution for this, so I'm probably stuck with the criticalSection
bloat.

I have another interesting question for you <g>. I can't find the answer in
any of my asm books. Suppose I am reading and writing 10 byte floating point
values. What's the best alignment to use? 4 byte or 2 byte?


Paul Hsieh wrote in message ...
>
>ric...@rpbridge.net says...

Terje Mathisen

unread,
Sep 27, 2000, 3:00:00 AM9/27/00
to

Mike wrote:
>
> Thanks for pointing out that the data must be 64 bit aligned in order for
> the qword instructions to work. Alas, there seems to be no really good
> general solution for this, so I'm probably stuck with the criticalSection
> bloat.
>
> I have another interesting question for you <g>. I can't find the answer in
> any of my asm books. Suppose I am reading and writing 10 byte floating point
> values. What's the best alignment to use? 4 byte or 2 byte?

Follow Intel's docs and use 16-byte alignment!

Terje

--
- <Terje.M...@hda.hydro.com>
Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"


Arargh!

unread,
Sep 28, 2000, 3:00:00 AM9/28/00
to
On Wed, 27 Sep 2000 00:29:28 -0000, "Mike" <geo...@myob.com> wrote:

>
>Thanks for pointing out that the data must be 64 bit aligned in order for
>the qword instructions to work. Alas, there seems to be no really good
>general solution for this, so I'm probably stuck with the criticalSection
>bloat.

It is possible that an old fashioned spin loop might work, but this is
somewhat outside of my area. (such as was used in OS360/MVT for the
MP65).

something like:

LockByte db 0
data64 dq 0 (or whatever it is)


(setup regs here)
Loop1:
CMPXCHG (al,LockByte)
jmp to Loop1 if lock not obtained

(your processing here)

mov LockByte,0 ; release the lock

--------------
I was under the impression that CMPXCHG would always complete without
allowing an interupt. If not, maybe a LOCK prefix would work.

Does someone who knows more about this want to comment?

--
arargh (at enteract period com) http://www.arargh.com
(Reply address points nowhere in an attempt to foil e-mail spammers.)


David Vago

unread,
Sep 29, 2000, 3:00:00 AM9/29/00
to
"Mike" <geo...@myob.com> wrote in message
news:pmIz5.18912$nk3.9...@newsread03.prod.itd.earthlink.net...

> I'm curious if anyone knows how to, with the 32 bit x86 instruction set,
> read or write a 64 bit value atomically. By that, I mean given:
>
> mov EAX, dword ptr [x]
> mov EDX, dword ptr 4[x]
>
> has a problem. If another thread changes x in between those two
> instructions, then [EDX,EAX] will contain a qword value that never was in
x.
>
> I do know how to make this work by bracketing it in an
> EnterCriticalSection() and LeaveCriticalSection(), but is there a cheap
way
> to do it?
Maybe my idea sounds strange and relatively inefficient, but anywy here it
is. You could use floating-point FLD QWORD PTR ... and then put the value to
a memory area which is used by only that thread with FST QWORD PTR, and then
read from there with two MOVs. I hope that helps

David

Terje Mathisen

unread,
Sep 29, 2000, 3:00:00 AM9/29/00
to

David Vago wrote:
> > I do know how to make this work by bracketing it in an
> > EnterCriticalSection() and LeaveCriticalSection(), but is there a cheap
> way
> > to do it?
> Maybe my idea sounds strange and relatively inefficient, but anywy here it
> is. You could use floating-point FLD QWORD PTR ... and then put the value to
> a memory area which is used by only that thread with FST QWORD PTR, and then
> read from there with two MOVs. I hope that helps

This has been suggested previously, this works fine as long as you use
either FILD or an MMX MOVQ operation on a 64-bit aligned element.

Doing an atomic update of such a 64-bit variable will still require the
use of a LOCKed CMPXCH8B, otherwise you'll need some other way to
protect the access.

All the alternatives would however require a much longer time when the
variable is locked, so this will not perform as well on a multi-cpu
system.

Sean Stanek

unread,
Sep 29, 2000, 3:00:00 AM9/29/00
to

You could use either a movq (mmx) or an fpu load to get all 64 bits first,
and then put that in your registers. If you're using mmx, you can movd
the low 32-bits of some mmx register into eax, and then move the upper
32-bits into edx in a variety of different ways. If you use the fpu, you'll
have to store it somewhere else, which might not be what you want. ;)

- vulture a.k.a. Sean Stanek


Reply all
Reply to author
Forward
0 new messages