__sync_bool_compare_and

Pawel

unread,

Mar 3, 2009, 1:42:01 PM3/3/09

to Cocotron Developers, pawel....@gmail.com

Hi!

I'm trying to figure out where is the "__sync_bool_compare_and_swap()"
function defined or declared?
Gcc seems to translate it (in my case) into a built in that it calls
"__sync_bool_compare_and_swap_4", but I keep it as an unresolved
symbol... Can't quite figure out what should be the source for the
actual code of this function.... Any hints? :)

(Googling this suggests that I weren't able to compile the builtins
for my platform, but still no idea, how to actually fix this, i.e.
where the actual code for these builtins is...)

Thanks!
Pawel.

Pawel

unread,

Mar 3, 2009, 2:19:26 PM3/3/09

to Cocotron Developers

And from what I understood more, I would either need to add the code
for
this function somewhere in Cocotron source itself, or to add it to the
gcc
for my platform, but I don't have a clue as to where the actual code
for these
functions is in GCC, not that I didn't look... So any hints on that
would be
really appreciated, otherwise, I'll stick it somewhere in Cocotron...

Thanks!
Pawel.

Johannes Fortmann

unread,

Mar 3, 2009, 2:44:38 PM3/3/09

to Cocotron Developers

On 3 Mrz., 20:19, Pawel <pawel.vese...@gmail.com> wrote:
> And from what I understood more, I would either need to add the code
> for
> this function somewhere in Cocotron source itself, or to add it to the
> gcc
> for my platform, but I don't have a clue as to where the actual code
> for these
> functions is in GCC, not that I didn't look... So any hints on that
> would be
> really appreciated, otherwise, I'll stick it somewhere in Cocotron...
>

The function is a gcc intrinsic. Depending on your platform, you may
need to use gcc 4.2; you also have to tell gcc that it may use non-386
machine code via "-march=i686". Maybe 586 would also work (I think
that's where the cmpxchg instruction was introduced).

Cheers,
Johannes

Pawel

unread,

Mar 3, 2009, 3:56:48 PM3/3/09

to Cocotron Developers

Oh, thanks!

I've looked a bit deeper, and accidentally figured out that
these functions are defined as "sync_compare_and_swap",
somewhere in the config directory per architecture.

I'm actually trying to emit for ARM platform, and it doesn't
seem that there is any support for those for ARM.

So, have to write those for myself, I guess.... (I hate GNU assembler)

Thanks!
Pawel.

On Mar 3, 11:44 am, Johannes Fortmann

Supreme Cocotron Committee

unread,

Mar 3, 2009, 11:46:54 PM3/3/09

to Cocotron Developers

Hi, you only need a full implementation of the function if you're
planning on using threads and there might be an equivalent or similar
enough function available on the target OS (Windows?Linux?yea, I'm
curious :) we could use until gcc supports the builtin.

Chris

Pawel

unread,

Mar 4, 2009, 12:14:24 AM3/4/09

to Cocotron Developers

Hi,

It's a Linux platform, with ARM5 architecture.
Yes, I've put in a place holder that is just doing that:

if (*ptr == test) { *ptr = other; return 1; } return 0;

I will be using threads, so I will probably need a proper
implementation.
I wanted to do it right away, but as I said, I hate GCC assembler, and
I need to manipulate CPSR, since ARM5 doesn't have LDREX/STREX yet,
so I got too lazy.

Thanks!
Pawel.

P.S. Right now my other problem is that GCC keeps injecting references
to gnu_personality_v0 for exception handling,
even though I combed the sucker for enforcing SJLJ exception model.

On Mar 3, 8:46 pm, Supreme Cocotron Committee <cocot...@gmail.com>
wrote:

Pawel Veselov

unread,

Mar 4, 2009, 12:24:24 AM3/4/09

to cocotr...@googlegroups.com

> and there might be an equivalent or similar
> enough function available on the target OS

So, is there something like that in, say, standard libc?
Didn't find anything on pages like that one:
http://www.kernel.org/doc/man-pages/online/dir_all_alphabetic.html

Thanks!
Pawel.

Supreme Cocotron Committee

unread,

Mar 20, 2009, 9:18:40 AM3/20/09

to Cocotron Developers

I ran across this thread:

http://www.nabble.com/Using-__kuser_cmpxchg-td13777242.html talking
about implementing compare and swap on ARM processors and what I am
reading is that it is basically impossible to implement so it is fast
as it requires a kernel call to disable interrupts. We might need a
different code path which uses just plain locks for ARM.

Chris

Pawel Veselov

unread,

Mar 20, 2009, 5:08:51 PM3/20/09

to cocotr...@googlegroups.com

My understanding was that:

For V6 and above: You should use LDREX/STREX, because LDREX "locks"
the memory word to this CPU only.

For V5 and below: You can use SWP, or whatever else, but you have to
disable the interrupts to guarantee safety of your memory contents.

However, for V5, disabling interrupts will not be enough in case of
SMP. There seems to be thinking that it's a bad idea to do so anyway,
but I'm not sure whether any other synchronization method will be more
effective, at least to the thread that's executing now. (Probably,
doing other synchronization will work slowly for the current thread,
but have more impact on other system events).

Also, for V6, I can't find anything that says the memory access is
guaranteed to be exclusive to multiple threads. The ARM doc says that
the LDREX "...marks the physical address as exclusive access for the
executing processor in a shared monitor...", but I don't see how a a
code executed in a context switch, or any other external interrupt on
the SAME CPU, won't be able to modify that memory word.... (Which
would mean that the interrupts need to be as disabled even if
LDREX/STREX is used).

Also, I am concerned about the logic diagrams in ARM doc, as it seems that the
LDREX will lock the memory if the condition is true, but STREX will
not unlock the memory
if the condition is false. I would see the most common ways to use
LDREX/STREX is:

LDREXAL R0 ; [R0] -> R0, always load
CMP R0, R1 ; do we need to swap ? (wathever is the instruction for comparing)
STREXNE R2, R1, R0 ; R1 -> [R0], status in R2, only store if weren't equal

I can make the LDREX condition on the same comparison before the load, but
even then there is guarantee I would have to do the store. And it doesn't make
sense to unlock the memory (anyway, I don't know how) using another instruction
than STREX...

Thanks,
Pawel.

--
With best of best regards
Pawel S. Veselov

Brent Priddy

unread,

Mar 20, 2009, 5:49:28 PM3/20/09

to cocotr...@googlegroups.com

I am not experienced on ARM V6 LDREX/STREX, but am for PPC and MIPS instructions that serve a similar purpose.

A tiny bit of searching provided http://www.doulos.com/knowhow/arm/Hints_and_Tips/Implementing_Semaphores/ look under "Exclusive Load and store"

The LDREX/STREX work similar to the "stronger" implemented PPC lwarx/stwcx, they just set/check a reservation on the memory address, if any other instructions access the memory (even mistakenly within the LDREX STREX tight loop) the reservation on the memory is broken and thus the STREX will fail. The "lock" that you mention is not a read write lock but rather is like a whistle blower, a tattle tale if you will. It lets the STREX instruction know that someone messed with the memory since the last LDREX and you have to loop and try your operation all over again. the link above even has a good example of a semaphore implementation which is very close to the cmpxchg you are looking for.

You have to be careful about the OS implementation/use of these instructions, the PPC requires that you do a "dummy" lwarx/stwcx when you handle interrupts or task switch. This is mainly because they use the reservation and the cache line of the memory that you are doing an atomic operation on. I dont know if the ARM has a requirement like this, but the PPC has a very long application note detailing the use of these instructions.

Pawel Veselov

unread,

Mar 20, 2009, 9:11:49 PM3/20/09

to cocotr...@googlegroups.com

Ah, I see about the "reservation" thing. Thank you, that really
explains it to me.

I would, though, imagine that in times of high contention, such code
will spin a lot,
since you are as likely to break someone else's reservation, and have
your reservation
broken at the same time.

Because I'm on ARMV5, LDREX/STREX won't work for me, unfortunately. I would
imagine that the code would look like:
; r0 = address, r1 = old value, r2 = new value
; move address to r3, r0 is used as a return register
MOV r3, r0
spin:
LDREX r0, [r3]
TEQ r1, r0
MOVNE r0, #1
STREXEQ r0, r2, [r3]
TEQEQ r0, #1
; if Z was 0 after CMP, it will be 0 here, nothing would've happened,
we need to leave
; if Z was 1 after CMP, it will be 1 now to mean to retry, or 0 to mean to leave
BEQ spin
; r0 is 1, if the comparison was negative
; r0 is 0, if the STREX succeeded
; we need to return 1 if the value was changed though, so need to reverse
RSB r0, r0, #1
; ABI return code here

For V5, as the page says, I don't see how SWP could help, so the only
thing left is interrupt locking, which won't work in SMP...

Thanks!
Pawel.

Brent Priddy

unread,

Mar 22, 2009, 10:07:59 PM3/22/09

to cocotr...@googlegroups.com

It is amusing that the linux kernel #errors with ARM architectures < 6 and SMP support enabled... they said "SMP FAIL"

another link mostly regarding < v6
http://infocenter.arm.com/help/topic/com.arm.doc.faqs/ka4175.html?resultof=%22%53%57%50%22%20%22%73%77%70%22%20%22%6d%75%6c%74%69%22%20
the SWP blocks busmaster access as the page mentiones, I am not familiar with ARM architecture enough to say that SWP is ok for SMP because of the bus locking. I wonder why linux did not implement the SMP safe atomic functions using SWP...
what ARM v5 multicore processor are you using (or the question might just be how are the UP ARMv5 connected)? Do they have any application notes about synchronization primitives? Do you have access to an FAE from the company that makes/sells the processor? I would ask the FAE some of these questions.

Regarding contention, If you have high contention then task switching is a lot more CPU intensive than spinning, this is why OSX, Freebsd and others spinlock on a mutex when another cpu is running the thread that owns the lock that you are going to block on. The idea/assumption is that the other CPU is probably going to finish the critical section faster than it would take to task switch the blocked thread out. you do have the possibility of live-lock though, very fun indeed.

Reply all

Reply to author

Forward

__sync_bool_compare_and_swap function

Pawel

Pawel

Johannes Fortmann

Pawel

Supreme Cocotron Committee

Pawel

Pawel Veselov

Supreme Cocotron Committee

Pawel Veselov

Brent Priddy

Pawel Veselov

Brent Priddy