atom and lock

1,341 views
Skip to first unread message

Warren Lynn

unread,
Jul 17, 2012, 6:57:13 PM7/17/12
to clo...@googlegroups.com
I have a hard time understanding why there is a need to retry when doing "swap!" on an atom. Why does not Clojure just lock the atom up-front and do the update? I have this question because I don't see any benefit of the current "just try and then re-try if needed" (STM?) approach for atom (maybe OK for refs because you cannot attach a lock to unknown ref combinations in a "dosync" clause). Right now I have an atom in my program and there are two "swap!" functions on it. One may take a (relatively) long time, and the other is short. I don't want the long "swap!" function to retry just because in the last minute the short one sneaked in and changed the atom value. I can do the up-front lock myself, but I wonder why this is not already so in the language. Thank you for any enlightenment.

Kevin Downey

unread,
Jul 17, 2012, 7:50:10 PM7/17/12
to clo...@googlegroups.com
if you do it as a lock, then readers must block writers (think it
through). Clojure's reference types + immutable datastructures and the
views on perception that underlay them are strongly opposed to readers
interfering with writers.

http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey around
27:48 Rich discusses perception
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en



--
And what is good, Phaedrus,
And what is not good—
Need we ask anyone to tell us these things?

Warren Lynn

unread,
Jul 17, 2012, 7:58:27 PM7/17/12
to clo...@googlegroups.com


On Tuesday, July 17, 2012 7:50:10 PM UTC-4, red...@gmail.com wrote:
if you do it as a lock, then readers must block writers (think it
through). Clojure's reference types + immutable datastructures and the
views on perception that underlay them are strongly opposed to readers
interfering with writers.



Why is it so? Does not the reader just get a snapshot copy of the atom state and does not care who writes to the original atom? If a lock is needed, it is only needed for a very short commit time (cannot read when a writer is committing), but not during the whole "swap!" function. That still sounds a lot better than re-try to me.


Timothy Baldridge

unread,
Jul 17, 2012, 8:03:28 PM7/17/12
to clo...@googlegroups.com
> Why is it so? Does not the reader just get a snapshot copy of the atom state
> and does not care who writes to the original atom? If a lock is needed, it
> is only needed for a very short commit time (cannot read when a writer is
> committing), but not during the whole "swap!" function. That still sounds a
> lot better than re-try to me.


Well let's examine the very common case:

(def a (atom {}))

(defn add-kv [k v]
(swap! assoc k v))

If I call add-kv from multiple threads, how can I assume that the map
won't be modified in the middle of the assoc? Sure, I could lock, but
read this article first:
http://en.wikipedia.org/wiki/Non-blocking_algorithm

The idea is that with atoms there will always be some progress. Even
if one updater takes an hour to complete, the other threads can
continue as needed. True that single thread will not complete until
there is a 1 hour window, but that's where other synchronization
methods come into play (worker queues, for example).

Timothy

Kevin Downey

unread,
Jul 17, 2012, 8:06:23 PM7/17/12
to clo...@googlegroups.com
Finish the thought, what happens when there is "contention", a thread
reads then writes before you acquire the lock to commit. You can try
and making locking work, but you'll just up with CAS based on a lock

Warren Lynn

unread,
Jul 17, 2012, 8:20:03 PM7/17/12
to clo...@googlegroups.com

(def a (atom {}))

(defn add-kv [k v]
   (swap! assoc k v))

If I call add-kv from multiple threads, how can I assume that the map
won't be modified in the middle of the assoc? Sure, I could lock, but
read this article first:
http://en.wikipedia.org/wiki/Non-blocking_algorithm

The idea is that with atoms there will always be some progress. Even
if one updater takes an hour to complete, the other threads can
continue as needed. True that single thread will not complete until
there is a 1 hour window, but that's where other synchronization
methods come into play (worker queues, for example).

Timothy

The "making progress" seems an illusion here to me. Sure, you can make progress in one thread while another thread is taking one hour to finish its part. But the cost is the "long" thread finally found out "oops, I have to start over my one-hour job".

Being lockless seems useful for certain cases (like real-time system as mentioned in the Wikipedia article). But I still could not grasp the idea how it can increase *real* work throughput, as the problem itself mandates a part of the work can only be done in serial.




Warren Lynn

unread,
Jul 17, 2012, 8:23:56 PM7/17/12
to clo...@googlegroups.com

Finish the thought, what happens when there is "contention", a thread
reads then writes before you acquire the lock to commit. You can try
and making locking work, but you'll just up with CAS based on a lock



I am not saying to throw away the "swap!" syntax. "swap!" syntax guarantees that you if you want to change an atom value, you must start with its latest, current value. I am just saying we'd better exclude any other writer (swap!) from starting once a writer already started to work.

Warren Lynn

unread,
Jul 17, 2012, 11:53:41 PM7/17/12
to clo...@googlegroups.com
For people who are interested, here is my own version of atom updating functions:

;; A wrapped up atom that can be used in those lock-* functions
(deftype LockAtom
  [atom]
  clojure.lang.IDeref
  (deref [this]
    @(.atom this)))

;; (lock-atom (+ 4 5)) => #<LockAtom@4e5497cb: 9>
(defmacro lock-atom
  "Like ATOM, but create a lockable atom."
  [expr]
  `(LockAtom. (atom ~expr)))

(defn lock-swap!
  "Like swap!, however, it will first lock the atom to ensure no other thread can change
  the atom at the same time, so no re-try will happen. This is useful for atom update
  functions that can have side effects, or high contension situations where re-tries are
  causing performance degradation. Works only with lock-atoms."
  [lock-atom & rest-args]
  (locking lock-atom
    (apply swap! (.atom lock-atom) rest-args)))

(defn lock-reset!
  "Like reset!, but works on lock-atoms."
  [lock-atom new-val]
  (locking lock-atom
    (reset! (.atom lock-atom) new-val)))

Ulises

unread,
Jul 18, 2012, 3:21:56 AM7/18/12
to clo...@googlegroups.com
Please excuse my ignorance and my late comment, but can you make your
1hr operation shorter?

The general advice I've always been given has been "whenever you need
to use a resource that might cause contention do it quickly, in and
out in a blink".

I know that the argument then would be "why do I need to change my
code to fit the tool?" but in general terms I'd think that having such
a long running operation is a bad thing anyway?

Unless the 1hr operation is just an exaggeration to get your point
across, in which case just ignore my comment :)

U

Andrew Rafas

unread,
Jul 18, 2012, 6:39:13 AM7/18/12
to clo...@googlegroups.com


On Wednesday, July 18, 2012 1:20:03 AM UTC+1, Warren Lynn wrote:
The "making progress" seems an illusion here to me. Sure, you can make progress in one thread while another thread is taking one hour to finish its part. But the cost is the "long" thread finally found out "oops, I have to start over my one-hour job".

Being lockless seems useful for certain cases (like real-time system as mentioned in the Wikipedia article). But I still could not grasp the idea how it can increase *real* work throughput, as the problem itself mandates a part of the work can only be done in serial.


Being lockless will not introduce more parallelism to an algorithm. So it will not beat Amdahl's law no matter how hard you try. In practice, because of the retries, performance will degrade to a level (sometimes much lower) which can be achieved by using locks. The point of clojure's threading support is (as I see it) is that you do not have to deal with lock hierarchies and deadlocks. On the other hand you cannot really control performance degradation in case of contention (too many retries) so in the end it is just a trade off. For some problems it is worth to make it, for other problems you are better to use locks or work queues.
Hope that helps.
Andrew

Timothy Baldridge

unread,
Jul 18, 2012, 7:48:17 AM7/18/12
to clo...@googlegroups.com
>> Being lockless seems useful for certain cases (like real-time system as
>> mentioned in the Wikipedia article). But I still could not grasp the idea
>> how it can increase *real* work throughput, as the problem itself mandates a
>> part of the work can only be done in serial.


Well first of all, your solution is horribly slow:

user=> (time (dotimes [x 1000000] (lock-swap! la (fn [o n] n) x)))
"Elapsed time: 43289.909709 msecs"
nil
user=> (def a (atom 0))
#'user/a
user=> (time (dotimes [x 1000000] (swap! a (fn [o n] n) x)))
"Elapsed time: 475.085593 msecs"
nil

But that's probably made worse by the use of reflection, so let's try this:

user=> (defn lock-swap! [^LockAtom lock-atom & rest-args] (locking
lock-atom (apply swap! (.atom lock-atom) rest-args)))
#'user/lock-swap!
user=> (time (dotimes [x 1000000] (lock-swap! la (fn [o n] n) x)))
"Elapsed time: 7596.568038 msecs"
nil

So, still it's 20x slower than straight old swap!

In all the multithreaded code I've worked on (and I've worked on quite
a lot in Clojure), I've never had code inside a swap! that would take
more than ~1ms to execute. In fact, I would say that by using delay,
promise and future, you can implement your "1 hour task" with much
less touching of locks. For instance, a naive implementation of an
image cache may look like this:


(def icache (atom {}))

(defn cache-image [url]
(swap! icache assoc url (download-image url)))


However, this is going to hit some of the issues you mentioned above.
We're going to re-download images if there was a swap! retry. Well
here's the correct way to go about this (or one of them).

(def icache (atom {}))

(defn cache-image [url]
(let [f (future (download-image url))]
(swap! icache assoc url f)))

I'll admit, we'll still try to download the image twice if two people
try to cache the same image at the exact same time but we could get
around that using promise or agents. So in general, your attitude when
using atoms, refs, or agents should be "get in and get out". Don't do
your work inside of swap!, send or alter....

Timothy

Brian Hurt

unread,
Jul 18, 2012, 7:55:13 AM7/18/12
to clo...@googlegroups.com
Accesses to atoms are just wrappers around atomic compare and swap instructions at the hardware level.  Locking an object also uses an atomic compare and swap, but piles other stuff on top of it, making it more expensive.  So atoms are useful in situations where there is likely not going to be much in the way of contention (multiple writes happening at the same time), so there won't be that many retries, and where the cost of redoing the computation is low enough that it's cheaper to simply redo an occasional update than pay the cost of locking.  So, a classic example of a good use of atom is to assign ids, like:

(def counter (atom 0))

(def get-id [] (swap! counter inc))

Incrementing an integer is about as cheap as computations get.  Even if some unlucky thread had to redo the computation dozens of times before "winning", that isn't that high of a cost.  And even if you're getting millions of ids a second, the number of collisions you have will be low.  So this is a good use for atoms.  Any more advanced locking behavior would simply slow things down.

Holding hash maps in atoms is a borderline case.  I tend to only do it when I know the update rate will be low, and thus collisions very rare.

For situations where collisions are a lot more common, or where the update is a lot more expensive, I'd use a ref cell and a dosync block.  Of course, if you need atomic updates to multiple different cells, there is no replacement for dosync blocks.

Brian

On Tue, Jul 17, 2012 at 6:57 PM, Warren Lynn <wrn....@gmail.com> wrote:
I have a hard time understanding why there is a need to retry when doing "swap!" on an atom. Why does not Clojure just lock the atom up-front and do the update? I have this question because I don't see any benefit of the current "just try and then re-try if needed" (STM?) approach for atom (maybe OK for refs because you cannot attach a lock to unknown ref combinations in a "dosync" clause). Right now I have an atom in my program and there are two "swap!" functions on it. One may take a (relatively) long time, and the other is short. I don't want the long "swap!" function to retry just because in the last minute the short one sneaked in and changed the atom value. I can do the up-front lock myself, but I wonder why this is not already so in the language. Thank you for any enlightenment.

--

Marshall T. Vandegrift

unread,
Jul 18, 2012, 7:59:02 AM7/18/12
to clo...@googlegroups.com
Warren Lynn <wrn....@gmail.com> writes:

> I have a hard time understanding why there is a need to retry when
> doing "swap!" on an atom. Why does not Clojure just lock the atom
> up-front and do the update?

This is just my two cents, but I think the/one big reason is that
Clojure atoms just *are* non-locking STM-based synchronized references.
They allow you to handle "shared, synchronous, independent state," but
that's their typical use-case, not their definition. The Clojure atom
is defined in terms of mechanism, not application.

If you want locks, the Java standard library has locks aplenty. My
understanding of the Clojure idiom is that when "traditional"
threads-locks-and-queues concurrency is the best fit for the job, then
just use it. My reading of the standard library suggests a strong
preference for using higher-level constructs which leverage shared
thread pools (futures and agents) over raw threads, but absolutely no
shame in using e.g. LinkedBlockingQueue.

Here's my quick stab at something implemented atop Java read-write
locks:

https://gist.github.com/3135772

-Marshall

Meikel Brandmeyer (kotarak)

unread,
Jul 18, 2012, 8:12:09 AM7/18/12
to clo...@googlegroups.com
Hi,


Am Mittwoch, 18. Juli 2012 00:57:13 UTC+2 schrieb Warren Lynn:
I have a hard time understanding why there is a need to retry when doing "swap!" on an atom. Why does not Clojure just lock the atom up-front and do the update? I have this question because I don't see any benefit of the current "just try and then re-try if needed" (STM?) approach for atom (maybe OK for refs because you cannot attach a lock to unknown ref combinations in a "dosync" clause). Right now I have an atom in my program and there are two "swap!" functions on it. One may take a (relatively) long time, and the other is short. I don't want the long "swap!" function to retry just because in the last minute the short one sneaked in and changed the atom value. I can do the up-front lock myself, but I wonder why this is not already so in the language. Thank you for any enlightenment.

Clojure is not optimised for one use case. As the answers of the other participants in this thread showed there is a plethora of use cases where the lock is not necessary and, in the contrary, can be actually harmful. Would Clojure directly use the lock as you propose everyone would have to pay the performance penalty imposed by the locking. And there would be no way to remedy this fact with on-board means.

By not using a lock to protect the update, Clojure allows to be fast where the lock is not necessary and still it is possible for you to base a quite simple LockAtom on the existing implementation as you have proven yourself. I feel the need for LockAtom very often put it in a library on Clojars. If you get a lot of "Hey, Dude! That is exactly what I needed. You saved my life. You are my hero!", then propose it for inclusion into Clojure proper (or some contrib) to have it also in the official distribution.

Kind regards
Meikel

Alan Malloy

unread,
Jul 18, 2012, 3:34:48 PM7/18/12
to clo...@googlegroups.com
Sorta off-topic from the main discussion, but in reference to the error you pointed out, one clever fix for this is to add a delay around the future:

(defn cache-image [icache url]
  (let [task (delay (future (download-image url)))]
    (doto (swap! icache update-in [url] #(or % task))
      (-> (get url) (force)))))

(defn wait-for-image [icache url]
  (deref (force (get icache url))))

As long as your swap! never overwrites an existing delay, only one will get added, and thus only one gets forced.

Warren Lynn

unread,
Jul 18, 2012, 5:10:38 PM7/18/12
to clo...@googlegroups.com
Thanks for the discussion This is not a reply to any particular post. Here is my thinking on the various points raised.

1. The length of the critical section is irrelevant in this discussion. Even with locks, people agree that the critical section should be as short as possible. So the limiting factor is the algorithm itself, with or without lock.

2. I assume even with STM style atom, some kind of lock is happening internally, for the final commit, because when committing, you still need to coordinate the access to some state with other threads. Maybe it is not called a lock, but something similar. So that same mechanism can be used to lock the whole swap! and should not increase any overhead.

3. Is lock so expensive? Let's find out. tbc++ showed my implementation is much slower. That's true, because I want to have a separate type so there is no chance to mix swap! and lock-swap! on one object. And that "apply" is slow too. Let's have a more fair comparison by changing the original lock-swap! to this:

(defn lock-swap!
  [lock-atom f x]
  (locking lock-atom
    (swap! lock-atom f x)))

user> (time (dotimes [x 1000000] (swap! a (fn [o n] n) x)))
"Elapsed time: 41.417993 msecs"
nil
user> (time (dotimes [x 1000000] (lock-swap! a (fn [o n] n) x)))
"Elapsed time: 136.342714 msecs"
nil

The difference is 3.3 times. But consider swap! is already doing some kind of internal locking at commit time as I mentioned before, I am doing (unnecessary) double-locking here, so the actual difference is less (maybe by half, if we assume fn [o n] is super fast). There might be more optimization room If we move the lock into the Clojure core. But the point is, lock is not that expensive at all here.

4. Some may argue that the STM style atom swap is not always better than lock style swap, but sometimes it is (the claim that it is not optimized for all use cases). Given lock is not expensive, can anyone give an example in what case STM style atom swap is better than lock based swap? Basically I am thinking STM atom is always no-better than lock based atom. Even for real-time system, the fact that you don't know when (and how many) re-try may happen will reduce its usefulness a lot.

Timothy Baldridge

unread,
Jul 18, 2012, 5:28:35 PM7/18/12
to clo...@googlegroups.com
>> But consider swap! is already doing some kind of internal locking at commit time as I mentioned before
>> I assume even with STM style atom, some kind of lock is happening internally, for the final commit, because when committing,
>> you still need to coordinate the access to some state with other threads. Maybe it is not called a lock, but something similar.
>> So that same mechanism can be used to lock the whole swap! and should not increase any overhead.

It's not. Locks are created by using CAS, not the other way around.
On a x86 machine the swap basically compiles down to a single assembly
code instruction:

http://jsimlo.sk/docs/cpu/index.php/cmpxchg.html

On a normal x86 machine, every lock in the system will boil down to
using this single instruction. x86 has no concept of "locks". Locks
are simply a construct created by the operating system that is
implemented with a series of cmpxchg instructions. This is the reason
this instruction exists in the first place. Every type of
lock/semephore/mutex we need in a operating system can be built off of
this single instruction. This is also why Clojure includes atoms.

>> And that "apply" is slow too.

Atom's apply uses apply as well:

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Atom.java


Plus, all these tests are worthless without introducing a few threads
into the mix:

user=> (time (doall (pmap (fn [x] (dotimes [x 1000000] (lock-swap! a
(fn [o n] n) x))) (range 10))))
"Elapsed time: 5323.677006 msecs"
(nil nil nil nil nil nil nil nil nil nil)
user=> (time (doall (pmap (fn [x] (dotimes [x 1000000] (swap! a (fn
[o n] n) x))) (range 10))))
"Elapsed time: 708.689155 msecs"
(nil nil nil nil nil nil nil nil nil nil)


Timothy

Timothy Baldridge

unread,
Jul 18, 2012, 5:31:09 PM7/18/12
to clo...@googlegroups.com
> It's not. Locks are created by using CAS, not the other way around.
> On a x86 machine the swap basically compiles down to a single assembly
> code instruction:
>

Eh, let me clarify that....locks do exist on x86, it's just that they
only lock a single assembly instruction. The complete list of
instructions that can be locked is here:

http://jsimlo.sk/docs/cpu/index.php/lock.html

Warren Lynn

unread,
Jul 18, 2012, 7:48:09 PM7/18/12
to clo...@googlegroups.com

It's not. Locks are created by using CAS, not the other way around.
On a x86 machine the swap basically compiles down to a single assembly
code instruction:

http://jsimlo.sk/docs/cpu/index.php/cmpxchg.html

On a normal x86 machine, every lock in the system will boil down to
using this single instruction. x86 has no concept of "locks". Locks
are simply a construct created by the operating system that is
implemented with a series of cmpxchg instructions. This is the reason
this instruction exists in the first place. Every type of
lock/semephore/mutex we need in a operating system can be built off of
this single instruction. This is also why Clojure includes atoms.



Ok. Now I understand. So atom wants to take advantage of this hardware high efficiency CAS. Then I agree if the load is not high, atom will be faster than locking. Thanks for pointing this out.

Now I got a broader question, why CAS is hardware supported, but lock is not (i.e., why it is not the other way around)? I used to work on some firmware, and we have hardware mutex. Why this is not generally the case for general purpose CPUs?

Timothy Baldridge

unread,
Jul 18, 2012, 8:38:21 PM7/18/12
to clo...@googlegroups.com
> Now I got a broader question, why CAS is hardware supported, but lock is not
> (i.e., why it is not the other way around)? I used to work on some firmware,
> and we have hardware mutex. Why this is not generally the case for general
> purpose CPUs?

There's several issues at work here, I'll try to cover them, but this
comes with a massive disclaimer I'm going to be radically
oversimplifying here.

Two factors at work here is the fact that modern CPUs have several
layers of caches (L3, L2, and L1 in most modern multi core CPUs).
Secondly, modern systems can have cores that share none or more of
these caches. So this all has to be handled in some way.

In addition, most systems only support loading memory in cache lines.
IIRC, today most cache lines are 16KB. So when you read a single byte,
the 16KB around that memory location is loaded as well.

So, when you do a swap in a x86, it first tells all other cores in the
system to flush their cache lines that correspond with the memory
location the swap is about to happen on. This will cause these CPUs to
lock if they attempt to read from that cache line. From there the core
is free to update the cache line and write it out to memory, and then
finally release it back to the other CPUs.

If this all sounds expensive, it really is. Part of this too is that
CPUs buffer reads/writes in queues, before a swap the pending
reads/writes must be flushed. Frankly I'm amazed that CAS works as
well as it does.

So there's a odd side-effect here. Notice how it locks a whole cache
line (16KB)? This means that if you allocated 4K atoms from the same
cache line, swapping one would cause the others to lock during the
CAS.

You can probably imagine how bad this would all get if we were allowed
to just randomly start and stop locks.

So yeah, there it is, simplified (and probably partly wrong). Hope
that helps ;-)

Timothy

Softaddicts

unread,
Jul 18, 2012, 9:10:26 PM7/18/12
to clo...@googlegroups.com
With multiple CPUs, for costs and design complexity reasons, coordination uses
test and set instructions in shared memory.

When the bit is set, other contenders can either try latter or enter a spin loop
(spin lock) retrying the operation until it succeeds.

Implementing complex hardware atomic instructions coordinated between all CPUs
was the norm before RISC chips came out.

Simulation proved that this approach was too slow given the speed boost
of RISC chips, memory access speed was impaired when implementing
complex coordination between all CPUs and RISC chips were spending
too much time waiting for the memory subsystem to keep the pace.

From then on, test and set shared memory became the norm.

Luc P.
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
--
Softaddicts<lprefo...@softaddicts.ca> sent by ibisMail from my ipad!

Timothy Baldridge

unread,
Jul 19, 2012, 2:46:15 PM7/19/12
to clo...@googlegroups.com
> The cost of
> retrying a CAS operation a few times is relatively trivial.

Not to mention that most of the time locks, thread sleeping, etc. all
involve a context switch into the kernel. Where a CAS is done in
userspace.

Timothy

Stefan Ring

unread,
Jul 26, 2012, 2:28:31 AM7/26/12
to clo...@googlegroups.com
> In addition, most systems only support loading memory in cache lines.
> IIRC, today most cache lines are 16KB. So when you read a single byte,
> the 16KB around that memory location is loaded as well.

The cache line size on x86 is 32 bytes on 32 bit systems, not 16KB. On
64 bit systems, it's 64 bytes.

> So there's a odd side-effect here. Notice how it locks a whole cache
> line (16KB)? This means that if you allocated 4K atoms from the same
> cache line, swapping one would cause the others to lock during the
> CAS.

Sort of, but this false sharing and subsequent performance degradation
happens whether or not you use CAS to access the items in a cache
line.

Timothy Baldridge

unread,
Jul 26, 2012, 9:57:21 AM7/26/12
to clo...@googlegroups.com
On Thu, Jul 26, 2012 at 1:28 AM, Stefan Ring <stef...@gmail.com> wrote:
> In addition, most systems only support loading memory in cache lines.
> IIRC, today most cache lines are 16KB. So when you read a single byte,
> the 16KB around that memory location is loaded as well.

The cache line size on x86 is 32 bytes on 32 bit systems, not 16KB. On
64 bit systems, it's 64 bytes.

Eh, you're right....I knew I was mixing something up there. I dug up CPU-Z on my windows machine and you're right, 32KB L1 cache, 64byte line size. 
 

 
> So there's a odd side-effect here. Notice how it locks a whole cache
> line (16KB)? This means that if you allocated 4K atoms from the same
> cache line, swapping one would cause the others to lock during the
> CAS.

Sort of, but this false sharing and subsequent performance degradation
happens whether or not you use CAS to access the items in a cache
line.

How so? If I have 4 cores all reading from the same cache line, with no writers, there won't be a cache degradation issue there will there? 

Ah, I just looked up "false sharing". You're mentioning that the issue is with multiple writers to the same cache line, but different elements in the cache. Yes, I agree, that's a performance problem.  I guess in the original context we were discussing LOCK vs CAS. It seems to me that hardware deadlocks could be a big issue if two cores tried to aquire exclusive access to elements that existed in the same cache line. 

As a side note, I'm super psyched about the new Hardware Transactional Memory being implemented by Intel and AMD. It'll probably be a long time before we get it in Java, but it's pretty cool none-the-less. http://www.realworldtech.com/haswell-tm/

The Intel approach has a bad limitation in that it tracks all memory modified in a transaction (instead of allowing the programmer to mark certain reads as non-critical), so the AMD approach may end up being the better solution here.

Timothy
 
Reply all
Reply to author
Forward
0 new messages