Transactional memory mutex

110 views
Skip to first unread message

Graham Miller

unread,
Feb 10, 2016, 4:12:30 AM2/10/16
to lock...@googlegroups.com
I am interested in exploring the transactional memory features of the new Intel processors (using the GCC transactional features described in [1]).  To do so I created a simple implementation of a mutex using this post about futexes [2] as a guide.

I'm looking for feedback on the implementation, if anyone has a moment.  It is just the mutex implementation, so comprises only about 100 lines of code:


I would appreciate any comments you have, though I am particularly focused on correctness at this point, not performance.

Thank you in advance.

graham

[1] https://3f993110-a-62cb3a1a-s-sites.googlegroups.com/site/tmforcplusplus/C%2B%2BTransactionalConstructs-1.1.pdf
[2] http://locklessinc.com/articles/mutex_cv_futex/

Dmitry Vyukov

unread,
Feb 15, 2016, 4:10:17 PM2/15/16
to Scalable Synchronization Algorithms
On Wednesday, February 10, 2016 at 10:12:30 AM UTC+1, Graham Miller wrote:
I am interested in exploring the transactional memory features of the new Intel processors (using the GCC transactional features described in [1]).  To do so I created a simple implementation of a mutex using this post about futexes [2] as a guide.

I'm looking for feedback on the implementation, if anyone has a moment.  It is just the mutex implementation, so comprises only about 100 lines of code:


I would appreciate any comments you have, though I am particularly focused on correctness at this point, not performance.


Hi Graham,


The transactions are meant to simplify concurrent programming, but I am actually not sure that they succeed in this case. This seems to be more difficult to read and understand and plain old atomics-based code. It would be simpler if you surround whole functions with a single transaction block, but then I guess you would have issues with blocking on futex...

You use __transaction_atomic, atomic transactions can't interfere with memory accesses in other threads. So can't the following loop actually abort concurrent attempts to lock the mutex, defeating its whole purpose?

  /* Spin and hope someone takes the lock */
  for (i = 0; i < 200; i++) {
    if (m->b.locked) return 0;
    cpu_relax();
  }

Also I guess the read inside of FUTEX_WAIT_PRIVATE can abort a concurrent attempt to unlock the mutex.

Also I've noticed that it's trickier than it appears on first sight because futexes are subject to spurious wakeups, so "m->b.contended = 0;" can actually reset contended as set by the _next_ thread (the one that was unblocked by a spurious wakeup). I am not sure whether it can lead to a deadlock or not.

The /* Spin and hope someone takes the lock */ optimization is usually useful only for synthetic benchmarks and harmful for real workloads (that loop wastes CPU time). What's usually useful in real program is futile wakeup throttling optimization.

But in general I am not sure that using transnational memory for mutex implementation is a good idea at all. TM is meant for situations when transactions can proceed in parallel. Here is no chance that two transactions don't conflict. An implementation based on CAS-loop that atomically moves the mutex state through well defined states should be cleaner and faster.

What could make sense is transaction lock _elision_ (when you turn the whole critical section into a transaction). Such critical sections can proceed in parallel for e.g. hash tables. See e.g. https://lwn.net/Articles/534758/

Reply all
Reply to author
Forward
0 new messages