portable lock-based doubly linked-list mutation techniques...

Chris Thomasson

unread,

Apr 8, 2008, 3:19:22 PM4/8/08

to

Here is some pseudo-code that "attempts" to show a lock-based method for
removing a node from a doubly linked-list while still allowing for
concurrent mutations on "other" parts of the list:
_______________________________________________________________
struct dlnode {
dlnode* next;
dlnode* prev;
};

void dlnode_init(dlnode* const _this) {
/* next/prev pointing to self is "empty" list */
_this->next = _this->prev = _this;
}

void sort(pthread_mutex_t* const _this[], size_t depth) {
/* simple sort & null out the duplicates */
}

void sort_and_lock(pthread_mutex_t* _this[], size_t depth) {
int i = 0;
sort(_this);
for(; i < depth; ++i) {
if (_this[i]) {
pthread_mutex_lock(_this[i]);
}
}
}

void unlock(pthread_mutex_t* _this[], size_t depth) {
int i = depth - 1;
for(; i > -1; --i) {
if (_this[i]) {
pthread_mutex_unlock(_this[i]);
}
}
}

void pop(dlnode volatile* const _this) {
pthread_mutex_t* locks[3] = { NULL };
assert(_this->next != _this && _this->prev != _this);
for (;;) {
dlnode cmp = *_this;
locks[0] = hash_lock_from_ptr(_this);
locks[1] = hash_lock_from_ptr(cmp.next);
locks[2] = hash_lock_from_ptr(cmp.prev);
sort_and_lock(locks, 3);
if (cmp.next == _this->next &&
cmp.prev == _this->prev &&
cmp.next->prev == _this &&
cmp.prev->next == _this) {

/* we have consistency! ;^) */

/* mark node as removed */
_this->next = _this->prev = _this;

/* unlink it */
cmp.next->prev = cmp.prev;
cmp.prev->next = cmp.next;

/* unlock everything because we are done */
unlock(locks);
break;
}
unlock(locks);
/* use your choice of back-off algorithm... */
sched_yield();
}
}
_______________________________________________________________

This quick sample code assumes a global locking-table with a function that
can hash pointers into indexes. It basically reads a snapshot of the node to
be removed and then atomically locks it, and the contained next/prev
pointers. The locks are always pre-sorted and duplicates are set NULL so
there is no risk of deadlock, and no need for recursive lock attribute.
After the locking is completed, it double checks what it read and ensures
that the next/prev node are properly linked to it. If everything is
consistent, then it marks the node as removed and unlinks it from the list,
unlocks and returns. Otherwise, it unlocks and executed a back-off algorithm
(e.g., exponential), and tries again.

The only benefit that comes along with this type of technique is that you
can potentially have several threads removing nodes concurrently, which
should scale better than using a single lock for the whose list. Also, there
is not really a non-blocking algorithm that efficiently works on doubly
linked-lists; locks to the rescue! ;^)

Anyway, this is kind of like Software Transactional Memory, without the need
to use a full-blown STM library. One other thing... It works using PThreads
which means its very portable indeed. Can you notice any potential problems?

Any comments/suggestions are welcome!

Thanks.

--
Chris M. Thomasson
http://appcore.home.comcast.net

Chris Thomasson

unread,

Apr 8, 2008, 3:41:14 PM4/8/08

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:dp-dnRSiNt9uWWba...@comcast.com...

> Here is some pseudo-code that "attempts" to show a lock-based method for
> removing a node from a doubly linked-list while still allowing for
> concurrent mutations on "other" parts of the list:
> _______________________________________________________________

[...]
> _______________________________________________________________
>
>
[...]

>
> The only benefit that comes along with this type of technique is that you
> can potentially have several threads removing nodes concurrently, which
> should scale better than using a single lock for the whose list. Also,
> there is not really a non-blocking algorithm that efficiently works on
> doubly linked-lists; locks to the rescue! ;^)

[...]

This raises a question... How do you allow for readers to traverse the list
without having to take a lock on each node visited (e.g., lock-chaining)? We
can't use a rw-mutex because that would only allow for a single writer to
mutate the list, which defeats the purpose of the ability to have concurrent
mutations. We cannot use non-blocking algorithms because that would not be
portable. Therefore, we need a synchronization primitive that can allow
multiple-readers _and_ multiple-writers, but not both at the same time.
Luckily PThreads provides everything we need to construct such a thing:

<stupid simple code-sketch>
_______________________________________________________________
struct rdbarrier {
int reads; /* = 0 */
int writes; /* = 0 */
pthread_mutex_t mtx;
pthread_cond_t cond;
};

void rdbarrier_lock(rdbarrier* const _this) {
pthread_mutex_lock(&_this->mtx);
while (_this->writes) {
pthread_cond_wait(&_this->cond, &_this->mtx);
}
++_this->reads;
pthread_mutex_unlock(&_this->mtx);
}

void rdbarrier_unlock(rdbarrier* const _this) {
int reads;
pthread_mutex_lock(&_this->mtx);
assert(! _this->writes);
reads = --_this->reads;
pthread_mutex_unlock(&_this->mtx);
if (! reads) {
pthread_cond_broadcast(&_this->cond);
}
}

void rdbarrier_block(rdbarrier* const _this) {
pthread_mutex_lock(&_this->mtx);
while (_this->reads) {
pthread_cond_wait(&_this->cond, &_this->mtx);
}
++_this->writes;
pthread_mutex_unlock(&_this->mtx);
}

void rdbarrier_unblock(rdbarrier* const _this) {
int writes;
pthread_mutex_lock(&_this->mtx);
assert(! _this->reads);
writes = --_this->writes;
pthread_mutex_unlock(&_this->mtx);
if (! writes) {
pthread_cond_broadcast(&_this->cond);
}
}
_______________________________________________________________

We can use the above synchronization primitive, which I named read-barrier,
to allow for multiple readers to traverse the list, and still allow for
multiple-writers to concurrently mutate it. We can now rewrite the pop
function shown in the previous post as:
_______________________________________________________________

void pop(dlnode volatile* const _this) {
pthread_mutex_t* locks[3] = { NULL };
assert(_this->next != _this && _this->prev != _this);

rdbarrier_block(...);

for (;;) {
dlnode cmp = *_this;
locks[0] = hash_lock_from_ptr(_this);
locks[1] = hash_lock_from_ptr(cmp.next);
locks[2] = hash_lock_from_ptr(cmp.prev);
sort_and_lock(locks, 3);
if (cmp.next == _this->next &&
cmp.prev == _this->prev &&
cmp.next->prev == _this &&
cmp.prev->next == _this) {

/* we have consistency! ;^) */

/* mark node as removed */
_this->next = _this->prev = _this;

/* unlink it */
cmp.next->prev = cmp.prev;
cmp.prev->next = cmp.next;

/* unlock everything because we are done */
unlock(locks);
break;
}
unlock(locks);
/* use your choice of back-off algorithm... */
sched_yield();
}

rdbarrier_unblock(...);
}

/* a read-only traversal would do: */
int read_only(dlnode* const _this) {
{
int i = 0;
rdbarrier_lock(...);
dlnode* node = _this;
do {
++i;
node = node->next;
} while (node != _this);
rdbarrier_unlock(...);
return i;
}
_______________________________________________________________

I know that SenderX would say that this is 100% total crap and will move
1000x slower than a slug climbing a very steep mountain of sea-salt... But,
I want to keep this within the PThread Standard...

;^)

Any thoughts?

Dmitriy V'jukov

unread,

Apr 9, 2008, 3:37:43 PM4/9/08

to

On Apr 8, 11:19 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> The only benefit that comes along with this type of technique is that you
> can potentially have several threads removing nodes concurrently, which
> should scale better than using a single lock for the whose list. Also, there
> is not really a non-blocking algorithm that efficiently works on doubly
> linked-lists; locks to the rescue! ;^)

See this "Lock-Free and Practical Deques and Doubly Linked Lists using
Single-Word Compare-And-Swap":
http://www.cs.chalmers.se/~dcs/ConcurrentDataStructures/phd_chap7.pdf

IIRC, this algorithm always maintains 'next' pointers in consistent
state. 'Prev' pointers can be temporary inconsistent, they are used as
'helper' links. This allows lock-free 'forward' iteration over doubly-
linked list.

Dmitriy V'jukov

unread,

Apr 9, 2008, 3:40:41 PM4/9/08

to

On Apr 8, 11:41 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> We can use the above synchronization primitive, which I named read-barrier,
> to allow for multiple readers to traverse the list, and still allow for
> multiple-writers to concurrently mutate it. We can now rewrite the pop
> function shown in the previous post as:

This is called 'Room synchronization':
http://en.wikipedia.org/wiki/Room_synchronization
http://www.cs.cmu.edu/~guyb/papers/rooms2001.pdf

Dmitriy V'jukov

unread,

Apr 9, 2008, 4:12:05 PM4/9/08

to

On Apr 8, 11:41 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> I know that SenderX would say that this is 100% total crap and will move
> 1000x slower than a slug climbing a very steep mountain of sea-salt...

Yeah, that's right. I wonder how precisely you describe operation of
the algorithm :)

> But, I want to keep this within the PThread Standard...

It seems that POSIX doesn't allow efficient synchronization. Every
simplest operation requires atomic operations and cache-line
transfers, making algorithm negative scalable. The smarter algorithm
you think out the more atomic operations and cache-line transfers
occurs :(

If you want to write portable efficient synchronization algorithms you
better to stick to Java. It provides support for atomic operations and
volatile stores and loads etc. At least this allows you to create lock-
free reader pattern.
Or wait for upcoming C++0x, which will have unprecedented support for
synchronization.

Dmitriy V'jukov

unread,

Apr 9, 2008, 4:20:38 PM4/9/08

to

On Apr 8, 11:41 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> <stupid simple code-sketch>

What do you think about this:

struct rdbarrier {
int reads; /* = 0 */
int writes; /* = 0 */

int pending_reads; /* = 0 */ //<---------------
int pending_writes; /* = 0 */ //<---------------
pthread_mutex_t mtx;
pthread_cond_t cond;

};

void rdbarrier_lock(rdbarrier* const _this) {
pthread_mutex_lock(&_this->mtx);

++_this->pending_reads; //<---------------

while (_this->writes) {
pthread_cond_wait(&_this->cond, &_this->mtx);
}

--_this->pending_reads; //<---------------
++_this->reads;
pthread_mutex_unlock(&_this->mtx);

}

void rdbarrier_unblock(rdbarrier* const _this) {
int writes;
int pending_reads;

pthread_mutex_lock(&_this->mtx);
assert(! _this->reads);
writes = --_this->writes;

pending_reads = _this->pending_reads; //<---------------
pthread_mutex_unlock(&_this->mtx);
if (! writes && pending_reads) { //<---------------
pthread_cond_broadcast(&_this->cond);
}
}

Dmitriy V'jukov

unread,

Apr 9, 2008, 4:24:33 PM4/9/08

to

On Apr 8, 11:19 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> Here is some pseudo-code that "attempts" to show a lock-based method for
> removing a node from a doubly linked-list while still allowing for
> concurrent mutations on "other" parts of the list:

> Any comments/suggestions are welcome!

You can consider embedding mutexes into nodes. This will:
- eliminate need for sorting, just lock in 'next-link-order'
- allow more concurrency
- improve locality of references

I'm not sure what size have typical pthread mutex...
But you know that potentially sizeof mutex is 1 word/byte/bit

Dmitriy V'jukov

unread,

Apr 9, 2008, 4:30:15 PM4/9/08

to

On Apr 8, 11:19 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> void pop(dlnode volatile* const _this) {
> pthread_mutex_t* locks[3] = { NULL };
> assert(_this->next != _this && _this->prev != _this);
> for (;;) {
> dlnode cmp = *_this;
> locks[0] = hash_lock_from_ptr(_this);
> locks[1] = hash_lock_from_ptr(cmp.next);
> locks[2] = hash_lock_from_ptr(cmp.prev);
> sort_and_lock(locks, 3);

What do you think about this:

void pop(dlnode volatile* const _this) {

pthread_mutex_t* locks[4] = { NULL };

assert(_this->next != _this && _this->prev != _this);
for (;;) {
dlnode cmp = *_this;

locks[0] = hash_lock_from_ptr(&cmp.next);
locks[1] = hash_lock_from_ptr(&cmp.prev);
locks[2] = hash_lock_from_ptr(&cmp.next->prev);
locks[3] = hash_lock_from_ptr(&cmp.prev->next);
sort_and_lock(locks, 4);

?

Then you can make following thing:

void pop(dlnode volatile* const _this) {

pthread_mutex_t* locks[4] = { NULL };

assert(_this->next != _this && _this->prev != _this);
for (;;) {
dlnode cmp = *_this;

locks[0] = hash_lock_from_ptr(&cmp.next);
locks[1] = hash_lock_from_ptr(&cmp.prev);
locks[2] = hash_lock_from_ptr(&cmp.next->prev);
locks[3] = hash_lock_from_ptr(&cmp.prev->next);
if (! sort_and_try_lock(locks, 4)) { //
<------------------------------

/* use your choice of back-off algorithm... */
sched_yield();

continue;

}
if (cmp.next == _this->next &&
cmp.prev == _this->prev &&
cmp.next->prev == _this &&
cmp.prev->next == _this) {

/* we have consistency! ;^) */

/* mark node as removed */
_this->next = _this->prev = _this;

/* unlink it */
cmp.next->prev = cmp.prev;
cmp.prev->next = cmp.next;

/* unlock everything because we are done */
unlock(locks);
break;
}
unlock(locks);
/* use your choice of back-off algorithm... */
sched_yield();
}
}

Why one must wait for locks just to see that links are changes?
Failure to *try* lock at least one mutex unconditionally means that
links are changed.

Dmitriy V'jukov

Chris Thomasson

unread,

Apr 10, 2008, 8:09:04 PM4/10/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
news:98049c20-8bfb-4b66...@a5g2000prg.googlegroups.com...

I am not sure I like their iteration logic. The "curser" functions seem a
bit expensive for a simple traversal. I would also scrap the way their are
caching nodes. PDR would be a better solution here. Need to think some more
on this. Anyway, I was trying to come up with something that follows PThread
standard.

Chris Thomasson

unread,

Apr 10, 2008, 8:12:18 PM4/10/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:ce3ecdae-9d0e-4097...@1g2000prf.googlegroups.com...

> On Apr 8, 11:19 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>> Here is some pseudo-code that "attempts" to show a lock-based method for
>> removing a node from a doubly linked-list while still allowing for
>> concurrent mutations on "other" parts of the list:
>
>> Any comments/suggestions are welcome!
>
>
> You can consider embedding mutexes into nodes. This will:
> - eliminate need for sorting, just lock in 'next-link-order'
> - allow more concurrency
> - improve locality of references

There might be a problem with destroying nodes. You may need to be able to
atomically release a lock and destroy the object which contains the mutex.

> I'm not sure what size have typical pthread mutex...
> But you know that potentially sizeof mutex is 1 word/byte/bit

If you use a try-lock scheme, you can simply use a spinlock per-node. Which
can be small.

Chris Thomasson

unread,

Apr 10, 2008, 8:19:41 PM4/10/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:f6aabd1f-114c-4e48...@z24g2000prf.googlegroups.com...

'locks[2]' and 'locks[3]' both have a high-probability of pointing to
'_this'. Even though the sort will NULL out duplicated, it seems cleaner to
just have:

locks[0] = hash_lock_from_ptr(_this);
locks[1] = hash_lock_from_ptr(cmp.next);
locks[2] = hash_lock_from_ptr(cmp.prev);

?

> if (! sort_and_try_lock(locks, 4)) { //

If your using try-locks I really don't see a need to sort. The only thing
you would need is a function to detect and NULL out duplicate locks.

> <------------------------------
> /* use your choice of back-off algorithm... */
> sched_yield();
> continue;
> }
> if (cmp.next == _this->next &&
> cmp.prev == _this->prev &&
> cmp.next->prev == _this &&
> cmp.prev->next == _this) {
>
> /* we have consistency! ;^) */
>
> /* mark node as removed */
> _this->next = _this->prev = _this;
>
> /* unlink it */
> cmp.next->prev = cmp.prev;
> cmp.prev->next = cmp.next;
>
> /* unlock everything because we are done */
> unlock(locks);
> break;
> }
> unlock(locks);
> /* use your choice of back-off algorithm... */
> sched_yield();
> }
> }
>
>
> Why one must wait for locks just to see that links are changes?
> Failure to *try* lock at least one mutex unconditionally means that
> links are changed.

I think you could do the try-lock scheme using a maximum of three locks
instead of four.

void pop(dlnode volatile* const _this) {
pthread_mutex_t* locks[3] = { NULL };
assert(_this->next != _this && _this->prev != _this);
for (;;) {
dlnode cmp = *_this;
locks[0] = hash_lock_from_ptr(_this);
locks[1] = hash_lock_from_ptr(cmp.next);
locks[2] = hash_lock_from_ptr(cmp.prev);

if (clear_duplicates_and_trylock(locks, 3)) {

if (cmp.next == _this->next &&
cmp.prev == _this->prev &&
cmp.next->prev == _this &&
cmp.prev->next == _this) {
/* we have consistency! ;^) */

/* mark node as removed */
_this->next = _this->prev = _this;

/* unlink it */
cmp.next->prev = cmp.prev;
cmp.prev->next = cmp.next;

/* unlock everything because we are done */
unlock(locks);
break;
}
unlock(locks);
}
/* use your choice of back-off algorithm... */
sched_yield();
}
}

Any thoughts?

Chris Thomasson

unread,

Apr 10, 2008, 8:25:49 PM4/10/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:d2507c09-b8ff-4171...@r9g2000prd.googlegroups.com...

> On Apr 8, 11:41 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> I know that SenderX would say that this is 100% total crap and will move
>> 1000x slower than a slug climbing a very steep mountain of sea-salt...
>
>
> Yeah, that's right. I wonder how precisely you describe operation of
> the algorithm :)
>
>
>> But, I want to keep this within the PThread Standard...
>
>
> It seems that POSIX doesn't allow efficient synchronization. Every
> simplest operation requires atomic operations and cache-line
> transfers, making algorithm negative scalable. The smarter algorithm
> you think out the more atomic operations and cache-line transfers
> occurs :(

Humm... About the best thing I could do in POSIX would probably be
distributed rw-mutexs:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a23aeb712e8dbdf9

Also, you could do per-thread memory allocator and use mutex for passing
mem-blocks back to owner thread during remote deallocation.

> If you want to write portable efficient synchronization algorithms you
> better to stick to Java. It provides support for atomic operations and
> volatile stores and loads etc. At least this allows you to create lock-
> free reader pattern.

IMHO, the Java memory-model is too strong. Its not fine-grain at all.

> Or wait for upcoming C++0x, which will have unprecedented support for
> synchronization.

That's going to be very cool indeed. Standardized fine-grain memory barriers
and atomic operations. I can re-write vZOOM and make it fully standard,
except for the version with automatic epoch detection.

Chris Thomasson

unread,

Apr 10, 2008, 8:27:34 PM4/10/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:100f0cd0-cdd2-414b...@z24g2000prf.googlegroups.com...

> On Apr 8, 11:41 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> <stupid simple code-sketch>
>
>
> What do you think about this:
>
>

[...]

I think it will perform better on platform that have a poor condvar
implementation. Even then, it still makes sense to try and avoid calling
condvar signal/broadcast function when you can.

Chris Thomasson

unread,

Apr 10, 2008, 8:31:39 PM4/10/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:516ce46b-6f9b-4c94...@v26g2000prm.googlegroups.com...

In the example they show, AFAICT, you can have concurrent pushes and pops,
but not at the same time.

Chris Thomasson

unread,

Apr 11, 2008, 1:47:41 AM4/11/08

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:9fGdndF1kL2BLGPa...@comcast.com...

This could replace 'rdbarrier'.

Dmitriy V'jukov

unread,

Apr 11, 2008, 10:50:02 AM4/11/08

to

On Apr 11, 4:25 am, "Chris Thomasson" <cris...@comcast.net> wrote:

> > It seems that POSIX doesn't allow efficient synchronization. Every
> > simplest operation requires atomic operations and cache-line
> > transfers, making algorithm negative scalable. The smarter algorithm
> > you think out the more atomic operations and cache-line transfers
> > occurs :(
>
> Humm... About the best thing I could do in POSIX would probably be
> distributed rw-mutexs:
>

> http://groups.google.com/group/comp.programming.threads/browse_frm/th...

But you can't do asymmetric rw mutex ;)

> Also, you could do per-thread memory allocator and use mutex for passing
> mem-blocks back to owner thread during remote deallocation.

I know that you know that beyond POSIX this can be implemented w/o
mutexes, atomic operations and heavy memory barriers at all ;)

> > If you want to write portable efficient synchronization algorithms you
> > better to stick to Java. It provides support for atomic operations and
> > volatile stores and loads etc. At least this allows you to create lock-
> > free reader pattern.
>
> IMHO, the Java memory-model is too strong. Its not fine-grain at all.

As compared with POSIX? I don't think so, it' far more fine-grained
and permissive than POSIX.

Dmitriy V'jukov

unread,

Apr 11, 2008, 11:15:06 AM4/11/08

to

On Apr 11, 4:19 am, "Chris Thomasson" <cris...@comcast.net> wrote:

> If your using try-locks I really don't see a need to sort. The only thing
> you would need is a function to detect and NULL out duplicate locks.
>

> I think you could do the try-lock scheme using a maximum of three locks
> instead of four.

> Any thoughts?

With 4 locks and sorting, failure to lock mutex unconditionally means
that somebody have changed the links, thus operation must be retried.

With 3 lock or without sorting, failure to lock mutex not necessarily
means that somebody have changed links.

So 4 locks + sorting - "lock-free" solution.
3 locks or without sorting - "obstruction-free" solution.

Dmitriy V'jukov

Chris Thomasson

unread,

Apr 12, 2008, 9:37:03 AM4/12/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:550b0bcb-cb01-4f65...@w4g2000prd.googlegroups.com...

> On Apr 11, 4:19 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> If your using try-locks I really don't see a need to sort. The only thing
>> you would need is a function to detect and NULL out duplicate locks.
>>
>> I think you could do the try-lock scheme using a maximum of three locks
>> instead of four.
>
>> Any thoughts?
>
>
> With 4 locks and sorting, failure to lock mutex unconditionally means
> that somebody have changed the links, thus operation must be retried.

Failure to lock... You mean a successful try lock scheme with 4 mutexs means
that everything is 100% coherent, and there is not need to double check
anything?

> With 3 lock or without sorting, failure to lock mutex not necessarily
> means that somebody have changed links.
>
> So 4 locks + sorting - "lock-free" solution.
> 3 locks or without sorting - "obstruction-free" solution.

Need to think on this some more...

Chris Thomasson

unread,

Apr 12, 2008, 9:40:15 AM4/12/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:3cbdbf7b-85a9-4473...@q27g2000prf.googlegroups.com...

> On Apr 11, 4:25 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>
>> > It seems that POSIX doesn't allow efficient synchronization. Every
>> > simplest operation requires atomic operations and cache-line
>> > transfers, making algorithm negative scalable. The smarter algorithm
>> > you think out the more atomic operations and cache-line transfers
>> > occurs :(
>>
>> Humm... About the best thing I could do in POSIX would probably be
>> distributed rw-mutexs:
>>
>> http://groups.google.com/group/comp.programming.threads/browse_frm/th...
>
>
> But you can't do asymmetric rw mutex ;)

:^)

>> Also, you could do per-thread memory allocator and use mutex for passing
>> mem-blocks back to owner thread during remote deallocation.
>
>
> I know that you know that beyond POSIX this can be implemented w/o
> mutexes, atomic operations and heavy memory barriers at all ;)

Indeed.

>> > If you want to write portable efficient synchronization algorithms you
>> > better to stick to Java. It provides support for atomic operations and
>> > volatile stores and loads etc. At least this allows you to create lock-
>> > free reader pattern.
>>
>> IMHO, the Java memory-model is too strong. Its not fine-grain at all.
>
>
> As compared with POSIX? I don't think so, it' far more fine-grained
> and permissive than POSIX.

Well, I was thinking along the lines of Java compared to "next" C++... POSIX
is great because its a well respected Standard, but its not optimal. Compare
platform specific techniques to POSIX, well, humm... Not fair... ;^)

Chris Thomasson

unread,

Apr 12, 2008, 9:58:59 AM4/12/08

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:Sd6dnSjhEZsDMZ3V...@comcast.com...

> "Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
> news:550b0bcb-cb01-4f65...@w4g2000prd.googlegroups.com...
>> On Apr 11, 4:19 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>>
>>> If your using try-locks I really don't see a need to sort. The only
>>> thing
>>> you would need is a function to detect and NULL out duplicate locks.
>>>
>>> I think you could do the try-lock scheme using a maximum of three locks
>>> instead of four.
>>
>>> Any thoughts?
>>
>>
>> With 4 locks and sorting, failure to lock mutex unconditionally means
>> that somebody have changed the links, thus operation must be retried.
>
> Failure to lock... You mean a successful try lock scheme with 4 mutexs
> means that everything is 100% coherent, and there is not need to double
> check anything?

[...]

Ahh. I see what your getting at.

Dmitriy V'jukov

unread,

Apr 12, 2008, 12:07:02 PM4/12/08

to

On 12 апр, 17:37, "Chris Thomasson" <cris...@comcast.net> wrote:

> > With 4 locks and sorting, failure to lock mutex unconditionally means
> > that somebody have changed the links, thus operation must be retried.
>
> Failure to lock... You mean a successful try lock scheme with 4 mutexs means
> that everything is 100% coherent, and there is not need to double check
> anything?

No. I mean the opposite. Failed try lock means that everything is 0%
coherent.
It's like lock-free. If CAS is failed, somebody else have made forward
progress, and you must retry.

Dmitriy V'jukov

Chris Thomasson

unread,

Apr 12, 2008, 1:40:38 PM4/12/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:635aaa7e-fdce-4817...@y21g2000hsf.googlegroups.com...

I was only worried about the extra lock. I see what your doing, and I think
it makes sense. Especially given the following thread:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8b267ed14711edbc

Chris Thomasson

unread,

Apr 13, 2008, 1:13:58 PM4/13/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:f6aabd1f-114c-4e48...@z24g2000prf.googlegroups.com...

> On Apr 8, 11:19 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> void pop(dlnode volatile* const _this) {
>> pthread_mutex_t* locks[3] = { NULL };
>> assert(_this->next != _this && _this->prev != _this);
>> for (;;) {
>> dlnode cmp = *_this;
>> locks[0] = hash_lock_from_ptr(_this);
>> locks[1] = hash_lock_from_ptr(cmp.next);
>> locks[2] = hash_lock_from_ptr(cmp.prev);
>> sort_and_lock(locks, 3);
>
>
> What do you think about this:
>
>
> void pop(dlnode volatile* const _this) {
> pthread_mutex_t* locks[4] = { NULL };
> assert(_this->next != _this && _this->prev != _this);
> for (;;) {
> dlnode cmp = *_this;
> locks[0] = hash_lock_from_ptr(&cmp.next);
> locks[1] = hash_lock_from_ptr(&cmp.prev);
> locks[2] = hash_lock_from_ptr(&cmp.next->prev);
> locks[3] = hash_lock_from_ptr(&cmp.prev->next);
> sort_and_lock(locks, 4);
>
> ?

[...]

What if the object pointed to by 'cmp.next' and/or 'cmp.prev' are unlinked
and destroyed before you use them? You cannot use 'cmp.next/prev->*' simply
because they can be freed right before you make use of them. This is a
race-condition that the three lock version does not suffer from. Am I
missing something here?

Dmitriy V'jukov

unread,

Apr 13, 2008, 4:21:46 PM4/13/08

to

On 13 апр, 21:13, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Dmitriy V'jukov" <dvyu...@gmail.com> wrote in message

Yes. You are right. I miss this moment.
Lock-based algorithms are hard, I better back to lock-free
algorithms :)

Dmitriy V'jukov