experimental multi-mutex algorithm...

Chris M. Thomasson

unread,

Nov 26, 2018, 1:19:43 AM11/26/18

to

This code allows a thread to dynamically lock multiple addresses in a
totally atomic fashion, and free of deadlock. It is based on hashing
pointers into a table of locks. A thread can create a little array of
hashed pointers, sort them, remove duplicates, then take all of the
locks in the table. The sorting and uniqueness factor prevents deadlock.
_________________
// pseudo-code

mtx_table lock_table;

// some vars
int a = 3;
short b = 6;
double x = 1.234;

// a thread
{
locker locks(lock_table); // our hashed locks
locks.push(&a);
locks.push(&x);
locks.push(&b);

locks.lock();
// a, x, and b are all locked!
locks.unlock();
}
_________________

Here is my code, without using threads for now. Take a look at the main
function for a real example on how to use it. The code is crude, please
try to take some pity...
_____________________________
// Experimental Multi-Mutex Algorithm By Chris. M. Thomasson

#include <iostream>
#include <functional>
#include <algorithm>
#include <mutex>
#include <thread>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <vector>

// Allows one to lock multiple addresses
// at once, and never hit a deadlock.
// It is an experiment.
namespace ct_multex
{
// A table of locks
struct mutex_table
{
std::vector<std::mutex> m_locks;

mutex_table(std::size_t size) : m_locks(size) { assert(size > 0); }

// totally contrived simple hash...
std::size_t hash(void const* ptr)
{
std::uintptr_t ptr_uint = (std::uintptr_t)ptr;
return (std::size_t)(((ptr_uint << 9) * 103) % m_locks.size());
}
};

// A threads local indices into a table of locks
struct local_locks
{
std::vector<std::size_t> m_lock_idxs;
mutex_table& m_mtx_tbl;

local_locks(mutex_table& mtx_tbl) :
m_lock_idxs(), m_mtx_tbl(mtx_tbl) {}

// Add an address to be locked
void push_ptr(void const* ptr)
{
std::size_t idx = m_mtx_tbl.hash(ptr);
m_lock_idxs.push_back(idx);
}

// Deadlock free baby!
void ensure_locking_order()
{
// sort and remove duplicates
std::sort(m_lock_idxs.begin(), m_lock_idxs.end());
m_lock_idxs.erase(std::unique(m_lock_idxs.begin(),
m_lock_idxs.end()), m_lock_idxs.end());
}

// Take all of the locks
void lock()
{
// there can be a flag to minimize this...
ensure_locking_order();

std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{
m_mtx_tbl.m_locks[m_lock_idxs[i]].lock();
}
}

// Unlock everything
void unlock()
{
std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{
m_mtx_tbl.m_locks[m_lock_idxs[n - i - 1]].unlock();
}
}
};

// RAII scoped lock: Allows a thread to actualy take the locks
// It locks a threads local lock indices
struct scoped_lock
{
local_locks& m_locks;

scoped_lock(local_locks& locks) : m_locks(locks)
{
m_locks.lock();
}

~scoped_lock() throw()
{
m_locks.unlock();
}
};
}

int main(void)
{
{
// Our lock table
ct_multex::mutex_table mtx_tbl(42);

int data_0 = 123;
long data_1 = 456;
short data_2 = 789;

// Usage...
{
// Our threads local locks
ct_multex::local_locks locks(mtx_tbl);

// Add some addresses
locks.push_ptr(&data_1);
locks.push_ptr(&data_0);

{
ct_multex::scoped_lock slock(locks);

// data_1 and data_0 are locked!
}

// unlocked... Add data_2 to the locks...
locks.push_ptr(&data_2);

{
ct_multex::scoped_lock slock(locks);

// Now, data_1, data_0 and data_1 are locked!
}
}
}

return 0;
}
_____________________________

Will post more info sometime tomorrow.

Chris M. Thomasson

unread,

Nov 26, 2018, 1:36:51 AM11/26/18

to

On 11/25/2018 10:19 PM, Chris M. Thomasson wrote:
> This code allows a thread to dynamically lock multiple addresses in a
> totally atomic fashion, and free of deadlock. It is based on hashing
> pointers into a table of locks. A thread can create a little array of
> hashed pointers, sort them, remove duplicates, then take all of the
> locks in the table. The sorting and uniqueness factor prevents deadlock.

[...]

> Here is my code, without using threads for now. Take a look at the main
> function for a real example on how to use it. The code is crude, please
> try to take some pity...
> _____________________________
> // Experimental Multi-Mutex Algorithm By Chris. M. Thomasson

[...]

> int main(void)
> {
>     {
>         // Our lock table
>         ct_multex::mutex_table mtx_tbl(42);
>
>         int data_0 = 123;
>         long data_1 = 456;
>         short data_2 = 789;
>
>         // Usage...
>         {
>             // Our threads local locks
>             ct_multex::local_locks locks(mtx_tbl);
>
>             // Add some addresses
>             locks.push_ptr(&data_1);
>             locks.push_ptr(&data_0);
>
>             {
>                 ct_multex::scoped_lock slock(locks);
>
>                 // data_1 and data_0 are locked!
>             }
>
>             // unlocked... Add data_2 to the locks...
>             locks.push_ptr(&data_2);
>
>             {
>                 ct_multex::scoped_lock slock(locks);
>
>                 // Now, data_1, data_0 and data_1 are locked!

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in a comment. Funny.

I meant: // Now, data_1, data_0 and data_2 are locked!

Sorry about that. Fwiw, there are three addresses to be locked here,
data_[0...2]. However, there might be less locks in a threads local lock
vector because of hash collisions; everything is sorted and duplicates
are always removed.

Think if data_0 hashed to index 41, data_1 went to 3, and data_2 went
back to 41. Humm... We have

41, 3, 41

After a call to my ct_multex::local_locks::ensure_locking_order... It
looks like:

3, 41

So, only two locks for three addresses. The hash function can be a lot
better. ;^o

[...]

Chris M. Thomasson

unread,

Nov 27, 2018, 1:00:08 AM11/27/18

to

On 11/25/2018 10:19 PM, Chris M. Thomasson wrote:

> This code allows a thread to dynamically lock multiple addresses in a
> totally atomic fashion, and free of deadlock. It is based on hashing
> pointers into a table of locks. A thread can create a little array of
> hashed pointers, sort them, remove duplicates, then take all of the
> locks in the table. The sorting and uniqueness factor prevents deadlock.

[...]

Hey now, this crude example code actually uses threads. ;^)

Deadlock free hashed address based locking, dead simple wrt a locking
hierarchy. It can be improved upon for sure. I am calling it the Multex.

Also, keep in mind that the locks are different than the memory pointed
to by the hashed pointers. So, this can be used to simulate atomic
operations on a system that only has semaphores. It easily solves the
Dining Philosophers problem.

The main and ct_thread functions show an example on how to use it:
____________________________
/*
The Multex, simple deadlock free locking abstraction
By Chris M. Thomasson
____________________________________________________________*/

#include <iostream>
#include <functional>
#include <algorithm>
#include <mutex>
#include <thread>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <vector>

#define THREADS 7
#define N 123456

// Allows one to lock multiple addresses
// at once, and never hit a deadlock.
// It is an experiment.
namespace ct_multex
{
// A table of locks
struct mutex_table
{
std::vector<std::mutex> m_locks;

mutex_table(std::size_t size) : m_locks(size) { assert(size > 0); }

// totally contrived simple hash...
std::size_t hash(void const* ptr)
{
std::uintptr_t ptr_uint = (std::uintptr_t)ptr;
return (std::size_t)(((ptr_uint << 9) * 103) % m_locks.size());
}
};

// A threads local indices into a table of locks
struct local_locks
{
std::vector<std::size_t> m_lock_idxs;
mutex_table& m_mtx_tbl;

local_locks(mutex_table& mtx_tbl) :
m_lock_idxs(), m_mtx_tbl(mtx_tbl) {}

// Add an address to be locked
void push_ptr(void const* ptr)
{
std::size_t idx = m_mtx_tbl.hash(ptr);
m_lock_idxs.push_back(idx);
}

// Deadlock free, baby! ;^)
void ensure_locking_order()
{
// sort and remove duplicates...

std::sort(m_lock_idxs.begin(), m_lock_idxs.end());
m_lock_idxs.erase(std::unique(m_lock_idxs.begin(),
m_lock_idxs.end()), m_lock_idxs.end());
}

// Take all of the locks
void lock()
{
// there can be a flag to minimize this...
ensure_locking_order();

std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{
m_mtx_tbl.m_locks[m_lock_idxs[i]].lock();
}
}

// Unlock everything
void unlock()
{
std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{
m_mtx_tbl.m_locks[m_lock_idxs[n - i - 1]].unlock();
}
}
};

// RAII scoped lock: Allows a thread to actualy take the locks
// It locks a threads local lock indices
struct scoped_lock
{
local_locks& m_locks;

scoped_lock(local_locks& locks) : m_locks(locks)
{
m_locks.lock();
}

~scoped_lock() throw()
{
m_locks.unlock();
}
};
}

// Test program...
//______________________________________

// Shared data
struct ct_shared
{
ct_multex::mutex_table& m_multex;

ct_shared(ct_multex::mutex_table& multex)
: m_multex(multex), data_0(1), data_1(2), data_2(3), data_3(4) { }

unsigned long data_0;
unsigned long data_1;
unsigned long data_2;
unsigned long data_3;
};

// a thread
void ct_thread(ct_shared& shared)
{
// Create our local locks
ct_multex::local_locks locks(shared.m_multex);

// Add some addresses
locks.push_ptr(&shared.data_2);
locks.push_ptr(&shared.data_0);

// Do some work
for (unsigned long i = 0; i < N / 2; ++i)
{
{
ct_multex::scoped_lock slock(locks);
// locked for data_0 and data_2
shared.data_0 += i;
shared.data_2 += i;
}

std::this_thread::yield();

{
ct_multex::scoped_lock slock(locks);
// locked for data_0 and data_2
shared.data_0 -= i;
std::this_thread::yield(); // for fun...
shared.data_2 -= i;
}
}

// Add some other addresses
locks.push_ptr(&shared.data_1);
locks.push_ptr(&shared.data_3);

// Do some more work...
for (unsigned long i = 0; i < N / 2; ++i)
{
{
ct_multex::scoped_lock slock(locks);
// locked for data_0, data_1, data_2 and data_3
shared.data_0 += i;
std::this_thread::yield(); // for fun...
shared.data_1 += i;
shared.data_2 += i;
shared.data_3 += i;
}

std::this_thread::yield();

{
ct_multex::scoped_lock slock(locks);
// locked for data_0, data_1, data_2 and data_3
shared.data_0 -= i;
shared.data_1 -= i;
shared.data_2 -= i;
std::this_thread::yield(); // for fun...
shared.data_3 -= i;
}
}
}

int main(void)
{
{
// Our mutex table
ct_multex::mutex_table multex_tbl(42);

// Our shared data
ct_shared shared(multex_tbl);

// Launch...
{
std::thread threads[THREADS];

// Create threads...
for (unsigned long i = 0; i < THREADS; ++i)
{
threads[i] = std::thread(ct_thread, std::ref(shared));
}

std::cout << "processing...\n\n";
std::cout.flush();

// Join threads...
for (unsigned long i = 0; i < THREADS; ++i)
{
threads[i].join();
}
}

// Verify the shared data...
std::cout << "shared.data_0 = " << shared.data_0 << "\n";
std::cout << "shared.data_1 = " << shared.data_1 << "\n";
std::cout << "shared.data_2 = " << shared.data_2 << "\n";
std::cout << "shared.data_3 = " << shared.data_3 << "\n";

assert(shared.data_0 == 1);
assert(shared.data_1 == 2);
assert(shared.data_2 == 3);
assert(shared.data_3 == 4);
}

std::cout << "\n\nfin!\n\n";

return 0;
}
____________________________

Can you get it to run?

thanks.

Melzzzzz

unread,

Nov 27, 2018, 2:24:24 AM11/27/18

to

On 2018-11-27, Chris M. Thomasson <invalid_chr...@invalid.invalid> wrote:
> Can you get it to run?
>
> thanks.

~/News >>> g++ -std=c++17 -O3 -march=native -pthread mmtex.cpp -o mmtex 2> errors
~/News >>> vim errors
~/News >>> ./mmtex
processing...

shared.data_0 = 1
shared.data_1 = 2
shared.data_2 = 3
shared.data_3 = 4

fin!

--
press any key to continue or any other to quit...

Chris M. Thomasson

unread,

Nov 27, 2018, 2:26:06 AM11/27/18

to

[...]

>         // Take all of the locks
>         void lock()
>         {
>             // there can be a flag to minimize this...
>             ensure_locking_order();
>
>             std::size_t n = m_lock_idxs.size();
>
>             for (std::size_t i = 0; i < n; ++i)
>             {
>                 m_mtx_tbl.m_locks[m_lock_idxs[i]].lock();
>             }
>         }

[...]

Fwiw, there is another way to take the locks, using a try_lock and
backoff scheme, something like:
_____________________________

// Take all of the locks

void lock() // optimistic

{
// there can be a flag to minimize this...
ensure_locking_order();

std::size_t n = m_lock_idxs.size();

for (std::size_t i = 0; i < n; ++i)
{

if (! m_mtx_tbl.m_locks[m_lock_idxs[i]].try_lock())
{
// Unlock our previous addresses...
for (std::size_t u = 0; u < i; ++u)
{
m_mtx_tbl.m_locks[m_lock_idxs[i - u - 1]].unlock();
}

// backoff algorithm, or whatever...
std::this_thread::yield();

// Try again...
i = SIZE_MAX;

continue;
}
}
}
_____________________________

Remember to include <climits> for SIZE_MAX. The try_lock version does
not even need to be sorted, just unique.

Chris M. Thomasson

unread,

Nov 27, 2018, 3:57:06 AM11/27/18

to

On 11/26/2018 11:24 PM, Melzzzzz wrote:
> On 2018-11-27, Chris M. Thomasson <invalid_chr...@invalid.invalid> wrote:
>> Can you get it to run?
>>
>> thanks.
> ~/News >>> g++ -std=c++17 -O3 -march=native -pthread mmtex.cpp -o mmtex 2> errors
> ~/News >>> vim errors
> ~/News >>> ./mmtex
> processing...
>
> shared.data_0 = 1
> shared.data_1 = 2
> shared.data_2 = 3
> shared.data_3 = 4
>
>
> fin!
>
>

Thanks for trying it out Melzzzzz. :^)

The basic algorithm can be useful for many things involving locking.
There is another more "optimistic" version I am tinkering around with
that can be used for n-address based software transactional memory.

One note, this is not recursive!

Chris M. Thomasson

unread,

Nov 29, 2018, 3:55:49 PM11/29/18

to

On 11/25/2018 10:19 PM, Chris M. Thomasson wrote:

> This code allows a thread to dynamically lock multiple addresses in a
> totally atomic fashion, and free of deadlock. It is based on hashing
> pointers into a table of locks. A thread can create a little array of
> hashed pointers, sort them, remove duplicates, then take all of the
> locks in the table. The sorting and uniqueness factor prevents deadlock.

[...]

> Will post more info sometime tomorrow.

This allows one to lock addresses without having to create a mutex for
each one. Just pass the address of a variable to a thread local lock
set, lock it, and the variable is now locked wrt mutual exclusion.

Now, to make this more fine grain, what about a multex that used
read-write locks? A thread can have two classes of lock sets, reads and
writes.

Pavel

unread,

Nov 30, 2018, 2:18:33 PM11/30/18

to

Chris, do you have a use case? Your approach of ordering multiple
resources that need locking in some global order (along with alternative
optimistic algorithms) has been successfully used by DBMS s for years to
transactionally alter multiple rows (or records or sets etc for
non-relational DBMSs) for concurrent transactions so the algorithm
itself is sound.

With regard to mutexes in a multi-threaded program, however, I cannot
recall when I saw a necessity to simultaneously lock more than one mutex
(I saw code that did it but always in the context where I would either
refactor it to not do it or at least wanted to refactor it so). It might
certainly be that my experience is skewed so I am curious to see a use case.

Melzzzzz

unread,

Nov 30, 2018, 3:01:54 PM11/30/18

to

Refactor this:
{
AutoLock<Mutex> l(lstSvcM_);
AutoLock<Mutex> ll(availM_);
AutoLock<Mutex> guard(rtlock);
if(!lstSvc_.empty())
{
s= lstSvc_.front();
lstSvc_.pop_front();
} else if((running_threads < 100 || threadsAvail_ > 3) && !lstSvcLow_.empty()) {
s = lstSvcLow_.front();
lstSvcLow_.pop_front();
} else {
more = false;
s=0;
continue;

Pavel

unread,

Nov 30, 2018, 4:31:26 PM11/30/18

to

What does "this" compute and why?

Chris M. Thomasson

unread,

Dec 1, 2018, 9:53:31 PM12/1/18

to

On 11/30/2018 11:18 AM, Pavel wrote:
> Chris M. Thomasson wrote:
>> This code allows a thread to dynamically lock multiple addresses in a
>> totally atomic fashion, and free of deadlock. It is based on hashing
>> pointers into a table of locks. A thread can create a little array of
>> hashed pointers, sort them, remove duplicates, then take all of the
>> locks in the table. The sorting and uniqueness factor prevents deadlock.

[...]

>> _____________________________
>>
>>
>> Will post more info sometime tomorrow.
>
> Chris, do you have a use case? Your approach of ordering multiple
> resources that need locking in some global order (along with alternative
> optimistic algorithms) has been successfully used by DBMS s for years to
> transactionally alter multiple rows (or records or sets etc for
> non-relational DBMSs) for concurrent transactions so the algorithm
> itself is sound.

Wrt threads, the use cases are generally geared toward in memory
database type schemes. Or any time a thread needs to lock a couple of
objects, read/write, unlock. The n-ary dining philosophers problem is
solved by this. One interesting use case is implementing C++11 or C11
atomics. Every type that uses locks would have to take the following
into account wrt the value of the ATOMIC_*_LOCK_FREE macros, and the
atomic_is_lock_free function:

https://en.cppreference.com/w/cpp/atomic/atomic_is_lock_free

They need to be zero wrt a locking scheme. The multex can also be used
for the writer side of a RCU algorithm.

Big time point, this is NOT recursive!, must keep that in mind.

One point, we can augment the lock table for other fun things besides
lock-based algorithms... We can define ATOMIC_*_LOCK_FREE as 1, or even 2.

> With regard to mutexes in a multi-threaded program, however, I cannot
> recall when I saw a necessity to simultaneously lock more than one mutex
> (I saw code that did it but always in the context where I would either
> refactor it to not do it or at least wanted to refactor it so). It might
> certainly be that my experience is skewed so I am curious to see a use case.

Hashed address locking can be used to implement DCAS, that can be used
to build other things. The fact that the atomic lib can be created by
addr locks means it exists at a lower level to build higher level
objects with. This can be used for inter or intra process communications
using robust mutexes. The robust type are hard to simulate in user-space
without using a watchdog process, so to speak. C11 or C++11 do not offer
such things. They are POSIX, and Windows.

Marcel Mueller

unread,

Dec 2, 2018, 3:43:37 AM12/2/18

to

Am 02.12.18 um 03:53 schrieb Chris M. Thomasson:

>> Chris, do you have a use case? Your approach of ordering multiple
>> resources that need locking in some global order (along with alternative
>> optimistic algorithms) has been successfully used by DBMS s for years to
>> transactionally alter multiple rows (or records or sets etc for
>> non-relational DBMSs) for concurrent transactions so the algorithm
>> itself is sound.
>
> Wrt threads, the use cases are generally geared toward in memory
> database type schemes.

I wrote an in memory DB kernel some years ago and avoided this type of
locking. Snapshot isolation with immutable objects and optimistic
locking provided better performance and simplified locking significantly
with almost no risk of deadlocks.

> Or any time a thread needs to lock a couple of
> objects, read/write, unlock. The n-ary dining philosophers problem is
> solved by this.

I think especially considering NUMA and/or cluster servers traditional
mutex locking will become less useful. It is a bottleneck due to the
limited speed of synchronization.

Of course, for now there are use cases where a mutex perfectly fit the
needs. But I strongly recommend to avoid to gather more than one mutex
at a time. Sooner or later it will cause problems, at least increased
effort.

> One interesting use case is implementing C++11 or C11
> atomics.

Dark chapter. Who needs atomics that /might/ be lock free?

OK, in practice most of the time they /are/ lock free, at least for
small types. But there is always the risk of running into a performance
bottleneck on the next platform. :-/

> One point, we can augment the lock table for other fun things besides
> lock-based algorithms... We can define ATOMIC_*_LOCK_FREE as 1, or even 2.

?

Marcel

Pavel

unread,

Dec 2, 2018, 4:30:43 PM12/2/18

to

Chris M. Thomasson wrote:
...

> Wrt threads, the use cases are generally geared toward in memory database type
> schemes. Or any time a thread needs to lock a couple of objects, read/write,
> unlock. The n-ary dining philosophers problem is solved by this.

...

Which variation did you solve? Any requirements on fairness / starvation avoidance?

(Asking to try to decompose it without multiple lock of mutexes. Obviously,
locking of multiple chopsticks have to be done as it's a part of the problem
statement, but a chopstick does not have to have a mutex).

Thanks!
-Pavel

Chris M. Thomasson

unread,

Dec 2, 2018, 5:01:13 PM12/2/18

to

On 12/2/2018 12:43 AM, Marcel Mueller wrote:
> Am 02.12.18 um 03:53 schrieb Chris M. Thomasson:
>>> Chris, do you have a use case? Your approach of ordering multiple
>>> resources that need locking in some global order (along with alternative
>>> optimistic algorithms) has been successfully used by DBMS s for years to
>>> transactionally alter multiple rows (or records or sets etc for
>>> non-relational DBMSs) for concurrent transactions so the algorithm
>>> itself is sound.
>>
>> Wrt threads, the use cases are generally geared toward in memory
>> database type schemes.
>
> I wrote an in memory DB kernel some years ago and avoided this type of
> locking. Snapshot isolation with immutable objects and optimistic
> locking provided better performance and simplified locking significantly
> with almost no risk of deadlocks.

That's fine. Any risk of livelock wrt the optimistic locking in your
scheme? Seqlocks can be pretty nice at getting a fast snapshot:

https://en.wikipedia.org/wiki/Seqlock

>> Or any time a thread needs to lock a couple of objects, read/write,
>> unlock. The n-ary dining philosophers problem is solved by this.
>
> I think especially considering NUMA and/or cluster servers traditional
> mutex locking will become less useful. It is a bottleneck due to the
> limited speed of synchronization.

This is just an example of address based locking. There are a lot of
faster, and much more scalable synchronization algorithms out there.

> Of course, for now there are use cases where a mutex perfectly fit the
> needs. But I strongly recommend to avoid to gather more than one mutex
> at a time. Sooner or later it will cause problems, at least increased
> effort.

There are places where it is required. For instance, if you have to
simulate something like DCAS with locks, then there are two addresses
involved.

>> One interesting use case is implementing C++11 or C11 atomics.
>
> Dark chapter. Who needs atomics that /might/ be lock free?

Very dark.

> OK, in practice most of the time they /are/ lock free, at least for
> small types. But there is always the risk of running into a performance
> bottleneck on the next platform. :-/

Well, if your program runs on such a system, it can warn the user that
the performance might not be up to par... ;^o

>> One point, we can augment the lock table for other fun things besides
>> lock-based algorithms... We can define ATOMIC_*_LOCK_FREE as 1, or
>> even 2.

The address based hashing can be used to distribute some algorithms.

Chris M. Thomasson

unread,

Dec 2, 2018, 5:09:07 PM12/2/18

to

On 12/2/2018 1:30 PM, Pavel wrote:
> Chris M. Thomasson wrote:
> ...
>
>> Wrt threads, the use cases are generally geared toward in memory database type
>> schemes. Or any time a thread needs to lock a couple of objects, read/write,
>> unlock. The n-ary dining philosophers problem is solved by this.
> ...
>
> Which variation did you solve?

I did one a while back for fun where each diner was a different creature
that could have more than two hands.

> Any requirements on fairness / starvation avoidance?

It shall not deadlock. It shall provide forward progress.

> (Asking to try to decompose it without multiple lock of mutexes. Obviously,
> locking of multiple chopsticks have to be done as it's a part of the problem
> statement, but a chopstick does not have to have a mutex).

It depends on how fine of a grain one uses with the locking scheme. You
can lock the table. Lock each diner. Lock each chopstick.

Chris M. Thomasson

unread,

Dec 5, 2018, 8:36:19 PM12/5/18

to

Are you thinking of more "exotic" methods when you mention that "a
chopstick does not have to have a mutex"?

Wrt the multex, each chopstick can be an address into it. There is no
need for each chopstick to contain its own mutex structure.

Marcel Mueller

unread,

Dec 6, 2018, 2:55:42 AM12/6/18

to

Am 02.12.18 um 23:01 schrieb Chris M. Thomasson:

>> I wrote an in memory DB kernel some years ago and avoided this type of
>> locking. Snapshot isolation with immutable objects and optimistic
>> locking provided better performance and simplified locking
>> significantly with almost no risk of deadlocks.
>
> That's fine. Any risk of livelock wrt the optimistic locking in your
> scheme? Seqlocks can be pretty nice at getting a fast snapshot:
>
> https://en.wikipedia.org/wiki/Seqlock

Eh, no, the snapshots require no locking at all. They are just very
precise timestamps that must not be in the future; in fact just a 64 bit
int with a revision number that acts like a sequential GUID. Using a
timestamp instead was just some sugar for serviceability and debugging.

Any object with newer revision is ignored by the core or replaced by an
older version if that exists when the root object used for access wants
an older snapshot. All operations act on snapshots. So any object in the
snapshot is /immutable/ from the readers point of view, again no locking
required.
These snapshots also can be taken retroactively, e.g. to widen the
oplock to the user. If a user wants to complete a transaction, there
need to be no changes in any object touched by its transaction since the
revision when the user opened the page respectively the data visible to
the user were updated lastly. If this condition holds true the
transaction is committed, otherwise the user gets an error that another
user (or himself) has changed relevant data in between. This is just a
"large scale oplock". It will even work if the user returns from
vacation and presses "save" in a page opened before and the application
server has been restarted in between, if and only if there are no
conflicts. Of course, it is not that likely that this transaction will
complete successfully. ;-)
Only for committing changes real locking is required. Read operations
are almost lock-free. Almost because some cache implementations use
classic mutexes. But they cannot be requested in nested context, so no
risk of a deadlock. And of course blocking on I/O when requesting
entities is not lock free. But the in memory approach avoids this most
of the time, and no risk of deadlock either.

>>> Or any time a thread needs to lock a couple of objects, read/write,
>>> unlock. The n-ary dining philosophers problem is solved by this.
>>
>> I think especially considering NUMA and/or cluster servers traditional
>> mutex locking will become less useful. It is a bottleneck due to the
>> limited speed of synchronization.
>
> This is just an example of address based locking. There are a lot of
> faster, and much more scalable synchronization algorithms out there.

I am not enough versed in that topic, and the platforms used for my
business application (.NET, Java) didn't allow me to dig that deep.
Personally I prefer C++ which has very little restrictions but for
business I do not recommend it. You need very disciplined programmers if
you want to write C++ programs with good maintainability (i.e. no UB).
They are difficult to find. This is much easier in managed languages,
especially if you encapsulate any parallelism into a core library below
the model.
In particular .NET can be amazingly fast due to the support of custom
value types. In fact the above application rarely takes more than 0.5
CPU cores when used by about 100 concurrent users. No need to struggle
with C++.

>> Of course, for now there are use cases where a mutex perfectly fit the
>> needs. But I strongly recommend to avoid to gather more than one mutex
>> at a time. Sooner or later it will cause problems, at least increased
>> effort.
>
> There are places where it is required. For instance, if you have to
> simulate something like DCAS with locks, then there are two addresses
> involved.

Indeed. But "simulate" already implies a compromise. For business I
would avoid such elementary compromises and require adequate hardware.
This is no big deal nowadays. In general hardware is almost always
cheaper than workers (for a more flexible solution).

And in between there might be other options. E.g. packing the two words
of a DCAS into a single one. The hack with the stolen lower bits for an
intermediate reference counter to implement strong thread safety comes
into my mind. I was never able to get this counter overflow on a 64 bit
platform (i.e. 3 bits), even not with very hard test cases with dozens
of parallel CAS writes. So single CAS is sufficient for strong thread
safety. The performance impact of the required push of the (small) outer
reference count to the inner one after each acquire access is
neglectable, in particular compared to simulated DCAS.

>>> One interesting use case is implementing C++11 or C11 atomics.
>>
>> Dark chapter. Who needs atomics that /might/ be lock free?
>
> Very dark.

Again a compromise to avoid to drop platforms.

>> OK, in practice most of the time they /are/ lock free, at least for
>> small types. But there is always the risk of running into a
>> performance bottleneck on the next platform. :-/
>
> Well, if your program runs on such a system, it can warn the user that
> the performance might not be up to par... ;^o

Hmm, a static assertion comes into my mind. C++ might support all
platforms, a certain application might have minimum requirements.

Marcel

Chris M. Thomasson

unread,

Dec 6, 2018, 7:35:21 PM12/6/18

to

On 12/5/2018 11:55 PM, Marcel Mueller wrote:
> Am 02.12.18 um 23:01 schrieb Chris M. Thomasson:
>>> I wrote an in memory DB kernel some years ago and avoided this type
>>> of locking. Snapshot isolation with immutable objects and optimistic
>>> locking provided better performance and simplified locking
>>> significantly with almost no risk of deadlocks.
>>
>> That's fine. Any risk of livelock wrt the optimistic locking in your
>> scheme? Seqlocks can be pretty nice at getting a fast snapshot:
>>
>> https://en.wikipedia.org/wiki/Seqlock
>
> Eh, no, the snapshots require no locking at all.

Seqlocks do not necessarily need writers to use locks. They just need to
update the version counts in the proper order. However, readers will
spin if they do not obtain a coherent view of all the memory locations
they are reading from. There is a distributed seqlock over here:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/improved-lock-free-seqlock

A lot of pressure is taken off the readers.

> They are just very
> precise timestamps that must not be in the future; in fact just a 64 bit
> int with a revision number that acts like a sequential GUID. Using a
> timestamp instead was just some sugar for serviceability and debugging.
>
> Any object with newer revision is ignored by the core or replaced by an
> older version if that exists when the root object used for access wants
> an older snapshot. All operations act on snapshots. So any object in the
> snapshot is /immutable/ from the readers point of view, again no locking
> required.

Kind of sounds something like:

https://pdfs.semanticscholar.org/f2c4/05920a129aa8df38ec7cb3d5135283498684.pdf

https://ieeexplore.ieee.org/document/4709153

Are you use a global version count to derive all of the revision numbers?

> These snapshots also can be taken retroactively, e.g. to widen the
> oplock to the user. If a user wants to complete a transaction, there
> need to be no changes in any object touched by its transaction since the
> revision when the user opened the page respectively the data visible to
> the user were updated lastly. If this condition holds true the
> transaction is committed, otherwise the user gets an error that another
> user (or himself) has changed relevant data in between. This is just a
> "large scale oplock". It will even work if the user returns from
> vacation and presses "save" in a page opened before and the application
> server has been restarted in between, if and only if there are no
> conflicts. Of course, it is not that likely that this transaction will
> complete successfully. ;-)
> Only for committing changes real locking is required. Read operations
> are almost lock-free. Almost because some cache implementations use
> classic mutexes. But they cannot be requested in nested context, so no
> risk of a deadlock. And of course blocking on I/O when requesting
> entities is not lock free. But the in memory approach avoids this most
> of the time, and no risk of deadlock either.

Need to think some more on this. Read it more carefully. Thanks.

>>>> Or any time a thread needs to lock a couple of objects, read/write,
>>>> unlock. The n-ary dining philosophers problem is solved by this.
>>>
>>> I think especially considering NUMA and/or cluster servers
>>> traditional mutex locking will become less useful. It is a bottleneck
>>> due to the limited speed of synchronization.
>>
>> This is just an example of address based locking. There are a lot of
>> faster, and much more scalable synchronization algorithms out there.
>
> I am not enough versed in that topic, and the platforms used for my
> business application (.NET, Java) didn't allow me to dig that deep.
> Personally I prefer C++ which has very little restrictions but for
> business I do not recommend it. You need very disciplined programmers if
> you want to write C++ programs with good maintainability (i.e. no UB).
> They are difficult to find. This is much easier in managed languages,
> especially if you encapsulate any parallelism into a core library below
> the model.
> In particular .NET can be amazingly fast due to the support of custom
> value types. In fact the above application rarely takes more than 0.5
> CPU cores when used by about 100 concurrent users. No need to struggle
> with C++.

.NET can work perfectly fine. Although, I would personally use C++ to
build a scalable server that had to handle tens of thousands of
concurrent connections.

>>> Of course, for now there are use cases where a mutex perfectly fit
>>> the needs. But I strongly recommend to avoid to gather more than one
>>> mutex at a time. Sooner or later it will cause problems, at least
>>> increased effort.
>>
>> There are places where it is required. For instance, if you have to
>> simulate something like DCAS with locks, then there are two addresses
>> involved.
>
> Indeed. But "simulate" already implies a compromise. For business I
> would avoid such elementary compromises and require adequate hardware.
> This is no big deal nowadays. In general hardware is almost always
> cheaper than workers (for a more flexible solution).

Well, the multex is just an example of how it can be done.

> And in between there might be other options. E.g. packing the two words
> of a DCAS into a single one.

How can we do that when DCAS operates on two separate memory locations
that are not contiguous? They can be on completely separate cache lines.

> The hack with the stolen lower bits for an
> intermediate reference counter to implement strong thread safety comes
> into my mind. I was never able to get this counter overflow on a 64 bit
> platform (i.e. 3 bits), even not with very hard test cases with dozens
> of parallel CAS writes. So single CAS is sufficient for strong thread
> safety. The performance impact of the required push of the (small) outer
> reference count to the inner one after each acquire access is
> neglectable, in particular compared to simulated DCAS.

I think you might be confusing DWCAS with DCAS.

DWCAS operates on a memory location that contains two contiguous words.
DW is double-word.

A DCAS operates on two separate memory locations that are not contiguous.

Wrt the stolen bits overflowing, well, they certainly can. I can think
of several scenarios. It basically depends on what the stolen bits are
"used for". A reference count? An version count to avoid ABA? Ect...

>>>> One interesting use case is implementing C++11 or C11 atomics.
>>>
>>> Dark chapter. Who needs atomics that /might/ be lock free?
>>
>> Very dark.
>
> Again a compromise to avoid to drop platforms.

;^)

>>> OK, in practice most of the time they /are/ lock free, at least for
>>> small types. But there is always the risk of running into a
>>> performance bottleneck on the next platform. :-/
>>
>> Well, if your program runs on such a system, it can warn the user that
>> the performance might not be up to par... ;^o
>
> Hmm, a static assertion comes into my mind. C++ might support all
> platforms, a certain application might have minimum requirements.

Sounds okay. :^)

Pavel

unread,

Dec 6, 2018, 11:51:44 PM12/6/18

to

Chris M. Thomasson wrote:
> On 12/2/2018 1:30 PM, Pavel wrote:
>> Chris M. Thomasson wrote:
>> ...
>>
>>> Wrt threads, the use cases are generally geared toward in memory database type
>>> schemes. Or any time a thread needs to lock a couple of objects, read/write,
>>> unlock. The n-ary dining philosophers problem is solved by this.
>> ...
>>
>> Which variation did you solve?
>
> I did one a while back for fun where each diner was a different creature that
> could have more than two hands.

And all n had to be busy to eat? :-)

I guess my real question is then: is there still any incidence between
philosophers and chopsticks? In classic problem, a philosopher should grab the
one from his left or his right but cannot get just any across the table I guess
i.e. chopsticks are not fungible. With more than 2 hands, what would be the
"table topology"? :)

>
>
>> Any requirements on fairness / starvation avoidance?
>
> It shall not deadlock. It shall provide forward progress.

Yes, so no restriction on starvation (i.e. the correct solution can end up with
2 philosophers eating intermittently and the 3rd starving to death). This is an
easier problem then.

>
>
>> (Asking to try to decompose it without multiple lock of mutexes. Obviously,
>> locking of multiple chopsticks have to be done as it's a part of the problem
>> statement, but a chopstick does not have to have a mutex).
>
> It depends on how fine of a grain one uses with the locking scheme. You can lock
> the table. Lock each diner. Lock each chopstick.

Because in most real-life tasks every thread has to do more than one thing (in
other words, handle more than one type of event -- let's say, in this problem,
the "main event" is a random time at which a philosopher gets hungry; but still
there should be some "service events" such as "we are out of food (for body or
brain)" i.e. "terminate orderly" or "how are you" i.e. some diagnostics; etc), I
prefer an event loop model. In this case, a philosopher would serve its queue of
events, for the purpose of our example, guarded by a mutex (it actually does not
have to be a mutex; a semaphore would probably be better and in a non-kernel app
where the events are usually inspired by i/o it would be some
wait-for-multiple-event/select/poll/epoll/kqueue). Only one mutex is required to
get locked at the time: to post or receive a message from a thread's queue.

I wrote down one solution (also, allowing starvation, resolving conflicts by
always yielding to a peer with lesser id) using a per-philosopher
mutex-protected event queues. Tried to illustrate how I usually design without
locking multiple resources (this is a slight diversion, actually, as usually
there is some I/O so I would not use mutices at all but instead use some select;
but philosophers could still be fed from an eventfd- driven queue (on Linux) so
the principle is same). No simultaneous locking of mutices is required and
achieved parallelism seems quite good: on old Linux machine with 4 very old CPU
cores (1.6 Ghz), 5 philosophers, it takes 3.2 times less real time than total
CPU time. A large chunk is spent in system calls; but this is because actual
meal does nothing at all of substance.

$ egrep 'processor\>|cpu MHz' /proc/cpuinfo
processor : 0
cpu MHz : 1600.195
processor : 1
cpu MHz : 1610.742
processor : 2
cpu MHz : 1613.554
processor : 3
cpu MHz : 1599.960
$ time ./dp.fast 10000000
idx:0 nPostponedMeals:0 nEatenMeals:1998464
idx:1 nPostponedMeals:0 nEatenMeals:1999544
idx:2 nPostponedMeals:0 nEatenMeals:1999008
idx:3 nPostponedMeals:0 nEatenMeals:2001252
idx:4 nPostponedMeals:0 nEatenMeals:2001732
TOTAL_N_MEALS:10000000 nEatenMeals:10000000

real 0m6.380s
user 0m7.628s
sys 0m12.884s
pavel@bahus:~/probes/DiningPhilosophers$ python -c 'print((7.628+12.884)/6.380)'
3.21504702194

The code and makefile are below:

--------------- code ------------------
#include <algorithm>
#include <cassert>
#include <chrono>
#include <condition_variable>
#include <cstdlib>
#include <functional>
#include <iostream>
#include <iterator>
#include <mutex>
#include <queue>
#include <thread>

using namespace std;
using namespace std::chrono;

static constexpr int N_PHILOSOPHERS = 5;
static constexpr int MAX_IDX = N_PHILOSOPHERS - 1;

enum Event {
NoEvent,
GetHungry,
YieldToLeftPeer,
YieldToRightPeer,
LeftPeerYielded,
RightPeerYielded,
ReportWhenNotHungry,
NotHungry,
Stop
};

class Philosopher;

class EventQueue {
public:
EventQueue():
condVar_(),
mutex_(),
queue_() {}
Event getEvent() {
Event event(NoEvent);
unique_lock<mutex> queueUniqueLock(mutex_);
if(queue_.empty())
condVar_.wait(queueUniqueLock);
assert(!queue_.empty());
event = queue_.front();
assert(event != NoEvent);
queue_.pop();
return event;
}
void postEvent(Event event) {
lock_guard<mutex> queueGuard(mutex_);
condVar_.notify_one();
queue_.push(event);
}
private:
condition_variable condVar_;
mutex mutex_;
queue<Event> queue_;
};

class Peer {
public:
explicit Peer(bool ownsChopstick):
eventQueue_(nullptr),
knowsToYield_(false),
ownsChopstick_(ownsChopstick) {}
void clearKnowsToYield() { knowsToYield_ = false; }
void clearOwnsChopstick() { ownsChopstick_ = false; }
Event getEvent() { return eventQueue_->getEvent(); }
int idx() const { return idx_; }
void init(EventQueue& eventQueue, int idx) {
eventQueue_ = &eventQueue;
idx_ = idx;
}
bool knowsToYield() const { return knowsToYield_; }
bool ownsChopstick() const { return ownsChopstick_; }
void postEvent(Event event) { eventQueue_->postEvent(event); }
void setKnowsToYield() { knowsToYield_ = true; }
void setOwnsChopstick() { ownsChopstick_ = true; }
private:
EventQueue* eventQueue_;
int idx_;
bool knowsToYield_;
bool ownsChopstick_;
};

class Philosopher {
public:
Philosopher():
eventQueue_(),
greaterPeerIsHungry_(false),
idx_(-1),
leftPeer_(false),
mainEventQueue_(nullptr),
nEatenMeals_(0),
needToReportNotHungry_(false),
nPostponedMeals_(0),
rightPeer_(true),
workerThread_(Work, ref(*this)) {}
void dump(ostream& os) const {
os << " idx:" << idx_ << " nPostponedMeals:" << nPostponedMeals_ <<
" nEatenMeals:" << nEatenMeals_;
}
void init(int idx, const Philosopher *peers, EventQueue& mainEventQueue);
void join() { workerThread_.join(); }
int nEatenMeals() const { return nEatenMeals_; }
void postEvent(Event event) { eventQueue_.postEvent(event); }
Event getEvent() { return eventQueue_.getEvent(); }
static void Work(Philosopher& philosopher) { philosopher.work(); }
void work();
private:
bool askPeerToYieldIfNeeded(Peer& peer, Event yieldEvent);
EventQueue& eventQueue() const { return eventQueue_; }
void eatPostponedMeals();
void eatTheOnlyMeal();
void grabChopstick(Peer& peer);
void handlePeerYielded(Peer& yieldedPeer, Peer& otherPeer,
Event otherPeerYieldEvent);
bool hungry() const { return !!nPostponedMeals_; }
bool lessThanPeer(const Peer& peer) { return idx_ < peer.idx(); }
void reportNotHungry();
void resetGreaterPeerIsHungry();
void yieldIfNeeded(Peer &peer, Event yieldedEvent);
void yieldToGreaterPeerIfNeeded();

mutable EventQueue eventQueue_;
bool greaterPeerIsHungry_;
int idx_;
Peer leftPeer_;
EventQueue* mainEventQueue_;
int nEatenMeals_;
bool needToReportNotHungry_;
int nPostponedMeals_;
Peer rightPeer_;
thread workerThread_;
};

ostream&
operator<<(ostream& os, const Philosopher& p)
{
p.dump(os);
return os;
}

int
main(int argc, char *argv[])
{
EventQueue mainEventQueue;
Philosopher philosophers[N_PHILOSOPHERS];
for (int i = 0; i < N_PHILOSOPHERS; ++i)
philosophers[i].init(i, philosophers, mainEventQueue);

constexpr int TOTAL_N_MEALS_DFLT = 1000 * 1000;
const int TOTAL_N_MEALS = argc == 1 ? TOTAL_N_MEALS_DFLT : atoi(argv[1]);

for(int nTestsLeft = TOTAL_N_MEALS; nTestsLeft--;) {
const int idx = rand() % N_PHILOSOPHERS;
Philosopher& philosopher = philosophers[idx];
philosopher.postEvent(GetHungry);
}

for(auto& philosopher: philosophers)
philosopher.postEvent(ReportWhenNotHungry);

for(int nToReport = N_PHILOSOPHERS; nToReport > 0; --nToReport) {
Event e = mainEventQueue.getEvent(); // wait for all to get not hungry
assert(e == NotHungry);
}

for(auto& philosopher: philosophers)
philosopher.postEvent(Stop);

for(auto& philosopher: philosophers)
philosopher.join();

for(auto& philosopher: philosophers) // print philosophers
cout << philosopher << endl;

const int nEatenMeals = accumulate(begin(philosophers), end(philosophers), 0,
[](int a, const Philosopher& p) { return a + p.nEatenMeals(); });

cout << "TOTAL_N_MEALS:" << TOTAL_N_MEALS << " nEatenMeals:" << nEatenMeals <<
endl;

return 0;
}

bool
Philosopher::askPeerToYieldIfNeeded(Peer& peer, Event yieldEvent)
{
if (!peer.ownsChopstick())
return false;
if (peer.knowsToYield())
return false;
peer.postEvent(yieldEvent);
peer.setKnowsToYield();
return true;
}

void
Philosopher::eatPostponedMeals()
{
assert(!!nPostponedMeals_);
nEatenMeals_ += nPostponedMeals_;
nPostponedMeals_ = 0;
if(needToReportNotHungry_) {
reportNotHungry();
needToReportNotHungry_ = false;
}
}

void
Philosopher::eatTheOnlyMeal()
{
assert(!nPostponedMeals_);
++nEatenMeals_;
}

void
Philosopher::grabChopstick(Peer& peer)
{
assert(hungry());
peer.clearOwnsChopstick();
}

void
Philosopher::handlePeerYielded(Peer& yieldedPeer, Peer& otherPeer,
Event otherPeerYieldEvent)
{
yieldedPeer.clearKnowsToYield();
grabChopstick(yieldedPeer);
if (askPeerToYieldIfNeeded(otherPeer, otherPeerYieldEvent)) {
assert(otherPeer.ownsChopstick());
return;
}
if (otherPeer.ownsChopstick())
return;
eatPostponedMeals();
yieldToGreaterPeerIfNeeded();
}

void
Philosopher::init(int idx, const Philosopher *peers, EventQueue& mainEventQueue)
{
assert(idx >= 0);
assert(idx <= MAX_IDX);
idx_ = idx;
leftPeer_.init(peers[idx == 0 ? MAX_IDX : idx - 1].eventQueue(), idx);
rightPeer_.init(peers[idx == MAX_IDX ? 0 : idx + 1].eventQueue(), idx);
mainEventQueue_ = &mainEventQueue;
}

void
Philosopher::work()
{
for(Event event = getEvent(); ; event = getEvent()) {
assert(event != NoEvent);
switch(event) {
case GetHungry:
if (askPeerToYieldIfNeeded(leftPeer_, YieldToRightPeer) ||
askPeerToYieldIfNeeded(rightPeer_, YieldToLeftPeer)) {
++nPostponedMeals_;
break;
}
if (rightPeer_.ownsChopstick() || leftPeer_.ownsChopstick()) {
++nPostponedMeals_;
break;
}
eatTheOnlyMeal();
yieldToGreaterPeerIfNeeded();
break;
case YieldToLeftPeer:
yieldIfNeeded(leftPeer_, RightPeerYielded);
break;
case YieldToRightPeer:
yieldIfNeeded(rightPeer_, LeftPeerYielded);
break;
case LeftPeerYielded:
handlePeerYielded(leftPeer_, rightPeer_, YieldToLeftPeer);
break;
case RightPeerYielded:
handlePeerYielded(rightPeer_, leftPeer_, YieldToRightPeer);
break;
case ReportWhenNotHungry:
if(!hungry())
reportNotHungry();
else
needToReportNotHungry_ = true;
break;
case Stop:
return;
default:
assert(false);
}
}
}

void
Philosopher::reportNotHungry()
{
mainEventQueue_->postEvent(NotHungry);
}

void
Philosopher::resetGreaterPeerIsHungry()
{
greaterPeerIsHungry_ = false;
}

void
Philosopher::yieldIfNeeded(Peer &peer, Event yieldedEvent)
{
if(lessThanPeer(peer)) {
if(hungry()) {
if(!greaterPeerIsHungry_)
greaterPeerIsHungry_ = true;
return; // don't yield
}
if(greaterPeerIsHungry_) {
resetGreaterPeerIsHungry();
}
}
if(peer.ownsChopstick()) // we yielded it before, after eating
return;
peer.setOwnsChopstick();
peer.postEvent(yieldedEvent);
}

void
Philosopher::yieldToGreaterPeerIfNeeded()
{
if(!greaterPeerIsHungry_)
return;
resetGreaterPeerIsHungry();
const bool leftPeerIsGreater = lessThanPeer(leftPeer_);
Peer& greaterPeer = leftPeerIsGreater ? leftPeer_ : rightPeer_;
const Event yieldedEvent = leftPeerIsGreater ? RightPeerYielded :
LeftPeerYielded;
greaterPeer.setOwnsChopstick();
greaterPeer.postEvent(yieldedEvent);
}

/* TODO:
* - could cache lessThanPeer for both peers;
*/

---------------------- makefile: -------------------
dp: dp.cpp
g++ -g -pthread -std=c++11 $< -o $@

dp.fast: dp.cpp
g++ -O3 -funroll-loops -g -pthread -std=c++11 $< -o $@

Chris M. Thomasson

unread,

Dec 7, 2018, 12:08:57 AM12/7/18

to

On 12/6/2018 8:51 PM, Pavel wrote:
> Chris M. Thomasson wrote:
>> On 12/2/2018 1:30 PM, Pavel wrote:
>>> Chris M. Thomasson wrote:
>>> ...
>>>
>>>> Wrt threads, the use cases are generally geared toward in memory database type
>>>> schemes. Or any time a thread needs to lock a couple of objects, read/write,
>>>> unlock. The n-ary dining philosophers problem is solved by this.
>>> ...
>>>
>>> Which variation did you solve?
>>
>> I did one a while back for fun where each diner was a different creature that
>> could have more than two hands.
> And all n had to be busy to eat? :-)

Big time... For stress testing, yes... :^)

>
> I guess my real question is then: is there still any incidence between
> philosophers and chopsticks? In classic problem, a philosopher should grab the
> one from his left or his right but cannot get just any across the table I guess
> i.e. chopsticks are not fungible. With more than 2 hands, what would be the
> "table topology"? :)

Imagine a field line of the following rendering being a fractal hand as
it approaches the unit circle, which represents the table:

https://plus.google.com/101799841244447089430/posts/457Kzv21gFp

n-ary diners at a table:

https://plus.google.com/101799841244447089430/posts/gYcUXTN5zBb

Now, we can create a little network here. Tangents are sync points?

>>> Any requirements on fairness / starvation avoidance?
>>
>> It shall not deadlock. It shall provide forward progress.
> Yes, so no restriction on starvation (i.e. the correct solution can end up with
> 2 philosophers eating intermittently and the 3rd starving to death). This is an
> easier problem then.

Every diner shall complete their transaction without deadlock, or
livelock. It will complete. This basically handles starvation by
default. Each diner always gains forward progress. A diner starving to
death means it never got to participate in the forward progress, thus
violating the property.

>>> (Asking to try to decompose it without multiple lock of mutexes. Obviously,
>>> locking of multiple chopsticks have to be done as it's a part of the problem
>>> statement, but a chopstick does not have to have a mutex).
>>
>> It depends on how fine of a grain one uses with the locking scheme. You can lock
>> the table. Lock each diner. Lock each chopstick.
>
> Because in most real-life tasks every thread has to do more than one thing (in
> other words, handle more than one type of event

Agreed. This is a simple example of a lock-based sync primitive that be
used.

> -- let's say, in this problem,
> the "main event" is a random time at which a philosopher gets hungry; but still
> there should be some "service events" such as "we are out of food (for body or
> brain)" i.e. "terminate orderly" or "how are you" i.e. some diagnostics; etc), I
> prefer an event loop model.

[...]

Event driven is a fun solution as well. Fwiw, check this out for a fast
MPMC queue:

https://groups.google.com/d/topic/lock-free/acjQ3-89abE/discussion
(read all...)

It is mod based, so, nice for a unitary angle formation of diners.