Lock method for 1 million items

Adem

unread,

Nov 4, 2008, 10:52:04 PM11/4/08

to

Suppose there is a vector with 1+ million elements (each ~ 50 Bytes).
The vector is used by many threads (~10) on a multi-core CPU.
For adding and deleting items the vector must be write-locked,
otherweise (for reading and modifying in-place) a read-lock
must be used. A modification of an item must obviously be
an atomic operation.
Ie. for modifying an item one needs also a lock on that item.
Now the question is: how to achive this.
Should each item have a pthread_mutex_t embedded in it,
or a sem_t, a pthread_cond_t, or maybe something else?
Which alternatives are there?
How would you solve this problem efficiently?

Marcel Müller

unread,

Nov 5, 2008, 2:22:31 AM11/5/08

to

Adem schrieb:

> Suppose there is a vector with 1+ million elements (each ~ 50 Bytes).
> The vector is used by many threads (~10) on a multi-core CPU.

[...]

> Ie. for modifying an item one needs also a lock on that item.
> Now the question is: how to achive this.
> Should each item have a pthread_mutex_t embedded in it,

definitely no. 10^6 semaphores are never a good idea. In your case they
will likely consume much more ressources than your data.

> or a sem_t, a pthread_cond_t, or maybe something else?
> Which alternatives are there?

prepare the modification as good as possible and apply it while the
entire collection is locked. Applying 50 bytes will be a neglegtible effort.

Marcel

David Schwartz

unread,

Nov 5, 2008, 6:41:57 AM11/5/08

to

On Nov 4, 7:52 pm, "Adem" <for-usenet...@alicewho.com> wrote:

> Suppose there is a vector with 1+ million elements (each ~ 50 Bytes).
> The vector is used by many threads (~10) on a multi-core CPU.

Is the vector expected to be heavily contended? If so, the solution is
probably to rethink the design. Odds are the same problem can be
solved without this.

> For adding and deleting items the vector must be write-locked,
> otherweise (for reading and modifying in-place) a read-lock
> must be used. A modification of an item must obviously be
> an atomic operation.

What is the expected ratio or object reads to object writes to object
add/deletes to traversing the vector? We don't need exact numbers, but
some idea of the ratios would be helpful.

The answer is very different if add/delete is rare and read is common
than it is if add/delete is common and write is rare.

> Ie. for modifying an item one needs also a lock on that item.
> Now the question is: how to achive this.
> Should each item have a pthread_mutex_t embedded in it,
> or a sem_t, a pthread_cond_t, or maybe something else?
> Which alternatives are there?
> How would you solve this problem efficiently?

I am assuming all the things you haven't explained are pretty much
what they typically are, that reads outnumber writes, the adds/deletes
are relative rare, and that redesigning is off the table. I am also
assuming that no operations involve two objects being locked at the
same time.

I would do it this way:

1) A single "master" reader/writer lock protects the vector itself.
Those searching for an object must hold this lock for read. Those
adding/deleting an object must hold this lock for write.

2) An array of 256 reader/writer locks for the objects themselves. A
hash of some kind on each object's pointer, object identifier, or just
a separate integer assigned sequentially at creation time determines
which object gets which lock.

So, here's how it works:

To Add An Object:
1) Allocate the object.
2) Write lock the "master" lock.
3) Insert the object.
4) Release the "master" write lock.

To Delete An Object
1) Write lock the "master" lock.
2) Find the object in the collection you don't already know where it
is.
3) Remove the object from the collection.
4) Release the "master" write lock.
5) Write lock and then release the lock corresponding to the object
(if needed).
6) Delete the object.

To Find An Object:
1) Read lock the "master" lock.
2) Find the object in the collection.
3) Read/Write lock the lock corresponding to the object as
appropriate.
4) Release the master read lock.
5) Operate on the object.
6) Release the object lock.

Note that you must never attempt to acquire the master lock while you
hold a lock on an object.

CAUTION: This approach is only appropriate if:
1) Removal of objects from the collection is relatively infrequent.
2) Objects are not ever kept locked, either read or write, for very
long. (Because this can block a thread that holds the master lock.)

Note that there's probably no point in using reader/writer locks for
the objects. Using 256 regular mutexes would probably work just as
well.

DS

Pascal J. Bourguignon

unread,

Nov 5, 2008, 8:53:22 AM11/5/08

to

"Adem" <for-us...@alicewho.com> writes:

I would write a scheduler. Thread wanting access to the array would
declare to the scheduler what object they plan to access, and how
(read or write), and would declare when they're done. The scheduler
would check for contention, and arbitrate accesses. In most cases
(given the ratio number of objects / number of threads), the scheduler
would leave the threads free to run. In the rare case when there are
several threads wanting concurent access to the same object the
scheduler would block them and let them proceed in good order.

So there would be O(1) semaphores/mutexes/conditions, with still
concurent accesses to the whole array, and concurent read accesses to
any object.

--
__Pascal Bourguignon__

GPS

unread,

Nov 5, 2008, 9:47:54 AM11/5/08

to

Pascal J. Bourguignon wrote:

That's an interesting solution. It reminds me of Lamport's Bakery
algorithm.

http://research.microsoft.com/users/lamport/pubs/pubs.html#bakery

--George

Bill Davy

unread,

Nov 5, 2008, 10:09:28 AM11/5/08

to

"GPS" <geor...@xmission.com> wrote in message
news:gesbn5$84r$1...@news.xmission.com...

I also wonder about the elevator algorithm that was suggested for disk
access. Involved a priority queue. In practice, it was not much good
because if you typically had more than one disk request outstanding, you'd
soon have two, and three, etc.

http://dbpubs.stanford.edu:8090/pub/1993-6

http://markmail.org/message/u7lxlurh5fekcjwt

Adem

unread,

Nov 5, 2008, 2:42:35 PM11/5/08

to

Sorry, I forgot to mention this important requirement:

- A thread must be able to lock multiple items at the same time,
ie. as a set for a "transactional" modification.

I see 3 alternatives for Item modifications:
1.) store a pthread_mutex_t or sem_t with each item.
2.) write-lock the vector and use nothing else for locking the items in question.
3.) Develop an application-level Lock Manager for item locks in the collection.
Threads must consult this lock manager for getting item locks.
A write-lock on the collection (ie. the vector) for Add/Del must be granted only
if there are no locks held in this LockManager for the collection object.

Alternative 1 uses too much resources and is therefore impractical.
Alternative 2 is easy to implement but it blocks all the other threads unneccessarily.
I think Alternative 3 is IMO the best general purpose solution in this case
as it does not block the other threads who want to operate on the other items.
All locking operations for this collection object can and should
better be done via this LockManager only.
One should use such a LockManager for each such a collection.

GPS

unread,

Nov 5, 2008, 3:10:29 PM11/5/08

to

Adem wrote:

What if every N elements has 1 mutex associated with the elements?

mutex1 [e1 e2 e3 e4 .. e50]
mutex2 [e51 .. e100]

You could then conceivably have many operations occur in parallel, while
being able to tweak the granularity of your mutex range for optimal
behavior with regard to the memory cost vs. performance.

--George

Bruno Causse

unread,

Nov 5, 2008, 3:34:24 PM11/5/08

to

Adem <for-us...@alicewho.com> wrote:

read this:

http://www.cis.uab.edu/hyatt/hashing.html

perhap help you

David Schwartz

unread,

Nov 5, 2008, 8:24:13 PM11/5/08

to

On Nov 5, 11:42 am, "Adem" <for-usenet...@alicewho.com> wrote:

> Sorry, I forgot to mention this important requirement:
>
> - A thread must be able to lock multiple items at the same time,
> ie. as a set for a "transactional" modification.

Then one big lock is probably the best solution, at least until you
can prove that you will really have a contention problem.

> I see 3 alternatives for Item modifications:
> 1.) store a pthread_mutex_t or sem_t with each item.
> 2.) write-lock the vector and use nothing else for locking the items in question.
> 3.) Develop an application-level Lock Manager for item locks in the collection.
> Threads must consult this lock manager for getting item locks.
> A write-lock on the collection (ie. the vector) for Add/Del must be granted only
> if there are no locks held in this LockManager for the collection object.
>
> Alternative 1 uses too much resources and is therefore impractical.

> Alternative 2 is easy to implement but it blocks all the other threads unneccessarily.

Right, but in compensation, it minimizes contention. It is contention
that consumes resources.

> I think Alternative 3 is IMO the best general purpose solution in this case
> as it does not block the other threads who want to operate on the other items.
> All locking operations for this collection object can and should
> better be done via this LockManager only.
> One should use such a LockManager for each such a collection.

This makes the most common case very expensive and should only even be
considered in cases where there's a proven problem with excessive
context switches.

DS

Adem

unread,

Nov 5, 2008, 11:52:23 PM11/5/08

to

"David Schwartz" write:

> On Nov 5, 11:42 am, "Adem" wrote:
> >
> > Sorry, I forgot to mention this important requirement:
> >
> > - A thread must be able to lock multiple items at the same time,
> > ie. as a set for a "transactional" modification.
>
> Then one big lock is probably the best solution, at least until you
> can prove that you will really have a contention problem.

Hmm. I would say the rule is exactly the contrary:
ie. the more a lock covers (coarse granularity) the more lock contention will be there.
OTOH a finer granularity reduces lock contention.

> > I see 3 alternatives for Item modifications:
> > 1.) store a pthread_mutex_t or sem_t with each item.
> > 2.) write-lock the vector and use nothing else for locking the items in question.
> > 3.) Develop an application-level Lock Manager for item locks in the collection.
> > Threads must consult this lock manager for getting item locks.
> > A write-lock on the collection (ie. the vector) for Add/Del must be granted only
> > if there are no locks held in this LockManager for the collection object.
> >
> > Alternative 1 uses too much resources and is therefore impractical.
>
> > Alternative 2 is easy to implement but it blocks all the other threads unneccessarily.
>
> Right, but in compensation, it minimizes contention. It is contention
> that consumes resources.

Hmm. I think your definition of "contention" is somewhat different from the mainstream.
Please take a look here:
http://en.wikipedia.org/wiki/Lock_(computer_science)#Granularity

> > I think Alternative 3 is IMO the best general purpose solution in this case
> > as it does not block the other threads who want to operate on the other items.
> > All locking operations for this collection object can and should
> > better be done via this LockManager only.
> > One should use such a LockManager for each such a collection.
>
> This makes the most common case very expensive and should only even be
> considered in cases where there's a proven problem with excessive
> context switches.

I'll measure the performance of both versions, and of some variations.

Hallvard B Furuseth

unread,

Nov 6, 2008, 12:09:54 AM11/6/08

to

Adem writes:

>David Schwartz write:
>> Then one big lock is probably the best solution, at least until you
>> can prove that you will really have a contention problem.
>
> Hmm. I would say the rule is exactly the contrary: ie. the more a lock
> covers (coarse granularity) the more lock contention will be there.
> OTOH a finer granularity reduces lock contention.

It's not the size of the locked item which causes contention. It's how
often and how long the lock is needed. And how irregularly, in
particular if you are looking to minimize worst-case behavior. Now, if
you had a million threads all contending for 10 array elements...

> (...)

> I'll measure the performance of both versions, and of some variations.

Yes, check if you have a problem before spending time on a nontrivial
solution.

--
Hallvard

David Schwartz

unread,

Nov 6, 2008, 1:02:30 AM11/6/08

to

On Nov 5, 8:52 pm, "Adem" <for-usenet...@alicewho.com> wrote:

> "David Schwartz" write:

> > Then one big lock is probably the best solution, at least until you
> > can prove that you will really have a contention problem.

> Hmm. I would say the rule is exactly the contrary:
> ie. the more a lock covers (coarse granularity) the more lock contention will be there.
> OTOH a finer granularity reduces lock contention.

Smaller locks reduce contention but add cost. So you shouldn't use
them unless you can prove that you really will have a contention
problem.

There is also a common misunderstanding of what locks typically
actually do in the real world. Simply put, most application-level
locks typically deschedule threads that tend to contend so that
instead threads that tend not to contend (whether from the same
process or not) run in parallel.

For example, if you have four threads, A1, A2, B1, and B2, and two
cores, if threads A1 and A2 tend to contend and B1 and B2 tend to
contend, normal application locks will tend to make A1 and A2 not run
at the same time. Instead, A1 and B1 will tend to run at the same
time, and there will be very little contention. (This is true whether
A1 and A2 are one process and B1 and B2 another or not.)

Locks minimize contention by not allowing threads that tend to contend
to be concurrently scheduled. It is very critical that you understand
that when you analyze what locks do and how they work. (Of course,
this doesn't help you if every single ready-to-run thread contends
with another ready-to-run thread or there are more cores than threads
that tend not to contend with each other.)

DS

Chris M. Thomasson

unread,

Nov 6, 2008, 6:56:14 PM11/6/08

to

"Adem" <for-us...@alicewho.com> wrote in message
news:ger5d9$lu8$1...@aioe.org...

For efficiency, I would use fine-grain locking for writers, and RCU for
readers. Forget RCU for now... Anyway, here is one way you can begin to
implement the writer and/or reader locking; quick and dirty example code
which compiles:
___________________________________________________________________
#define _XOPEN_SOURCE 600
#include <set>
#include <cstdio>
#include <cstddef>
#include <pthread.h>

#define HASH_PTR(mp_ptr, mp_depth) ((std::size_t) \
(((std::size_t)(mp_ptr)) % (mp_depth)) \
)

#define ARRAY_SIZE(mp_this) ( \
sizeof(mp_this) / sizeof((mp_this)[0]) \
)

#define RWMUTEX_PLACE_6 \
PTHREAD_RWLOCK_INITIALIZER, PTHREAD_RWLOCK_INITIALIZER, \
PTHREAD_RWLOCK_INITIALIZER, PTHREAD_RWLOCK_INITIALIZER, \
PTHREAD_RWLOCK_INITIALIZER, PTHREAD_RWLOCK_INITIALIZER

#define RWMUTEX_PLACE_48 \
RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, \
RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, RWMUTEX_PLACE_6

template<typename T>
class wrlock_guard {
T& m_mutex;

public:
wrlock_guard(T& mutex) : m_mutex(mutex) {
m_mutex.wrlock();
}

~wrlock_guard() {
m_mutex.unlock();
}
};

template<typename T>
class rdlock_guard {
T& m_mutex;

public:
rdlock_guard(T& mutex) : m_mutex(mutex) {
m_mutex.rdlock();
}

~rdlock_guard() {
m_mutex.unlock();
}
};

namespace rwmtxtbl {
namespace detail {
static pthread_rwlock_t table[] = {
RWMUTEX_PLACE_48, RWMUTEX_PLACE_48, RWMUTEX_PLACE_48,
RWMUTEX_PLACE_48, RWMUTEX_PLACE_48, RWMUTEX_PLACE_48
};
}

class locker {
std::set<pthread_rwlock_t*> m_locks;

public:
typedef wrlock_guard<locker> wrguard;
typedef rdlock_guard<locker> rdguard;

void add(void const* ptr) {
m_locks.insert(&detail::table[HASH_PTR(ptr,
ARRAY_SIZE(detail::table))]);
}

void reset() {
m_locks.clear();
}

void wrlock() {
std::set<pthread_rwlock_t*>::iterator i;
for (i = m_locks.begin(); i != m_locks.end(); ++i) {
pthread_rwlock_wrlock(*i);
std::printf("write locked rdmutex %p\n", (void*)*i);
}
}

void rdlock() {
std::set<pthread_rwlock_t*>::iterator i;
for (i = m_locks.begin(); i != m_locks.end(); ++i) {
pthread_rwlock_rdlock(*i);
std::printf("read locked rdmutex %p\n", (void*)*i);
}
}

void unlock() {
std::set<pthread_rwlock_t*>::reverse_iterator i;
for (i = m_locks.rbegin(); i != m_locks.rend(); ++i) {
pthread_rwlock_unlock(*i);
std::printf("unlocked rdmutex %p\n", (void*)*i);
}
}
};
}

struct foo1 {};
struct foo2 {};

static foo1 g_foo1_1, g_foo1_2, g_foo1_3, g_foo1_4;
static foo2 g_foo2_1, g_foo2_2, g_foo2_3, g_foo2_4;

int main() {
rwmtxtbl::locker locks;

locks.add(&g_foo1_4);
locks.add(&g_foo1_3);
locks.add(&g_foo1_1);
locks.add(&g_foo1_2);

{
rwmtxtbl::locker::wrguard lock(locks);
std::puts("-->[write critical-section for all g_foo1_N objects]");
}

std::puts("---------------------------------");

locks.reset();

locks.add(&g_foo2_1);
locks.add(&g_foo2_4);
locks.add(&g_foo2_2);
locks.add(&g_foo2_3);

{
rwmtxtbl::locker::rdguard lock(locks);
std::puts("-->[read critical-section for all g_foo2_N objects]");
}

std::puts("---------------------------------");

locks.add(&g_foo1_1);
locks.add(&g_foo1_2);
locks.add(&g_foo1_4);
locks.add(&g_foo1_3);

{
rwmtxtbl::locker::wrguard lock(locks);
std::puts("-->[write critical-section for all g_foo1/2_N objects]");
}

return 0;
}
___________________________________________________________________

This will allow you to acquire multiple read/write locks at once without
fear of deadlocking; all of the ordering is handled for you. You can create
lock-based transactional memory using that code. However, please take
David's advise to heart. You may not even need something like this. Also, it
can sometimes be fairly tricky to use correctly. Your going to probably
going to have to implement your own special vector class and directly
integrate this locking scheme into it; you can read here for some further
context:

http://msdn.microsoft.com/en-us/magazine/cc301503.aspx

Also, here is an algorithm that allows one to use the hashed locking scheme
to build portable POSIX-based doubly-linked lists:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/776f6842784072f2

You can definitely build concurrent portable double-lists using the example
C++ code above.

How are you planning to implement your transaction layer? What about safe
memory reclamation? The hashed locks can help with both...

Adem

unread,

Nov 7, 2008, 7:19:31 PM11/7/08

to

"Chris M. Thomasson" wrote:

> "Adem" wrote:
> >
> > Suppose there is a vector with 1+ million elements (each ~ 50 Bytes).
> > The vector is used by many threads (~10) on a multi-core CPU.
> > For adding and deleting items the vector must be write-locked,
> > otherweise (for reading and modifying in-place) a read-lock
> > must be used. A modification of an item must obviously be
> > an atomic operation.
> > Ie. for modifying an item one needs also a lock on that item.
> > Now the question is: how to achive this.
> > Should each item have a pthread_mutex_t embedded in it,
> > or a sem_t, a pthread_cond_t, or maybe something else?
> > Which alternatives are there?
> > How would you solve this problem efficiently?
>
> For efficiency, I would use fine-grain locking for writers, and RCU for
> readers. Forget RCU for now... Anyway, here is one way you can begin to
> implement the writer and/or reader locking; quick and dirty example code
> which compiles:

...

> #define HASH_PTR(mp_ptr, mp_depth) ((std::size_t) \
> (((std::size_t)(mp_ptr)) % (mp_depth)) \
> )

...

> namespace rwmtxtbl {
> namespace detail {
> static pthread_rwlock_t table[] = {
> RWMUTEX_PLACE_48, RWMUTEX_PLACE_48, RWMUTEX_PLACE_48,
> RWMUTEX_PLACE_48, RWMUTEX_PLACE_48, RWMUTEX_PLACE_48
> };
> }
>
> class locker {
> std::set<pthread_rwlock_t*> m_locks;
>
> public:
> typedef wrlock_guard<locker> wrguard;
> typedef rdlock_guard<locker> rdguard;
>
> void add(void const* ptr) {
> m_locks.insert(&detail::table[HASH_PTR(ptr,
> ARRAY_SIZE(detail::table))]);
> }
>
> void reset() {
> m_locks.clear();
> }

...

>
> This will allow you to acquire multiple read/write locks at once without
> fear of deadlocking; all of the ordering is handled for you. You can create
> lock-based transactional memory using that code. However, please take
> David's advise to heart. You may not even need something like this. Also, it
> can sometimes be fairly tricky to use correctly. Your going to probably
> going to have to implement your own special vector class and directly
> integrate this locking scheme into it; you can read here for some further
> context:
> http://msdn.microsoft.com/en-us/magazine/cc301503.aspx
>
> Also, here is an algorithm that allows one to use the hashed locking scheme
> to build portable POSIX-based doubly-linked lists:
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/776f6842784072f2
>
> You can definitely build concurrent portable double-lists using the example
> C++ code above.
>
> How are you planning to implement your transaction layer? What about safe
> memory reclamation? The hashed locks can help with both...

Hi Chris,
can you comment the hash stuff because I don't understand
why you use an array of 288 mutices in detail::table.
I must say I'm suspicious about the correctness of the above code.
For example: a write lock should be possible only if there are no readlocks.
But I don't see where you have checked for this.
What about recursive locks? Is it possible or not?
What about write-locking and then read-locking the same object by the same thread?
And, what do you mean by "safe memory reclamation"?
Is this too part of the locking problem here?
What are the limits of your "hashed locks". When can they fail?
Max how many threads can use it?
What about pthread_rwlock_XXlock() failures? Can't they happen? There is no error checking.
And regarding MS-code: mostly it is nonportable and/or useless buggy crrrrap for the trashcan :-)
not worth to study or waste your valuable time for it...

Chris M. Thomasson

unread,

Nov 7, 2008, 8:21:34 PM11/7/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf2m0f$pvv$1...@aioe.org...

> "Chris M. Thomasson" wrote:
>> "Adem" wrote:

[...]

> can you comment the hash stuff because I don't understand
> why you use an array of 288 mutices in detail::table.

This is the main mutex table. Pointers to objects get hashed to values which
are then used as indexes into said table.

> I must say I'm suspicious about the correctness of the above code.

Its a quick and dirty example; I omitted all error checking of pthread API
calls for brevity.

> For example: a write lock should be possible only if there are no
> readlocks.

Sure.

> But I don't see where you have checked for this.

Huh? I don't have to check for this because the `pthread_rwlock_rd/wrlock()'
API's already take care of all the internal locking details:

http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_rwlock_wrlock.html

http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_rwlock_rdlock.html

http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_rwlock_unlock.html

The code allows you to build lock sets containing multiple objects and
acquire read or write access on all of them in a single deadlock-free
operation. All ordering and duplication concerns are handled under the
covers.

> What about recursive locks? Is it possible or not?

AFAICT, there is no attribute one can set in order to make calls to
`pthread_rwlock_wrlock()' recursive. However, `pthread_rwlock_rdlock()' is
documented to be recursive by nature. Also, why are you worried about write
lock recursion? The need for recursion is usually a sign of a bad design.

> What about write-locking and then read-locking the same object by the same
> thread?

POSIX Threads do not allow that; its documented to produce undefined
behavior. Or, an implementation may return 'EDEADLK'; but its a "may" clause
which means an implementation does not have to honor it.

> And, what do you mean by "safe memory reclamation"?

This pertains to how you are freeing the memory which makes up the objects
in the vector.

> Is this too part of the locking problem here?

It completely depends on how you make use of it.

> What are the limits of your "hashed locks". When can they fail?

The particular algorithm I posted is deadlock free in the sense that it
follows a total locking order and removed duplicates from locking sets. You
just need to follow the rules of the POSIX Threads rwlock API: you can't
acquire recursive write locks.

> Max how many threads can use it?

Any number of threads can use it. However, POSIX Threads allows an
implementation to put a limit on the number of read locks that can be
acquired at any one time. This number is usually ULONG_MAX or something that
will handle a ridiculous amount of threads.

> What about pthread_rwlock_XXlock() failures? Can't they happen?

'pthread_rwlock_wrlock()' __may__ return `EDEADLK' if the calling thread
already owns read/write access. `'pthread_rwlock_rdlock()' __may__ return
`EDEADLK' if the calling thread already owns write access, it also __may__
return `EAGAIN' if too many read locks are acquired. They both __may__
return `EINVAL' if you pass an uninitialized object to the API's.

> There is no error checking.

I omitted it for brevity. You can certainly make it throw exceptions on
'EDEADLK' or 'EAGAIN'. For `EINVAL', which should never show up, I would
simply call `std::unexpected()'.

> And regarding MS-code: mostly it is nonportable and/or useless buggy
> crrrrap for the trashcan :-)
> not worth to study or waste your valuable time for it...

If you would have read the article, you would know what hashed locking is,
and how it can be applied to your problem. Its nothing new, and MS certainly
did not invent it.

Chris M. Thomasson

unread,

Nov 7, 2008, 8:31:47 PM11/7/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf2m0f$pvv$1...@aioe.org...
> "Chris M. Thomasson" wrote:

[...]

> Hi Chris,
> can you comment the hash stuff because I don't understand
> why you use an array of 288 mutices in detail::table.

By taking advantage of a hash locking scheme one can amortize the number of
locks needed for any size collection. You can have a million entries, but
you will still only have 288 mutexs. It saves you from having to store a
mutex in each entry. To lock/unlock an entry you simple hash its address to
a value which is then used as an index into the mutex array. You can also
use the scheme to implement lock striping algorithms.

Also, hash locking has an advantage of keeping mutexs separate from the
objects they protect. This can be very useful wrt destroying objects because
sometimes you find yourself needing to atomically unlock and destroy an
object.

Adem

unread,

Nov 7, 2008, 10:11:13 PM11/7/08

to

"Chris M. Thomasson" wrote:
> "Adem" wrote:

Nice, but can you tell me why the following test code crashes with a SEGV:

//--------------------------------------------------------------------
// <...lib code...>

//--------------------------------------------------------------------
// TEST CODE:

struct foo1 {};
struct foo2 {};

static foo1 g_foo1_1, g_foo1_2, g_foo1_3, g_foo1_4;
static foo2 g_foo2_1, g_foo2_2, g_foo2_3, g_foo2_4;

struct TST
{
pthread_t tid;
//...
};

void* MyThread1(void* Ap)
{
TST S = *((TST*) Ap);
rwmtxtbl::locker locks;

locks.add(&g_foo1_4);
locks.add(&g_foo1_3);
locks.add(&g_foo1_1);
locks.add(&g_foo1_2);
{
rwmtxtbl::locker::rdguard lock(locks);
std::printf("-->[Thread %u can do its read operations here]\n", S.tid);
#if 1
// simulate some more work (for task-switching and testing the other threads...)
sleep(1);
#endif
}

return 0;
}

void* MyThread2(void* Ap)
{
TST S = *((TST*) Ap);
rwmtxtbl::locker locks;

locks.add(&g_foo1_3);
locks.add(&g_foo1_2);
{
rwmtxtbl::locker::rdguard lock(locks);
std::printf("-->[Thread %u can do its read operations here]\n", S.tid);
#if 1
// simulate some more work (for task-switching and testing the other threads...)
sleep(1);
#endif
}

return 0;
}

int main()
{
const int nThreadSets = 3;
const int nThreadSetSize = 2; // 2 threads per set
std::vector<TST> aS(nThreadSets * nThreadSetSize);
int i;
for (i = 0; i < nThreadSets; i += nThreadSetSize)
{
pthread_create(&aS[i + 0].tid, 0, MyThread1, &aS[i + 0]);
pthread_create(&aS[i + 1].tid, 0, MyThread2, &aS[i + 1]);
}

// wait for all threads finish:
for (i = 0; i < aS.size(); ++i)
pthread_join(aS[i].tid, NULL);

return 0;
}

/*
Sample output: I've added some debug code to the lib code

$ g++ -m32 chris.cpp -lpthread -lrt
$ ./a.out
N=288 ix=128 &table[ix]=0x804c6c0
N=288 ix=96 &table[ix]=0x804c2c0
N=288 ix=128 &table[ix]=0x804c6c0
N=288 ix=96 &table[ix]=0x804c2c0
N=288 ix=96 &table[ix]=0x804c2c0
N=288 ix=129 &table[ix]=0x804c6e0
N=288 ix=97 &table[ix]=0x804c2e0
Thread 4156795792: read locked rdmutex 0x804c2c0
Thread 4156795792: read locked rdmutex 0x804c2e0
Thread 4156795792: read locked rdmutex 0x804c6c0
Thread 4156795792: read locked rdmutex 0x804c6e0
N=288 ix=97 &table[ix]=0x804c2e0
N=288 ix=97 &table[ix]=0x804c2e0
-->[Thread 4156795792 can do its read operations here]
N=288 ix=96 &table[ix]=0x804c2c0
N=288 ix=129 &table[ix]=0x804c6e0
Thread 4131629968: read locked rdmutex 0x804c2c0
Thread 4131629968: read locked rdmutex 0x804c2e0
Thread 4148407184: read locked rdmutex 0x804c2c0
Thread 4148407184: read locked rdmutex 0x804c2e0
-->[Thread 4148407184 can do its read operations here]
N=288 ix=97 &table[ix]=0x804c2e0
-->[Thread 4131629968 can do its read operations here]
Thread 4140018576: read locked rdmutex 0x804c2c0
Thread 4140018576: read locked rdmutex 0x804c2e0
Thread 4140018576: read locked rdmutex 0x804c6c0
Thread 4140018576: read locked rdmutex 0x804c6e0
-->[Thread 4140018576 can do its read operations here]
Thread 4156795792: unlocked rdmutex 0x804c6e0
Thread 4156795792: unlocked rdmutex 0x804c6c0
Thread 4156795792: unlocked rdmutex 0x804c2e0
Thread 4156795792: unlocked rdmutex 0x804c2c0
Thread 4148407184: unlocked rdmutex 0x804c2e0
Thread 4148407184: unlocked rdmutex 0x804c2c0
Thread 4131629968: unlocked rdmutex 0x804c2e0
Thread 4131629968: unlocked rdmutex 0x804c2c0
Thread 4140018576: unlocked rdmutex 0x804c6e0
Thread 4140018576: unlocked rdmutex 0x804c6c0
Thread 4140018576: unlocked rdmutex 0x804c2e0
Thread 4140018576: unlocked rdmutex 0x804c2c0
Segmentation fault
$

*/

Adem

unread,

Nov 7, 2008, 10:19:56 PM11/7/08

to

Ok, fixed; it was my fault:

> int main()
> {
> const int nThreadSets = 3;
> const int nThreadSetSize = 2; // 2 threads per set
> std::vector<TST> aS(nThreadSets * nThreadSetSize);
> int i;
> for (i = 0; i < nThreadSets; i += nThreadSetSize)

must be:
for (i = 0; i < aS.size(); i += nThreadSetSize)

Chris M. Thomasson

unread,

Nov 7, 2008, 11:24:17 PM11/7/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf30jh$v98$1...@aioe.org...

There is another problem. You simply cannot portably apply `printf()' to a
`pthread_t' opaque data-structure; I notice your doing that here:

std::printf("-->[Thread %u can do its read operations here]\n", S.tid);

Where the data-structure represented by `S'

struct TST {
pthread_t tid;
//...
};

About the most portable thing you can do, if you really want to do it, would
be something nasty like:

union TST {
pthread_t tid;
unsigned uid;
};

Also, you have a nasty race-condition. You assign the value to `TST::tid' in
`pthread_create()', however, you pass it over to the thread function. Did
you know that the store to `TST:tid' can occur AFTER the thread starts?

Here, let me show you how I would re-create a quick and dirty version of
your program; I will use the nasty hack to `printf()' `pthread_t'
structures:

PLEASE NOTE!!!

I include <windows.h> in order to get `Sleep(1000)'. I don't want to boot
into Linux right now. I am using `pthread_win32' emulation:

http://sourceware.org/pthreads-win32

to resolve <pthread.h> on windows.

_____________________________________________________________________________
// snip [lib code]

/* Application Code
______________________________________________________________*/
#include <vector>
#include <cstdio>
#include <windows.h> // for Sleep(1000)

struct TST {
pthread_t tid;
//...
};

union TID {
pthread_t tid;
unsigned long uid;
};

struct foo1 {};
struct foo2 {};

static foo1 g_foo1_1, g_foo1_2, g_foo1_3, g_foo1_4;
static foo2 g_foo2_1, g_foo2_2, g_foo2_3, g_foo2_4;

void* MyThread1(void* Ap) {
rwmtxtbl::locker locks;

union TID tid;

tid.tid = pthread_self();

locks.add(&g_foo1_4);
locks.add(&g_foo1_3);
locks.add(&g_foo1_1);
locks.add(&g_foo1_2);

{
rwmtxtbl::locker::rdguard lock(locks);
std::printf("-->[Thread %lu can do its read operations here]\n",
tid.uid);
Sleep(1000);
}

std::printf("-->[Thread %lu has unlocked its read operations here]\n",
tid.uid);

return 0;
}

void* MyThread2(void* Ap) {
rwmtxtbl::locker locks;

union TID tid;

tid.tid = pthread_self();

locks.add(&g_foo1_3);
locks.add(&g_foo1_2);

{
rwmtxtbl::locker::rdguard lock(locks);
std::printf("-->[Thread %lu can do its read operations here]\n",
tid.uid);
Sleep(1000);
}

std::printf("-->[Thread %lu has unlocked its read operations here]\n",
tid.uid);

return 0;
}

int main() {
{
const int nThreadSets = 3;
const int nThreadSetSize = 2; // 2 threads per set
std::vector<TST> aS(nThreadSets * nThreadSetSize);

unsigned i;
for (i = 0; i < aS.size(); i += nThreadSetSize)
{
pthread_create(&aS[i + 0].tid, 0, MyThread1, NULL);
pthread_create(&aS[i + 1].tid, 0, MyThread2, NULL);
}

// wait for all threads finish:
for (i = 0; i < aS.size(); ++i)
pthread_join(aS[i].tid, NULL);
}

std::puts("\n\n\nPress <ENTER> To Exit...\n");
std::fflush(stdout);
std::getchar();

return 0;
}
_____________________________________________________________________________

Any thoughts?

Chris M. Thomasson

unread,

Nov 7, 2008, 11:34:11 PM11/7/08

to

for a fairly more interesting result wrt my dirty version of your test app:

_____________________________________________________________________

// snip lib code

struct foo1 {};
struct foo2 {};

union TID tid;

tid.tid = pthread_self();

locks.add(&g_foo1_4);
locks.add(&g_foo1_3);
locks.add(&g_foo1_1);
locks.add(&g_foo1_2);

return 0;
}

union TID tid;

tid.tid = pthread_self();

locks.add(&g_foo1_2);
locks.add(&g_foo1_1);
locks.add(&g_foo1_4);
locks.add(&g_foo1_3);

{
rwmtxtbl::locker::wrguard lock(locks);
std::printf("-->[Thread %lu can do its write operations here]\n",
tid.uid);
Sleep(1000);
}

std::printf("-->[Thread %lu has unlocked its write operations here]\n",
tid.uid);

return 0;
}

int main() {
{
const int nThreadSets = 3;
const int nThreadSetSize = 2; // 2 threads per set
std::vector<TST> aS(nThreadSets * nThreadSetSize);
unsigned i;
for (i = 0; i < aS.size(); i += nThreadSetSize)
{
pthread_create(&aS[i + 0].tid, 0, MyThread1, NULL);
pthread_create(&aS[i + 1].tid, 0, MyThread2, NULL);
}

// wait for all threads finish:
for (i = 0; i < aS.size(); ++i)
pthread_join(aS[i].tid, NULL);
}

std::puts("\n\n\nPress <ENTER> To Exit...\n");
std::fflush(stdout);
std::getchar();

return 0;
}

_____________________________________________________________________

Now `MyThread2' takes write locks of all the objects that 'MyThread1' takes,
but in a different order. The very crude and quick hash locking code I
posted takes care of the ordering, and makes just makes the locking work.

Adem

unread,

Nov 8, 2008, 3:39:25 AM11/8/08

to

Here are some small improvements adressing the points you mentioned:

struct TSThreadPar
{
volatile bool fGoOn;
int mid; // my id: 1..n, 0=main thread
pthread_t tid;
//...

TSThreadPar()
{
Init();
}

void Init(int Amid = 0, pthread_t Atid = 0)
{
fGoOn = false;
mid = Amid;
tid = Atid;
}
};

void* MyThread(void* ApThreadPar)
{
TSThreadPar* pSThreadPar_tmp = (TSThreadPar*) ApThreadPar; // will be invalid when parent creates the next thread...
TSThreadPar SThreadPar = *pSThreadPar_tmp; // ...therefore we make a local copy
SThreadPar.fGoOn = pSThreadPar_tmp->fGoOn = true; // parent waits for this flag...

//...

std::printf("-->[Thread %d can do its xxx operations here]\n", SThreadPar.mid);

//...

and in main():

#include <unistd.h> // usleep()

int main()
{
const int nThreadSets = 3;
const int nThreadSetSize = 2; // 2 threads per set

std::vector<TSThreadPar> aS(nThreadSets * nThreadSetSize);
int i;

for (i = 0; i < aS.size(); i += nThreadSetSize)
{

TSThreadPar& S1 = aS[i];
S1.Init(1 + i);
pthread_create(&S1.tid, 0, MyThread1, &S1);
while (!S1.fGoOn)
usleep(10 * 1000); // 10 ms

TSThreadPar& S2 = aS[i + 1];
S2.Init(1 + i + 1);
pthread_create(&S2.tid, 0, MyThread2, &S2);
while (!S2.fGoOn)
usleep(10 * 1000); // 10 ms
}

//...

Another observation: since pointers are usually word aligned (unless pragma pack was used)
then of the hash table only every sizeof(void*)'th items are effectively used.
Ie. your table has 288 elements but only 1/4 (=72) of it get used in 32-bit mode,
and only 36 in 64-bit mode.
And: of course the minimum table size must be >= sizeof(void*)

And: one better should call "pthread_rwlock_destroy(...)" for the table elements before quitting the program.

Chris M. Thomasson

unread,

Nov 8, 2008, 5:55:28 AM11/8/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf3jd0$34v$1...@aioe.org...

> Here are some small improvements adressing the points you mentioned:
>
>
> struct TSThreadPar
> {

[...]
> };
>
> void* MyThread(void* ApThreadPar)
> {
[...]

>
> and in main():
>
> #include <unistd.h> // usleep()
> int main()
> {

[...]

Need to read through that birds nest.

> Another observation: since pointers are usually word aligned (unless
> pragma pack was used)
> then of the hash table only every sizeof(void*)'th items are effectively
> used.
> Ie. your table has 288 elements but only 1/4 (=72) of it get used in
> 32-bit mode,
> and only 36 in 64-bit mode.

YES. The please read m responses... The ode qas quick and dirty! You can
most certainly come up with a MUCH better hash algorithm. Spread the wealth
indeed!

> And: of course the minimum table size must be >= sizeof(void*)

Why? Don't lock into my example which made use of simple MOD on address
values!

> And: one better should call "pthread_rwlock_destroy(...)" for the table
> elements before quitting the > program.

Ummm, did you notice that I was using ___static___ rwlock's? On Windows
pthread-win32 _____emulation____, your correct, however, your totally wrong
at the same time; think about it for a moment...

I will get back to you later; good bye for now.

Chris M. Thomasson

unread,

Nov 8, 2008, 5:57:28 AM11/8/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:xceRk.18$9Z6...@newsfe01.iad...

>
> "Adem" <for-us...@alicewho.com> wrote in message
> news:gf3jd0$34v$1...@aioe.org...
>> Here are some small improvements adressing the points you mentioned:
>>
>>
>> struct TSThreadPar
>> {
> [...]
>> };
>>
>> void* MyThread(void* ApThreadPar)
>> {
> [...]
>>
>> and in main():
>>
>> #include <unistd.h> // usleep()
>> int main()
>> {
> [...]
>
> Need to read through that birds nest.
>
>
>
>
>> Another observation: since pointers are usually word aligned (unless
>> pragma pack was used)
>> then of the hash table only every sizeof(void*)'th items are effectively
>> used.
>> Ie. your table has 288 elements but only 1/4 (=72) of it get used in
>> 32-bit mode,
>> and only 36 in 64-bit mode.
>
> YES. The please read m responses... The ode qas quick and dirty! You can
> most certainly come up with a MUCH better hash algorithm. Spread the
> wealth indeed!

SHIT!!!! RETARD ME!!!!

The above should read as:

YES; please read _all_ of my responses... The Code Was quick and dirty! You
can most certainly come up with a MUCH better hash algorithm; spread the

Chris M. Thomasson

unread,

Nov 8, 2008, 5:58:43 AM11/8/08

to

> #include <unistd.h> // usleep()
> int main()
> {
> const int nThreadSets = 3;
> const int nThreadSetSize = 2; // 2 threads per set
> std::vector<TSThreadPar> aS(nThreadSets * nThreadSetSize);
> int i;
> for (i = 0; i < aS.size(); i += nThreadSetSize)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

One correction before I move on to actual work, `i' should be unsigned;
period.

Chris M. Thomasson

unread,

Nov 8, 2008, 6:13:44 AM11/8/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf3jd0$34v$1...@aioe.org...

> Here are some small improvements adressing the points you mentioned:

>[...]

Why do you ___SEEM___ to want to get into a pointless quarrel instead of
learning? I gave you a hash-lock scheme. This is NOTHING NEW. However, it
MIGHT possibly be able to help you solve your locking problem...

Please research hash locking, stripe locking, partition locking, ect....
please! Don't research RCU, I did not mean to mention that at this stage!
Sorry about that non-sense!!!!!!!

:^(

Chris M. Thomasson

unread,

Nov 8, 2008, 6:15:30 AM11/8/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:xceRk.18$9Z6...@newsfe01.iad...
>

> "Adem" <for-us...@alicewho.com> wrote in message
> news:gf3jd0$34v$1...@aioe.org...
>> Here are some small improvements adressing the points you mentioned:
>>
>>
>> struct TSThreadPar
>> {
> [...]
>> };
>>
>> void* MyThread(void* ApThreadPar)
>> {
> [...]
>>
>> and in main():
>>
>> #include <unistd.h> // usleep()
>> int main()
>> {
> [...]
>
> Need to read through that birds nest.

[...]

Birds nest in the sense that my retarded newsreader mangled your post!

;^(....

Chris M. Thomasson

unread,

Nov 8, 2008, 7:16:34 AM11/8/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:ger5d9$lu8$1...@aioe.org...

> Suppose there is a vector with 1+ million elements (each ~ 50 Bytes).
> The vector is used by many threads (~10) on a multi-core CPU.
> For adding and deleting items the vector must be write-locked,
> otherweise (for reading and modifying in-place) a read-lock
> must be used. A modification of an item must obviously be
> an atomic operation.
> Ie. for modifying an item one needs also a lock on that item.
> Now the question is: how to achive this.
> Should each item have a pthread_mutex_t embedded in it,
> or a sem_t, a pthread_cond_t, or maybe something else?
> Which alternatives are there?
> How would you solve this problem efficiently?

If you all can please find it within your hearts to actually cut through
some of the non-sense posts I made, perhaps one can augment the following
CRUDE/DIRTY example code I posted here:

http://groups.google.com/group/comp.programming.threads/msg/3216191ef3caa87b

to a point where its "production" ready. Adem noticed that the hash
algorithm is __100%__ CRAP! It definitely can be improved upon in every way
shape or from. Perhaps by multiplying by a couple of hundred and shifting
right by eight or so positions...

Adem, I am sorry for coming across as a complete as%hole in some of those
posts! I apologize.

Anyway, I really do think that some of the existing hash/partition locking
scheme(s) can help solve your problem.

Adem

unread,

Nov 8, 2008, 12:45:05 PM11/8/08

to

"Chris M. Thomasson" wrote:
> "Adem" wrote:

> > Suppose there is a vector with 1+ million elements (each ~ 50 Bytes).
> > The vector is used by many threads (~10) on a multi-core CPU.
> > For adding and deleting items the vector must be write-locked,
> > otherweise (for reading and modifying in-place) a read-lock
> > must be used. A modification of an item must obviously be
> > an atomic operation.
> > Ie. for modifying an item one needs also a lock on that item.
> > Now the question is: how to achive this.
> > Should each item have a pthread_mutex_t embedded in it,
> > or a sem_t, a pthread_cond_t, or maybe something else?
> > Which alternatives are there?
> > How would you solve this problem efficiently?
>
> If you all can please find it within your hearts to actually cut through
> some of the non-sense posts I made, perhaps one can augment the following
> CRUDE/DIRTY example code I posted here:
> http://groups.google.com/group/comp.programming.threads/msg/3216191ef3caa87b
>
> to a point where its "production" ready. Adem noticed that the hash
> algorithm is __100%__ CRAP!

No, no :-) I meant code from MS.
Regarding your code: although the hash algo has some limits it is still very effective and useful.

> It definitely can be improved upon in every way shape or from.
> Perhaps by multiplying by a couple of hundred and shifting
> right by eight or so positions...

> Adem, I am sorry for coming across as a complete as%hole in some of those
> posts! I apologize.

No, Chris, your code, and especially the two ideas with a) hashing and b) using
a set<> are clever and useful ideas; they gave me interessting inspirations.
I apologize for my inital comments, your code is ok. Thanks for sharing it.

> Anyway, I really do think that some of the existing hash/partition locking
> scheme(s) can help solve your problem.

Locking is a very interessting, challenging, and tricky problem.
I like the idea behind rwlocks, as next step I'll think about RCU.
The goal is of course to get the fastest locking method.

Adem

unread,

Nov 8, 2008, 8:35:21 PM11/8/08

to

Unfortunately I found a major bug in the algoritm/method:
Since hashing can lead to hash-collisions (ie. due to the use
of the modulo function then the same slot can get mapped
for 2 or more different adresses; I actually can even reproduce this)
then adding the result to the set<> of course will fail!

One can of course replace the set<> by a multiset<>.
But then the following happens:

[4156918672] ptr=0x804b750 ix=0
[4156918672] ptr=0x804b754 ix=4
[4156918672] ptr=0x804b758 ix=8
[4156918672] ptr=0x804b75c ix=2
[4156918672] ptr=0x804b760 ix=6
[4156918672] ptr=0x804b764 ix=0 <--- the dupe slot
lock ix=0 rc=0
lock ix=0 rc=35 <--- returned by pthread_rwlock_wrlock()
lock ix=2 rc=0
lock ix=4 rc=0
lock ix=6 rc=0
lock ix=8 rc=0

If one could initialize the mutices for recursive rw-locking
then it maybe could function.
But I'm not sure if recursive rwlocks are possible at all. Anyone know more?

But I must admit I no longer trust the use of hashing for this job (cf. subject)... :-)

Chris M. Thomasson

unread,

Nov 8, 2008, 9:04:29 PM11/8/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf5ero$857$1...@aioe.org...
[...]

AFAICT, its not a problem because pointers to the actual hashed rwlocks get
added to the set and a set does not allow duplicates. The fact that the set
removes duplicates makes hash collisions a mute point. Think about it for a
moment. One way to test this would be to add the same object to a locker
multiple times in order to force hash collisions:
______________________________________________________________
// snip lib code

struct foo {};

static foo g_foo1, g_foo2;

int main() {
rwmtxtbl::locker locks;

locks.add(&g_foo1);
locks.add(&g_foo2);
locks.add(&g_foo1);
locks.add(&g_foo2);
locks.add(&g_foo1);
locks.add(&g_foo2);
locks.add(&g_foo1);

{
rwmtxtbl::locker::wrguard lock(locks);
}

std::getchar();

return 0;
}
______________________________________________________________

AFAICT, this is perfectly legitimate and will cause no problems.

Chris M. Thomasson

unread,

Nov 8, 2008, 9:06:24 PM11/8/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:IwrRk.6856$Qb1....@newsfe19.iad...

Even if `&g_foo1' and `&g_foo2' hashed to the exact same rwlock, well, that
means they share the same lock. This is fine.

Adem

unread,

Nov 9, 2008, 12:19:05 AM11/9/08

to

"Chris M. Thomasson" wrote:
>
> Even if `&g_foo1' and `&g_foo2' hashed to the exact same rwlock, well, that
> means they share the same lock. This is fine.

Nope, Chris,
you must replace the hash macro by the following
"perfect hash function" :-) of mine:

template <typename T>
class TCPerfectDynamicHash
{
std::vector<T> v;

public:
TCPerfectDynamicHash() {}

size_t val2ix(T& Aval) // the hash function :-)
{
int ix = find(Aval);
if (ix >= 0)
return ix;
v.push_back(Aval);
return v.size() - 1;
}

int find(T& Aval)
{
for (int i = 0; i < v.size(); ++i)
if (v[i] == Aval)
return i;
return -1;
}

void clear()
{
v.vlear();
}
};
typedef TCPerfectDynamicHash<void const*> TCPtrHash;

Just study what it does... :-)

With this modification the mutex pool is used 'evenly' and
no dupes ever can happen as long as the number of the locked objects
per thread stays below the capacity of the mutex pool. Ie. in our example
we have 288 mutices in the pool, therefore each thread can lock
maximally 288 items at the same time.
You can change this number to any number you like (> 0 of course).

And: if you replace your set<> by multiset<> then the above limitation even disappears.
That means: you can then lock any number of items using the mutex-pool.
But I'm not finished yet testing this feature.

Chris M. Thomasson

unread,

Nov 9, 2008, 2:41:49 AM11/9/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf5rve$j5c$1...@aioe.org...

> "Chris M. Thomasson" wrote:
>>
>> Even if `&g_foo1' and `&g_foo2' hashed to the exact same rwlock, well,
>> that
>> means they share the same lock. This is fine.
>
> Nope, Chris,

If two objects hash to the same rwlock then you only have to take read/write
access one time and _both_ objects are protected. The set sorts and removes
duplicates to so you can get around recursion. I don't see why this is bad.
One rwlock serving more than one object is end effect of hash collision;
this is fine with me.

[...]

Chris M. Thomasson

unread,

Nov 9, 2008, 3:09:09 AM11/9/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:WswRk.6912$Qb1....@newsfe19.iad...

This would be called "false-contention". AFAICT, it does not effect
correctness, it only decreases overall locking granularity. The number of
hash collisions is proportional to the locking granularity level. Poor
locking granularity does not result in incorrect locking algorihtms.

So, basically it seems that your perfect hashing would have high
granularity, which can be a good thing wrt scalability and concurrency.

Adem

unread,

Nov 9, 2008, 3:50:33 AM11/9/08

to

"Chris M. Thomasson" wrote:
> "Adem" wrote:

> > "Chris M. Thomasson" wrote:
> >>
> >> Even if `&g_foo1' and `&g_foo2' hashed to the exact same rwlock, well,
> >> that means they share the same lock. This is fine.
> >
> > Nope, Chris,
>
> If two objects hash to the same rwlock then you only have to take read/write
> access one time and _both_ objects are protected.
> The set sorts and removes duplicates to so you can get around recursion.
> I don't see why this is bad.

Your set<> works thread-local, ie. each thread has its own set<>, isn't it?...
Ie. what you say above works so only within the same thread, isn't it?

> One rwlock serving more than one object is end effect of hash collision;
> this is fine with me.

But consider this scenario:

Thread1:
ix = hash(obj1), writelockobjvia(ix)
...work...
unlockobjvia(ix)

Thread2:
ix = hash(obj1), writelockobjvia(ix)
...work...
unlockobjvia(ix)

Or this scenario where let's say hash(obj1) == hash(obj2):

Thread1:
ix = hash(obj1), writelockobjvia(ix)
...work...
unlockobjvia(ix)

Thread2:
ix = hash(obj2), writelockobjvia(ix)
...work...
unlockobjvia(ix)

What do you think will happen in these cases? :-)

BTW, by default rdlocks seem to be recursive.
But, wrlocks seem to be non-recursive.
The net effect is: no problems with rdlocks, but big problems with wrlocks.

(If someone knows whether recursive write-locks are possible
in pthreads/NPTL let me know pls)

Chris M. Thomasson

unread,

Nov 9, 2008, 6:08:27 AM11/9/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf68fn$uup$1...@aioe.org...

Both cases are fine and deadlock free. Nothing bad will happen. The latter
case has false-contention between obj1 and obj2 because they share the same
lock. This is perfectly fine. BTW, both cases exibit absoultly no recursion.

> BTW, by default rdlocks seem to be recursive.
> But, wrlocks seem to be non-recursive.

The are compatible with the algorithm because it removes duplicates, and
therefore there is no recursion. Keep in mind that recursion is on a
per-thread basis. A thread has to take a lock more than once. The set
eliminates that possibility because every rwlock in a set is unique, so a
thread never tries to lock the same lock more than once.

> The net effect is: no problems with rdlocks, but big problems with
> wrlocks.

I don't think so.

> (If someone knows whether recursive write-locks are possible
> in pthreads/NPTL let me know pls)

Yeah. The eaist way would be to do something like:

struct rwlock {
unsigned recurse; /* = 0 */
pthread_mutex_t recurse_lock;
pthread_rwlock_t lock;
};

void rwlock_wrlock(struct rwlock* const self) {
pthread_mutex_lock(&self->recurse_lock);
if (! self->recurse) {
pthread_rdlock_wrlock(&self->lock);
}
++self->recurse;
pthread_mutex_unlock(&self->recurse_lock);
}

void rwlock_wrunlock(struct rwlock* const self) {
pthread_mutex_lock(&self->recurse_lock);
--self->recurse;
if (! self->recurse) {
pthread_rdlock_unlock(&self->lock);
}
pthread_mutex_unlock(&self->recurse_lock);
}

void rwlock_rdlock(struct rwlock* const self) {
pthread_rdlock_wrlock(&self->lock);
}

void rwlock_unlock(struct rwlock* const self) {
pthread_rdlock_unlock(&self->lock);
}

This is 100% POSIX compliant.

Chris M. Thomasson

unread,

Nov 9, 2008, 6:10:06 AM11/9/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:EuzRk.387$Yu5...@newsfe24.iad...

>
> "Adem" <for-us...@alicewho.com> wrote in message
> news:gf68fn$uup$1...@aioe.org...

[...]

>> (If someone knows whether recursive write-locks are possible
>> in pthreads/NPTL let me know pls)
>
> Yeah. The eaist way would be to do something like:
>
>
> struct rwlock {
> unsigned recurse; /* = 0 */
> pthread_mutex_t recurse_lock;
> pthread_rwlock_t lock;
> };
>
>
> void rwlock_wrlock(struct rwlock* const self) {
> pthread_mutex_lock(&self->recurse_lock);
> if (! self->recurse) {
> pthread_rdlock_wrlock(&self->lock);
> }
> ++self->recurse;
> pthread_mutex_unlock(&self->recurse_lock);
> }
>
>
> void rwlock_wrunlock(struct rwlock* const self) {
> pthread_mutex_lock(&self->recurse_lock);
> --self->recurse;
> if (! self->recurse) {
> pthread_rdlock_unlock(&self->lock);
> }
> pthread_mutex_unlock(&self->recurse_lock);
> }
>
>
> void rwlock_rdlock(struct rwlock* const self) {
> pthread_rdlock_wrlock(&self->lock);

^^^^^^^^^^^^^^^^^^^^

should read as:

pthread_rdlock_rdlock(&self->lock);

of course!

Chris M. Thomasson

unread,

Nov 9, 2008, 6:38:28 AM11/9/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:yofRk.310$M5....@newsfe01.iad...

>
> "Adem" <for-us...@alicewho.com> wrote in message
> news:ger5d9$lu8$1...@aioe.org...
>> Suppose there is a vector with 1+ million elements (each ~ 50 Bytes).
>> The vector is used by many threads (~10) on a multi-core CPU.
>> For adding and deleting items the vector must be write-locked,
>> otherweise (for reading and modifying in-place) a read-lock
>> must be used. A modification of an item must obviously be
>> an atomic operation.
>> Ie. for modifying an item one needs also a lock on that item.
>> Now the question is: how to achive this.
>> Should each item have a pthread_mutex_t embedded in it,
>> or a sem_t, a pthread_cond_t, or maybe something else?
>> Which alternatives are there?
>> How would you solve this problem efficiently?
>
> If you all can please find it within your hearts to actually cut through
> some of the non-sense posts I made, perhaps one can augment the following
> CRUDE/DIRTY example code I posted here:
>
>
> http://groups.google.com/group/comp.programming.threads/msg/3216191ef3caa87b
>
>
> to a point where its "production" ready. Adem noticed that the hash
> algorithm is __100%__ CRAP!

Perhaps something like:

#define HASH_PTR(mp_ptr, mp_depth) ( \
((((std::size_t)(mp_ptr)) * 309) >> 8) % (mp_depth) \
)

That might be better.

Chris M. Thomasson

unread,

Nov 9, 2008, 7:25:01 AM11/9/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf303b$tnq$1...@aioe.org...

> "Chris M. Thomasson" wrote:
>> "Adem" wrote:
>> > "Chris M. Thomasson" wrote:
>> [...]
>> > Hi Chris,
>> > can you comment the hash stuff because I don't understand
>> > why you use an array of 288 mutices in detail::table.
>>
>> By taking advantage of a hash locking scheme one can amortize the number
>> of
>> locks needed for any size collection. You can have a million entries, but
>> you will still only have 288 mutexs. It saves you from having to store a
>> mutex in each entry. To lock/unlock an entry you simple hash its address
>> to
>> a value which is then used as an index into the mutex array. You can also
>> use the scheme to implement lock striping algorithms.
>>
>> Also, hash locking has an advantage of keeping mutexs separate from the
>> objects they protect. This can be very useful wrt destroying objects
>> because
>> sometimes you find yourself needing to atomically unlock and destroy an
>> object.
>
> Nice, but can you tell me why the following test code crashes with a SEGV:
>

[...]

There is another problem with your code:

> void* MyThread1(void* Ap)
> {
[...]
>
> void* MyThread2(void* Ap)
[...]

should be extern "C"

Chris M. Thomasson

unread,

Nov 9, 2008, 8:24:48 AM11/9/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:EuzRk.387$Yu5...@newsfe24.iad...

here is "proof" that there is nothing wrong:
_________________________________________________________________
/* Hash-Lock Library Code (tweaked)
______________________________________________________________*/
#define _XOPEN_SOURCE 600
#include <set>
#include <cstdio>
#include <cstddef>
#include <pthread.h>

#define HASH_PTR(mp_ptr, mp_depth) ( \
((((std::size_t)(mp_ptr)) * 309) >> 8) & ((mp_depth) - 1) \
)

#define ARRAY_SIZE(mp_this) ( \
sizeof(mp_this) / sizeof((mp_this)[0]) \
)

#define RWMUTEX_PLACE_2 \
PTHREAD_RWLOCK_INITIALIZER, PTHREAD_RWLOCK_INITIALIZER

#define RWMUTEX_PLACE_6 \
RWMUTEX_PLACE_2, RWMUTEX_PLACE_2, RWMUTEX_PLACE_2

#define RWMUTEX_PLACE_48 \
RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, \
RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, RWMUTEX_PLACE_6, RWMUTEX_PLACE_6

#define RWMUTEX_PLACE_288 \
RWMUTEX_PLACE_48, RWMUTEX_PLACE_48, RWMUTEX_PLACE_48, \
RWMUTEX_PLACE_48, RWMUTEX_PLACE_48, RWMUTEX_PLACE_48

template<typename T>
class wrlock_guard {
T& m_mutex;

public:
wrlock_guard(T& mutex) : m_mutex(mutex) {
m_mutex.wrlock();
}

~wrlock_guard() {
m_mutex.unlock();
}
};

template<typename T>
class rdlock_guard {
T& m_mutex;

public:
rdlock_guard(T& mutex) : m_mutex(mutex) {
m_mutex.rdlock();
}

~rdlock_guard() {
m_mutex.unlock();
}
};

namespace rwmtxtbl {
namespace detail {
#define RWMUTEX_PLACE RWMUTEX_PLACE_2

static pthread_rwlock_t table[] = {
RWMUTEX_PLACE
};
}

class locker {
std::set<pthread_rwlock_t*> m_locks;

public:
typedef wrlock_guard<locker> wrguard;
typedef rdlock_guard<locker> rdguard;

void add(void const* ptr) {
m_locks.insert(&detail::table[HASH_PTR(ptr,
ARRAY_SIZE(detail::table))]);
}

void reset() {
m_locks.clear();
}

void wrlock() {
std::set<pthread_rwlock_t*>::iterator i;
for (i = m_locks.begin(); i != m_locks.end(); ++i) {
pthread_rwlock_wrlock(*i);
std::printf("write locked rdmutex %p\n", (void*)*i);
}
}

void rdlock() {
std::set<pthread_rwlock_t*>::iterator i;
for (i = m_locks.begin(); i != m_locks.end(); ++i) {
pthread_rwlock_rdlock(*i);
std::printf("read locked rdmutex %p\n", (void*)*i);
}
}

void unlock() {
std::set<pthread_rwlock_t*>::reverse_iterator i;
for (i = m_locks.rbegin(); i != m_locks.rend(); ++i) {
pthread_rwlock_unlock(*i);
std::printf("unlocked rdmutex %p\n", (void*)*i);
}
}
};
}

/* Application Code
______________________________________________________________*/
#define WIN32_LEAN_AND_MEAN

#include <windows.h> // for Sleep(1000)

#include <cassert>
#include <sched.h>

#if defined(WIN32) || defined(WIN64)
# define SLEEP(mp_seconds) Sleep((mp_seconds) * 1000)
#else
# define SLEEP(mp_seconds) sleep((mp_seconds))
#endif

#define RUNS 2U
#define READERS 4U
#define WRITERS 2U
#define THREADS (READERS + WRITERS)
#define ITERATIONS 128U
#define ITERATION_YIELD 2U
#define LOCK_YIELD 32U

struct foo {
unsigned i;
foo() : i(0) {}
};

static foo g_foo1_1, g_foo1_2, g_foo1_3, g_foo1_4;
static foo g_foo2_1, g_foo2_2, g_foo2_3, g_foo2_4;

static void mutate() {
++g_foo1_1.i;
++g_foo1_2.i;
++g_foo1_3.i;
++g_foo1_4.i;
++g_foo2_1.i;
++g_foo2_2.i;
++g_foo2_3.i;
++g_foo2_4.i;
}

static void verify() {
if (g_foo1_1.i % 2 || g_foo1_1.i != g_foo1_2.i ||
g_foo1_2.i % 2 || g_foo1_2.i != g_foo1_3.i ||
g_foo1_3.i % 2 || g_foo1_3.i != g_foo1_3.i ||
g_foo1_4.i % 2 ||
g_foo2_1.i % 2 || g_foo1_1.i != g_foo2_1.i ||
g_foo2_2.i % 2 || g_foo1_2.i != g_foo2_2.i ||
g_foo2_3.i % 2 || g_foo1_3.i != g_foo2_3.i ||
g_foo2_4.i % 2 || g_foo1_4.i != g_foo2_4.i) {
assert(! (g_foo1_1.i % 2) && g_foo1_1.i == g_foo1_2.i);
assert(! (g_foo1_2.i % 2) && g_foo1_2.i == g_foo1_3.i);
assert(! (g_foo1_3.i % 2) && g_foo1_3.i == g_foo1_4.i);
assert(! (g_foo1_4.i % 2));
assert(! (g_foo2_1.i % 2) && g_foo1_1.i == g_foo2_1.i);
assert(! (g_foo2_2.i % 2) && g_foo1_2.i == g_foo2_2.i);
assert(! (g_foo2_3.i % 2) && g_foo1_3.i == g_foo2_3.i);
assert(! (g_foo2_4.i % 2) && g_foo1_4.i == g_foo2_4.i);
std::unexpected();
}
}

extern "C" void* thread_entry(void* state) {
unsigned const id = (unsigned)state;
rwmtxtbl::locker locks;

locks.add(&g_foo1_1);
locks.add(&g_foo2_2);
locks.add(&g_foo1_3);
locks.add(&g_foo2_4);
locks.add(&g_foo1_2);
locks.add(&g_foo2_1);
locks.add(&g_foo1_4);
locks.add(&g_foo2_3);

for (unsigned i = 0, y = 1; i < ITERATIONS; ++i, ++y) {
if (id < READERS) {
{
rwmtxtbl::locker::rdguard rdlock(locks);
std::printf("(%u)->thread_entry: read locked\n", id);
verify();
if (y <= LOCK_YIELD) {
sched_yield();
} else {
y = 0;
SLEEP(1);
}
verify();
}
std::printf("(%u)->thread_entry: read unlocked\n", id);
} else {
{
rwmtxtbl::locker::wrguard wrlock(locks);
std::printf("(%u)->thread_entry: write locked\n", id);
mutate();
if (y <= LOCK_YIELD) {
sched_yield();
} else {
y = (unsigned)-1;
SLEEP(1);
}
mutate();
}
std::printf("(%u)->thread_entry: write unlocked\n", id);
}
if (! (i % ITERATION_YIELD)) {
sched_yield();
}
}

return 0;
}

int main() {
pthread_t tid[THREADS];

for (unsigned r = 0; r < RUNS; ++r) {
for (unsigned i = 0; i < THREADS; ++i) {
pthread_create(&tid[i], NULL, thread_entry, (void*)i);
}

for (unsigned i = 0; i < THREADS; ++i) {
pthread_join(tid[i], NULL);
}
}

std::puts("\n\n\n__________________\nhit <ENTER> to exit...");

std::fflush(stdout);
std::getchar();
return 0;
}
_________________________________________________________________

Notice how the `RWMUTEX_PLACE' macro is defined to `RWMUTEX_PLACE_2'? This
has the effect of making the rwlock table only hold 2 locks. This will
definitely create hash collisions. However, as you can see, there is no
deadlocks. You will only need to acquire read/write access on one or two
locks in order to protect all of the 8 `foo' objects. This is fine!

Any thoughts?

Adem

unread,

Nov 9, 2008, 9:08:35 AM11/9/08

to

"Chris M. Thomasson" wrote:
> "Chris M. Thomasson" wrote:
> > "Adem" wrote:
> >> "Chris M. Thomasson" wrote:
> >>> "Adem" wrote:
> >>> > "Chris M. Thomasson" wrote:
> >>> >>
> >>> >> Even if `&g_foo1' and `&g_foo2' hashed to the exact same rwlock,
> >>> >> well, that means they share the same lock. This is fine.
> >>> >
> >>> > Nope, Chris,
> >>>
> >>> If two objects hash to the same rwlock then you only have to take
> >>> read/write
> >>> access one time and _both_ objects are protected.
> >>> The set sorts and removes duplicates to so you can get around recursion.
> >>> I don't see why this is bad.
> >>
> >> Your set<> works thread-local, ie. each thread has its own set<>, isn't
> >> it?...
> >> Ie. what you say above works so only within the same thread, isn't it?
> >>
> >>> One rwlock serving more than one object is end effect of hash collision;
> >>> this is fine with me.
>

> here is "proof" that there is nothing wrong:

Come on! You MUST take a multiset instead of a set!
Just imagine what happens in case of a hash collision (ie. dupe) when you use a set...
Ergo: you must use a multiset.

BTW, you can get rid of the funny macros if you use the following mutex pool class:

class TCMutexPool
{
public:
std::vector<pthread_rwlock_t> vMtx;

TCMutexPool(size_t AuSize = 288)
{
if (AuSize < 1)
AuSize = 1;
vMtx.resize(AuSize);
pthread_rwlock_t initializer = PTHREAD_RWLOCK_INITIALIZER;
for (size_t i = 0; i < AuSize; ++i)
vMtx[i] = initializer;
}

~TCMutexPool()
{
for (int i = vMtx.size() - 1; i >= 0; --i)
pthread_rwlock_destroy(&vMtx[i]);
}
};

...

> Notice how the `RWMUTEX_PLACE' macro is defined to `RWMUTEX_PLACE_2'? This
> has the effect of making the rwlock table only hold 2 locks. This will
> definitely create hash collisions. However, as you can see, there is no
> deadlocks. You will only need to acquire read/write access on one or two
> locks in order to protect all of the 8 `foo' objects. This is fine!
>
> Any thoughts?

What about my 1 million container objects to manage? :-)
Are you going to manage this now with only 2 mutices at all? :-)

Chris M. Thomasson

unread,

Nov 9, 2008, 9:42:06 AM11/9/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf6qv2$g6o$1...@aioe.org...

I don't need a multiset. A set is already perfectly fine because it
automatically removes duplicates, this is the behavior I am looking for what
I need...

> BTW, you can get rid of the funny macros if you use the following mutex
> pool class:
>

[...]

I want statically created mutexs, not dynamic.

[...]

>> /* Application Code
> ...
>> Notice how the `RWMUTEX_PLACE' macro is defined to `RWMUTEX_PLACE_2'?
>> This
>> has the effect of making the rwlock table only hold 2 locks. This will
>> definitely create hash collisions. However, as you can see, there is no
>> deadlocks. You will only need to acquire read/write access on one or two
>> locks in order to protect all of the 8 `foo' objects. This is fine!
>>
>> Any thoughts?
>
> What about my 1 million container objects to manage? :-)
> Are you going to manage this now with only 2 mutices at all? :-)

You can manage 1 million objects with 2 mutexs perfectly fine. However. keep
in mind that you can set `RWMUTEX_PLACE' to whatever you want. This was an
example which forces hash-collisions. The locking scheme is correct even
when hash-collisions occur. As for 1 million objects, well:

#define RWMUTEX_PLACE \
RWMUTEX_PLACE_288, RWMUTEX_PLACE_288, RWMUTEX_PLACE_288,
RWMUTEX_PLACE_288, RWMUTEX_PLACE_288, RWMUTEX_PLACE_288

;^D

Chris M. Thomasson

unread,

Nov 9, 2008, 9:53:55 AM11/9/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf6qv2$g6o$1...@aioe.org...
[...]

> BTW, you can get rid of the funny macros if you use the following mutex
> pool class:
>
> class TCMutexPool
> {
> public:
> std::vector<pthread_rwlock_t> vMtx;
>
> TCMutexPool(size_t AuSize = 288)
> {
> if (AuSize < 1)
> AuSize = 1;
> vMtx.resize(AuSize);
> pthread_rwlock_t initializer = PTHREAD_RWLOCK_INITIALIZER;
> for (size_t i = 0; i < AuSize; ++i)
> vMtx[i] = initializer;

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is totally busted. You cannot assign a `pthread_rwlock_t'. You can
dynamically initialize, or static initialize. thats it.

> }
>
> ~TCMutexPool()
> {
> for (int i = vMtx.size() - 1; i >= 0; --i)
> pthread_rwlock_destroy(&vMtx[i]);
> }
> };

[...]

Chris M. Thomasson

unread,

Nov 9, 2008, 9:59:52 AM11/9/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:XCCRk.617$e5....@newsfe01.iad...

[...]

Please keep in mind that I am storing a pointer to the hashed lock in the
set, not pointers to objects.

Chris M. Thomasson

unread,

Nov 9, 2008, 9:11:36 PM11/9/08

to

"Adem" <for-us...@alicewho.com> wrote in message

news:gf5ero$857$1...@aioe.org...

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This cannot happen with the current code because I store hashed values in
the set. So, for all of those object pointer values, the lock set would look
like:

ix=0
ix=2
ix=4
ix=6
ix=8

Chris M. Thomasson

unread,

Nov 10, 2008, 1:38:43 AM11/10/08

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:kJMRk.2239$6s5...@newsfe01.iad...

WRT this example, the objects whose addresses are (0x804b750) and
(0x804b764) simply share the same rwlock within the global rwlock table that
happens to reside at index (0); this is fine. Its false-contention after
all, but its still correct behavior.

The lock set stores hashed values and removes duplicates, so there is only
one index (0) in there no matter how many object address hash and collide
into index (0). It ultimately takes read/write access at index 0 _ONE_ time
and automatically and atomically protects objects at addrs (0x804b750) and
(0x804b764), and all others with addrs that hash into index (0), in a single
operation...

This can be used for lock-based STM.