I am in a unlucky situation of trying to fix a multithreaded
application that has many threads that insert/update the contents of a
C++ STL std::map. I also have a reader thread that either iterates
through the entire map or else reads an entry out of it.
My problem is compounded by the fact the said map, after the elements
are inserted for the first time, is constantly (almost non-stop)
updated through out the life-time of the application. In this scenario
I am finding it hard to come up with a mechanism to make reads more
efficient (because reader threads have to be kept at bay when a write
is happening and writes pretty much happen all the time!)
I am looking for ideas on how to make these operations thread safe.
Any help will be greatly appreciated...
This would give to a starting point to experiment with locking and
access schemes.
--
Ian Collins.
Use caution when deriving a class from an STL container. Since the STL
doesn't include virtual destructors you may inadvertently introduce a
memory leak if you use the wrapper class incorrectly.
Ian suggested to use the STL container as a *private* base class - this
means that the possibility to form a pointer to base class (and later
delete it) is reduced. In fact, the code would need to be severely
broken to allow this possibility.
The other option is to keep the STL container as a member and actually
there is nothing that makes derivation a better solution here.
Still, just wrapping each method *separately* will not necessarily make
the program thread safe, because thread safety cannot be guaranteed
without taking into account the logical granularity of operations that
are performed on any given data structure. It might help, but it might
as well be just solving the wrong problem.
--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/
Dilip schrieb:
> I am in a unlucky situation of trying to fix a multithreaded
> application that has many threads that insert/update the contents of a
> C++ STL std::map. I also have a reader thread that either iterates
> through the entire map or else reads an entry out of it.
in addition to the tips alread posted:
The insert, update and element lookup operations are quite easy to
handle, as long as the calling code does not make assumptions like
>> if map[key] does not exist insert(key, value) <<
The insert may fail because of a duplicate key when the map is not
locked in between.
> My problem is compounded by the fact the said map, after the elements
> are inserted for the first time, is constantly (almost non-stop)
> updated through out the life-time of the application. In this scenario
> I am finding it hard to come up with a mechanism to make reads more
> efficient (because reader threads have to be kept at bay when a write
> is happening and writes pretty much happen all the time!)
Single writes and reads are quite fast. Simply lock the entire map for
this time.
The more complicated part is the iteration.
Either you lock the map during the entire iteration. Since the iteration
is not as fast as read/write and contains code outside the scope of the
thread safe map this may block other threads for an unacceptable period.
The other way is to use special, thread-safe iteraters, which always
caches the current element's key and value. Key and value of the next or
previous element are read and cached while the map is locked using the
upper_bound lower_bound functions. This turns the complexity of ++ from
O(1) into O(log n). Furthermore the iterator may return keys and values,
that are no longer in the list.
Marcel
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
You'll have to use locks to protect all your accesses to the map. For
iteration this means holding the lock for the duration of the iteration.
Lock-free won't be an option since all lock-free techniques are
propietary and you'll have to pay for any implementation you want
to use. I didn't see a map in the Nobel lock-free library. The
nearest thing they have is a dictionary. I don't know what's in
Intel's lock-free library except it probably only works on Intel
processors.
For data structures with reasonable lock-free semantics, there's no
reason they can't be made lock-free. Lists, queues, stacks, hashmap,
and trees can be made lock-free. For money of course. Lock-free isn't
free. :)
You could also make a map using a lock-free skip-list. At least one
implementation was done as some university student's PhD (Hamilton
Ontario, I think). Which may or may not be proprietary, depends a lot
on the university.
Tony
> You could also make a map using a lock-free skip-list. At least one
> implementation was done as some university student's PhD (Hamilton
> Ontario, I think). Which may or may not be proprietary, depends a lot
> on the university.
I found a description of a lock-free skip list in this PhD thesis:
http://www.cs.chalmers.se/~phs/phd.pdf
I didn't read it, just scanned the table of contents essentially
--
Ben Pfaff
email: b...@cs.stanford.edu
web: http://benpfaff.org
I'm surprised nobody mentioned the standard trick for this case.
First, develop a hashing function that hashes your key value to a fixed
range, say 0-255 or 0-136. Make sure the function you use evenly
distributes the keys you need to hash. Mod is fine if they're
sequential.
Second, create an array of locks and an array of maps, one for each
possible value of the hashing function.
To add a value to a map, compute the hash value and take the
corresponding lock, then add it to the corresponding map. To locate a
value, compute the hash value and take the corresponding lock, then
search the coresponding map.
To traverse, loop through all hash values. For each hash value, take
the corresponding lock, traverse the corresponding hash, release the
lock, and iterate.
This will permit concurrency until two or more threads collide on a
hash. Using 256 maps or 136 maps or even 31 should make this
negligible. The downside is increased cache footprint. So make sure you
really need the concurrency.
DS
There are "some not so well known caveat(s)" that come along with this
technique. You can deadlock really quick if your not careful!
http://groups.google.com/group/comp.programming.threads/msg/56406d367ac85dcb
:O Crap!
Here is my generic solution to the nasty problem:
Basically, it boils down to lock-based STM via. two-phase locking.
Its my lock-based answer to the following problem:
Any thoughts?
David
That sounds really interesting. (taken with that caveat from Chris)
No matter what I try in my current implementation I am constantly hurt
by the need to lock the map whenever anyone wants to do a look up. We
have only one mega uber-map that stores everything. Maybe I can use
similiar algorithm and split it into maybe a dozen maps... The problem
with the hashing approach I don't know how to generate even
distribution... my map can literally contain a couple of _million_
entries! Collisions will then become the norm :-(
Have you already considered using a database to hold the data? E.g.
PostgreSQL is freely available and can handle concurrent read and write
requests. If you have enough RAM that the whole table can be kept in
memory, performance may be good enough for you.
just a thought...
Rupert
> David
>
> That sounds really interesting. (taken with that caveat from Chris)
>
> No matter what I try in my current implementation I am constantly hurt
> by the need to lock the map whenever anyone wants to do a look up. We
> have only one mega uber-map that stores everything. Maybe I can use
> similiar algorithm and split it into maybe a dozen maps... The problem
> with the hashing approach I don't know how to generate even
> distribution... my map can literally contain a couple of _million_
> entries! Collisions will then become the norm :-(
The number of entries has almost no effect on the number of collisions.
Only the number of threads and the number of hash values affect the
number of collisions.
If ten threads are looking in ten places for ten out of a hundred
things, there will be no more collisions than if ten threads are
looking in ten places for ten out of a billion things. In each case,
each of the ten threads wants one of the ten places, and if two threads
want the same place, they'll collide.
DS
postgres uses a technique called "multiversion concurrency" which imho
is similar to read-copy-update (table entries are not changed, but a new
version is written for every change. Old versions stay around until all
transactions that use them have finished.)
Of course there is some locking, but there are locks at different levels
(whole database, whole tables, columns, ...) and the database engine
tries very hard to lock at the right level. Deadlocks are detected
automatically.
Postgres data structures have been developed for this kind of usage by
very competent people over a long time, I think will will be very
difficult to do better.
There is some overhead, and probably the database installation must be
tuned for optimal performance. On the other hand, you get persistence
for free.
cheers,
Rupert
Actually, possible approaches will highly depend upon what kind of
semantics you are looking for.
The things are:
[1] Do you want new elements to become visible as soon as they are
inserted? (delayed insert)
[2] Do you mind a slight hit in performance hit while searching?
(non-balanced trees)
[3] Are some elements searched for very often? (splay trees)
[4] Do you need persistance? (databases)
[5] Do inserts/updates outweigh the number of lookups, or vice-versa or
are both the same? (depends on the answer :P)
Once you answer these questions, you will probably be able to find a
solution yourself, or if people here know the answers, they would be
able to answer you better.
HTH,
-Dhruv.
RCU is lock-free. I doubt the implementation for postgres is.
>
> Of course there is some locking, but there are locks at different levels
> (whole database, whole tables, columns, ...) and the database engine
> tries very hard to lock at the right level. Deadlocks are detected
> automatically.
>
> Postgres data structures have been developed for this kind of usage by
> very competent people over a long time, I think will will be very
> difficult to do better.
>
Look, this is a nonsensical argument. Putting a hashtable inside a
database doesn't make the problems of properly synchronizing access
to the hashtable go away. What techniques for reducing synchronization
overhead are only available for use by database implementations?
Nobody claimed it is lockfree. However, I do claim it has much lower
lock contention than most ad-hoc datastructures. Low enough for quite a
lot of concurrent readers and writers. And chances are, it has been
tested orders of magnitude more thoroughly than anything most of us will
code in their whole career :-)
>>
>> Of course there is some locking, but there are locks at different
>> levels (whole database, whole tables, columns, ...) and the database
>> engine tries very hard to lock at the right level. Deadlocks are
>> detected automatically.
>>
>> Postgres data structures have been developed for this kind of usage by
>> very competent people over a long time, I think will will be very
>> difficult to do better.
>>
> Look, this is a nonsensical argument. Putting a hashtable inside a
> database doesn't make the problems of properly synchronizing access
> to the hashtable go away. What techniques for reducing synchronization
> overhead are only available for use by database implementations?
>
>
I would phrase it differently: what techniques for reducing
synchronization overhead cannot reasonably be implemented from scratch
as a part of an average software project? My answer: probably all but
the simplest.
Putting it the other way, what techniques for reducing synchronization
are _not_ available for use by database implementations?
btw, the OP stated he did not even try the hashing approach, so how do
you conclude that "properly synchronizing access to the hashtable" is
the bottleneck?
Rupert
Yes, but they're all available for c.p.t.
>
> Putting it the other way, what techniques for reducing synchronization
> are _not_ available for use by database implementations?
All the ones they don't know about.
>
> btw, the OP stated he did not even try the hashing approach, so how do
> you conclude that "properly synchronizing access to the hashtable" is
> the bottleneck?
>
This issue has been previously discussed in c.p.t. It's the reader/writer
problem with a large collection of objects. You can use per object locking
or locked reference counting at the object level, but you have to safely
access the object first. So big giant global mutex or rwlock for the collection.
If you have contention problems with the collection lock, you're pretty much
out of luck unless you go to lock-free which isn't an option at this point.
I have a hashed read/write lock solution but it's not very efficient for writes.
The only practical solution is to use a static hashtable (no resizing of hashtable
allowed) with per chain locking. You can use further hashing to reduce the number
of locks to you don't need one per chain.
Trees aren't an option. I don't think there's any lock-free trees out there and
I haven't a complete lock-free implementation of my own since I have yet to work
out the rebalancing heuristics, just lock-free insert, delete, and rotate.
:-)
On the other hand, they may know some other techniques we don't know
about...
>>
>> btw, the OP stated he did not even try the hashing approach, so how do
>> you conclude that "properly synchronizing access to the hashtable" is
>> the bottleneck?
>>
let me elaborate: without profiling the app we simply do not know. I
agree that there is a scalability problem with locking, but maybe the
performance is already limited by other factors:
- inefficient locking primitive (e.g mutex vs. critical section on a
single-cpu windows variant)
- cache thrashing when iterating over millions of entries
- false sharing of cache lines
- expensive key comparison (e.g. std::strings vs. atoms)
> This issue has been previously discussed in c.p.t. It's the reader/writer
> problem with a large collection of objects. You can use per object locking
> or locked reference counting at the object level, but you have to safely
> access the object first. So big giant global mutex or rwlock for the
> collection.
If you are intersted how postgres handles this, here is an overview:
http://forge.objectweb.org/docman/view.php/237/132/mvcc-survey.pdf
I don't know if this approach fits your definition of lock-free, but I
would be interested.
> If you have contention problems with the collection lock, you're pretty
> much
> out of luck unless you go to lock-free which isn't an option at this point.
>
> I have a hashed read/write lock solution but it's not very efficient for
> writes.
>
> The only practical solution is to use a static hashtable (no resizing of
> hashtable
> allowed) with per chain locking. You can use further hashing to reduce
> the number
> of locks to you don't need one per chain.
>
> Trees aren't an option. I don't think there's any lock-free trees out
> there and
> I haven't a complete lock-free implementation of my own since I have yet
> to work
> out the rebalancing heuristics, just lock-free insert, delete, and rotate.
>
>
There is a lot of literature on concurrency control for B-tree variants
called B-link trees, so at least there are some people that see it as an
option :-)
Rupert
I mean lock-free with respect to the internal implemenation. Since
databases are typically accessed via relatively slow network connections,
they're concerned with avoiding the use of range or table locking since
it can potentially block other clients and slow things down in general.
Hence the versioning and transaction stuff to avoid that. Reading (iteration)
is possibly lock-free depending on the GC used. It's not mentioned which
form of GC is used so it's a big question mark. If you doing indexed
reads then you need a lock on the index data structure.
When you're doing lock-free, the form of GC that is used, and the
atomic reads and write with release and/or acquire semantics are
important. They don't mention those details so it's impossible to
tell whether they know what they are doing. Plus I'm not sure
anyone knows how to do atomicity of write transactions right. I've
looked at that problem and there are some subtle issues.
I'm waiting for one of the in memory OLAP database vendors to catch onto
lock-free. When they do, they'll probably get several orders of magnitude
improvement in performance, and they'll get rich since Wall Street wants really
fast db for their quant stuff. And the other OLAP vendors who didn't catch
on in time will be hurting.
No garbage collection is done by transactions, there is a dedicated
process for that.
as for the index locking, per default postgres uses btrees (the
lehman-yao variant). hash tables are also available. The developers
claim that using btree gives better performance.
> When you're doing lock-free, the form of GC that is used, and the
> atomic reads and write with release and/or acquire semantics are
> important. They don't mention those details so it's impossible to
> tell whether they know what they are doing. Plus I'm not sure
> anyone knows how to do atomicity of write transactions right. I've
> looked at that problem and there are some subtle issues.
>
> I'm waiting for one of the in memory OLAP database vendors to catch onto
> lock-free. When they do, they'll probably get several orders of magnitude
> improvement in performance, and they'll get rich since Wall Street wants
> really
> fast db for their quant stuff. And the other OLAP vendors who didn't catch
> on in time will be hurting.
>
and this is the moment they will face multi-zillion lawsuits for
violating some patents on CAS-based bubblesort, storing utf8-encoded
strings in dynamically allocated memory, using an extensible markup
language to store database configuration files, ...
Rupert
You may consider RCU (read/Copy/Update) mechanisms :
http://en.wikipedia.org/wiki/Read-copy-update
"Read-copy-update (RCU) is an operating system kernel technology which,
loosely speaking, can be thought of as a reader-writer lock with
extremely low overhead, wait-free reads..."
It's not a reader-writer lock. This "loosely speaking" is so loose as to
be misleading. A reader-writer lock provides mutual exclusion between
readers and writers; RCU does not.
--
David Hopwood <david.nosp...@blueyonder.co.uk>
I have changed it to:
'''Read-copy-update''' (RCU) is an _operating system_ _synchronization_
mechanism which can sometimes be used as an alternative to a _reader-writer lock_.
It allows extremely low overhead, _wait-free_ reads. [...]
--
David Hopwood <david.nosp...@blueyonder.co.uk>