How to achieve structure level locking in STL containers in C++

grbgooglefan

unread,

Dec 5, 2007, 12:45:02 AM12/5/07

to

My application uses the caching heavily to store the data from
databases & also the runtime orders information.
All these caches are built on STL hash map, vectors & maps and data
format is the structures.
There are multiple threads accessing these caches simultaneously for
reading the data as well as updating the data.
Whenever any thread accesses the cache, it locks that cache & finds
the required element. Does the actions required & unlocks the cache.
I am finding that this is causing the other threads to wait for longer
time because the locking is at cache level.
I would like to make this locking quite finer & granular, in such a
way that only the single structure which is to be updated is locked.
This is somewhat same as the row level locking in databases.

I am using the hash map, etc from default available STL library on
Linux (libstdc++). I am using them as memory cache for server
application.
Most of these caches are built using the string as key and the
structure as data. Flow of functions happens somewhat as below in the
application:
Thread 1 gets data from database using select query. Populates that
data in the structures & pushes that structure on the cache.
Thread 2 then picks up that structure & uses it for getting other live
real time data from other service. Once that real time data is
available, this Thread 2 keeps on updating that data in the structures
in the cache.
There Thread 3 which reads the data from this cache & uses for
processing the requests from the client application.

So all types of data structure operations - insertions, erasing,
updating, reading are happening on this cache with the strctures.
The cache holds more than 60000 data elements (structures) & searching
for each required structure takes long time. So when one thread is
trying to get the required structure, others have to wait & cannot do
any productive work. Because they also need data from the same cache
(even though for different type of actions).

How do we achieve this level of granular locking?
Thoughts please.

Laurent Deniau

unread,

Dec 5, 2007, 6:08:00 AM12/5/07

to

Why don't you use a RW locking policy (many simultaneous readers, one
writer)? BTW, I don't find 60000 data elements being so large, so
maybe the bottleneck is somewhere else (DB query?).

a+, ld.

Joe Seigh

unread,

Dec 5, 2007, 6:12:00 AM12/5/07

to

You'd need to describe the cache structure in a little more detail as
the solution is likely to be data structure dependent or at least
influenced by it. You say the searching takes a long time so it
probably is something sequential. Are the searches for a single
item or multiple items like a an sql query?

If the latter, there are commercial in memory OLAP databases designed
for exactly this kind of stuff. There's even a streaming based one.

You might be able to come up with a quasi lock-free solution involving
the "thread-safe" version of shared_ptr but you really should be familiar
with the lock-free design patterns which affect the semantics of the
operations, not just the performance.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

GArlington

unread,

Dec 5, 2007, 8:49:01 AM12/5/07

to

How critical is the correctness of data?
I.e. thread 1 initiates update process, while the information is being
collected/prepared/formated... thread 2 (3..10000) try to read the
data? Do all those threads NEED to wait for up-to-date data or can
they fetch current data and fetch the update later (next time)?
If you do NOT need absolutely up-to-the-millisecond data, I suggest
NOT to use locks at all, in your update/add method collect/prepare/
format ALL data, assign it to tempVar, and then make a single
assignment yourStructureDataPointer = tempVar; - it is fast and
corrupted/incomplete data should never be available for reading.
If you DO need absolutely up-to-the-millisecond data, I suggest to use
lock (maybe your own data lock) in your update/add method applied to
the data structure pointer.
In any case DO NOT lock your data structure.

GArlington

unread,

Dec 5, 2007, 9:20:49 AM12/5/07

to

On Dec 5, 5:45 am, grbgooglefan <ganeshbo...@gmail.com> wrote:

BTW: I created similar in scale caching system in Java (about 50K data
structures @ 30KBytes each).
I was just wondering what are your performance figures?
My stats: check() = 0.5-5, get() = 1-10, put() = 1-10 depending on
load All times in micro seconds - that is 1000th of the millisecond
or 1000000th of a second.

Joe Seigh

unread,

Dec 5, 2007, 8:43:29 PM12/5/07

to

> If the latter, there are commercial in memory OLAP databases designed
> for exactly this kind of stuff. There's even a streaming based one.
>
> You might be able to come up with a quasi lock-free solution involving
> the "thread-safe" version of shared_ptr but you really should be familiar
> with the lock-free design patterns which affect the semantics of the
> operations, not just the performance.
>
>

If you go the latter route, you'd probably want to use a form of proxy PDR
like appc from http://atomic-ptr-plus.sourceforge.net/

You could do it with shared_ptr and with a conventional lock for the writer and
for safely acquiring a proxy reference. This all assumes you know how to
implement your own lock-free data structures using PDR.

Just be careful with shared_ptr on linked lists. C++ dtors are recursive and
you can easily cause stack overflow if you're not careful.

Chris Thomasson

unread,

Dec 5, 2007, 9:06:25 PM12/5/07

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:5tI5j.13548$Lg.10140@trndny09...

are you referring to the following:

http://groups.google.com/group/comp.programming.threads/msg/591d41afc27fab69

? Does appc still have that issue?

Joe Seigh

unread,

Dec 6, 2007, 5:19:31 AM12/6/07

to

Yes. You need to use a semaphore or something to slow down the writers since
they're no longer blocked by readers the way conventional rwlocks do.

Chris Thomasson

unread,

Dec 6, 2007, 2:15:43 PM12/6/07

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:T0Q5j.6897$3W.1290@trndny04...
> Chris Thomasson wrote:
[...]

>>> Just be careful with shared_ptr on linked lists. C++ dtors are
>>> recursive and
>>> you can easily cause stack overflow if you're not careful.
>>
>>
>> are you referring to the following:
>>
>> http://groups.google.com/group/comp.programming.threads/msg/591d41afc27fab69 ?
>> Does appc still have that issue?
>>
>>
> Yes. You need to use a semaphore or something to slow down the writers
> since
> they're no longer blocked by readers the way conventional rwlocks do.

[...]

FWIW, I got around this problem in the following proxy collector
implementation:

http://home.comcast.net/~vzoom/demos/pc_sample.c
(refer to the pc_deferred_sys_dec function...)
______________________________________________________________
void pc_deferred_sys_dec( pc_deferred_t *_this )
{
/* the private refs are zero, we now need to dec
the back-link refs */
if ( atomic_xaddword( &_this->back_refs, -1 ) == 1 )
{
/* this collector has dropped to zero,
so queue it locally */
pc_deferred_t *next, *cur = _this->next,
*dtorf = _this, *dtorb = _this;

_this->next = 0;

while ( cur )
{
/* we now need to dec the back-link refs of
the this collector */
if ( atomic_xaddword( &cur->back_refs, -1 ) != 1 )
{
break;
}

next = cur->next;

/* this collector has dropped to zero,
so queue it locally */
cur->next = 0;
dtorb->next = cur;
dtorb = cur;

cur = next;
}

/* process the dropped collectors in FIFO order */
cur = dtorf;

while ( cur )
{
next = cur->next;
pc_deferred_sys_free( cur );
cur = next;
}

/* that's all folks! :) */
}
}
______________________________________________________________

I gather all of the dropped collectors in the expired epoch into a
thread-local queue, then just loop through it and call the destructor
function (refer to pc_deferred_sys_free function). I think I am going to
revive this code and stick a C++ wrapper around it. AFAICT, this will not
blow the stack because I am using a simple loop instead of recursion.

Wenwei Peng

unread,

Apr 22, 2015, 3:57:40 AM4/22/15

to

在 2007年12月5日星期三 UTC+8下午1:45:02，grbgooglefan写道：

Title: The core of the core of the big data solutions -- Map
Author: pengwenwei
Email:
Language: c++
Platform: Windows, linux
Technology: Perfect hash algorithm
Level: Advanced
Description: Map algorithm with high performance
Section MFC c++ map stl
SubSection c++ algorithm
License: (GPLv3)

Download demo project - 1070 Kb
Download source - 1070 Kb

Introduction:
For the c++ program, map is used everywhere.And bottleneck of program performance is often the performance of map.Especially in the case of large data,and the business association closely and unable to realize the data distribution and parallel processing condition.So the performance of map becomes the key technology.

In the work experience with telecommunications industry and the information security industry, I was dealing with the big bottom data,especially the most complex information security industry data,all can’t do without map.

For example, IP table, MAC table, telephone number list, domain name resolution table, ID number table query, the Trojan horse virus characteristic code of cloud killing etc..

The map of STL library using binary chop, its has the worst performance.Google Hash map has the optimal performance and memory at present, but it has repeated collision probability.Now the big data rarely use a collision probability map,especially relating to fees, can’t be wrong.

Now I put my algorithms out here,there are three kinds of map,after the build is Hash map.We can test the comparison,my algorithm has the zero probability of collision,but its performance is also better than the hash algorithm, even its ordinary performance has no much difference with Google.

My algorithm is perfect hash algorithm,its key index and the principle of compression algorithm is out of the ordinary,the most important is a completely different structure,so the key index compression is fundamentally different.The most direct benefit for program is that for the original map need ten servers for solutions but now I only need one server.
Declare: the code can not be used for commercial purposes, if for commercial applications,you can contact me with QQ 75293192.
Download:
https://sourceforge.net/projects/pwwhashmap/files

Applications:
First,modern warfare can’t be without the mass of information query, if the query of enemy target information slows down a second, it could lead to the delaying fighter, leading to failure of the entire war. Information retrieval is inseparable from the map, if military products use pwwhashMap instead of the traditional map,you must be the winner.

Scond,the performance of the router determines the surfing speed, just replace open source router code map for pwwHashMap, its speed can increase ten times.
There are many tables to query and set in the router DHCP ptotocol,such as IP,Mac ,and all these are completed by map.But until now,all map are using STL liabrary,its performance is very low,and using the Hash map has error probability,so it can only use multi router packet dispersion treatment.If using pwwHashMap, you can save at least ten sets of equipment.

Third,Hadoop is recognized as the big data solutions at present,and its most fundamental thing is super heavy use of the map,instead of SQL and table.Hadoop assumes the huge amounts of data so that the data is completely unable to move, people must carry on the data analysis in the local.But as long as the open source Hadoop code of the map changes into pwwHashMap, the performance will increase hundredfold without any problems.

Background to this article that may be useful such as an introduction to the basic ideas presented:
http://blog.csdn.net/chixinmuzi/article/details/1727195