I am using the hash map, etc from default available STL library on
Linux (libstdc++). I am using them as memory cache for server
application.
Most of these caches are built using the string as key and the
structure as data. Flow of functions happens somewhat as below in the
application:
Thread 1 gets data from database using select query. Populates that
data in the structures & pushes that structure on the cache.
Thread 2 then picks up that structure & uses it for getting other live
real time data from other service. Once that real time data is
available, this Thread 2 keeps on updating that data in the structures
in the cache.
There Thread 3 which reads the data from this cache & uses for
processing the requests from the client application.
So all types of data structure operations - insertions, erasing,
updating, reading are happening on this cache with the strctures.
The cache holds more than 60000 data elements (structures) & searching
for each required structure takes long time. So when one thread is
trying to get the required structure, others have to wait & cannot do
any productive work. Because they also need data from the same cache
(even though for different type of actions).
How do we achieve this level of granular locking?
Thoughts please.
Why don't you use a RW locking policy (many simultaneous readers, one
writer)? BTW, I don't find 60000 data elements being so large, so
maybe the bottleneck is somewhere else (DB query?).
a+, ld.
You'd need to describe the cache structure in a little more detail as
the solution is likely to be data structure dependent or at least
influenced by it. You say the searching takes a long time so it
probably is something sequential. Are the searches for a single
item or multiple items like a an sql query?
If the latter, there are commercial in memory OLAP databases designed
for exactly this kind of stuff. There's even a streaming based one.
You might be able to come up with a quasi lock-free solution involving
the "thread-safe" version of shared_ptr but you really should be familiar
with the lock-free design patterns which affect the semantics of the
operations, not just the performance.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
How critical is the correctness of data?
I.e. thread 1 initiates update process, while the information is being
collected/prepared/formated... thread 2 (3..10000) try to read the
data? Do all those threads NEED to wait for up-to-date data or can
they fetch current data and fetch the update later (next time)?
If you do NOT need absolutely up-to-the-millisecond data, I suggest
NOT to use locks at all, in your update/add method collect/prepare/
format ALL data, assign it to tempVar, and then make a single
assignment yourStructureDataPointer = tempVar; - it is fast and
corrupted/incomplete data should never be available for reading.
If you DO need absolutely up-to-the-millisecond data, I suggest to use
lock (maybe your own data lock) in your update/add method applied to
the data structure pointer.
In any case DO NOT lock your data structure.
BTW: I created similar in scale caching system in Java (about 50K data
structures @ 30KBytes each).
I was just wondering what are your performance figures?
My stats: check() = 0.5-5, get() = 1-10, put() = 1-10 depending on
load All times in micro seconds - that is 1000th of the millisecond
or 1000000th of a second.
If you go the latter route, you'd probably want to use a form of proxy PDR
like appc from http://atomic-ptr-plus.sourceforge.net/
You could do it with shared_ptr and with a conventional lock for the writer and
for safely acquiring a proxy reference. This all assumes you know how to
implement your own lock-free data structures using PDR.
Just be careful with shared_ptr on linked lists. C++ dtors are recursive and
you can easily cause stack overflow if you're not careful.
are you referring to the following:
http://groups.google.com/group/comp.programming.threads/msg/591d41afc27fab69
? Does appc still have that issue?
FWIW, I got around this problem in the following proxy collector
implementation:
http://home.comcast.net/~vzoom/demos/pc_sample.c
(refer to the pc_deferred_sys_dec function...)
______________________________________________________________
void pc_deferred_sys_dec( pc_deferred_t *_this )
{
/* the private refs are zero, we now need to dec
the back-link refs */
if ( atomic_xaddword( &_this->back_refs, -1 ) == 1 )
{
/* this collector has dropped to zero,
so queue it locally */
pc_deferred_t *next, *cur = _this->next,
*dtorf = _this, *dtorb = _this;
_this->next = 0;
while ( cur )
{
/* we now need to dec the back-link refs of
the this collector */
if ( atomic_xaddword( &cur->back_refs, -1 ) != 1 )
{
break;
}
next = cur->next;
/* this collector has dropped to zero,
so queue it locally */
cur->next = 0;
dtorb->next = cur;
dtorb = cur;
cur = next;
}
/* process the dropped collectors in FIFO order */
cur = dtorf;
while ( cur )
{
next = cur->next;
pc_deferred_sys_free( cur );
cur = next;
}
/* that's all folks! :) */
}
}
______________________________________________________________
I gather all of the dropped collectors in the expired epoch into a
thread-local queue, then just loop through it and call the destructor
function (refer to pc_deferred_sys_free function). I think I am going to
revive this code and stick a C++ wrapper around it. AFAICT, this will not
blow the stack because I am using a simple loop instead of recursion.