Single writer multiple reader hash map of int keys and int values

361 views
Skip to first unread message

Rajiv Kurian

unread,
Jan 7, 2014, 3:42:16 AM1/7/14
to lock...@googlegroups.com
I am trying to design a lock-free or close to lock-free hash map of int keys and int values. I have a single writer thread that writes to the hash-map. My efforts to do better than a RW mutex around the map haven't yielded anything good. My gut feeling says I should be able to do better given it's a single writer. I have the following requirements:

1) A single-writer multiple reader map of int keys and int values.
2) Keys are guaranteed to be monotonically increasing. These are just ids and are generated by incrementing an integer.
3) The operations I need are:
  1. append(key, value) where key is guaranteed to be higher than anything in the map so far - only called on the writer thread.
  2. delete(key) - only called on the writer thread.
  3. set(key, value) where key is required to be already in the map - only called on the writer thread.
  4. get(key) - could be called by writer/reader threads.
4) All the reader threads and the single writer thread are known during the creation of the map. Reader threads do not pop up or die. They stay constant for the lifetime of the application.

I am struggling to use anything cheaper than RW locks (changed for a single writer) for the implementation since the delete and append operations could require resizing of buckets or moving elements around and the reader threads could observe the changes in between.

Any pointers?

Thanks,
Rajiv




Dmitriy V'jukov

unread,
Jan 9, 2014, 10:02:36 AM1/9/14
to lock...@googlegroups.com
Hi!

The most efficient solution is always tailored to your particular
situation. Lots of details matter. In particular in this case: 1. what
is the number of reader threads? 2. what is the relative frequencies
of different operations?

Also the fastest way to do something is to not do it all.
Can you replace integers with pointers? Even if you send them across
network, you can verify that the received pointer is valid before
using it, e.g:

#define MAX_OBJECT_IN_FLIGHT (1<<20)

object_t objects[MAX_OBJECT_IN_FLIGHT];

bool verify_pointer(object_t *o) {
if(o < objects || o >= objects + MAX_OBJECT_IN_FLIGHT)
return false;
if(((uintptr)o - (uintptr)objects) % sizeof(object_t))
return false;
// OK, this is a valid pointer into objects.
// A client can spoof it, but he can spoof the integer as well.
return true;
}

--

Then if you can limit the difference between the newest and the oldest
object, then you can use a ring buffer with direct indexing. If a
reader asks for a very old object (older than the threshold), then,
sorry, no luck.

--

If you can not afford to reject requests for very old ids, but you
expect them to not happen in common case, then you can have fast path
for recent ids and slow path for older ids. E.g.:

// We do not expect to receive requests for ids older than this
#define N (1<<20)

struct table_element {
int key;
int val;
table_element_slow* slow;
};

table_element table[N];

int find(int key) {
int idx = key % N;
table_element* e = &table[idx];
if(e->key == key)
return val;
// consult e->slow
// this can be slow, but we do not expect this to happen in common case
}

This needs special care wrt atomic accesses to key/val, and also some
protection for slow, and other details. But I hope the idea is clear.

--

There can be other "special" solutions.
It all these cases, accesses are extremely fast in common case.
If that does not work for you, then I would consider taking a normal
hashmap, partitioning it into lots of partitions and protecting each
partition with own rw mutex. On top of that you can have a mutex for
each thread/partition, readers lock own mutex but the writer need to
lock mutexes of all threads. This will eliminate contention between
readers, but make writer slower. Since you know number of readers
ahead of time, this must be really simple to implement.





--
Dmitry Vyukov

All about lockfree/waitfree algorithms, multicore, scalability,
parallel computing and related topics:
http://www.1024cores.net

Rajiv Kurian

unread,
Jan 9, 2014, 8:33:30 PM1/9/14
to lock...@googlegroups.com
Number of reader threads = num cores - 1, so all but one core should be reading.
Writes are vastly outnumbered by reads, that's all I know for now. I am trying to build a framework so, the real ratio is dependent on the application and kind of unpredictable.


Also the fastest way to do something is to not do it all.
Can you replace integers with pointers? Even if you send them across
network, you can verify that the received pointer is valid before
using it, e.g:

#define MAX_OBJECT_IN_FLIGHT (1<<20)

object_t objects[MAX_OBJECT_IN_FLIGHT];

bool verify_pointer(object_t *o) {
  if(o < objects || o >= objects + MAX_OBJECT_IN_FLIGHT)
    return false;
  if(((uintptr)o - (uintptr)objects) % sizeof(object_t))
    return false;
  // OK, this is a valid pointer into objects.
  // A client can spoof it, but he can spoof the integer as well.
  return true;
 
}

--
Hmm, I am not sure what you mean. Are you saying that since my int is an id of an object I might as well deal with the pointers to that object instead of having extraneous look ups by ids? For more context I am trying to design an actor system like Erlang. IDs instead of pointers help because there are extraneous requirements like persisting actor state and ids across runs of the applications. The hash map helps with scheduling actors to threads. There are num core - 1 dispatcher threads actually running the actor logic. These are the reader threads that I talked about. They are connected by a mesh of SPSC queues/ring-buffers. Each dispatcher thread (which is running actor logic) needs to check where an actor (id) is scheduled so that it can enqueue a msg on the right queue. There is a single writer thread which actually does the scheduling of actors to dispatcher threads. This is the writer thread that writes to the hash map.  Having pointers won't save me from doing the look ups since I cannot use the pointer to actually execute a function. I need to know where an object/actor is scheduled to run. The operations I need in context are:

1) Append(actorId, dispatcher-id). This is called by the single writer thread, when a new actor is created and hence the monotonically increasing appends.
2) Set(actorId, dispatcher-id). The single scheduler thread observes patterns like certain actor pairs communicating a lot, or dispatchers becoming imbalanced (queue length, queue drain rate etc). Based on these metrics it can choose to move an actor(s) from one dispatcher to another. There are other steps involved here that I am omitting.
3) Delete(actorId). An actor has just died hence it's deleted from the central directory.

On your suggestion of specialization, something I thought of is that since I already have communication channels between the writer thread and the reader threads (ring-buffers that they poll on), I could use it as an opportunity to do updates such as resize/compaction etc. I am using a closed  addressing hash map implemented by 2^n buckets of key value arrays. Since ids are monotonically increasing, an append will never displace another value. The only time value of an existing cell changes is:
i) There is a compaction (too many deletions).
ii) A rebalance event occurs, resulting in Set(actorId, dispatcher-id) calls. 
iii) An append call will cause a bucket to run out of space or cause the load-factor of the table to be too high. This will cause wholesale restructuring of the hash-map - allocation and free of one or more buckets or increasing the number of buckets etc.

If I size my map appropriately (massively oversize) then compaction and append running out of space events should be rare enough. Rebalance in general should be rare enough and stablize with time. Rebalance also requires other coordination in any case - dispatcher threads need to stop processing messages of the transitioning actors etc. So what I could do is any time the writer thread requires to change the structure of the hash-map I could wait for a safe point by signalling a map-about-to-change event to all the other threads and waiting for them to acknowledge. Once they acknowledge (it means they are just busy waiting for end-of- map-change event) I could make as many changes as I want to the internal structure of the map and then signal end-of-map-change. The reader threads will now be caught up with all the changes and can proceed. The nice thing is all get calls can proceed without any locks. Append calls are always making changes to previously unaccessed cells so a normal load of the cell should be fine for get calls. All of this is based on the hypothesis that resize, compaction and set calls will be infrequent. 
 


Then if you can limit the difference between the newest and the oldest
object, then you can use a ring buffer with direct indexing. If a
reader asks for a very old object (older than the threshold), then,
sorry, no luck.

--

If you can not afford to reject requests for very old ids, but you
expect them to not happen in common case, then you can have fast path
for recent ids and slow path for older ids. E.g.:

// We do not expect to receive requests for ids older than this
#define N (1<<20)

struct table_element {
  int key;
  int val;
  table_element_slow* slow;
};

table_element table[N];

int find(int key) {
  int idx = key % N;
  table_element* e = &table[idx];
  if(e->key == key)
    return val;
  // consult e->slow
  // this can be slow, but we do not expect this to happen in common case
}

This needs special care wrt atomic accesses to key/val, and also some
protection for slow, and other details. But I hope the idea is clear.
I think so. 

--

There can be other "special" solutions.
It all these cases, accesses are extremely fast in common case.
If that does not work for you, then I would consider taking a normal
hashmap, partitioning it into lots of partitions and protecting each
partition with own rw mutex. On top of that you can have a mutex for
each thread/partition, readers lock own mutex but the writer need to
lock mutexes of all threads. This will eliminate contention between
readers, but make writer slower. Since you know number of readers
ahead of time, this must be really simple to implement.
Right, I saw your RW mutex solution based on one cell per core. 

Dmitriy V'jukov

unread,
Jan 12, 2014, 5:49:37 AM1/12/14
to lock...@googlegroups.com
For an actor system I would strongly consider using pointers as actor
identity. If you need to persist an actors, you can map it to
int/string id only when you are persisitng (this mapping is trivial --
you just need to read a field from actor struct, pointer to which you
already have). If you want to pin actors to dispatchers, you can
operatate on pointer here as well, just add the pointer to an actor
into dispatcher queue. And so on.
If you will operate on int ids, then that will incur constant
unnecessary overheads on most hot paths.
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Scalable Synchronization Algorithms" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to lock-free+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lock-free/5fc8de5a-545e-41dd-bdb4-4c635c2b53c7%40googlegroups.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.

Rajiv Kurian

unread,
Jan 12, 2014, 2:55:05 PM1/12/14
to lock...@googlegroups.com
Right, makes sense. However I'd still need some kind of a look up/mechanism for different threads to stay up to date with the knowledge of which thread an actor(just pointer) lives, so that it can dispatch messages to the right queue. Do you think my solution for the look up (replace int ids with pointers) makes sense? 

Dmitriy V'jukov

unread,
Jan 13, 2014, 1:08:44 AM1/13/14
to lock...@googlegroups.com
I may be missing something, but this is just:
actor->dispatcher->enqueue(msg);
> https://groups.google.com/d/msgid/lock-free/8a5ce867-b0e9-4c56-89a2-a64cdbebebd6%40googlegroups.com.

Rajiv Kurian

unread,
Jan 13, 2014, 2:22:45 AM1/13/14
to lock...@googlegroups.com
Aah I see, are you saying we should just keep a field internal to the actor that says what thread to dispatch it on? 

Dmitriy V'jukov

unread,
Jan 18, 2014, 7:39:08 AM1/18/14
to lock...@googlegroups.com
Yes, exactly. That's what you usually do to associate data with an object.
Reply all
Reply to author
Forward
0 new messages