Question about collision in redis

1,723 views
Skip to first unread message

Thanh TRAN

unread,
Nov 23, 2011, 5:26:57 AM11/23/11
to Redis DB
Hi!
Now I have established a server which uses redis cache to cache
data having form of <int , vector>.
--> So i have a question about redis cache: Does collision exist
in redis cache? (I couldn't see it)

Josiah Carlson

unread,
Nov 23, 2011, 11:59:54 AM11/23/11
to redi...@googlegroups.com

If you have two keys X and Y, and they are different, then there will
be no observable collision. Within Redis, they may be in the same hash
bucket, but they will both survive.

Regards,
- Josiah

Tran Thanh

unread,
Nov 23, 2011, 9:46:10 PM11/23/11
to redi...@googlegroups.com

Hi. Thanks!
     In function dictAdd() in file "dict.c" I saw 3 lines as below :
                          entry = zmalloc(sizeof(*entry));
                          entry->next = ht->table[index];
                          ht->table[index] = entry;
They said that redis uses "open addressing" to process collision. Oh. Do you have experiences about using hash function in redis??? When testing Redis cache, i saw many "dictType" declared in "redis.c" so i don't know the best for me.


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.




--

-------------------------------------------------------------
Tran Dac Thanh
Developer of  Luvina Join Stock Company
Ha Noi University Of Technology
-------------------------------------------------------------

Josiah Carlson

unread,
Nov 23, 2011, 10:00:40 PM11/23/11
to redi...@googlegroups.com
What are you trying to do with Redis? Are you trying to add
functionality? Are you just studying the source to see how it works?
Are you looking for bugs? How are you "testing" Redis?

I ask these questions so I can understand what you are interested in
learning, which will make it a lot easier to tell you what you want to
know.

Regards,
- Josiah

Tran Thanh

unread,
Nov 23, 2011, 10:36:19 PM11/23/11
to redi...@googlegroups.com
Hi!
   Now i have established a social network. My  DB contains 45 millions of users. When an user searches friends, the result  must be returned as fast as possible --> The solution is using a cache. At the first time, I built a cache using some technologies as below.
1. Clock cache : When cache is full --> a certain item must be evicted and the new item is add to cache --> find evicted item by this algorithm.
2. Hash function : calculate the hash value of the key: I use boost:hash function to do this
3. Collision: I use "open addressing" to solve collision in my cache.
 4. Using tbb::atomic and tbb::mutex to lock data : ( I use 10 threads to set and get data).

When finish, I built a client (using thrift) to test data. But my performance is not good: 10.000 request/ second.
 
--> I must have another choice. --> so i choose redis cache.
Ok, return to your question.
1 . What are you trying to do with Redis? ==> : I only want to use redis cache. so i want to know more about redis cache and find the way to integrate it to my project (written in C++).
2.Are you trying to add functionality? ==> No, I am not. Redis cache is full of functionalities for me to use.
3.Are you just studying the source to see how it works? ==> Yes. Because i want to integrate redis cache in my project --> I must know it.
4. Are you looking for bugs? ==> No
5. How are you "testing" Redis? ==> I can test "Redis cache" as following : With my cache, I add (get) list of data ( 1 million -> 10 millions) and calculate the total time to use. Then I compare it with the time Redis cache uses to do that.

Marc Byrd

unread,
Nov 23, 2011, 10:38:02 PM11/23/11
to redi...@googlegroups.com
So Josiah, are you saying that locking, by which redis accomplishes atomicity, is at the key level?  

Is there any insight into how long a key is locked in such a situation?  (e.g. 1 ms, 1 microsecond?)  

How does this compare to, for example, mongodb's implementation of atomicity by doing a system-wide lock (see e.g. http://stackoverflow.com/questions/7506124/is-it-true-that-mongodb-has-one-global-read-write-lock ). 

Have there been any comparisons between mongodb and redis in very-high-concurrency situations?

Thanks,


m


Tran Thanh

unread,
Nov 23, 2011, 10:51:46 PM11/23/11
to redi...@googlegroups.com
Hi Marc Byrd,
     I couldn't see any "lock operation" in redis. I couldn't know how redis can run as fast as the benchmark says. About mongodb's implementation, if it using global lock --> the performance is as slow as using one thread to run.
About the comparisons between mongodb and redis in very-high-concurrency situations. I think in any situations, the most important thing one person cares is that how fast does your system can run. --> if you using  mongodb's implementation in your system, you must calculate maximum of thread your project can run to reach the best performance. Then compare the result with redis.
 

On Thu, Nov 24, 2011 at 10:38 AM, Marc Byrd <dr.mar...@gmail.com> wrote:
So Josiah, are you saying that locking, by which redis accomplishes atomicity, is at the key level?  

Is there any insight into how long a key is locked in such a situation?  (e.g. 1 ms, 1 microsecond?)  

How does this compare to, for example, mongodb's implementation of atomicity by doing a system-wide lock (see e.g. http://stackoverflow.com/questions/7506124/is-it-true-that-mongodb-has-one-global-read-write-lock ). 

Have there been any comparisons between mongodb and redis in very-high-concurrency situations?

Thanks,


On Wed, Nov 23, 2011 at 8:59 AM, Josiah Carlson <josiah....@gmail.com> wrote:
On Wed, Nov 23, 2011 at 2:26 AM, Thanh TRAN <thanht...@gmail.com> wrote:
>    Hi!
>    Now I have established a server which uses redis cache to cache
> data having form of <int , vector>.
>    --> So i have a question about redis cache: Does collision exist
> in redis cache? (I couldn't see it)

If you have two keys X and Y, and they are different, then there will
be no observable collision. Within Redis, they may be in the same hash
bucket, but they will both survive.

Regards,
 - Josiah

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

kehai chen

unread,
Nov 23, 2011, 10:55:03 PM11/23/11
to redi...@googlegroups.com
I want to know the question.Hope that you can explain it in details.

2011/11/24 Marc Byrd <dr.mar...@gmail.com>

Josiah Carlson

unread,
Nov 24, 2011, 3:38:18 AM11/24/11
to redi...@googlegroups.com
On Wed, Nov 23, 2011 at 7:36 PM, Tran Thanh <thanht...@gmail.com> wrote:
> Hi!
>    Now i have established a social network. My  DB contains 45 millions of
> users. When an user searches friends, the result  must be returned as fast
> as possible --> The solution is using a cache. At the first time, I built a
> cache using some technologies as below.
> 1. Clock cache : When cache is full --> a certain item must be evicted and
> the new item is add to cache --> find evicted item by this algorithm.
> 2. Hash function : calculate the hash value of the key: I use boost:hash
> function to do this
> 3. Collision: I use "open addressing" to solve collision in my cache.
>  4. Using tbb::atomic and tbb::mutex to lock data : ( I use 10 threads to
> set and get data).
>
> When finish, I built a client (using thrift) to test data. But my
> performance is not good: 10.000 request/ second.

I agree, that isn't great.

> --> I must have another choice. --> so i choose redis cache.
> Ok, return to your question.
> 1 . What are you trying to do with Redis? ==> : I only want to use redis
> cache. so i want to know more about redis cache and find the way to
> integrate it to my project (written in C++).

Hiredis is a client written in C, which you should be able to use in
your C++ project directly.

> 2.Are you trying to add functionality? ==> No, I am not. Redis cache is full
> of functionalities for me to use.
> 3.Are you just studying the source to see how it works? ==> Yes. Because i
> want to integrate redis cache in my project --> I must know it.

Most people use it very successfully without understanding very much
of how it works deep down. Do you know all of the details of how your
keyboard works? Probably not, yet you use it as a tool every day to
get your job done.

To be clear: I'm not saying you shouldn't learn it, I'm just saying
that it may not be necessary to solve your performance problem.

> 4. Are you looking for bugs? ==> No
> 5. How are you "testing" Redis? ==> I can test "Redis cache" as following :
> With my cache, I add (get) list of data ( 1 million -> 10 millions) and
> calculate the total time to use. Then I compare it with the time Redis cache
> uses to do that.

You don't need to read Redis source code to be able to do that. You
only need to read the command documentation: http://redis.io/commands
and learn how to use a client library with your language to access
Redis.

Regards,
- Josiah

Josiah Carlson

unread,
Nov 24, 2011, 3:58:42 AM11/24/11
to redi...@googlegroups.com
On Wed, Nov 23, 2011 at 7:38 PM, Marc Byrd <dr.mar...@gmail.com> wrote:
> So Josiah, are you saying that locking, by which redis accomplishes
> atomicity, is at the key level?

I'm not saying that at all. Redis accomplishes atomicity by being
single threaded. There is an illusion of simultaneous operations
within Redis, which is only technically correct in the case of bgsave,
bgrewriteaof, vm (which is deprecated), and the discussed/implemented
file rename; all of which are there to not cause the thread/process
that does work to block.

> Is there any insight into how long a key is locked in such a situation?
>  (e.g. 1 ms, 1 microsecond?)

There is no lock.

> How does this compare to, for example, mongodb's implementation of atomicity
> by doing a system-wide lock (see
> e.g. http://stackoverflow.com/questions/7506124/is-it-true-that-mongodb-has-one-global-read-write-lock
> ).

No lock.

> Have there been any comparisons between mongodb and redis in
> very-high-concurrency situations?

Apples and Oranges. Both are fruit, but are substantially different in
design, implementation, intent, purpose, ... so as to be effectively
non-comparable.

More specifically:
Redis is an in-memory data structure server.
MongoDb is a document store.

One stores data structures, the other stores documents. One stores
data primarily in memory, the other caches data that is supposed to be
on disk. I could go on, but I won't. Different solutions for an
overlapping problem set.

But to answer your question: for any operation that they can do
equivalently, Redis will generally be many times faster. I can perform
50k read/write operations/second on a Redis instance in VirtualBox on
my 2 year old laptop. I've not heard of anyone coming even close to
that write performance with anything less than a cluster of 6+
overly-powerful MongoDB boxes (if someone has better Mongo results,
I'd love to hear them).

Regards,
- Josiah

Didier Spezia

unread,
Nov 24, 2011, 2:02:02 PM11/24/11
to redi...@googlegroups.com
Hi,

to answer the original question of the OP:

Yes, there are collisions in the hash table implementation of Redis
(like with any other hash tables). The hash function is Bernstein's
(for strings)

Redis does not use open addressing but separate chaining
(i.e. a linked list is associated to each bucket of the hash table).

Collisions are not a problem in practice because Redis supports
transparent background incremental rehashing, so hash table
size is increased (doubled) to maintain the number of collision low.

In most situations, table expansion and rehashing are triggered when
the number of items gets equal to the number of buckets (i.e. 1:1 ratio).
In some situations (for instance when a bgsave operation is active),
rehashing is only tolerated when the ratio is 5:1 in order to be more
COW-friendly.

The hash tables can also be rehashed down to save memory
when the number of items gets lower than 10% of the buckets.

Now regarding the disapointing results with the OP's own cache
implementation, I would say the problem is probably more with the
thrift server than with the hash table. When something as
efficient as a hash table is used (any implementation),
the first performance bottleneck should be at the network
and communication code level.

Using Redis or memcached is probably safer and more efficient
than most home-made cache implementations. Actually, it is getting
harder and harder to beat them.

Regards,
Didier.

Tran Thanh

unread,
Nov 24, 2011, 9:19:26 PM11/24/11
to redi...@googlegroups.com
To  Josiah Carlson  and  Didier Spezia : Thanks a lot for your helpful sharing


-------------------------------------------------------------
Tran Dac Thanh
R&D Vinagame Corp.
Reply all
Reply to author
Forward
0 new messages