Re: Best maximum no of members within sets, in practice?

111 views
Skip to first unread message

Didier Spezia

unread,
Jul 16, 2012, 10:55:33 AM7/16/12
to redi...@googlegroups.com
Hi,

a set is implemented with a hash table so you have enough memory
you can store millions of items with no sweat in a Redis set. 

Now if you want to store sets in a compact way, you can imagine
leveraging intset or ziplist optimizations to store the data. This is what
the Instagram guys proposed here. 


To optimize larger sets, you could cut them into pieces so that each piece
is stored using a serialized data structure

See  http://stackoverflow.com/questions/10004565/redis-10x-more-memory-usage-than-data/10008222#10008222 

The best way to evaluate memory consumption is to try creating a smaller number of objects
and extrapolate. On my system, without any optimization, 10K sets of 8K numerical items
take about 3 GB, so 500K sets would take 150 GB. Now by leveraging intset, I can decrease
memory consumption to about 200 MB, resulting in 10 GB for 500K sets. Much better.

Regards,
Didier.

On Monday, July 16, 2012 1:56:32 PM UTC+2, Haluk AKIN wrote:
Hello,

I understand the OS defines the maximum number of members a "set" can hold.

How about practical usage limits?

For instance the following article mentions "1000" as a limit for "hash" type keys.
http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value-pairs

In my case, I'm trying to understand if it makes sense to hold 7-8k members in each set, for about 500k sets.

Thank you,
Haluk

Haluk AKIN

unread,
Jul 16, 2012, 4:12:05 PM7/16/12
to redi...@googlegroups.com
Hello Didier,

Thank you for your input.

I'm somewhat less worried about storage space.


Quoting from the article, "We found this setting was best around 1000; any higher and the HSET commands would cause noticeable CPU activity"
This sounds like, as the set grows performance degrades more than it was expected? Or is this just referring to the CPU activity it takes to compress sets?

My particular focus is more on the speed performance of the functions similar to the following:

SUNION,SCARD,SDIFF,SINTERSTORE
ZCOUNT,ZRANGE,ZINTERSTORE


Maybe I can rephrase the question like this, "shall I expect the "time complexity" information to be exactly correct for large sets, or would there be additional performance decay with larger sets?"
http://redis.io/commands/sinter

Thank you,
Haluk

Josiah Carlson

unread,
Jul 16, 2012, 4:20:44 PM7/16/12
to redi...@googlegroups.com
They were using the compact ziplist/zipmap representation by setting
the max zipmap/ziplist entries and size options. By setting those to
large numbers, then you can use the compact representation for larger
sets/hashes/zsets, but the larger they get, the slower the operations
(operations are typically O(n) for the compact representation).

If your set/hash/zset gets bigger than your configured ziplist limit,
then it will switch to the native structure, which will then follow
the expected performance characteristics for that structure (O(1)
inserts, updates, deletes on sets/hashes, O(logn) inserts, updates,
and deletes on zsets).

Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/I1xSXSrhImUJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
Reply all
Reply to author
Forward
0 new messages