sorted set memory consumption optimization

Oleg Ruchovets

unread,

Sep 26, 2013, 12:16:20 PM9/26/13

to redi...@googlegroups.com

Hi All.

I am running a benchmark of memory consumption of sorted set in Redis.

My goal is to optimize memory usage for 1.000.000 sorted sets each has 300 entries.

SortedSet :

all:speed:year:1380210701349 - key

50:1380210701349 (entries)

80:1380210701349

I calculated that each sorted set has size:

28 byte (key itself) + (4byte * 300) scores + (13byte * 300)entries = 5128 bytes.

1.000.000 * 5128 = ~ 4.7 GB.

But executing insertion of the 1000.000 sorted sets I got memory usage 40GB !!!!

Questions:

1) Why I get so high memory usage overhead , it is closed to x10?

2) I changed in conf file these parameters:

zset-max-ziplist-entries 310

    zset-max-ziplist-value 6000

but the memory consumption didn't changed.

When I check the object :

> DEBUG OBJECT ALL:speed:year:1380210701349

I get such result

Value at:0x7f3c2ef22d10 refcount:1 encoding:skiplist serializedlength:5292 lru:1706190 lru_seconds_idle:50

Does it means that after changing ziplist my sorted sets still not zipped? If yes why after changing the conf parameters and restart Redis , I still didn't get zipped sorted sets? What should i do to resolve / debug this behavior.

I am using Redis 2.6.9 , CentOs operationg system running on VM.

Thanks

Oleg

Josiah Carlson

unread,

Sep 26, 2013, 2:04:39 PM9/26/13

to redi...@googlegroups.com

Your zset max ziplist sizes must be set prior to Redis starting up, and/or prior to the sorted set creation. You can force a restructuring by using a "zunionstore key 1 key" call, which should get you a ziplist-encoded zset.

- Josiah

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.

Oleg Ruchovets

unread,

Sep 27, 2013, 12:13:38 AM9/27/13

to redi...@googlegroups.com

Yes it works.

Thank you ,

Using zip , reduced memory consumption to X10.

Still I don't understand why using sorted sent took X10 memory vs original data size.

Am I doing something wrong or it is reasonable ration?

I have sorted sets with size 300.

I've read that performance degradation using zip depends on sorted set size.

Is 300 size cause significant performance degradation?

Thanks

Oleg.

--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/HTk8nusqlHo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.

Josiah Carlson

unread,

Sep 27, 2013, 1:35:14 AM9/27/13

to redi...@googlegroups.com

Your earlier estimates were also off (scores are stored in 8 bytes, plus there is header information for every entry in the ziplist).

The practical realities of storing data in data structures means that in order to make them fast, they use more memory. A ZSET is not just a skiplist. And the skiplist is not just a skiplist. A ZSET is a hash plus a skiplist. The hash for accessing items by member, and the skiplist is for ordering the items by score. The combination of the two is what gives the ZSET its usefulness. Now, as for why it takes up so much space, that is a longer story.

First, look up what a skiplist is: http://en.wikipedia.org/wiki/Skip_list Now, understand that the more items that are in your skiplist, the more memory they use, because there are more levels that need to have previous/next pairs. You can see the structure layout here: https://github.com/antirez/redis/blob/unstable/src/redis.h#L512

On the hash side of things, you get a similar thing going on. Each entry in a hash is not just a a pointer to a value, it's a pointer to a structure that includes information about the key, the relevant value, and a pointer to the next entry in the bucket. You can read the details here: https://github.com/antirez/redis/blob/unstable/src/dict.h

Long story short, in my experience, you can expect that an entry in a ZSET to account for 100-150 bytes of memory each for ZSETs with at least a few thousand items (I saw roughly 150 bytes per entry for a ZSET with 150+ million entries).

Will that always be the case? I don't know, it's hard to say. I'm in the process of working on a data structure now as part of my full-time gig that I'm planning on initially integrating into a fork of Redis that can offer integer sets the opportunity to use very little memory (less than 2% of memory used is pointers and structure overhead when there are more than a few hundred entries), while at the same time offering O(log(n)) insert, update, and delete. I've also got an O(n+m) union/intersect algorithm that can be easily swapped out for an O(nlog(m)) algorithm depending on which is expected to be faster (which is a function of memory latency and can be calculated on startup).

It wouldn't be difficult to tweak my structure's functionality to offer similar memory savings to replace skiplists, offering HASH/ZSET memory use that could be very close to each other (within 8 bytes/entry, compared to the 15-65 byte difference now).

It could also be trivially used to replace the existing hash-based expiration/random sampling and actually reduce memory use, while offering O(1) discovery of data that can be expired immediately.