redis-server taking 10 times more RAM to store data than the dump

Alexey Verkhovsky

unread,

Jun 1, 2009, 11:03:05 PM6/1/09

to redi...@googlegroups.com

Hi all,

Been playing with Redis quite a bit for the last couple of weeks. One thing that is bugging me right now looks like this:

02 Jun 02:46:20 . DB 0: 2092395 keys (0 volatile) in 2097152 slots HT.
02 Jun 02:46:20 . 4 clients connected (0 slaves), 310914 bytes in use

This is ~2 mln name-value pairs taking up 300 Mb of RAM (actually, RSS of that process is even bigger - 440 Mb). The name-value pairs in question map 8-byte strings to integers. And it takes 150-200 bytes of RAM to store each. The dump of the same database takes up 24 Mb unarchived - 12 bytes per record, which is a lot more to my liking. I realize that there are indexes and therefore hash values and pointers everywhere, but 200 bytes pf RAM to store 16 bytes of data in a big hash table still seems somewhat excessive.

So, is this a normal behavior, and is there a way to cut down the amount of RAM used by Redis in this scenario by a factor of 2-3?

--
Alexey Verkhovsky
http://alex-verkhovsky.blogspot.com/
CruiseControl.rb [http://cruisecontrolrb.thoughtworks.com]

Salvatore Sanfilippo

unread,

Jun 3, 2009, 11:20:49 AM6/3/09

to redi...@googlegroups.com

On Tue, Jun 2, 2009 at 5:03 AM, Alexey Verkhovsky
<alexey.v...@gmail.com> wrote:
> Hi all,

Hello Alexey!

> Been playing with Redis quite a bit for the last couple of weeks. One thing
> that is bugging me right now looks like this:
>
> 02 Jun 02:46:20 . DB 0: 2092395 keys (0 volatile) in 2097152 slots HT.
> 02 Jun 02:46:20 . 4 clients connected (0 slaves), 310914 bytes in use
>
> This is ~2 mln name-value pairs taking up 300 Mb of RAM (actually, RSS of
> that process is even bigger - 440 Mb). The name-value pairs in question map
> 8-byte strings to integers. And it takes 150-200 bytes of RAM to store each.
> The dump of the same database takes up 24 Mb unarchived - 12 bytes per
> record, which is a lot more to my liking. I realize that there are indexes

There are two effects leading to this results:

1) The on-disk format is optimized pretty well.
2) The on-memory format hash a lot of overhead.

Starting from point 1: redis encodes strings on disk in an efficient
way using mulitple techniques:
a) len-prefixed strings don't use something like a 32bit integer in
network byte order, but an encoding so that for small string only 1
byte is needed as length, 2 for larget strings and finally 4 for 32
bit lengths.
b) strings that looks like numbers, that is, strings that Redis can
prove to itself that translated into a number, and then translated
again into a string, are bit by bit the same, are stored as numbers on
disk. This saves a lot of space.
c) if a string can be compressed using LZF, it gets stored compressed.

And now 2: there is *a lot* of overhead in storing stuff on Ram. we
have the hash table bucket, one for key in the best case, but 2 for
key in the average case. On 64 bit systems this alone is 48 bytes! (1
pointer for the key, 1 pointer for the value, 1 pointer for "next"
since dict.c uses chaining to resolve collisions).

Then there is the Redis object itself, another 16 bytes per object.
Every key needs two of this. 32 bytes more. And still we didn't stored
nothing inside... then there are the key/value values, sizeof(void*) +
string_length for strings,

> and therefore hash values and pointers everywhere, but 200 bytes pf RAM to
> store 16 bytes of data in a big hash table still seems somewhat excessive.
>
> So, is this a normal behavior, and is there a way to cut down the amount of
> RAM used by Redis in this scenario by a factor of 2-3?

As you can see from the Redis structures, the memory used is almost at
minimum, there are no additional fields. Expires are taken into a
secondary hash table so that no additional mem is used if expires are
not used.

I don't see a simple way to save a lot of memory... the only think
that is working is object sharing that is usually able to cut 20% of
memory with large dataset with some repeating data, but I've a problem
in turing this on by default: some day ago we got a report of Redis
crashing under load with object sharing enabled, and I'm trying to
reproduce this bug hard without results so far. But in the end this
will get fixed and we can use this strategy.

Another way to save memory is to compress large strings in memory.
With the redis architecture this is surprisingly simple. At the end of
the TODO list in the Git there are hints about how to do this. I'll
implement this stuff in After Redis 1.0 stable indeed. But when there
are a lot of short strings and values it is very very hard to save
memory.

What I'll do btw is to try to track how many memory is used in the
different parts of Redis: in the hash table, in objects allocation,
and in sds strings. Another interesting test is to try how many memory
memcached is using to store the same amount of data. I bet this will
be rougly the same, but it's absolutely worth to try it. If we get
values very similar to Redis we know that we can try hard to improve
another 10 or 20%, but it will be very hard to go over that.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

siculars

unread,

Jun 4, 2009, 12:10:16 PM6/4/09

to Redis DB

Thank you for the breakdown. This is definitely valuable to know and
could lend itself to an algorithm to determine overall overhead and
for each key under different circumstances. It is clear that percent
overhead drops as a function of key/value size which could shape the
type of data people decide to throw at redis.

Salvatore Sanfilippo

unread,

Jun 4, 2009, 12:25:56 PM6/4/09

to redi...@googlegroups.com

Hello Sinculars,

indeed for large objects the overhead starts to be much smaller. It's
for tiny objects that's huge.
Also I forgot to tell that every malloc() call uses zmalloc() in
Redis, that has a 4/8 bytes (32/64bit) overhead in order to be able to
take the count of the memory used. Without this maxmemory is not
possible.

What I plan to do btw is to #ifdef that for glibc since glibc malloc
should be able to return this kind of info without the wasted mem.

Salvatore Sanfilippo

unread,

Jun 4, 2009, 12:52:58 PM6/4/09

to redi...@googlegroups.com

On Thu, Jun 4, 2009 at 6:25 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:

> What I plan to do btw is to #ifdef that for glibc since glibc malloc
> should be able to return this kind of info without the wasted mem.

Just commited this exact change, but for macosx that exports an
interesting malloc_size() function. So from your next build of Redis
it will start using less memory. I hope glibc malloc exports a similar
funciton.

Cheers,
Slavatore

Reply all

Reply to author

Forward