Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
redis-server taking 10 times more RAM to store data than the dump
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  5 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Alexey Verkhovsky  
View profile  
 More options Jun 1 2009, 11:03 pm
From: Alexey Verkhovsky <alexey.verkhov...@gmail.com>
Date: Mon, 1 Jun 2009 21:03:05 -0600
Local: Mon, Jun 1 2009 11:03 pm
Subject: redis-server taking 10 times more RAM to store data than the dump

Hi all,

Been playing with Redis quite a bit for the last couple of weeks. One thing
that is bugging me right now looks like this:

02 Jun 02:46:20 . DB 0: 2092395 keys (0 volatile) in 2097152 slots HT.
02 Jun 02:46:20 . 4 clients connected (0 slaves), 310914 bytes in use

This is ~2 mln name-value pairs taking up 300 Mb of RAM (actually, RSS of
that process is even bigger - 440 Mb). The name-value pairs in question map
8-byte strings to integers. And it takes 150-200 bytes of RAM to store each.
The dump of the same database takes up 24 Mb unarchived - 12 bytes per
record, which is a lot more to my liking. I realize that there are indexes
and therefore hash values and pointers everywhere, but 200 bytes pf RAM to
store 16 bytes of data in a big hash table still seems somewhat excessive.

So, is this a normal behavior, and is there a way to cut down the amount of
RAM used by Redis in this scenario by a factor of 2-3?

--
Alexey Verkhovsky
http://alex-verkhovsky.blogspot.com/
CruiseControl.rb [http://cruisecontrolrb.thoughtworks.com]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Salvatore Sanfilippo  
View profile  
 More options Jun 3 2009, 11:20 am
From: Salvatore Sanfilippo <anti...@gmail.com>
Date: Wed, 3 Jun 2009 17:20:49 +0200
Local: Wed, Jun 3 2009 11:20 am
Subject: Re: redis-server taking 10 times more RAM to store data than the dump
On Tue, Jun 2, 2009 at 5:03 AM, Alexey Verkhovsky

<alexey.verkhov...@gmail.com> wrote:
> Hi all,

Hello Alexey!

> Been playing with Redis quite a bit for the last couple of weeks. One thing
> that is bugging me right now looks like this:

> 02 Jun 02:46:20 . DB 0: 2092395 keys (0 volatile) in 2097152 slots HT.
> 02 Jun 02:46:20 . 4 clients connected (0 slaves), 310914 bytes in use

> This is ~2 mln name-value pairs taking up 300 Mb of RAM (actually, RSS of
> that process is even bigger - 440 Mb). The name-value pairs in question map
> 8-byte strings to integers. And it takes 150-200 bytes of RAM to store each.
> The dump of the same database takes up 24 Mb unarchived - 12 bytes per
> record, which is a lot more to my liking. I realize that there are indexes

There are two effects leading to this results:

1) The on-disk format is optimized pretty well.
2) The on-memory format hash a lot of overhead.

Starting from point 1: redis encodes strings on disk in an efficient
way using mulitple techniques:
a) len-prefixed strings don't use something like a 32bit integer in
network byte order, but an encoding so that for small string only 1
byte is needed as length, 2 for larget strings and finally 4 for 32
bit lengths.
b) strings that looks like numbers, that is, strings that Redis can
prove to itself that translated into a number, and then translated
again into a string, are bit by bit the same, are stored as numbers on
disk. This saves a lot of space.
c) if a string can be compressed using LZF, it gets stored compressed.

And now 2: there is *a lot* of overhead in storing stuff on Ram. we
have the hash table bucket, one for key in the best case, but 2 for
key in the average case. On 64 bit systems this alone is 48 bytes! (1
pointer for the key, 1 pointer for the value, 1 pointer for "next"
since dict.c uses chaining to resolve collisions).

Then there is the Redis object itself, another 16 bytes per object.
Every key needs two of this. 32 bytes more. And still we didn't stored
nothing inside... then there are the key/value values, sizeof(void*) +
string_length for strings,

> and therefore hash values and pointers everywhere, but 200 bytes pf RAM to
> store 16 bytes of data in a big hash table still seems somewhat excessive.

> So, is this a normal behavior, and is there a way to cut down the amount of
> RAM used by Redis in this scenario by a factor of 2-3?

As you can see from the Redis structures, the memory used is almost at
minimum, there are no additional fields. Expires are taken into a
secondary hash table so that no additional mem is used if expires are
not used.

I don't see a simple way to save a lot of memory... the only think
that is working is object sharing that is usually able to cut 20% of
memory with large dataset with some repeating data, but I've a problem
in turing this on by default: some day ago we got a report of Redis
crashing under load with object sharing enabled, and I'm trying to
reproduce this bug hard without results so far. But in the end this
will get fixed and we can use this strategy.

Another way to save memory is to compress large strings in memory.
With the redis architecture this is surprisingly simple. At the end of
the TODO list in the Git there are hints about how to do this. I'll
implement this stuff in After Redis 1.0 stable indeed. But when there
are a lot of short strings and values it is very very hard to save
memory.

What I'll do btw is to try to track how many memory is used in the
different parts of Redis: in the hash table, in objects allocation,
and in sds strings. Another interesting test is to try how many memory
memcached is using to store the same amount of data. I bet this will
be rougly the same, but it's absolutely worth to try it. If we get
values very similar to Redis we know that we can try hard to improve
another 10 or 20%, but it will be very hard to go over that.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
siculars  
View profile  
 More options Jun 4 2009, 12:10 pm
From: siculars <sicul...@gmail.com>
Date: Thu, 4 Jun 2009 09:10:16 -0700 (PDT)
Local: Thurs, Jun 4 2009 12:10 pm
Subject: Re: redis-server taking 10 times more RAM to store data than the dump
Thank you for the breakdown. This is definitely valuable to know and
could lend itself to an algorithm to determine overall overhead and
for each key under different circumstances. It is clear that percent
overhead drops as a function of key/value size which could shape the
type of data people decide to throw at redis.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Salvatore Sanfilippo  
View profile  
 More options Jun 4 2009, 12:25 pm
From: Salvatore Sanfilippo <anti...@gmail.com>
Date: Thu, 4 Jun 2009 18:25:56 +0200
Local: Thurs, Jun 4 2009 12:25 pm
Subject: Re: redis-server taking 10 times more RAM to store data than the dump

On Thu, Jun 4, 2009 at 6:10 PM, siculars <sicul...@gmail.com> wrote:

> Thank you for the breakdown. This is definitely valuable to know and
> could lend itself to an algorithm to determine overall overhead and
> for each key under different circumstances. It is clear that percent
> overhead drops as a function of key/value size which could shape the
> type of data people decide to throw at redis.

Hello Sinculars,

indeed for large objects the overhead starts to be much smaller. It's
for tiny objects that's huge.
Also I forgot to tell that every malloc() call uses zmalloc() in
Redis, that has a 4/8 bytes (32/64bit) overhead in order to be able to
take the count of the memory used. Without this maxmemory is not
possible.

What I plan to do btw is to #ifdef that for glibc since glibc malloc
should be able to return this kind of info without the wasted mem.

Cheers,
Salvatore


--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Salvatore Sanfilippo  
View profile  
 More options Jun 4 2009, 12:52 pm
From: Salvatore Sanfilippo <anti...@gmail.com>
Date: Thu, 4 Jun 2009 18:52:58 +0200
Local: Thurs, Jun 4 2009 12:52 pm
Subject: Re: redis-server taking 10 times more RAM to store data than the dump

On Thu, Jun 4, 2009 at 6:25 PM, Salvatore Sanfilippo <anti...@gmail.com> wrote:
> What I plan to do btw is to #ifdef that for glibc since glibc malloc
> should be able to return this kind of info without the wasted mem.

Just commited this exact change, but for macosx that exports an
interesting malloc_size() function. So from your next build of Redis
it will start using less memory. I hope glibc malloc exports a similar
funciton.

Cheers,
Slavatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »