Guess the actual memory usage from size of dump file?

kuno

unread,

Apr 4, 2012, 9:55:06 PM4/4/12

to Redis DB

Hi all:
Glod to be here.
I am new to redis, I wondering Is there any way to guess the actual
memory usage of redis based on size of the dump file?
For example, suppose the size of current dump file of redis is 10M,
then, what is the best chance of the actual memory usage of redis?

Best
--kuno

Josiah Carlson

unread,

Apr 4, 2012, 11:26:39 PM4/4/12

to redi...@googlegroups.com

On 64 bit machines, I found a good rule was that the dump would be
about 1/10 the size of the data in memory. So if you have a 10M dump,
I wouldn't expect Redis to use more than 100M.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>

Sripathi Krishnan

unread,

Apr 5, 2012, 12:02:20 AM4/5/12

to redi...@googlegroups.com

I have a python script that estimates the in memory size of each object by parsing the dump.rdb file. See https://github.com/sripathikrishnan/redis-rdb-tools

If you run the script, sum up the size of all objects, and add about 15% for overheads, you will get a fair estimate of the in-memory size of your data.

The script parses the dump file into (key, value, expiry, data type) tuples. Depending on the data type, it estimates the size of the object. For example, for a hash, the logic is as follows -

hash_overhead = 56 + 4*sizeof_pointer + next_power_of_two(number_of_elements) * sizeof_pointer * 2
entry_overhead = 3 * sizeof_pointer
hash_size = hash overhead + number_of_elements * entry_overhead

This logic is reverse engineered from dict.h. The script has similar logic for each of the datatypes, plus some logic for estimating the overheads for storing key expiry information and so on.

--Sri

Salvatore Sanfilippo

unread,

Apr 5, 2012, 4:05:17 AM4/5/12

to redi...@googlegroups.com

On Thu, Apr 5, 2012 at 5:26 AM, Josiah Carlson <josiah....@gmail.com> wrote:
> On 64 bit machines, I found a good rule was that the dump would be
> about 1/10 the size of the data in memory. So if you have a 10M dump,
> I wouldn't expect Redis to use more than 100M.

This used to work in the past, but now starting from 2.4.

Example, common data sets are composed of small hashes. Now that RDB
will have the same representation of small hashes as the one that
Redis uses in memory, the RDB size and the RAM used will be in the
same order of magnitude.

It's simply too dataset dependent...

Salvatore

>
> Regards,
> - Josiah
>
> On Wed, Apr 4, 2012 at 6:55 PM, kuno <neo...@gmail.com> wrote:
>> Hi all:
>> Glod to be here.
>> I am new to redis, I wondering Is there any way to guess the actual
>> memory usage of redis based on size of the dump file?
>> For example, suppose the size of current dump file of redis is 10M,
>> then, what is the best chance of the actual memory usage of redis?
>>
>>
>> Best
>> --kuno
>>
>> --
>> You received this message because you are subscribed to the Google Groups "Redis DB" group.
>> To post to this group, send email to redi...@googlegroups.com.
>> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Didier Spezia

unread,

Apr 5, 2012, 4:08:25 AM4/5/12

to redi...@googlegroups.com

Hi Sri,

I think the scripts you provide could be improved, by:

- accounting for the robj structures (follow-up of your comments on stackoverflow)

I believe robj objects are used on a per entry basis into Redis set and hash objects.

1 robj per entry for set and zset, 2 robj per entry for hash

See setDictType, zsetDictType, hashDictType structures in redis.c

- add an option to take internal fragmentation of the memory allocator in account.

For instance when 24 bytes are required, jemalloc will give 32 (because it is how the allocation classes are defined), and 8 bytes will remain unused and lost to Redis.

This is platform and memory allocator dependent though.

Thanks for these RDB parsing scripts anyway, they are useful.

Regards,

Didier.

Sripathi Krishnan

unread,

Apr 7, 2012, 12:36:46 PM4/7/12

to redi...@googlegroups.com

Hi Didier,

Thanks for your pointers!

I fixed my scripts to account for the following -

robj wrappers for each element in a list/set/hash as you pointed out
Each call to zmalloc has a fixed overhead of 8 bytes (one size_t). I wasn't accounting for this earlier

With these changes, the reported and actual memory is now within 5%.

Next up : Generating a html report with a) keys using max memory b) distribution of memory across data types c) Grouping memory usage by key prefixes - for example memory used user:*

--Sri

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.

To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/bi6Js47aW_4J.

Andy

unread,

Apr 7, 2012, 7:15:39 PM4/7/12

to Redis DB

On Apr 5, 4:05 am, Salvatore Sanfilippo <anti...@gmail.com> wrote:
> This used to work in the past, but now starting from 2.4.

> the RDB size and the RAM used will be in the
> same order of magnitude.

Does that mean starting from 2.4 the memory usage has gone down or the
RDB size has gone up?

Sripathi Krishnan

unread,

Apr 7, 2012, 11:09:57 PM4/7/12

to redi...@googlegroups.com

Does that mean starting from 2.4 the memory usage has gone down or the RDB size has gone up?

Memory usage has gone down.

Small hashes are encoded as zipmap (or ziplist in 2.6). This is a compact representation, but the cost of operations is O(N). For small maps, this is a very good tradeoff. More information over here - http://redis.io/topics/memory-optimization

--Sri

Reply all

Reply to author

Forward