how to estimate RAM usage

6807 views
Skip to first unread message

Xavier Noria

unread,
Oct 16, 2012, 2:16:56 PM10/16/12
to redi...@googlegroups.com
I am running some tests to try to estimate the RAM needed to store certain amount of data. For example, N lists of M elements, where M < list-max-ziplist-entries.

In order to do that, I am running scripts like the one pasted below with a smaller n, with the intention to extrapolate to the real big N.

But I wonder about that comment in a recent post of Salvatore that says that write-intensive usage could double RAM. Since the script is (I'd say) write-intensive, I wonder if these numbers are what I am looking for, or should I correct them somehow.

What do you recommend?

Xavier

require 'redis'
require 'redis/connection/hiredis'

redis = Redis.new
redis.flushall

value = 2**63

10_000.times do |n|
  500.times do
    redis.lpush("foo_#{n}", value)
  end
  puts redis.info['used_memory_human'] if n % 1_000 == 0
end

puts redis.info['used_memory_human']

__END__
919.97K
11.22M
21.56M
31.90M
42.24M
52.60M
62.93M
73.26M
83.59M
93.98M
104.30M

Josiah Carlson

unread,
Oct 16, 2012, 6:45:43 PM10/16/12
to redi...@googlegroups.com
On Tue, Oct 16, 2012 at 11:16 AM, Xavier Noria <f...@hashref.com> wrote:
> But I wonder about that comment in a recent post of Salvatore that says that
> write-intensive usage could double RAM. Since the script is (I'd say)
> write-intensive, I wonder if these numbers are what I am looking for, or
> should I correct them somehow.

I'm sure that comment was only in relation to what happens during
BGSAVE or BGREWRITEAOF (or the automatic triggers of them).

And that memory doubling is only during the saving/rewriting
operation. Once they complete, memory usage goes back to normal.

Regards,
- Josiah

Xavier Noria

unread,
Oct 17, 2012, 5:54:51 AM10/17/12
to redi...@googlegroups.com
El dimecres 17 d’octubre de 2012 0:45:51 UTC+2, Josiah Carlson va escriure:
Yes, but if you need to do an estimation to provision your servers, which figure do you base the estimation on?

My point is that *since the script is (I'd say) write-intensive*, does the reported memory usage account for those peaks on saving? Can I calculate RAM from them?

Note that the memory peak info is similar to the regular memory info, see script pasted below.

Xavier

require 'redis'
require 'redis/connection/hiredis'

def mem(redis)
  puts "%7s, %7s" % [redis.info['used_memory_human'], redis.info['used_memory_peak_human']]
end

redis = Redis.new
redis.flushall

value = 2**63

10_000.times do |n|
  500.times do
    redis.lpush("foo_#{n}", value)
  end
  mem(redis) if n % 1_000 == 0
end

mem(redis)

__END__
919.83K, 900.75K
 11.22M,  11.21M
 21.56M,  21.55M
 31.90M,  31.88M
 42.24M,  42.23M
 52.60M,  52.59M
 62.93M,  62.92M
 73.26M,  73.25M
 83.59M,  83.58M
 93.98M,  93.97M
104.30M, 104.27M
 

Josiah Carlson

unread,
Oct 17, 2012, 10:49:19 AM10/17/12
to redi...@googlegroups.com
On Wed, Oct 17, 2012 at 2:54 AM, Xavier Noria <f...@hashref.com> wrote:
> El dimecres 17 d’octubre de 2012 0:45:51 UTC+2, Josiah Carlson va escriure:
>
>> On Tue, Oct 16, 2012 at 11:16 AM, Xavier Noria <f...@hashref.com> wrote:
>> > But I wonder about that comment in a recent post of Salvatore that says
>> > that
>> > write-intensive usage could double RAM. Since the script is (I'd say)
>> > write-intensive, I wonder if these numbers are what I am looking for, or
>> > should I correct them somehow.
>>
>> I'm sure that comment was only in relation to what happens during
>> BGSAVE or BGREWRITEAOF (or the automatic triggers of them).
>>
>> And that memory doubling is only during the saving/rewriting
>> operation. Once they complete, memory usage goes back to normal.
>
>
> Yes, but if you need to do an estimation to provision your servers, which
> figure do you base the estimation on?

This is the sequence that I do:
1. Put as much data in memory as I can to reasonably estimate reality
(1/10, 1/100, 1/2, etc., pick one), then multiply it out to get the
"true" number
2. Add 20% for memory fragmentation or data inconsistency.
3. Calculate my number of operations per second and how many keys are
volatile (have expiration times) during the rewrite
4. Calculate a worst-case increase in memory consumption at 4k *
(ops/s + volatile keys)
5. If the increase in memory consumption is > the memory use +
fragmentation, then call it double. If it's less, then use the lesser
number
6. If slaving is going to be done, calculate the size of the dump for
initial slaving, and multiply it out if you will have more than one
slave, and add that in
7. If your network is not fast, and sync times are large, you may need
to add in some memory for batched up commands during sync
8. If pubsub will be used, multiply the buffer size limit for pubsub
times the expected number of clients, and add that in

For example:
1. 4G
2. + 800M
3. dump takes 5 seconds, producing a 400M dump, at 10k writes/second,
20k volatile keys...
4. + (5 * 10k + 20k) * 4k = 280M
5. Not double, use 280M
6. 2 slaves, so + 2 * 400M
7. Fast network, not applicable
8. 50 pubsub clients, 8M each so +400M

Total: 4G(1) + 800M(2) + 280M(4) + 800M(6) + 400M(8) = 6.28G estimated
required memory, including rewrite/pubsub overhead, and key
modifications


Some of those factors matter more, some less. But if you want to get
the best estimate, follow those steps.

> My point is that *since the script is (I'd say) write-intensive*, does the
> reported memory usage account for those peaks on saving? Can I calculate RAM
> from them?

Apples and oranges. You aren't writing during a BGSAVE/BGREWRITEAOF ,
and you aren't checking *system* memory use. The question is how many
memory pages are written to by the main process and the child process.
The moment a page is written to, it stops being shared after a fork,
increasing memory usage by 4k. If you are using 4G of memory, you must
have at least 1M writes during the BGSAVE/BGREWRITEAOF in order to
double memory usage (something like "zinterstore keyX 1 keyX weights
.5" would actually perform a number of writes that is 2 times the
number of elements in the ZSET, as an example, so you have to be
careful).

> Note that the memory peak info is similar to the regular memory info, see
> script pasted below.

For your example definitely, because you aren't deleting information.
And actually, that number is relatively unrelated to actual memory
used by Redis during BGSAVE/BGREWRITEAOF, because Redis isn't
calculating (I don't know if it can) the amount of *shared* memory
between the main and child process during BGSAVE/BGREWRITEAOF.


Regards,
- Josiah

Salvatore Sanfilippo

unread,
Oct 17, 2012, 3:59:36 PM10/17/12
to redi...@googlegroups.com
I like thi spost Josiah, thank you, we need something like that at redis.io

However I'm not sure why you account for dump size if there are slaves.
Another thing about ops/sec for COW, it's a good idea and you probably
want to be very conservative about that, but in most cases it's ok to
just count writes per sec.

Cheers,
Salvatore
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>



--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org

Beauty is more important in computing than anywhere else in technology
because software is so complicated. Beauty is the ultimate defence
against complexity.
— David Gelernter

Josiah Carlson

unread,
Oct 17, 2012, 4:38:10 PM10/17/12
to redi...@googlegroups.com
On Wed, Oct 17, 2012 at 12:59 PM, Salvatore Sanfilippo
<ant...@gmail.com> wrote:
> I like this post Josiah, thank you, we need something like that at redis.io

Feel free to copy/paste and use it in the docs. Actually, with a bit
of extra time, a bit of javascript could be useful for keeping a
running total for the calculator :)

> However I'm not sure why you account for dump size if there are slaves.

To account for outgoing Redis buffers when writing to the slave. I was
under the impression that Redis copied it all into memory, am I
mistaken? Still, it makes sense to keep 1 copy in consideration, as
the OS will cache the file (if only temporarily), which will affect
usable system memory.

> Another thing about ops/sec for COW, it's a good idea and you probably
> want to be very conservative about that, but in most cases it's ok to
> just count writes per sec.

Indeed. It depends on what is an easy number to get :)

Regards,
- Josiah

Salvatore Sanfilippo

unread,
Oct 17, 2012, 4:53:03 PM10/17/12
to redi...@googlegroups.com
On Wed, Oct 17, 2012 at 10:38 PM, Josiah Carlson
<josiah....@gmail.com> wrote:

> Feel free to copy/paste and use it in the docs. Actually, with a bit
> of extra time, a bit of javascript could be useful for keeping a
> running total for the calculator :)

Thank you :-) Currently I've very little bandwidth for a number of
reasons, but I want to get the following things done ASAP in a page
about Redis memory usage:

1) Informations similar to the ones you provided about how to estimate
memory usage.
2) A memory usage benchmark / regression, that takes into account how
many objects of a given type (that maps very well to real world data)
you can fit into 1 MB of Redis memory. For instance classes of objects
can be: 1) An object representing an user (hash). 2) A small string
mapping an username to a counter. 3) A bigger string containing an
average HTML page. 4) A Set of integers with a number of items in the
range of a few hundreds, that is a very common object used to map
users to: friends, tags, ...
3) Patterns involving memory usage. There are a of things that are
possible and most users don't know. From GETRANGE/SETRANGE, to bit
operations, encoding objects in json *as an hash value* (see lamernews
comments system) and so forth.

>> However I'm not sure why you account for dump size if there are slaves.
>
> To account for outgoing Redis buffers when writing to the slave. I was
> under the impression that Redis copied it all into memory, am I
> mistaken? Still, it makes sense to keep 1 copy in consideration, as
> the OS will cache the file (if only temporarily), which will affect
> usable system memory.

Redis does not take a copy in memory, and I think it is safe to don't
account for the kernel caches as the file is transmitted via tcp to
the other instance using a small user-space buffer. On the other side,
where there is the problem of blocking when writing the file because
*if there is a lot of free memory* the kernel may retain pages in
memory, we call fflush() from time to time starting from 2.6.

I think it's pretty safe to remove this form the math.

Cheers,
Salvatore

>
>> Another thing about ops/sec for COW, it's a good idea and you probably
>> want to be very conservative about that, but in most cases it's ok to
>> just count writes per sec.
>
> Indeed. It depends on what is an easy number to get :)
>
> Regards,
> - Josiah
>

M. Edward (Ed) Borasky

unread,
Oct 17, 2012, 5:01:09 PM10/17/12
to redi...@googlegroups.com
I'm very interested in this on Linux, especially integrating it with
packages like 'iostat' / 'sar' and 'pmap'.

http://www.cyberciti.biz/tips/howto-find-memory-used-by-program.html

I can't help you on BSD or Windows, though - I'm strictly a Linux guy.

I'm in the process of resurrecting all the scripts I did back in 2008
- 2009, mostly on Gentoo, with PostgreSQL as the database. They're "up
on Github somewhere", but I pretty much put them aside until now.
Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
Workbench: http://znmeb.github.com/Computational-Journalism-Publishers-Workbench/

How the Hell can the lion sleep with all those people singing "A weem
oh way!" at the top of their lungs?

Dbz Fan

unread,
Apr 5, 2015, 3:39:34 PM4/5/15
to redi...@googlegroups.com
"calculate the size of the dump for initial slaving".What does that mean?
Also i will be using redis as transient storage .so the pub/sub thing wont apply right?
Reply all
Reply to author
Forward
0 new messages