On Wed, Oct 17, 2012 at 2:54 AM, Xavier Noria <
f...@hashref.com> wrote:
> El dimecres 17 d’octubre de 2012 0:45:51 UTC+2, Josiah Carlson va escriure:
>
>> On Tue, Oct 16, 2012 at 11:16 AM, Xavier Noria <
f...@hashref.com> wrote:
>> > But I wonder about that comment in a recent post of Salvatore that says
>> > that
>> > write-intensive usage could double RAM. Since the script is (I'd say)
>> > write-intensive, I wonder if these numbers are what I am looking for, or
>> > should I correct them somehow.
>>
>> I'm sure that comment was only in relation to what happens during
>> BGSAVE or BGREWRITEAOF (or the automatic triggers of them).
>>
>> And that memory doubling is only during the saving/rewriting
>> operation. Once they complete, memory usage goes back to normal.
>
>
> Yes, but if you need to do an estimation to provision your servers, which
> figure do you base the estimation on?
This is the sequence that I do:
1. Put as much data in memory as I can to reasonably estimate reality
(1/10, 1/100, 1/2, etc., pick one), then multiply it out to get the
"true" number
2. Add 20% for memory fragmentation or data inconsistency.
3. Calculate my number of operations per second and how many keys are
volatile (have expiration times) during the rewrite
4. Calculate a worst-case increase in memory consumption at 4k *
(ops/s + volatile keys)
5. If the increase in memory consumption is > the memory use +
fragmentation, then call it double. If it's less, then use the lesser
number
6. If slaving is going to be done, calculate the size of the dump for
initial slaving, and multiply it out if you will have more than one
slave, and add that in
7. If your network is not fast, and sync times are large, you may need
to add in some memory for batched up commands during sync
8. If pubsub will be used, multiply the buffer size limit for pubsub
times the expected number of clients, and add that in
For example:
1. 4G
2. + 800M
3. dump takes 5 seconds, producing a 400M dump, at 10k writes/second,
20k volatile keys...
4. + (5 * 10k + 20k) * 4k = 280M
5. Not double, use 280M
6. 2 slaves, so + 2 * 400M
7. Fast network, not applicable
8. 50 pubsub clients, 8M each so +400M
Total: 4G(1) + 800M(2) + 280M(4) + 800M(6) + 400M(8) = 6.28G estimated
required memory, including rewrite/pubsub overhead, and key
modifications
Some of those factors matter more, some less. But if you want to get
the best estimate, follow those steps.
> My point is that *since the script is (I'd say) write-intensive*, does the
> reported memory usage account for those peaks on saving? Can I calculate RAM
> from them?
Apples and oranges. You aren't writing during a BGSAVE/BGREWRITEAOF ,
and you aren't checking *system* memory use. The question is how many
memory pages are written to by the main process and the child process.
The moment a page is written to, it stops being shared after a fork,
increasing memory usage by 4k. If you are using 4G of memory, you must
have at least 1M writes during the BGSAVE/BGREWRITEAOF in order to
double memory usage (something like "zinterstore keyX 1 keyX weights
.5" would actually perform a number of writes that is 2 times the
number of elements in the ZSET, as an example, so you have to be
careful).
> Note that the memory peak info is similar to the regular memory info, see
> script pasted below.
For your example definitely, because you aren't deleting information.
And actually, that number is relatively unrelated to actual memory
used by Redis during BGSAVE/BGREWRITEAOF, because Redis isn't
calculating (I don't know if it can) the amount of *shared* memory
between the main and child process during BGSAVE/BGREWRITEAOF.
Regards,
- Josiah