On Wed, Oct 3, 2012 at 3:33 AM, ticktricktrack <
rai...@incutio.com> wrote:
> We use resque in combination with redis, backed by AOF and RDB. I had two
> crashes so far caused by aof files hitting the disk limit. One even got up
> to 78GB.
>
> 1. Do you think AOF files are worth it, technically they make full
> persistence available, in practice the cause more crashes.
It depends on your use-case. If you aren't rewriting on a regular
basis automatically, or are writing so much data that your system
can't rewrite it fast enough, then it may not be right for you.
Have you configured the trigger for bgrewriteaof?
> 2. Is full persistence via slave a viable option? Advise by Felix from last
> Sunday:
>>
>> The 'fastest' way to deal with this is to turn persistence off on your
>> production server, and set up a (networked, not on the same hypervisor)
>> slave to handle persistence. It looks like the iops on your disk is
>> relatively poor; are there any other VMs on this box that might be hogging
>> the disk or hitting it hard enough to confuse the situation?
>> The next-most-pragmatic solution would be to disable aof rewrite
>> completely and do it either offline or during non-peak hours.
Using slaves to handle persistence is an option. As is rewriting
during odd hours. It all depends on your use-case, and whether your
disk can store as much data as you will be writing before a rewrite.
> 3. What disk space to memory ratio do you use?
> I have 7GB memory available for redis and prevent for overcommit by no
> longer processing small jobs into big ones. Our system has small amounts if
> data coming in [id, id, options], but large amounts going out of resque.
> I think 14 GB for the aof file might not have been enough in the first
> place. Especially since a rewrite needs quite a bit itself. Freeing up 1 GB
> wasn't enough in my case.
For dumps, my experience and expectation is that in-memory will use
5-10x what the on-disk representation uses. For AOFs, that will depend
on your write volume before your bgrewriteaof triggers. In terms of
main memory to rewrite an AOF, you run into effectively the same issue
as for dumps. Worst-case you will use 2x your main memory, but as a
practical matter, it is typically much smaller. Of course how much
memory it will use during a rewrite will depend on your disk IO
capacity, your processor load, etc., at the time of the rewrite. There
is no simple formula, especially in the cloud. Test, benchmark,
repeat, check multiple instances of the same type, ...
My general recommendation is to actually split your workloads onto
different Redis servers. If you have a resque Redis, don't use it for
caching. Start another Redis server up to run your caching, etc.
If you're already doing this, then my other recommendation is to use
the fastest spinning disk you can find for your AOF. I know, SSDs are
better in many ways, but my back-of-the-envelope calculations tells me
that constant writing of some loads may chew through write cycles
faster than is expected (a commodity 256 gig SSD with 5k write cycles
will last 1 year at 40 megs/second write volume). But if you've got
the money to buy SSDs, or rent them from Amazon, they may be a
reasonable answer.
Regards,
- Josiah