appendonly.aof size

Vinodh Krishnamoorthy

unread,

Feb 28, 2012, 4:14:14 PM2/28/12

to redi...@googlegroups.com

Hi Everyone,

As appendonly.aof file size can infinitely grow - what is the best practice to ensure that it doesnt grow larger than the system capcacity? I was under the impression that BGREWRITEAOF will compress the file, but I am not sure that is really happening. I am taking backups of this file and uploading it to s3. Should I delete this aof file on the machine, once the upload is completed? In the event of a disaster or failure should I just concatenate all the aof files from s3 to get the latest aof snapshot?

My point of concern is the aof file can grow infinitely larger, and I want to know what is the best practices out there. It seems like a common problem to solve, but I have not been successful in finding good solutions.

Thanks everyone.

-Vinodh

Vinodh Krishnamoorthy

unread,

Feb 28, 2012, 4:24:16 PM2/28/12

to redi...@googlegroups.com

Should I just save the aof file in my ebs volume to be on the safer side?

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/fuXVhavUdgoJ.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

hemant adelkar

unread,

Feb 28, 2012, 5:26:37 PM2/28/12

to redi...@googlegroups.com

Hi Vinodh,

If you ready to loose some data then dump.rdb is best option as compare to
appendonly.aof because its taking 1/10 of space (both meory as well as disk )
as compare to appendonly.aof
also its taking snapshot of memory which is faster than cleaning process

Thanks
Hemant

--
Hemant S. Adelkar
Netcore Solutions Pvt. Ltd.

Tel: +91 (22) 6662 8142
Mob: +91 9773704063
Email: hemant....@netcore.co.in
Web: http://www.netcore.co.in

Vinodh Krishnamoorthy

unread,

Feb 28, 2012, 9:57:42 PM2/28/12

to redi...@googlegroups.com, redi...@googlegroups.com

Will BGRewriteaof compress the aof file? What are some best/common practices out there?

Thanks,

Vinodh

Sriharsha Setty

unread,

Feb 29, 2012, 3:06:14 AM2/29/12

to redi...@googlegroups.com

Hello,

On Wed, Feb 29, 2012 at 2:44 AM, Vinodh Krishnamoorthy <vinodh....@gmail.com> wrote:

Hi Everyone,

As appendonly.aof file size can infinitely grow - what is the best practice to ensure that it doesnt grow larger than the system capcacity? I was under the impression that BGREWRITEAOF will compress the file, but I am not sure that is really happening. I am taking backups of this file and uploading it to s3. Should I delete this aof file on the machine, once the upload is completed? In the event of a disaster or failure should I just concatenate all the aof files from s3 to get the latest aof snapshot?

We use Redis as a object cache, hence the losing data is not very critical to us as it can be warmed again. But to ensure that the AOF does not grow infinitely, we use the maxmemory variable to limit the size of the Redis database. Every object cached has an EXPIRY set and when the maxmemory is reached, Redis automatically expires keys based on an algorithm (which is mentioned in the docs).

My point of concern is the aof file can grow infinitely larger, and I want to know what is the best practices out there. It seems like a common problem to solve, but I have not been successful in finding good solutions.

Like I said, it depends on what kind of data you are storing in Redis. If you are worried about size, you can consider compressing the data (I guess this has been discussed in a separate thread). But if your data is something that you can re-build, fix the maxmemory limit, and use EXPIRES.

Hope that helps,

/harsha

Josiah Carlson

unread,

Feb 29, 2012, 4:13:41 PM2/29/12

to redi...@googlegroups.com

The file itself is compressed. More recent versions even use bulk
commands instead of single commands for adding values, which reduces
size even smaller (there was a time where a rewritten AOF could be
smaller than the equivalent dump.rdb, that may or may not still be the
case).

As for "best practices", the same best practices apply as you would
for your database. If you need daily backups, store the daily backups
in S3 or somewhere else. If you only need the most recent, only store
the most recent. The file will compress further with gzip or bzip2
(which is also the case for dump.rdb), and you should do your
cost/benefit analysis WRT time spent compressing (this is actually my
favorite article comparing the commonly used algorithms:
http://stephane.lesimple.fr/wiki/blog/lzop_vs_compress_vs_gzip_vs_bzip2_vs_lzma_vs_lzma2-xz_benchmark_reloaded
).

Regards,
- Josiah

On Tue, Feb 28, 2012 at 6:57 PM, Vinodh Krishnamoorthy

Vinodh Krishnamoorthy

unread,

Feb 29, 2012, 8:30:45 PM2/29/12

to redi...@googlegroups.com

Thank you Josiah.

We are using redis for data, we cannot afford to lose. We have a cron task, that backs up data and issues the BGREWRITEAOF command every 15 minutes. Assuming the data size increases, should I wait for a period of time to ensure that the command actually finished successfully? Should I look at redis.log to see that the execution happened successfully, or see that the child process that it forked finished successfully? What are some common techniques?

Thanks so much.

-Vinodh

Josiah Carlson

unread,

Feb 29, 2012, 10:46:04 PM2/29/12

to redi...@googlegroups.com

On Wed, Feb 29, 2012 at 5:30 PM, Vinodh Krishnamoorthy
<vinodh....@gmail.com> wrote:
> Thank you Josiah.
>
> We are using redis for data, we cannot afford to lose. We have a cron task,
> that backs up data and issues the BGREWRITEAOF command every 15 minutes.
> Assuming the data size increases, should I wait for a period of time to
> ensure that the command actually finished successfully? Should I look at

You should check the output of the INFO command, it will tell you if
AOF rewriting or a background dump is going on.

Cannot afford to lose, at all? As in, if you lose data from the most
recent 1/2 second, that would end your business? Or would it be okay
if you lost data from the most recent 1/2 second?

If you can't afford to lose *any* data, then you should buy a stack of
SSDs, set them up for RAID6 operation, set your file syncing options
to: "appendfsync always", and prepare yourself for replacing the SSDs
every 3-6 months. You'll want 2-3 slaves with that same setup.
Alternatively, set your syncing options to "appendfsync everysec", run
a few slaves (the SSDs are optional), then when you write to the
master, verify that the data you wrote is on the slaves, then wait an
additional second to ensure that the data got to disk.

If you can afford to lose up to 1 second's worth of data if something
horrible happens, set your aof file syncing options to: "appendfsync
everysec", run at least 1 slave.

In all cases, run your BGREWRITEAOF only when the AOF gets too large
to be reasonable (every 15 minutes seems excessive to me, how much are
you writing?)

If you want safe point-in-time backups to restore from, use regular
dumps and BGSAVE. Why? If you had copied an AOF between fsync
operations (which I just recommended you set to once per second), you
may end up with a partial AOF, which requires that you fix it. That
*may* be better for getting the most data, but getting an error during
a restore operation is painful. With slaves, hopefully you won't run
into needing to start from a partial AOF, you should be able to copy a
fully synced slave AOF to the master.

> redis.log to see that the execution happened successfully, or see that the
> child process that it forked finished successfully? What are some common
> techniques?

When most people realize how difficult it is to literally guarantee
that their data is safe (appendfsync everysec, or waiting at least 1
second), most people realize that they can deal with a little bit of
loss. Also, I usually recommend to push back on management until they
say "it's okay if we lose *some* data", because everything else leads
to madness (and needing to spend a lot of $ to ensure that your data
is in 5x places at the same time, is backed up to S3 and at least 2
other non-Amazon data centers, etc).

Regards,
- Josiah

Reply all

Reply to author

Forward