how does copy-on-write works

558 views
Skip to first unread message

Bread Pig

unread,
Apr 20, 2014, 10:00:05 PM4/20/14
to redi...@googlegroups.com

I have redis setup like this. A 12gb VM running ubuntu 12.04, 100gb disk space, 24gb file swap. With only redis 2.8.8 running in the vm.

I used redis.conf as below. The rest are per default values.

save 2 500000
rdbchecksum no
appendonly yes
appendfsync no


As I continuously write data to redis, the memory used by copy-on-write keeps increasing. Even though I write my program to sleep long enough so that redis will be able to finish all the background save (last memory message is 0 MB of memory used by copy-on-write), the next background save will go back to the high number.

Example,

1300MB of memory used by cow

1400MB of memory used by cow

0MB of memory used by cow

1500MB of memory used by cow


I am writing around 500000 data per 3.5 sec, using Jedis pipeline. After 10*500000 successful writes, the program sleeps for 30 sec before continuing. I thought it will help redis to finish doing all saves in this way.

But at around 200*500000 writes, a background save took 4hrs to finish. Subsequently save never finish even after more than 6 hrs. What exactly do all these means? Why does the memory for cow keep increasing? Also, with each background save that is of high memory used, redis seems non-functional. Jedis always hit the socket timeout exception.

Josiah Carlson

unread,
Apr 21, 2014, 7:12:29 PM4/21/14
to redi...@googlegroups.com
It would seem that you are misundersanding how Redis handles snapshotting, what the SAVE configuration does, and the meaning of log entries reading "memory used by copy-on-write". So let me try to explain.

First, the save configuration "SAVE X Y" means "if in the last X or more seconds you have at least Y changes, perform a BGSAVE if one is not already in process".

What happens during a BGSAVE is that Redis forks itself, and while the master keeps accepting commands, the child process creates a new snapshot file and replaces the old one.

After the child finishes replacing the old snapshot file, it prints a log message based on how much private data the child used. This basically counts how much modification the parent process performed compared to the child (including read buffers, write buffers, data modifications, ...).


So... when you configure your Redis with "SAVE 2 500000" and you start writing your chunks, what happens is that after the first chunk gets in, Redis forks and starts a BGSAVE. While Redis is doing the snapshot, you keep writing. When Redis finishes its snapshot, you are still writing, so almost immediately starts snapshotting again. When you finally get around to sleeping for 30 seconds, Redis may just be starting or finishing another BGSAVE, etc.

As for your timeouts when Redis gets enough information along with slow BGSAVE times, that's simply because you are exhausting your main memory and going into swap. When Redis starts swapping, basically all operations become slow, including client requests.


All of this said, why do you *need* to have a fresh snapshot as often as possible?

If you are interested in getting periodic snapshots with zero memory use on copy-on-write, then as long as you don't need to perform operations while you are waiting for the snapshot to occur, you can disable your save configuration and run an explicit "SAVE" command when you want/need to create a new snapshot. That will keep you from using extra memory. If you set your Jedis client timeout high enough so it doesn't time-out, then it should properly return when the SAVE is complete. Incidentally, running experiments with some 30gig memory used VMs, I saw BGSAVE times of ~20 minutes, but when using SAVE they were closer to ~3 minutes.

 - Josiah




--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Bread Pig

unread,
Apr 22, 2014, 12:26:51 AM4/22/14
to redi...@googlegroups.com
Hi, thanks for the detailed explanation. Indeed I have misunderstood. Because the data did not have any expiration, and because eventually the memory will be exhausted, I thought I have to do the save or bgsave.

So actually if I do not need persistency, I can disable save or bgsave, and just continue writing in data? With save disabled, Even though I have 10gb of data and redis has 6gb, I can still write all to redis and at the end of it, I can still search for the very first data that is written? Because there is no expiration and redis will not evict but swap the out the very first data, and swap it back in when I do the search?

Josiah Carlson

unread,
Apr 22, 2014, 4:27:39 AM4/22/14
to redi...@googlegroups.com
I think you may misunderstand what Redis is. Redis stores *all* of its data in RAM. That is its primary design feature and requirement. While there is a fork of Redis called Redis-nds that offers on-disk storage with more or less reasonable performance depending on your needs, the standard version of Redis keeps everything in RAM.

Yes, if you want Redis to have your data after it restarts, you need to snapshot or write to the AOF... but once Redis goes past the available memory on your machine, you are going to have a bad time. It's going to be very slow - that 6 hour BGSAVE you mentioned before.

Want my advice? Pick one or more of:
* get a bigger server to host Redis
* find a way to shard your data and get more servers
* figure out whether you can sample your data
* try Redis-nds

Letting Redis get into swap is a recipe for pain.

 - Josiah



On Mon, Apr 21, 2014 at 9:26 PM, Bread Pig <brea...@gmail.com> wrote:
Hi, thanks for the detailed explanation. Indeed I have misunderstood. Because the data did not have any expiration, and because eventually the memory will be exhausted, I thought I have to do the save or bgsave.

So actually if I do not need persistency, I can disable save or bgsave, and just continue writing in data? With save disabled, Even though I have 10gb of data and redis has 6gb, I can still write all to redis and at the end of it, I can still search for the very first data that is written? Because there is no expiration and redis will not evict but swap the out the very first data, and swap it back in when I do the search?

Josiah Carlson

unread,
Apr 22, 2014, 4:29:02 AM4/22/14
to redi...@googlegroups.com
Alternatively, depending on your particular problem, there may be a different data modeling in Redis or somewhere else that can solve your problem better than Redis.

 - Josiah

Bre@dPiG

unread,
Apr 22, 2014, 4:51:38 AM4/22/14
to redi...@googlegroups.com

Hi, I understand redis is a memory store. But I am unsure what happens when I am writing more data than the available memory. Can enlighten me on that?

If redis has 6gb and my data is 10gb,

1) if the keys have expiration set but have not expired, and when redis mem is max out, what will redis do?

2) if the keys do not have expiration and redis mem is max out, what will redis do?

On the above scenarios, my concern is as data is being written after redis mem has max out, are the initial few keys still retrievable if they have not expired or do not have expiration defined?

Thanks!



> You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/UddrdYxeFOk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.

Josiah Carlson

unread,
Apr 22, 2014, 11:01:29 AM4/22/14
to redi...@googlegroups.com
On Tue, Apr 22, 2014 at 1:51 AM, Bre@dPiG <brea...@gmail.com> wrote:

Hi, I understand redis is a memory store. But I am unsure what happens when I am writing more data than the available memory. Can enlighten me on that?

Modern operating systems (since the late 80's) use a technology called "virtual memory" to address more memory than you actually have, then moves what it considers to be "unused" memory to your hard disk to what is called (on Windows) a swap file or (just about everywhere else) a swap partition. This makes your application believe that it can keep operating without issue, though many applications have fairly severe performance issues in such a case, including Redis.

If redis has 6gb and my data is 10gb,

1) if the keys have expiration set but have not expired, and when redis mem is max out, what will redis do?

If you set your 'maxmemory-policy' to a valid policy that isn't 'noeviction;, then Redis will delete old data using the algorithm defined for that policy. If Redis can't delete memory, it will stop accepting write operations.

2) if the keys do not have expiration and redis mem is max out, what will redis do?

If you are using a 'maxmemory-policy' of 'allkeys-random' or 'allkeys-lru', Redis will delete old data. Otherwise Redis will stop accepting write operations.

On the above scenarios, my concern is as data is being written after redis mem has max out, are the initial few keys still retrievable if they have not expired or do not have expiration defined?

They might be, it depends on how lucky they get during the eviction process. This is why I recommended that you find a bigger machine, find a way to partition your data, try to reduce your data, or try Redis-nds. All are potentially workable solutions to your problem, at least as far as I can observe with the limited information you've given us.

 - Josiah

Bre@dPiG

unread,
Apr 22, 2014, 1:02:42 PM4/22/14
to redi...@googlegroups.com
Hi Josiah,

Thank you! I realized my understanding of eviction and swapping were wrong to begin with. It's embarrassing.
1) I thought when there is noeviction and no expiration for data, Redis will swap old data to swap file/partition.
You have shown me that these are 2 different things.

2) I thought AOF or saves work similarly to conventional DB whereby all data is being persisted.
You have shown me that Redis persistency is just for the purpose of reloading the data back into memory after failure/restart.



Thanks! 
Is there no one else? Is there no one else!

Josiah Carlson

unread,
Apr 22, 2014, 1:47:42 PM4/22/14
to redi...@googlegroups.com
Being wrong is part of learning. There is no embarrassment in learning something new or having been wrong in the past.

Ii you were giving advice or making claims based on incorrect information, that could be embarrassing, but politicians are do it all the time, so at least it isn't fatal. ;)

 - Josiah
Reply all
Reply to author
Forward
0 new messages