Redis server collapsing

385 views
Skip to first unread message

ity

unread,
Jan 27, 2011, 12:05:24 PM1/27/11
to Redis DB
Hey, we are seeing a BGSAVE to cause redis-server to collapse
occassionally.

I'm running on an EC2 instance with 17gb RAM and redis server 2.0.2
using persistence settings so that a save usually happens every 120
seconds.

Once it forks, the second redis-server shoots up to 100% CPU usage so
if redis-server is running then every 120 secs the forked process
saves to disk. But this behaviour is not consistent, as in redis-
server will run smoothly with the background saves occurring normally
for hours and suddenly there will be a spike in memory usage (caused
usually at the same time a BGSAVE occurs) and it will cause redis to
collapse. This is happening more frequently as of late. I have also
attached a graph which shows this behaviour.

Obviously, I want to keep the memory and CPU usage as low as possible,
but I need the data persisted.

The redis data is about 1.4gb on disk, and rotates completely every 24
hours (as in, the data from hour 1 should be completely gone by the
end of the 24 hours, due to expiration). Basically, what we are
struggling with keeping the server running for a day without failing.

Thanks!

Salvatore Sanfilippo

unread,
Jan 27, 2011, 12:26:49 PM1/27/11
to redi...@googlegroups.com
Hello,

what it means collapsing exactly? as in crashing, not replying to
queries for N seconds, and so forth.
What is the Redis process RSS?

How many queries per second the server receives, and how much time it
takes for the BGSAVE to complete?

It seems like that you run out of memory during a BGSAVE.

Cheers,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Jeremy Zawodny

unread,
Jan 27, 2011, 12:37:12 PM1/27/11
to redi...@googlegroups.com
I'd suggest looking at 2.2 since it's better with memory management/use and that makes it a bit more friendly during BGSAVE operations.  My guess is that you're low on RAM and starting to swap which means that this will only get worse as your dataset grows.  Eventually you'll need to either add RAM or bring a slave online and use it as your backup/failover mechanism.

Jeremy

Salvatore Sanfilippo

unread,
Jan 27, 2011, 12:38:55 PM1/27/11
to redi...@googlegroups.com
Exactly, as Jeremy implicitly suggesting, 2.2 will use little ram
during BGSAVE as long as you mostly read from the dataset, and this
could do a big difference.

Also in general 2.2 will use less memory. Switching is highly recommended.

Cheers,
Salvatore

--

Ity

unread,
Jan 27, 2011, 12:44:57 PM1/27/11
to redi...@googlegroups.com
By collapsing I meant that Redis is crashing, the process dies. 
RSS for redis-server right now - 7.4g (it ranges anywhere between 7g and 12g). 

Currently it receives anywhere around 2000 queries per second. The graph attached show the peaks for bgsaves and the time it takes. I have seen an average of 30 seconds per bgsave.

Yes, we do run out of memory. And we see this in the system log messages after redis crashes.

Jan 27 16:20:35 redis0 kernel: [510936.576189] lowmem_reserve[]: 0 0 0 0
Jan 27 16:20:35 redis0 kernel: [510936.576191] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 4*4096kB = 16384kB
Jan 27 16:20:35 redis0 kernel: [510936.576196] DMA32: 433*4kB 339*8kB 184*16kB 49*32kB 10*64kB 4*128kB 4*256kB 15*512kB 15*1024kB 5*2048kB 3*4096kB = 56700kB
Jan 27 16:20:35 redis0 kernel: [510936.576201] Normal: 1147*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB = 12780kB
Jan 27 16:20:35 redis0 kernel: [510936.576206] 818 total pagecache pages
Jan 27 16:20:35 redis0 kernel: [510936.576208] 0 pages in swap cache
Jan 27 16:20:35 redis0 kernel: [510936.576209] Swap cache stats: add 0, delete 0, find 0/0
Jan 27 16:20:35 redis0 kernel: [510936.576210] Free swap  = 0kB
Jan 27 16:20:35 redis0 kernel: [510936.576211] Total swap = 0kB
Jan 27 16:20:35 redis0 kernel: [510936.619553] 4482048 pages RAM
Jan 27 16:20:35 redis0 kernel: [510936.619556] 139091 pages reserved
Jan 27 16:20:35 redis0 kernel: [510936.619557] 303396 pages shared
Jan 27 16:20:35 redis0 kernel: [510936.619558] 4018007 pages non-shared
Jan 27 16:20:35 redis0 kernel: [510936.776130] redis-server invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
Jan 27 16:20:35 redis0 kernel: [510936.776135] redis-server cpuset=/ mems_allowed=0



21727 prod      20   0 7571m 7.4g  800 R    6 43.2   4:19.05 redis-server    
redis-memory.png

Salvatore Sanfilippo

unread,
Jan 27, 2011, 12:53:50 PM1/27/11
to redi...@googlegroups.com
On Thu, Jan 27, 2011 at 6:44 PM, Ity <ity...@gmail.com> wrote:
> By collapsing I meant that Redis is crashing, the process dies.
> RSS for redis-server right now - 7.4g (it ranges anywhere between 7g and
> 12g).

Ok that is know as the OOM killer I think.
First obvious question is, what is the overcommit memory policy
currently set in the system?

> Currently it receives anywhere around 2000 queries per second. The graph
> attached show the peaks for bgsaves and the time it takes. I have seen an
> average of 30 seconds per bgsave.

Mostly reads or mostly writes?

> Yes, we do run out of memory. And we see this in the system log messages
> after redis crashes.

I think your problem can be the OOM killer because the overcommit
setting is wrong, otherwise you should experience other problems, like
the server starting to be slower and slower.

Probably setting the value to the right value suggested by Redis in
the first lines of the log once it is restarted will fix the problem,
but upgrading to 2.2 will help in general, especially if your data set
you have small lists, or small sets of integers, or alike.

The more info you can provide on your dataset and queries, the more we can help.

Cheers,
Salvatore

Ity

unread,
Jan 27, 2011, 1:05:02 PM1/27/11
to redi...@googlegroups.com
On Thu, Jan 27, 2011 at 12:53 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
On Thu, Jan 27, 2011 at 6:44 PM, Ity <ity...@gmail.com> wrote:
> By collapsing I meant that Redis is crashing, the process dies.
> RSS for redis-server right now - 7.4g (it ranges anywhere between 7g and
> 12g).

Ok that is know as the OOM killer I think.
First obvious question is, what is the overcommit memory policy
currently set in the system?
 
overcommit_memory is set to 1

> Currently it receives anywhere around 2000 queries per second. The graph
> attached show the peaks for bgsaves and the time it takes. I have seen an
> average of 30 seconds per bgsave.

Mostly reads or mostly writes?

In the last hour, we have had 250,000 reads, 100,000 writes each from 8 different clients which works out to be about 
2.000,000 reads and 800,000 writes in total approximately. This works out to be about 800 qps

Salvatore Sanfilippo

unread,
Jan 27, 2011, 1:10:05 PM1/27/11
to redi...@googlegroups.com
On Thu, Jan 27, 2011 at 7:05 PM, Ity <ity...@gmail.com> wrote:

> overcommit_memory is set to 1

So definitely not our problem.
Strange that the OOM killer will kill you before you see bad
performances due to swapping.

> In the last hour, we have had 250,000 reads, 100,000 writes each from 8
> different clients which works out to be about
> 2.000,000 reads and 800,000 writes in total approximately. This works out to
> be about 800 qps

Do you have small lists, or small sets composed of integers?
If either of this is true, you can save tons of memory with 2.2,
bringing the memory limit again to a low value.
Anyway given your reads, with 2.2 this is going to work better anyway.
I strongly suggest upgrading, it is completely backward compatible
with 2.0 UNLESS you are using an old client that only supports the old
protocol.

Cheers,
Salvatore

Ity

unread,
Jan 27, 2011, 1:11:41 PM1/27/11
to redi...@googlegroups.com
And we are going to try and move to version 2.2 today. Hopefully that will ease our problems.

And I forgot to mention that we are using Redis as a url cache, so we store lists of values against keys which are urls. We currently have a dbsize ranges anywhere upto 13 million keys but it always crashes before reaching that limit. I have attached a graph of dbsize over a week.

 Is there any other information that you would require and might be useful?
dbsize.png

Ity

unread,
Jan 27, 2011, 1:16:39 PM1/27/11
to redi...@googlegroups.com
On Thu, Jan 27, 2011 at 1:10 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
On Thu, Jan 27, 2011 at 7:05 PM, Ity <ity...@gmail.com> wrote:

> overcommit_memory is set to 1

So definitely not our problem.
Strange that the OOM killer will kill you before you see bad
performances due to swapping.

> In the last hour, we have had 250,000 reads, 100,000 writes each from 8
> different clients which works out to be about
> 2.000,000 reads and 800,000 writes in total approximately. This works out to
> be about 800 qps

Do you have small lists, or small sets composed of integers?
If either of this is true, you can save tons of memory with 2.2,
bringing the memory limit again to a low value.

So our data is pretty much
Url(Key)   -> List of values (which might have strings, integers, url again etc)

Anyway given your reads, with 2.2 this is going to work better anyway.
I strongly suggest upgrading, it is completely backward compatible
with 2.0 UNLESS you are using an old client that only supports the old
protocol.

We are using the most recent version of Jedis (1.5.1) as the client. I wanted to also point out that we are not using the vm option. And as you said, we will try and move to 2.2 today and will keep you posted on how that goes. 

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Salvatore Sanfilippo

unread,
Jan 27, 2011, 1:19:17 PM1/27/11
to redi...@googlegroups.com
On Thu, Jan 27, 2011 at 7:16 PM, Ity <ity...@gmail.com> wrote:

>> Do you have small lists, or small sets composed of integers?
>> If either of this is true, you can save tons of memory with 2.2,
>> bringing the memory limit again to a low value.
>
> So our data is pretty much
> Url(Key)   -> List of values (which might have strings, integers, url again
> etc)

Oh, every interesting. Are most of this lists less than a few hundreds
of elements, but more than ... 5?
If this is true Redis 2.2 will use something like just 1/5 of the memory.
But please tell me the average list size and I'll reply with the right
config option.

>> Anyway given your reads, with 2.2 this is going to work better anyway.
>> I strongly suggest upgrading, it is completely backward compatible
>> with 2.0 UNLESS you are using an old client that only supports the old
>> protocol.
>
> We are using the most recent version of Jedis (1.5.1) as the client. I
> wanted to also point out that we are not using the vm option. And as you
> said, we will try and move to 2.2 today and will keep you posted on how that
> goes.

I think Jedis is fine.

Cheers,
Salvatore

Ity

unread,
Jan 27, 2011, 2:56:45 PM1/27/11
to redi...@googlegroups.com
On Thu, Jan 27, 2011 at 1:19 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
On Thu, Jan 27, 2011 at 7:16 PM, Ity <ity...@gmail.com> wrote:

>> Do you have small lists, or small sets composed of integers?
>> If either of this is true, you can save tons of memory with 2.2,
>> bringing the memory limit again to a low value.
>
> So our data is pretty much
> Url(Key)   -> List of values (which might have strings, integers, url again
> etc)

Oh, every interesting. Are most of this lists less than a few hundreds
of elements, but more than ... 5?
If this is true Redis 2.2 will use something like just 1/5 of the memory.
But please tell me the average list size and I'll reply with the right
config option.

The list size at the moment is <=7 but it might increase in the future. On another note, is redis more efficient if we store the whole list serialized?  
Also, I wanted to mention that we do not have swap on the EC2 machine that we are running redis on.

Salvatore Sanfilippo

unread,
Jan 27, 2011, 4:23:55 PM1/27/11
to redi...@googlegroups.com
On Thu, Jan 27, 2011 at 8:56 PM, Ity <ity...@gmail.com> wrote:

>
> The list size at the moment is <=7 but it might increase in the future. On
> another note, is redis more efficient if we store the whole list serialized?

Oh, you don't need to do anything at all: just upgrade to 2.2 and
you'll see an impressive improvement in memory usage.

> Also, I wanted to mention that we do not have swap on the EC2 machine that
> we are running redis on.

Yes, this is why overcommit was completely useless.
With swap performances will suffer when Redis is out of memory during
BGSAVE but will not be killed by the OOM.

Ity

unread,
Jan 30, 2011, 5:43:43 PM1/30/11
to redi...@googlegroups.com
I have great news guys! The upgrade has been a success and redis survived over the whole weekend and is still going strong. You might remember the previous memory usage graph that I sent, the new one is attached and as you can see the improvement is remarkable. With the same size of data in the older version, it was using almost double the memory. CPU usage also never reaches more than 50%. I can swear by the newer version of redis now :) 

Thanks a lot for all your help through this!

redis1.png
Reply all
Reply to author
Forward
0 new messages