Issue 425 in redis: 100% CPU usage, very slow response/timeout

re...@googlecode.com

unread,

Jan 5, 2011, 8:32:56 AM1/5/11

to redi...@googlegroups.com

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 425 by barthazi.andras: 100% CPU usage, very slow response/timeout
http://code.google.com/p/redis/issues/detail?id=425

I'm not sure I can give you a proper bug report, but let me create this
issue, and if you have no idea at all, just close it.

What steps will reproduce the problem?
1. we have a 2 queue SET based queues, and 7 SET based cache
2. the queue is quite small, typically have <10 items (with <10 bytes)
3. the cache is about 800 MB, with typically 100 byte -> 8 kbyte values
4. we do about <1000 queries per second according to the MONITOR command
5. sometimes the CPU usage of Redis is going up to 100%, and staying there
for 30-60s
6. While the CPU usage is around 100%, the response time is very slow,
sometimes over 30s, sometimes timeouting, the CLI benchmark tool says about
50 GET queries per sec, while typically it says 45.000
7. I have thought the issue is related to background saving, but not
8. We're using the default factory settings, no virtual memory or other
like this

redis_version:2.0.2
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:29610
uptime_in_seconds:6006
uptime_in_days:0
connected_clients:20
connected_slaves:0
blocked_clients:0
used_memory:904343768
used_memory_human:862.45M
changes_since_last_save:715
bgsave_in_progress:0
last_save_time:1294150490
bgrewriteaof_in_progress:0
total_connections_received:18036
total_commands_processed:3992865
expired_keys:0
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:master
db0:keys=15,expires=0

It's an Ubuntu Linux, on a 8 core processor machine. Redis' memory usage is
10%.

re...@googlecode.com

unread,

Jan 5, 2011, 8:41:07 AM1/5/11

to redi...@googlegroups.com

Comment #1 on issue 425 by antirez: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

Hello, are you sure you don't run commands like KEYS from time to time? Or
other similarly slow commands like big set intersections or alike?

Btw the simplest way to figure what happens is the following. When it
blocks use the script you'll find at:

http://poormansprofiler.org/

And post here the output. Run it a few times while Redis is stopped. This
script is able to tell us where Redis is spending so much time.

Thank you!
Salvatore

re...@googlecode.com

unread,

Jan 22, 2011, 7:16:26 AM1/22/11

to redi...@googlegroups.com

Comment #2 on issue 425 by eugene.t...@gmail.com: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

I confirm the issue.

Redis 2.0.4

Linux RHEL5 2.6.18 SMP, 16 cores

CPU usage ~ 100%

==> redis-benchmark:

====== PING ======
10002 requests completed in 30.17 seconds
50 parallel clients
3 bytes payload
keep alive: 1
......
331.57 requests per second (60343.38 requests per second - when we disable
new services that use Redis intensively)

==> poormansprofiler output:

3 options:

1) 1
at,call,processCommand,processInputBuffer,readQueryFromClient,aeProcessEvents,aeMain,main
2)
1 "results:all:qZv9GAOK"),keysCommand,call,processCommand,processInputBuffer,readQueryFromClient,aeProcessEvents,aeMain,main
3) 1
dictNext,keysCommand,call,processCommand,processInputBuffer,readQueryFromClient,aeProcessEvents,aeMain,main

also it produces the following warnings:

warning: .dynamic section for "/lib64/libm.so.6" is not at the expected
address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libpthread.so.0" is not at the
expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libc.so.6" is not at the expected
address (wrong library or version mismatch?)

warning: .dynamic section for "/lib64/ld-linux-x86-64.so.2" is not at the
expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: no loadable sections found in added symbol-file system-supplied
DSO at 0x7fff177fc000

When Redis works fine profile output is as follows:

1 __epoll_wait_nocancel,aeApiPoll,flags=3),aeMain,main

with the only warning

warning: no loadable sections found in added symbol-file system-supplied
DSO at 0x7fffa4ded000

re...@googlecode.com

unread,

Jan 22, 2011, 8:39:53 AM1/22/11

to redi...@googlegroups.com

Comment #3 on issue 425 by anti...@gmail.com: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

As expected your output of the poor man profiler shows that you are using
KEYS command, that should never be used in production code by clients, but
just for debugging.

As you can see in the documentation KEYS is O(N) both in space and time,
and will trash Redis performances.

You can also check our documentation here:

http://redis.io/commands/keys

You can read there: "Warning: consider KEYS as a command that should only
be used in production environments with extreme care. It may ruin
performance when it is executed against large databases. This command is
intended for debugging and special operations, such as changing your
keyspace layout. Don't use KEYS in your regular application code. If you're
looking for a way to find keys in a subset of your keyspace, consider using
sets.".

Therefore the issue is not valid.

re...@googlecode.com

unread,

Jan 22, 2011, 8:43:54 AM1/22/11

to redi...@googlegroups.com

Comment #4 on issue 425 by anti...@gmail.com: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

p.s. however thank you for reporting. While in this case the problem is
clear in other cases the same macroscopic behavior may be due to something
actually wrong in the server.

Cheers,
Salvatore

re...@googlecode.com

unread,

Jan 22, 2011, 2:02:05 PM1/22/11

to redi...@googlegroups.com

Comment #5 on issue 425 by eugene.t...@gmail.com: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

Hi Salvatore,

Thanks for a prompt reply. We'll get rid of keys command and report here
about results.

Cheers
Eugene

re...@googlecode.com

unread,

Jan 22, 2011, 2:14:11 PM1/22/11

to redi...@googlegroups.com

Comment #6 on issue 425 by barthazi...@gmail.com: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

I was the original reporter, we had not used KEYS or commands like that,
but for be sure, I have checked it when I've tried debugging the problem.
Anyway, we had only about 10 keys, so I guess KEYS would be quite fast in
this situation. :)

Unfortunately we have replaced the code was based on Redis, so I can't
profile it. I'll try to get some time for trying to repeat it, but I cannot
say now that it will be soon.

re...@googlecode.com

unread,

Jan 23, 2011, 3:03:59 PM1/23/11

to redi...@googlegroups.com

Comment #7 on issue 425 by josiah.c...@gmail.com: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

I feel bad to speak contrary to the OP's statements, but it really smells
like this is a bgsave operation overloading the (otherwise already fairly
heavily loaded) box and causing other commands to run slowly.

How many processors in the machine? How much memory does it have? What
operations are being performed? Anything else running on that box?

re...@googlecode.com

unread,

Jan 23, 2011, 4:22:42 PM1/23/11

to redi...@googlegroups.com

Comment #8 on issue 425 by anti...@gmail.com: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

Barthazi, @Josiah: BGSAVE can't overload the server as it's a different
process. It can only matter that you run out of memory because of BGSAVE
and the instance starts swapping, this of course will lead to a much slower
server, but it's a different problem.

Anyway the only reliable way to debug this problems is to use a profiler to
check where the time is spent, together with the other system tools to
check if the system is out of memory, if there is OS buffers / pages I/O
and so forth. Currently *every single time* it was possible to understand
what was going on really, it was either an out of memory condition or the
user doing something strange like KEYS, big set operations like
intersections, huge MULTI/EXEC calls with hundeds of commands inside, and
so forth.

I added the poor man profiler script in the utils in the unstable branch,
file: utlis/whatisdoing.sh

This is so far the best we can do to debug this kind of issues.

Cheers,
Salvatore

re...@googlecode.com

unread,

Jan 23, 2011, 8:55:34 PM1/23/11

to redi...@googlegroups.com

Comment #9 on issue 425 by josiah.c...@gmail.com: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

I didn't see that it was an 8 core server, or that Redis was using 10% of
the system memory.

If it's using EBS at Amazon (or some other network storage wherever it is),
operations may be slow during a BGSAVE due to network contention...

re...@googlecode.com

unread,

Jan 28, 2011, 9:35:43 AM1/28/11

to redi...@googlegroups.com

Comment #10 on issue 425 by eugene.t...@gmail.com: 100% CPU usage, very
slow response/timeout
http://code.google.com/p/redis/issues/detail?id=425

Works fine without keys command.

re...@googlecode.com

unread,

Jul 5, 2011, 7:12:35 AM7/5/11

to redi...@googlegroups.com

Comment #11 on issue 425 by lpu...@insynchq.com: 100% CPU usage, very slow
response/timeout
http://code.google.com/p/redis/issues/detail?id=425

I would like to also ask for help in finding out why our redis is using
100% cpu. Apologies if I'm resurrecting an old thread.

The output for the poor man's profiler with 100 samples is