New issue 525 by jwillp: Slow clients using pubsub cause memory to grow
unbounded
http://code.google.com/p/redis/issues/detail?id=525
What version of Redis you are using, in what kind of Operating System?
2.0.4 on Linux (Fedora 14)
What is the problem you are experiencing?
The server process memory usage grows without bounds. (May cause
unintentional denial of service.)
What steps will reproduce the problem?
A client that is SUBSCRIBEd to a high volume will cause the server to
buffer all pending PUBLISHed data without limit, when the client is too
slow to keep up, or if the client has hung (blocked). This can cause a
single faulty client to kill a redis server. (This is also possible when a
blackholing iptable type filter is raised between server and client, or
when an intermediate stateful firewall decides to be buggy.)
Do you have an INFO output? Please past it here.
redis> info
redis_version:2.0.4
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:31762
uptime_in_seconds:24418
uptime_in_days:0
connected_clients:4
connected_slaves:0
blocked_clients:0
used_memory:162028112
used_memory_human:154.52M
changes_since_last_save:74156
bgsave_in_progress:0
last_save_time:1303061753
bgrewriteaof_in_progress:0
total_connections_received:57
total_commands_processed:132388962
expired_keys:0
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:1
vm_enabled:0
role:master
db0:keys=292,expires=0
If it is a crash, can you please paste the stack trace that you can find in
the log file or on standard output? This is really useful for us!
Please provide any additional information below.
Could you add a configuration feature to redis.conf for setting the maximum
buffer space to use for client subscriptions? This would be analogous to
the Spread toolkit MaxSessionMessages (defaults to 1000, but I generally
raise it to 5,000 or 10,000), or the activemq.maximumPendingMessageLimit
parameter of ActiveMQ for dealing with too-slow consumers. The best
strategy for the server is to drop the connection to a slow client after it
has backlogged this configured maximum buffer size. Then the memory impact
on the server is limited to a fixed size per client connection.
This should be a relatively small change to the pubsubPublishMessage
function in src/pubsub.c Before calling addReply() and addBulk(), the
client connection should be checked to see if too much data is buffered
up. If so, then drop the connection.
More generically, the fix could even go into src/networking.c into the
_addReplyObjectToList() function, done only in the least-likely branch
where there is already a list of objects, and we can't append to the last
item in the list. So perhaps in the unstable branch, line 127 would be a
good spot to check the list size (in terms of bytes) before adding another
object to the tail of the list. That would have zero CPU cost for clients
that read fast enough that the static buffer never spills over into the
linked list, and would be a generic place to implement the bounds-checking.
I'm not very familiar with the code for redis, though. So, you might have
a better way to solve this!
Thanks!
Limiting the size of the outgoing queue for these kinds of things
isn't a bad thing, but I wouldn't be surprised if this doesn't get
picked up very soon.
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
I think this would be a very useful feature. I'd be willing to work on it,
if it's considered an acceptable addition.
I'd definitely appreciate it!
I've added a new feature that lets you specify a limit on the number of
pending objects in the client's "reply" linked list (pending outbound
messages). It adds a new server level config parameter "maxclientqueue",
with a default of 0 (disabled).
When this setting is enabled and a client is too slow to read replies fast
enough to keep up with the server, then the server will drop the client's
connection when it has built up the maximum number of messages on the
server side. It's not perfect, because ideally you would want to limit the
client based on the number of bytes that have been queued up. But this
setting is intended to be a safety-measure (like a shearing-pin). Actually
counting up the enqueued bytes and maintaining that value seems like
overkill when we already have an object count in the linked list structure.
When a client exceeds this limit, the server also logs a warning message to
indicate that it has dropped a client due to this overflow. That provides
a positive signal if this setting is used and is set too low, or if clients
are frequently too slow to keep up. (There is also a low frequency check
(every ~30 sec or so) for any clients that have overflowed, but haven't yet
been dropped due to another write, as a cleanup pass.)
My github branch for this patch is here:
https://github.com/willp/redis/tree/willp_issue525_memory
And the diff is attached as a patch as well.
Attachments:
issue525_memory_v1.patch 6.9 KB
Thanks for doing this!
I think your solution is very reasonable, and probably the simplest
out of the alternatives.