-bret
On Fri, Jun 10, 2011 at 06:55:10AM -0700, Zolt??n Nagy wrote:
> Hello,
>
> sorry about the subject, but I really don't know anything more specific
> about the cause of the problem. I'd like to ask for your help on debugging a
> strange issue.
>
> We're using Predis as the client library, so of course the first step was to
> eliminate Predis as the culprit. Here is the issue in the Predis issue
> tracker: https://github.com/nrk/predis/issues/32
>
> There's a pretty detailed description of the problem and investigation
> there, I'll try to sum up the results:
>
> - At seemingly random intervals we get exceptions for all connections to
> a Redis instance for a second or so. Some debugging points to the connection
> timing out.
> - The client works with a socket timeout of 30, and a read-write timeout
> of 60 seconds.
> - There are very few connections to the instance at any one time (20-30
> at most)
> - The issue exists both with calls to a local or a remote instance
> - The errors don't seem to happen at the time of some specific resource
Hello, this is the only things that I can think of:
1) It is very very unlikely that this is a Redis problem. Nothing is
impossible but such a huge bug in a production release not noticed
elsewhere is strange.
2) Do you have some kind of middleware between Redis and your client?
Proxies, SSL, something like that? I guess not but just to check.
3) What happens in the Redis logs when the connection gets closed?
4) Is it possible to simulate that in the same environment with just a
single client doing operations into a loop? That would make debugging
possibly simpler.
5) Upgrading to latest 2.2 is a good idea, even if there are no
strictly related bugs, but there are bugs in your version, so who
knows?
6) Another interesting try could be to upgrade to latest 2.4 branch
commit. Since there you have a CLIENT command than ca be used for
further debugging what happens.
7) You should try compiling 'redis-tools' (see
https://github.com/antirez/redis-tools) and use the following command
to check if the server somewhat stops working when you see the issue:
./redis-stat latency delay 100
8) If you can reproduce that in the loop, try switching in turn:
version of PHP, version of Predis, try with an entirely different
client library, to see if this resolves the issue.
9) Should be completely unrelated but also try to use 'timeout 0' in
your redis.conf.
Cheers,
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
Cheers,
Salvatore
On Fri, Jun 10, 2011 at 5:25 PM, Bret A. Barker <br...@iwin.com> wrote:
> You aren't by chance doing any "keys" commands are you? We ran into a similar issue when we had some code inadvertently calling keys periodically and with millions of keys in our data set that was blocking long enough to cause timeouts for other clients.
>
--
On Friday, June 10, 2011 5:45:48 PM UTC+2, Salvatore Sanfilippo wrote:
On Fri, Jun 10, 2011 at 3:55 PM, Zoltán Nagy <abe...@gmail.com> wrote:
> I don't even know where to start looking. I'd really appreciate any ideas
> and requests for more (specific) information.Hello, this is the only things that I can think of:
1) It is very very unlikely that this is a Redis problem. Nothing is
impossible but such a huge bug in a production release not noticed
elsewhere is strange.
2) Do you have some kind of middleware between Redis and your client?
Proxies, SSL, something like that? I guess not but just to check.
3) What happens in the Redis logs when the connection gets closed?
4) Is it possible to simulate that in the same environment with just a
single client doing operations into a loop? That would make debugging
possibly simpler.
5) Upgrading to latest 2.2 is a good idea, even if there are no
strictly related bugs, but there are bugs in your version, so who
knows?
6) Another interesting try could be to upgrade to latest 2.4 branch
commit. Since there you have a CLIENT command than ca be used for
further debugging what happens.
-bret
No sorry... that's into unstable land, not easy to backport without
moving too much code around.
Cheers,
Salvatore