Queries executing loner than one second

93 views
Skip to first unread message

chilumb...@gmail.com

unread,
Nov 12, 2015, 10:15:13 PM11/12/15
to Redis DB
Redis is understandably super fast with reads that an RDMB. If a Redis cluster is processing queries taking longer than 1 second, what impact can that have on the redis cluster in regards to new requests sent to the redis cluster by clients? Since Redis is single threaded, would that mean that other queries would remain in the command queue while the slow query executes? And if more writes from other clients keep coming through and piling inside the command queue, would the redis cluster eventually start blocking incoming requests?

Last question, is it ok to change the redis daemon on each node in the cluster to "No", and allow redis to process multiple connections at the same time not sequentially? How would doing so impact the performance clients connections to the redis cluster?

Josiah Carlson

unread,
Nov 17, 2015, 12:59:09 AM11/17/15
to redi...@googlegroups.com
Replies inline.

On Thu, Nov 12, 2015 at 7:15 PM, <chilumb...@gmail.com> wrote:
Redis is understandably super fast with reads that an RDMB. If a Redis cluster is processing queries taking longer than 1 second, what impact can that have on the redis cluster in regards to new requests sent to the redis cluster by clients?

If that one command takes 1 second, it will block 1 master or 1 slave from responding to requests for 1 second.
 
Since Redis is single threaded, would that mean that other queries would remain in the command queue while the slow query executes?

Sort of, but only for that 1 master or 1 slave.
 
And if more writes from other clients keep coming through and piling inside the command queue, would the redis cluster eventually start blocking incoming requests?

While Redis is working on executing an actual command, it doesn't read or process data from other connections generally, except in the case of Lua scripts. In the case of long-running Lua scripts, Redis may periodically read client requests to respond to them (typically rejecting commands for them not being executable because the Lua script is executing).

What generally happens is that Redis just stops responding to clients (accepting connections, responding to clients, etc.) until the command completes executing. That said, unless you are performing operations on large numbers of structures or large structures, this shouldn't be an issue. Do you know what commands you are going to run, and/or how long you should expect them to last?

Last question, is it ok to change the redis daemon on each node in the cluster to "No", and allow redis to process multiple connections at the same time not sequentially?

Redis cluster can (and will) execute separate commands on each master and slave independently. Which means you can read from all of your slaves and read/write to all of your masters simultaneously without blocking. There is no setting for "yes" or "no" that determines whether this is the case - this is *always* the case.
 
How would doing so impact the performance clients connections to the redis cluster?

See my answers to your other questions.

 - Josiah 

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

chilumb...@gmail.com

unread,
Nov 17, 2015, 10:22:59 AM11/17/15
to Redis DB
Thanks for explaining things Josiah. It really helps.

On your question: 
Do you know what commands you are going to run, and/or how long you should expect them to last?

I set slowlog to 1000000 (i second), and so far i have several commands executing more than 1 second, several more over 2 seconds, and a couple over 1 minute. Below is a some of the slow commands logged:

1) 1) (integer) 75
   
2) (integer) 1447057594
   
3) (integer) 2020760
   
4) 1) "SET"
       
2) "noti:in:ur:67445315:follow"
       
3) "1"
 
2) 1) (integer) 74
   
2) (integer) 1446771851
   
3) (integer) 2400234
   
4) 1) "SET"
       
2) "noti:msg:unread:130507369:854169"
       
3) "1"
 
3) 1) (integer) 73
   
2) (integer) 1446768149
   
3) (integer) 1228490
   
4) 1) "SET"
       
2) "noti:msg:unread:114024283:849089"
       
3) "1"

These SET commands do not look too bad, so i am surprised why they are logged, and even more so that a couple of them took over 2 seconds.

Josiah Carlson

unread,
Nov 17, 2015, 2:36:21 PM11/17/15
to redi...@googlegroups.com
Those SET commands have no reason to take longer than a handful of microseconds unless you've got something else going on; snapshot, bgrewriteaof, a new slave connecting, or some other large amount of disk/cpu being taken by some other process.

Reset your slowlog, and check your slowlog after a number of hours of running without rewriting the aof, creating a snapshot, or having a slave connect/reconnect.

 - Josiah

George Chilumbu

unread,
Nov 19, 2015, 10:20:04 PM11/19/15
to Redis DB
Hi Josiah,

I set up a test where i disabled Redis rdb and aof persistence. Then i enabled slowlog, setting the time to one second. And indeed, no queries were logged this time. I let the test run for two days.

What would be the explanation to rdb and aof persistence causing slow queries? Would  it be because during persistence, while a folked() child process is running, Redis blocks all writes (and maybe reads) on that particular redis instance?

Thank you,
George 

Greg Andrews

unread,
Nov 19, 2015, 10:50:55 PM11/19/15
to redi...@googlegroups.com
No, Redis does not block writes during the snapshot.  When this kind of thing occurs (a lack of responsiveness while Redis is writing to disk), it's usually caused by the server not having sufficient free memory to handle the write activity while the snapshot is being performed, so the server starts swapping to disk and everything slows down; or the disk i/o is too slow to keep up with Redis's write activity and the other disk i/o activity on the server, so everything slows down.  Sometimes it's both - the extra disk writes from Redis make everything else on the server backlog on writes to logfiles or reads from data files, so everything else's memory usage grows, and this pushes the server into swapping, which only makes it worse.

This is where you get great value in keeping performace graphs for your servers.  You can see what's happening on the server when Redis starts writing to disk.  Do the graphs show the disk becomes 100% busy, or only 30-40% busy?  Does the total memory consumed on the server grow beyond the amount of physical ram?  Does the growth of the Redis parent and child processes grow, or is it other processes?

  -Greg

George Chilumbu

unread,
Nov 20, 2015, 1:20:05 AM11/20/15
to redi...@googlegroups.com
Thank you for your quick response Andrew.

I would like to share what the Redis documentation says though, which led me to even ask this question:


"RDB needs to fork() often in order to persist on disk using a child process. Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great."

This statement to me sounds like they are saying that during persistence to disk, redis may stop serving clients if the data is too large. Or am i miss-understanding whats being said here?

Reference:

--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/U6szotLMF9A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.

Josiah Carlson

unread,
Nov 20, 2015, 2:03:08 AM11/20/15
to redi...@googlegroups.com
Delays during fork() are now substantially over-estimated, and are on the order of 1-5ms/gig (compared to 10-100ms/gig previously). Historically the parent process (the one serving responses) would have substantial delays after the fork, which were/are removed by disabling transparent huge pages. The author of Redid debugged and blogged about it: http://antirez.com/news/84 and documented disabling transparent huge pages as part of: http://redis.io/topics/admin .

 - Josiah

George Chilumbu

unread,
Nov 20, 2015, 4:29:33 AM11/20/15
to redi...@googlegroups.com
It seems that we initially had disk i/o issues due to snapshots and bdrewriteoaf which have since been disabled. And we also found out that some of the Redis master instances having been swapping, slowing down things badly. It can be due to lack of free sufficient memory to handle extra write activities of large data bursts. I then released swap the previous day. 

All these things done have helped us get rid of a notorious error that has been annoying to us:

   Error while reading line from the server. [tcp://10.1.1.200:6379]
  Error while reading line from the server. [tcp://10.1.1.201:6379]

But we still have several of the errors below and we cannot figure our what is causing this error. Our API error log (that uses some script to log) keep showing this error related to Redis:

     AbstractConnection.php: Operation timed out [tcp://10.1.1.200:6379]
     AbstractConnection.php: Operation timed out [tcp://10.1.1.201:6379]


We are using predis for our Applications.
   
 




Greg Andrews

unread,
Nov 20, 2015, 12:58:21 PM11/20/15
to redi...@googlegroups.com
George, you've solved one cause of slowness, but now you find that there were two causes, maybe more.  Now you're left with the remaining cause(s).  So you're back to asking the question you asked on the 10th of November.  My own answer is going to be the essentially the same as it was then:

You need to start with the error report that you have, develop it, and follow where it points.

The message you have is not sufficient.  It says "Operation timed out" and identifies the server, but it does not give the nature of the error.  What was the operation?  Was predis waiting for a reply to a command?  Was it trying to connect?  Was it waiting for a reply but got an exception indicating the connection was broken?  What is the timeout period?  1 second?  10?  60? 

You need a sufficient message.  That may mean increasing the verbosity of the predis logfile or your client's logfile.  Find out exactly what error it's getting and exactly what it was doing when it got the error.

Checking for a broken connection is a reasonable thing to do.  You can increase the verbosity of the Redis server log and see if it complains about unexpected disconnections.  This can give you evidence agreeing with the broken connection theory or disagreeing with it.  More evidence is better because you want to find the cause.

Get the complete report of the error from predis and see what errors the Redis server is seeing.


George Chilumbu

unread,
Nov 22, 2015, 11:04:29 PM11/22/15
to Redis DB
Hi Greg,

Thank you for your feedback. I was under the weather this weekend and hence could not provide a response.

Though we are using Redis in production, i will try to change the verbose server level from notice to debug to get more info on errors.

I  will notify you on my finings. 

George Chilumbu

unread,
Nov 23, 2015, 1:34:18 AM11/23/15
to Redis DB
After changing the server verbose level to debug, it logged several of these errors:

1274:M 23 Nov 14:00:03.576 - Client closed connection

1274:M 23 Nov 14:00:03.645 - Client closed connection

1274:M 23 Nov 14:00:03.673 - Client closed connection

1274:M 23 Nov 14:00:03.684 - Client closed connection

1274:M 23 Nov 14:00:03.695 - Client closed connection

1274:M 23 Nov 14:00:03.722 - Client closed connection

1274:M 23 Nov 14:00:03.755 - Client closed connection

1274:M 23 Nov 14:00:03.760 - Client closed connection

1274:M 23 Nov 14:00:03.789 - Client closed connection

1274:M 23 Nov 14:00:03.798 - Client closed connection

1274:M 23 Nov 14:00:03.810 - Client closed connection

1274:M 23 Nov 14:00:03.817 - Client closed connection

1274:M 23 Nov 14:00:03.857 - Client closed connection

1274:M 23 Nov 14:00:03.903 - Client closed connection

1274:M 23 Nov 14:00:03.911 - Client closed connection

1274:M 23 Nov 14:00:03.923 - Client closed connection

1274:M 23 Nov 14:00:03.960 - Client closed connection

1274:M 23 Nov 14:00:04.122 - Client closed connection

1274:M 23 Nov 14:00:04.153 - Client closed connection

1274:M 23 Nov 14:00:04.173 - Client closed connection

1274:M 23 Nov 14:00:04.211 - Client closed connection

1274:M 23 Nov 14:00:04.236 - Client closed connection

1274:M 23 Nov 14:00:04.276 - Client closed connection

1274:M 23 Nov 14:00:04.414 - Client closed connection

1274:M 23 Nov 14:00:04.503 - Client closed connection


Does this mean that the client, predis or our api, is closing tens of connections per second?

George Chilumbu

unread,
Nov 23, 2015, 1:37:11 AM11/23/15
to Redis DB
I meant to say "Redis server verbosity level"
<span style=

George Chilumbu

unread,
Nov 23, 2015, 3:33:26 AM11/23/15
to Redis DB
A more real portray of the log is:

# here too, two connections are opened, followed by two closed
1274:M 23 Nov 14:59:45.315 - Accepted 192.168.1.174:60996

1274:M 23 Nov 14:59:45.317 - Accepted 192.168.1.174:60999

1274:M 23 Nov 14:59:45.324 - Client closed connection

1274:M 23 Nov 14:59:45.324 - Client closed connection


# here too, two connections are opened, followed by two closed
1274:M 23 Nov 14:59:45.400 - Accepted 192.168.1.76:28154

1274:M 23 Nov 14:59:45.402 - Accepted 192.168.1.78:23352

1274:M 23 Nov 14:59:45.409 - Client closed connection

1274:M 23 Nov 14:59:45.412 - Client closed connection


# here, one connection is opened, followed by one closed connection
1274:M 23 Nov 14:59:45.442 - Accepted  192.168.1.78:23386

1274:M 23 Nov 14:59:45.451 - Client closed connection


# So here, three connections are opened, followed by three closed connections almost instantly
1274:M 23 Nov 14:59:45.453 - Accepted  192.168.1.151:19482

1274:M 23 Nov 14:59:45.455 - Accepted  192.168.1.165:58343

1274:M 23 Nov 14:59:45.459 - Accepted  192.168.1.172:11545

1274:M 23 Nov 14:59:45.463 - Client closed connection

1274:M 23 Nov 14:59:45.464 - Client closed connection

1274:M 23 Nov 14:59:45.469 - Client closed connection


# here, two connections are opened, followed by two closed connections
1274:M 23 Nov 14:59:45.596 - Accepted  192.168.1.151:19631

1274:M 23 Nov 14:59:45.606 - Accepted  192.168.1.165:58487

1274:M 23 Nov 14:59:45.607 - Client closed connection

1274:M 23 Nov 14:59:45.614 - Client closed connection


# here, one connection is opened, followed by one closed connection
1274:M 23 Nov 14:59:45.621 - Accepted  192.168.1.164:21931

1274:M 23 Nov 14:59:45.624 - Client closed connection


# here, one connection is opened, followed by one closed connection
1274:M 23 Nov 14:59:45.661 - Accepted  192.168.1.122:47107

1274:M 23 Nov 14:59:45.663 - Client closed connection


It seems like the client (Predis) is closing connections almost as soon as they are opened. 

Our PHP developers suspect these client connections registered here have to do with the way Predis operates. He is how they are describing what is happening when Predis and Redis are talking:

Predis client will try to connect to all redis masters, but not all master are needed, depends on what keys are used.

It works like
- Connect to one of the server (say M1)
- Give me this key -> get -MOVED
- Tell me where all other Masters are -> Then connect to  M2 and M3
- Give me this key and that key, only M1 and M2 are queried
- Close connection to M1, M2, M3

The confusion for me is from step 3 - why would Predis ask for all the other masters when it receives a "key MOVED" messages? Why not just connect directly to the appropriate Redis instance based on the IP:port provided with the "MOVED" message by Redis? Is this really how Predis works?

They are also using a lot of cron jobs. 

Greg Andrews

unread,
Nov 23, 2015, 4:29:12 AM11/23/15
to redi...@googlegroups.com
Yes, Redis is logging events where the client is closing connections, and it's at a faster rate than you expected.  When clients try to open and close TCP connections rapidly, shifts in traffic can produce large bursts of connections to the server.  Not just in the total number of connections, but also in the rate of new connections being made.  This is similar to traffic patterns that exist for web servers, and there are some classic methods to tune the TCP stack in the Linux/Unix kernel to handle these bursts, both on the server side (in this case Redis) and the client side (in this case your client software).  Have you done any tuning for the bursts of connections that are being made to Redis?

Also, is your client software using connection pooling, or does the software connect and disconnect on every Redis command?

Another possibility is that your main software is fine and these log entries are being caused by a misbehaving program that's just spinning in a busy loop, making and breaking connections to Redis because it receives something it doesn't like or understand.

  -Greg

Itamar Haber

unread,
Nov 23, 2015, 4:33:49 AM11/23/15
to redi...@googlegroups.com

IIRC predis (PHP) can't do persistent connections or pooling of any sort.

Greg Andrews

unread,
Nov 23, 2015, 5:08:37 AM11/23/15
to redi...@googlegroups.com
Wow, Itamar, you're right.  I just found this issue and answer in the predis github repo:  https://github.com/nrk/predis/issues/276

So predis makes a new connection for each command.  This traffic pattern creates an upper limit on the rate at which Redis commands can be executed (the TCP connect/disconnect overhead becomes a bottleneck).  When traffic hits that limit the symptoms can include connection timeouts.  (though other things can cause timeouts too)

George, as for the question about why predis is requesting the list of masters on receipt of a MOVED response, your developers may have to open an issue with predis to get the answer.  Someone here might know, but the author will certainly know.

  -Greg

George Chilumbu

unread,
Nov 23, 2015, 5:10:06 AM11/23/15
to redi...@googlegroups.com
Yes, i have performed some performance tuning on Redis to deal with bursts. This includes setting:

timeout = 0
tcp-keepalive = 60
tcp-backlog = 65535 # which is equal the value of somaxonn on my linux system

Basically, we have these API servers handling client API calls from web, mobile and desktop applications. PHP scripts running on these API servers will read data from Redis cluster using Predis. 


Greg Andrews

unread,
Nov 23, 2015, 5:29:07 AM11/23/15
to redi...@googlegroups.com
Settings in Redis won't tune the TCP stack that I described in my message.  (besides the timeout and tcp-keepalive parameters do nothing whatsoever toward performance tuning)

  -Greg

chilumb...@gmail.com

unread,
Nov 25, 2015, 12:25:02 PM11/25/15
to redi...@googlegroups.com
I have one more question - since you explained that predis creates a connection for each command used, does that mean that in redis-cli info clients the "connected_clients" number should represent all these connections by each command, or just a single connection which is the predis client connection?

Sent from my iPhone

Greg Andrews

unread,
Nov 25, 2015, 3:30:17 PM11/25/15
to redi...@googlegroups.com
I seem to recall that your php api code runs multiple threads.  If your php client programs run multiple threads, and each thread decides to send a command to redis, then each thread will make its own call to predis, and each call will create its own connection to the Redis server.  I.e., multiple connections from the client program on one server.

As an example, if you have 10 api servers, and on each server you have php code that runs 5 threads, and all those threads decide to communicate with Redis at the same time, you'll see 50 connections to Redis.  Do they all go to the same Redis server in your Redis cluster?  I don't know.  I believe that would depend on how predis is written, and any configuration about the cluster that is passed to predis at initialization time.

Remember that what you see in the output of the Redis INFO command is a snapshot in time.  The 'connected_clients' value is for that moment.  It doesn't remember how many connections there were a second ago, or a minute ago, etc.  Since the connections from predis last only as long as it takes to issue the command and receive the reply, the number of connections a few seconds ago could have been very different.  And the number of connections a few seconds from now could be very different.

The INFO command also a snapshot on that particular Redis server.  There could be connections to other Redis servers (in a cluster perhaps), but the Redis server you asked for info doesn't know about those.  It only knows about the connections to itself.

  -Greg

chilumb...@gmail.com

unread,
Nov 25, 2015, 9:32:36 PM11/25/15
to redi...@googlegroups.com
Our predis runs on a single thread. 

And I know that connected_clients from INFO command only displays current clients connected for a specific redid instance. 

I guess I should make my questions more direct. Since predis reportedly creates a new connection for each command e.g 10000 commands will make predis create 10000 connections, will connected_clients INFO display 10000 or just one connection for the predis connection? This is assuming that 10000 commands are executing at the time the INFO command is issued.

Sent from my iPhone

George Chilumbu

unread,
Nov 25, 2015, 10:06:11 PM11/25/15
to redi...@googlegroups.com
Also, is there a max value to set maxclients in redis? Or as it is just an estimate of the highest number of client connections e.g, 600,000 if that is the peak of client connections?

Greg Andrews

unread,
Nov 26, 2015, 6:12:13 PM11/26/15
to redi...@googlegroups.com
Hi George,

You're asking how predis behaves.  Do 1000 simultaneous commands produce 1000 connections to Redis server(s) or just 1 connection?  Speaking for myself, I don't know predis.  But this is something you and your programmers can test, isn't it?  Throw together a php test app and see what happens if 1000 calls to predis are triggered all at once.

  -Greg

Greg Andrews

unread,
Nov 26, 2015, 6:28:35 PM11/26/15
to redi...@googlegroups.com
There has been a 'maxclients' parameter in the config file since v2.8 in July 2013.  According to the comments in the config file that's included with the source code, the default value is 10000 (ten thousand), though the operating system's limit on the number of open file descriptors can cause similar behavior (client can't connect to the Redis server) at a lower number of connections.

In my experience, setting limits on client connections is something that must be done with careful thought about the side effects.  When the maxclient limit is reached, Redis will reject new connections of any kind.  It will reject a slave that is trying to re-establish a valid connection.  It will reject valid connections from your monitoring system that want to check Redis's health (and count the current number of connections).  It will reject your valid attempts to connect with 'redis-cli' to investigate. I'm not saying "do not use maxclients".  I'm saying the side effects on valid connections can be severe, so be careful.  I'm convinced that it's better to have very high limits on the server end, and do your normal connection limiting on the client end.

  -Greg

Wayne Gemmell

unread,
Nov 27, 2015, 1:16:01 AM11/27/15
to redi...@googlegroups.com
Hi George,

Have you considered implementing php redis, I think it's better maintained etc.

George Chilumbu

unread,
Nov 27, 2015, 2:00:18 AM11/27/15
to redi...@googlegroups.com
@Greg, our maxclients setting is pretty high, same as the number of open files (about a million).

@Wayne, i did suggest to the PHP developers to move to phpredis but complained about having to rewrite the API Libraries.

At the moment, i have done a lot of performance tuning both on redis instances and the machines. Regarding Redis, i have disabled snapshots and AOF, set maxclients = 1024000, set tcp-backlog = 65535, set timeout = 0, set tcp-keepalive = 60, added more RAM (from 16G to 64G), set maxmemory to 80% of total RAM (64GB) etc

Regarding networking and TCP tuning, i have done the following?

echo 5 > /proc/sys/net/ipv4/tcp_fin_timeout
 echo "32768 65535" > /proc/sys/net/ipv4/ip_local_port_range
 echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
 echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

 sysctl -w net.core.rmem_max=8388608  #default   212992
 sysctl -w net.core.wmem_max=8388608 #default 212992
 sysctl -w net.core.rmem_default=65536 #default 212992
 sysctl -w net.core.wmem_default=65536 #default 212992
 sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608’  #4096    87380   6291456
 sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608’ #4096    16384   4194304
 sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608’  #1542459 2056612 3084918
 sysctl -w net.ipv4.route.flush=1

Also added vm.overcommit_memory=1 in the sysctl.conf file. number of open files is set to 1024000.

But we still keep getting the "Operation timed out [tcp://redis_ip:port]" error from our api error log.

I have really exhausted my options for a solution from the Redis and System part. 

Reply all
Reply to author
Forward
0 new messages