@Bill, i will try your suggestions about tracking the client connections.
By the way, i am using the redis cluster with three master nodes each with a single slave.
The reason for posting this question is because we are receiving a lot of the following error message below from our predis logs:
Since redis is single threaded, i was suspecting that; as the number of clients grow, the percentage of resource time given to each client decreases and each client spends an increasing amount of time waiting for their share of Redis server time. And in the process of waiting in the command queue for too long, the connections somehow timeout. This is the reason i am interest in understanding the redis connections in this post.
It is also important to note that the redis timeout on all nodes in the cluster is set to 0, the tcp-keepalie is 60, and the tcp-backlog is set to 65535 same as the system's somaxconn value.
On predis, we have the read_write_timeout set to -1, and the timeout set to 0.
The linux open files is set to 1024000 on each node in the cluster, and the max user processes is set to 10240 (these van be viewed using the "ulimit -a" command in the linux terminal). The txqueuelen is set to the default 1000 when you run ifconfig. I am using ubuntu 14.04 with 8GB RAM and 40GN disk.
Periodically, we have cron job performing bursts of heavy writes from our social media. I am also suspecting this might be resulting in blocking the redis daemon with the burst of writes and as other clients send more queries. So maybe when redis clients send a large number of commands to the redis server in a short period of time, this clocks the redis server and resulting in the error message show above.
These are our theories now. But we are not sure how to prove them. Or we are not sure if they might be the root cause of this error message.