redis cluster performance

Péter Mátray

unread,

Jul 4, 2014, 5:27:47 AM7/4/14

to redi...@googlegroups.com

Hi all,

I've been trying out Redis Cluster to see how it works and performs for high data rate clients (possibly processing tens of thousands of events/sec).

I made some tests where a number of concurrent clients performed SET and GET operations with random keys. The goal was to saturate the Redis Cluster, i.e. to stress at least one of the master nodes to fully utilize a CPU core. From these test (see the details below) I found that the cluster can handle 30-45.000 ops/sec/master node (depending on the number of clients and the size of the cluster). Previously, with the standalone version we could reach 100.000+ ops/sec with a single Redis instance (with pipelinening we could even go way above 100.000 ops/sec).

I wonder if this should be considered as a correct figure? What could be the source of this more than 50% performance drop? Async replication to slaves and persistence to disk are not the source: switching these off do not change the numbers. An interesting observation though is that the CPU load of the master instances differ: although CRC16 spreads to operations equally among the masters, there's always one or two masters which saturate the CPU first, while other masters still have room with 0.5-0.8 CPU load. Are there any inherent asymmetries in the cluster? Also, to reach the 30-45.000 ops/sec/master saturation point, I need to start a large number of concurrent clients, thus the client side performance drops down from 8.000 to 2-3.000 ops/client (i.e. response latencies are going up from 0.13 ms to 0.5 ms).

Any ideas/comments about this?

And finally a related question: do you know of any Redis Cluster clients that plan to support async and/or pipeline operations?

Br,

Peter

--------------------------------------------

BENCHMARK SETUP

--------------------------------------------

- M redis masters

- S redis slaves

- 2N client processes (N doing random writes, N doing random reads)

Methodology:

- all processes evenly spread across 7 physical servers (~64 GB memory each, 1 Gbe network)

- no pipelinening or async operations since Redis Cluster clients do not support those

- to compensate the client side performance loss originating from round-trip times (0.13 ms between servers),

we keep starting more clients until at least one of the Redis instances fill up 1 CPU (we treat this point as a saturation point of the cluster)

--------------------------------------------

TEST 1: STANDALONE

--------------------------------------------

- single instance Redis

- Figures at saturation point:

2N = 20 clients

106.000 op/s total

5.300 op/s/client

--------------------------------------------

TEST 2: SMALL CLUSTER

--------------------------------------------

- M=S=3

- Figures at saturation point:

2N = 42 clients

45.000 op/s/master

3.200 op/s/client

--------------------------------------------

TEST 3: MEDIUM CLUSTER

--------------------------------------------

- M=S=7

- Figures at saturation point:

2N = 84 clients

45.000 op/s/master

3.800 op/s/client

--------------------------------------------

TEST 4: MEDIUM CLUSTER NO REPLICATION

--------------------------------------------

- M=7

- S=0

- Figures at saturation point:

2N = 84 clients

45.000 op/s/master

3.800 op/s/client

--------------------------------------------

TEST 5: LARGE CLUSTER

--------------------------------------------

- M=S=14

- Figures at saturation point:

2N = 168 clients

32.000 op/s/master

2.700 op/s/client

Josiah Carlson

unread,

Jul 7, 2014, 4:48:15 PM7/7/14

to redi...@googlegroups.com

Assuming you are doing all the right things when it comes to connection re-use on the client, as long as you are generating random keys, you should expect to see some nodes getting up to around 2-3x more traffic than other nodes, assuming that the shards are somewhat evenly distributed across the cluster. This has to do with hash collisions and the mathematics of those collisions, and nothing to do with Redis itself (aside from the choice of hash function). I can explain further if you are curious.

The reduction to 32k ops/master at the large cluster size suggests that there might be some client/network contention when sending from more clients or to more servers, but it would be hard to say without knowing more details about your client load test configuration and methodology, as well as your physical hardware configuration (machines, network, etc.).

- Josiah

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Péter Mátray

unread,

Jul 8, 2014, 4:23:58 AM7/8/14

to redi...@googlegroups.com

Hi Josiah,

First I suspected the same thing, but the case is that all Redis masters receive the same amount of requests. (That can be checked with the "info" command via the command line interface.) So the CRC32 hash does a good job as a load balancer. Even though, the CPU load of the masters are different. I wonder why that could be?

I gave the specs of my test setup in the previous mail. The clients are really simple, only do SET or GET with keys of the form "key:XXX" where XXX is a random integer between 0 and 5.000.000. The values are of the form "value:N" where N is an integer going from 0, and incremented after each request. I repeated the tests with the ruby, python and java clients available for the cluster, and the results were similar.

Let me know if there are further details that could be relevant.

Br,

Peter

Salvatore Sanfilippo

unread,

Jul 9, 2014, 10:59:56 AM7/9/14

to Redis DB

Hello Péter,

actually Redis Cluster instances do some more work compared to other
instances, they need to hash keys, and when setting / removing keys
from the data base they need to take an internal sorted set to track
which hashslot has what key. However I never measured the exact cost
of this operations. I believe a very simple way to do so is to start a
Redis Cluster node modifying by hand the configuration so that a
single master is assigned to all the hash slots, then, run a benchmark
against this instance.

Salvatore

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

"One would never undertake such a thing if one were not driven on by
some demon whom one can neither resist nor understand."
— George Orwell

Salvatore Sanfilippo

unread,

Jul 9, 2014, 11:04:46 AM7/9/14

to Redis DB

I just tried, without pipelining there is almost no difference since
the server is much more bound to the cost of syscalls.
With a pipeline of 32 operations the cluster node handles 250k
ops/sec, while the non-cluster version 500 ops/sec.
Probably it is worth to do some profiling soon or later to improve the
CPU intensive part of the data sharding.

Salvatore

Josiah Carlson

unread,

Jul 9, 2014, 12:51:13 PM7/9/14

to redi...@googlegroups.com

Replies inline.

On Tue, Jul 8, 2014 at 1:23 AM, Péter Mátray <alma....@gmail.com> wrote:

First I suspected the same thing, but the case is that all Redis masters receive the same amount of requests. (That can be checked with the "info" command via the command line interface.) So the CRC32 hash does a good job as a load balancer. Even though, the CPU load of the masters are different. I wonder why that could be?

Salvatore commented on this.

I gave the specs of my test setup in the previous mail. The clients are really simple, only do SET or GET with keys of the form "key:XXX" where XXX is a random integer between 0 and 5.000.000. The values are of the form "value:N" where N is an integer going from 0, and incremented after each request. I repeated the tests with the ruby, python and java clients available for the cluster, and the results were similar.

You did not specify machine configuration (CPU, memory, real hardware, VM, in a managed hosting provider, ...) or network configuration (interconnection speed, on AWS, some other managed network, your own network), both of which could substantially affect performance going from single server to multiple server configurations.

- Josiah

Péter Mátray

unread,

Jul 10, 2014, 8:48:26 AM7/10/14

to redi...@googlegroups.com

Hi Salvatore,

Thanks for the note. Apart from the absolute values, your test is in accordance with mine, i.e. the standalone version can handle twice as much ops/sec than the clustered one.

In the meantime I implemented a small async layer on top of the Jedis Cluster client. With that I could reach 90-100k ops/sec/client rate, and I could reach the 45.000 ops/sec/master even on the larger cluster configs.

Br,

Peter

ps.: if you (or someone else) will happen to do some profiling sometime, it would also be interesting to check if there are any asymmetries among the masters. As I wrote I usually get one or two masters filling up their CPUs first. These instances might essentially limit the overall performance of the cluster, although other masters could still have free CPU capacities to process more events. Getting the CPU load balanced across the instances could be a way to optimize performance.

Péter Mátray

unread,

Jul 10, 2014, 9:05:18 AM7/10/14

to redi...@googlegroups.com

Hi Joshia,

I gave the specs of my test setup in the previous mail. The clients are really simple, only do SET or GET with keys of the form "key:XXX" where XXX is a random integer between 0 and 5.000.000. The values are of the form "value:N" where N is an integer going from 0, and incremented after each request. I repeated the tests with the ruby, python and java clients available for the cluster, and the results were similar.

You did not specify machine configuration (CPU, memory, real hardware, VM, in a managed hosting provider, ...) or network configuration (interconnection speed, on AWS, some other managed network, your own network), both of which could substantially affect performance going from single server to multiple server configurations.

I gave the specs as "all processes evenly spread across 7 physical servers (~64 GB memory each, 1 Gbe network)". The servers have 12 physical CPUs plus hyperthreading (I didn't consider this information relevant as Redis is single threaded). The machines are connected via a non-blocking switch, so in effect there's 1 Gbe network connection between any pair of servers in both directions. During the tests I monitored the network during the test to make sure that the benchmark traffic did not sature the links. The Redis Cluster's performance was definitely limited by the CPU of one or two master instances.