Poor etcd performance when running benchmark against memory-backed etcd cluster

Kalpesh Padia

unread,

Jun 6, 2023, 2:07:15 PM6/6/23

to etcd-dev

Hello team,

Recently we ran the official etcd benchmark tool to benchmark our etcd cluster and obtained very poor results.

Setup details:

etcd Version: 3.5.4

Number of nodes/members in etcd cluster: 3

Deployment details: etcd pods deployed on the same Kubernetes cluster; pods are scheduled on different yet co-located kubernetes nodes. These nodes do not have any other workloads running on them and at no point during any tests exhibited any io/memory/cpu stress.

Resource requests/limits: cpu - 16/32 ; memory: 128GiB/128GiB

etcd data directory: "memory-backed" emptydir mounted on the pods; size of this empty dir is set to 64000000000 bytes (64GB).

Environment variables: ETCD_QUOTA_BACKEND_BYTES = 64000000000 ; ETCD_SNAPSHOT_COUNT = 100000.

Configuration parameters/arguments passed: heartbeat-interval = 250 ; election-timeout = 2000 ; other etcd configuration params left at their default values

Expectations:

With a memory backed etcd we expect that the benchmark results should be at least as fast as the official numbers and possibly even faster.

Benchmark Results:

We first ran the write benchmark using the same commands shared on the official benchmark results page. To our surprise we obtained very dismal results:

|---------|----------|------------|-------------|-----------|-------------|-----------|-----------------|
| Number | Key size | Value size | Number of | Number of | Target etcd | Average | Average latency |
| of keys | in bytes | in bytes | connections | clients | server | write QPS | per request |
|---------|----------|------------|-------------|-----------|-------------|-----------|-----------------|
| 10,000 | 8 | 256 | 1 | 1 | leader only | 1805 | 0.5ms |
| 100,000 | 8 | 256 | 100 | 1000 | leader only | 12,355 | 90.8ms |
| 100,000 | 8 | 256 | 100 | 1000 | all members | 12,351 | 95.3ms |
|---------|----------|------------|-------------|-----------|-------------|-----------|-----------------|

As can be seen above, while the QPS for the first test surpasses the official result (expected), the other two tests report a QPS that is about quarter of the official results. The average latency/request also seems to be about 4x the official numbers for these tests.

We didn't perform read tests since we were surprised by the results of the write tests and wanted to dig deeper. We therefore decided to first run the etcdctl check perf tool to check if our cluster passes the tests for various load sizes and then also ran fio against the etcd data dir to check the performance of our memory.

Results of etcdctl check perf:
|------|-------|-------------|----------|--------|
| load | QPS | Slowest | Stddev | Result |
| size | | request (s) | (s) | |
|------|-------|-------------|----------|--------|
| s | 151 | 0.003675 | 0.000225 | Pass |
| m | 997 | 0.006241 | 0.000226 | Pass |
| l | 7885 | 0.040049 | 0.001170 | Pass |
| xl | 14126 | 0.151048 | 0.008136 | Pass |
|------|-------|-------------|----------|--------|

The xl load size option uses 1000 clients to issue write requests with key size 256 bytes and value size 1024 bytes for 60s. The resultant QPS is similar to the QPS observed in the benchmark test with 1000 clients, and barely crosses the pass criteria (13500). While this gives assurance that the cluster is fast enough (load tests pass) it is still not fast enough (benchmark numbers are still too slow).

Results of running fio:

Before we ran fio, we ran strace while the first benchmark test was running to check the average bs in write calls made to the WAL file. We found this to be 4767 bytes. We used this to run fio test using the command:

for i in 476700000 4767000000 47670000000; do
fio --rw=write --ioengine=sync --fdatasync=1 --size=${i}b --bs=4767 --filename=/var/etcd/fio-test --name=write_test
done

We obtained the following results:
|-------------|---------|----------|---------------|---------|
| size | IOPS | p99 clat | p99 fdatasync | BW |
| (bytes) | (avg) | (us) | (us) | (MiB/s) |
|-------------|---------|----------|---------------|---------|
| 476700000 | 2380000 | 5.472 | 0.596 | 1080 |
| 4767000000 | 2390000 | 5.408 | 0.612 | 1087 |
| 47670000000 | 2460000 | 5.344 | 0.652 | 1120 |
|-------------|---------|----------|---------------|---------|

The above results suggest that our memory is fast enough and is able to provide a throughput that is many orders of magnitude greater than what we are seeing with etcd benchmark tests.

Question:

Based on the above results we feel that while our memory is unlikely to be a bottleneck, something about our setup is sub-optimal which is causing such poor performance. Can you please help us with the following:

What seems to be the likely cause for our results shared above? What should we check/focus on?
What are some suggestions to better tune our etcd cluster?
Please share the etcd params/configurations used when performing the official benchmark tests. We would like to use the same config to replicate the results on our end.

Thanks.

Josh Berkus

unread,

Jun 6, 2023, 2:23:44 PM6/6/23

to Kalpesh Padia, etcd-dev

On 6/6/23 11:07, 'Kalpesh Padia' via etcd-dev wrote:
> *_Expectations:_*

> With a memory backed etcd we expect that the benchmark results should be
> at least as fast as the official numbers

> <https://etcd.io/docs/v3.5/op-guide/performance/#benchmarks> and
> possibly even faster.
>

What do you mean by "memory backed". How is your storage set up, exactly?

--
-- Josh Berkus
Kubernetes Community Architect
OSPO, OCTO

Kalpesh Padia

unread,

Jun 6, 2023, 3:25:00 PM6/6/23

to etcd-dev

Hi Josh,

The emptyDir that is mounted for etcd storage has its medium set to memory. Here's a redacted version of the etcd pod's spec:

apiVersion: v1
kind: Pod

...

spec:

containers:
- command:
- /bin/sh
- -ec
- /usr/local/bin/etcd --data-dir=/var/etcd/data ....

...

volumeMounts:
- mountPath: /var/etcd
name: etcd-data

...

volumes:
- emptyDir:
medium: Memory
sizeLimit: 62500000Ki
name: etcd-data

...

Josh Berkus

unread,

Jun 6, 2023, 6:05:56 PM6/6/23

to Kalpesh Padia, etcd-dev

On 6/6/23 12:24, 'Kalpesh Padia' via etcd-dev wrote:
> The emptyDir that is mounted for etcd storage has its medium set to

> memory. <https://kubernetes.io/docs/concepts/storage/volumes/#emptydir>

> Here's a redacted version of the etcd pod's spec:
>

Just to knock out the easy possibilities: what's memory utilization look
like on the servers where Etcd is running? During the benchmark, that is.

Kalpesh Padia

unread,

Jun 6, 2023, 6:32:07 PM6/6/23

to etcd-dev

Here are the CPU/Memory utilization graphs during the three write benchmarks.

10,000 keys, 1 conn, 1 client

100,000 keys, 100 conn, 1000 client, target leader true

100,000 keys, 100 conn, 1000 client, target leader false

Kalpesh Padia

unread,

Jun 9, 2023, 2:42:40 PM6/9/23

to etcd-dev

Hi Josh and etcd community,

Were you able to obtain some insights on this issue? We would appreciate any help that you can provide.

Thanks.

Reply all

Reply to author

Forward