etcd as a general-purpose key-value store

georgia.k

unread,

Dec 16, 2016, 1:48:04 PM12/16/16

to etcd-dev

Hello etcd team,

I'm interested in using etcd as a general-purpose key-value store, due
to its high availability and strong consistency guarantees.

Recently, I read that etcd has a storage size limit of 8GB in order to
avoid performance degradation.

[https://github.com/coreos/etcd/blob/master/Documentation/faq.md#deployment]
[https://github.com/coreos/etcd/blob/master/Documentation/op-guide/maintenance.md#space-quota]

Is there any strict constraint in its architecture that imposes this limit?

I do understand that etcd is designed to be used as a metadata store
and not for storing a large amount of data. However, as I will be
running it on a 64-bit system with enough memory, I would like to have
a larger space quota. I was thinking of writing a patch to make the
quota configurable beyond 8GB. Do you think that would make sense?
What kind of consequences would it have?

Also, I understand that the addition of new members will be seriously
impacted by the size of the dataset. Can we overcome this issue by
ensuring that new members will always have a recent snapshot of the
dataset?

What is more, BoltDB according to its developer can scale up to
1TB [https://github.com/boltdb/bolt#project-status].
Does the 8GB limit arise from Raft or etcd itself, or is it related to BoltDB
in any way? Could a different storage engine (e.g. RocksDB) solve this
problem?

Finally, when chosing a storage engine did you perform any comparisons
between BoltDB and RocksDB? What were the reasons that led you to the
choice of BoltDB?

Thanks in advance,
Georgia

Xiang Li

unread,

Dec 16, 2016, 2:19:05 PM12/16/16

to georgia.k, etcd-dev

Hi Georgia,

This is a great question.

The main reason of the size limit is MTTR `mean time to recovery`.

etcd is designed to be a highly available storage. It replicates all data across all nodes. If you lose a etcd member, simply adding a member should bring

the cluster back to full health state within 10s of seconds with little impact on the overall performance.

Typically, we can recover 2GB of data within 20 seconds on good hardware. We cannot do that for 1TB data due to today's hardware limitation. If you do not care about MTTR, you can in theory to store 1TB in etcd.

There are people making multiple consistent kv groups into a logic unified kv space: see Tikv or Cockroachdb's kv layer. Basically the physical consistent storage layer is well partitioned, with a logic proxy layer to make the kv space seems to be unified. But that all comes with cost, and a pretty expensive cost for cross kv groups consistency.

etcd's main use case is for storing metadata. We want to ensure there is no additional cost for that use case. We might make etcd proxy to be able to talk to multiple actual etcd clusters with some API limitation in the future. So you can get the same feeling that etcd is horizontal scalable.

Hope this helps.

--
You received this message because you are subscribed to the Google Groups "etcd-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to etcd-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

georgia.k

unread,

Dec 16, 2016, 3:25:56 PM12/16/16

to etcd-dev, georg...@yahoo.com

Hi Xiang,

Thank you for your quick response. It is indeed very helpful.

Regarding RocksDB, did you consider it when selecting a storage engine for etcd?
What made you prefer BoltDB?

Regards,
Georgia

To unsubscribe from this group and stop receiving emails from it, send an email to etcd-dev+u...@googlegroups.com.

Reply all

Reply to author

Forward