[ANN] ram, an in-memory distributed KV store

48 views
Skip to first unread message

Roberto Ostinelli

unread,
Dec 21, 2021, 8:57:33 AM12/21/21
to erlang-q...@erlang.org
Let’s write a database! Well not really, but I think it’s a little sad that there doesn’t seem to be a simple in-memory distributed KV database in Erlang. Many times all I need is a consistent distributed ETS table.

The two main ones I normally consider are:

  • Riak which is great, it handles loads of data and is based on DHTs. This means that when there are cluster changes there is a need for redistribution of data and the process needs to be properly managed, with handoffs and so on. It is really great but it’s eventually consistent and on many occasions it may be overkill when all I’m looking for is a simple in-memory ACI(not D) KV solution which can have 100% of its data replicated on every node.
  • mnesia which could be it, but unfortunately requires special attention when initializing tables and making them distributed (which is tricky), handles net splits very badly, needs hacks to resolve conflicts, and does not really support dynamic clusters (additions can be kind of ok, but for instance you can’t remove nodes unless you stop the app).
  • …other solutions? In general people end up using Foundation DB or REDIS (which has master-slave replication), so external from the beam. Pity, no?

So… :) Well I don’t plan to write a database (since ETS is awesome), rather distributing it in a cluster. I’d simply want a distributed ETS solution after all!

I’ve already started the work and released a version 0.1.0 or ram:
https://github.com/ostinelli/ram

Docs are here:
https://hexdocs.pm/ram

Please note this is a very early stage. It started as an experiment and it might remain one. So feedback is welcome to decide its future!

Best,
r.

Marc Worrell

unread,
Dec 21, 2021, 9:45:22 AM12/21/21
to Roberto Ostinelli, erlang-q...@erlang.org
We have https://github.com/zotonic/depcache

This is a NON-distributed caching system which:

 - tracks dependencies between keys
 - cache expiration
 - local in process memoization of lookups
 - automatic cleanup with a max memory usage (soft limit)

I have been pondering to make this distributed for a long time….

Cheers,

Marc

Jesse Gumm

unread,
Dec 21, 2021, 10:40:52 AM12/21/21
to Marc Worrell, Erlang Questions
Very cool.  Also, interesting timing.

Just about a week ago, I started work on a distributed KV store for session management (mostly for Nitrogen, but no reason it can't be general purpose).


The distributed component is not yet functional (that's today's focus), but the point of this is to be:

* fully replicated on all nodes
* self healing
* expiration based on last session access time
* data persistence (because I don't like losing session data on a VM restart for a single-node webapp)
* easy adding or dropping from the cluster

So it's definitely not a general purpose KV by any stretch, but it's interesting the overlap here at the same time (our initial commits on these projects were less than a day apart).

I will definitely be keeping an eye on ram.

-Jesse

--
Jesse Gumm
Owner, Sigma Star Systems
414.940.4866 || sigma-star.com || @jessegumm

Roger Lipscombe

unread,
Dec 21, 2021, 10:44:30 AM12/21/21
to Marc Worrell, erlang-q...@erlang.org
While we're comparing our cache implementations...

We have https://github.com/electricimp/ei_cache.

- ETS-backed.
- It's non-distributed.
- It IS (very) stampede-resistant. We've been using it in production
for ~6 years.
- It does NOT have expiry or eviction. Until recently, we've not
actually needed it. I'll probably be adding it in the new year.

Attila Rajmund Nohl

unread,
Dec 22, 2021, 7:51:19 AM12/22/21
to Erlang
Roberto Ostinelli <osti...@gmail.com> ezt írta (időpont: 2021. dec.
21., K, 14:57):
>
> Let’s write a database! Well not really, but I think it’s a little sad that there doesn’t seem to be a simple in-memory distributed KV database in Erlang. Many times all I need is a consistent distributed ETS table.
>
> The two main ones I normally consider are:
>
> Riak which is great, it handles loads of data and is based on DHTs. This means that when there are cluster changes there is a need for redistribution of data and the process needs to be properly managed, with handoffs and so on. It is really great but it’s eventually consistent and on many occasions it may be overkill when all I’m looking for is a simple in-memory ACI(not D) KV solution which can have 100% of its data replicated on every node.
> mnesia which could be it, but unfortunately requires special attention when initializing tables and making them distributed (which is tricky), handles net splits very badly, needs hacks to resolve conflicts, and does not really support dynamic clusters (additions can be kind of ok, but for instance you can’t remove nodes unless you stop the app).
> …other solutions? In general people end up using Foundation DB or REDIS (which has master-slave replication), so external from the beam. Pity, no?

Have you seen this: https://gitlab.com/leapsight/plum_db ? It's only
eventually consistent, but if you want distribution and availability
even in case of network partitioning, you won't get consistency...

Benoit Chesneau

unread,
Dec 22, 2021, 8:23:28 AM12/22/21
to Attila Rajmund Nohl, Erlang
while we are here let's add cached to the comparison: 

Only the experiment is public for now. It has different strategies to store k/v in memory and distribute it. Distribution is plugable and by default rely on erlang distribution. 

Benoit Chesneau

unread,
Dec 22, 2021, 8:33:36 AM12/22/21
to Attila Rajmund Nohl, Erlang
I forgot to describe strategies:

* non blocking LRU, LRU cache with eviction over ets. Eviction is done asynchronously using garbage collection. A pool of keys among the less used are evicted. This prevents evicting the keys too often and increases the latency. The algorithm is similar to the one used in Redis [1]
* volatile: pure cache maintained in a gen process. Keys to be evicted are maintained in a list
* tier file: part of the cache is maintained in memory, part is maintained on disk for persistence in an append only manner. This allows the cache tobe persistent across upgrades/restarts.

Note: you can pass your own cache module if you need to. You can also pass a hook to be executed on eviction on all caches backends.

Distribution is done by maintaining a group of peers. For now it is using PG2 and erlang distribution. This is similar to Group Cache. The distribution backend itself is pluggable. For some customers a true P2P distribution is used for example.

Anyway I may release the stable version if anyone sees an interest in it.

Benoît


Benoit Chesneau

unread,
Dec 22, 2021, 8:47:20 AM12/22/21
to Roberto Ostinelli, Erlang Questions
This is kind of refreshing. I like the simplicity of the goal. I did a similar thing , not a database but something that can be distributed by linking nodes. Atomicity is done inside a batch and you have CAS features as well. Storage can be persisted or in memory. Default backend is roclksdb or rocksdb in temporary db in RAM.


and you have an HTTP frontend also that was started:  https://gitlab.com/benoitc/openkvs-http

I am revisiting all of these projects these days , time is limited sometimes :) Maybe we can find common goal.

Benoît

Led

unread,
Dec 22, 2021, 9:43:36 AM12/22/21
to Erlang-Questions Questions
We have successfully used https://github.com/jr0senblum/jc with
additional add-on for gen_server and memcached protocols and lazy TTL
reaping (built-in TTL-engine is useless in multimaster mode) as fast
distributed kv (session) storage.

--
Led.

Michael Truog

unread,
Dec 22, 2021, 5:56:16 PM12/22/21
to Roberto Ostinelli, erlang-q...@erlang.org
I added cloudi_crdt for internal (Erlang/Elxiir/etc.) CloudI services to have an in-memory distributed KV database.  It uses a POLog CRDT with the data stored in an Erlang map locally (reads are accessing the local Erlang map).  There is additional functionality (bootstrap and clean_vclocks) to ensure fault-tolerance problems are handled (e.g., service processes crashing, netsplits, etc.).

The configuration does need to know the number of nodes that will be used, though a node can be replaced without problems.  The amount of messaging with a POLog means that your node count is best kept lower unless you have fast hardware (so use 4 nodes and think twice before trying 64 nodes).  There is basic use in cloudi_service_request_rate and more complex use in cloudi_service_funnel.  The cloudi_service_request_rate service does a loadtest of the CloudI service request throughput that is sustained without any timeouts.  The cloudi_service_funnel service is able to group multiple service requests into a single service request as a proxy, so it could be used for a distributed fault-tolerant cron setup (with cloudi_service_cron).  The relevant links are below:

https://github.com/CloudI/CloudI/blob/develop/src/lib/cloudi_core/src/cloudi_crdt.erl
https://github.com/CloudI/CloudI/blob/develop/src/lib/cloudi_service_request_rate/src/cloudi_service_request_rate.erl
https://github.com/CloudI/CloudI/blob/develop/src/lib/cloudi_service_funnel/src/cloudi_service_funnel.erl

Best Regards,
Michael
Reply all
Reply to author
Forward
0 new messages