About Riak core lite project

Tak

unread,

Feb 21, 2020, 9:42:02 AM2/21/20

to gsoc-erlef

Hello!

I am interested in participating in the riak-core lite project #1 idea, which is to implement a dynamo-style KV-Store on top of riak-core lite.

I have one question.

In this project, it says to create a reference implementation of KV-Store but how is that different to the one in this tutorial?

I know that in the tutorial it uses the full riak-core and what we want is on riak-core-lite.

But what would be the main difference between implementing KV on top of riak-core and riak-core-lite?

Mariano Guerra

unread,

Feb 24, 2020, 9:32:59 AM2/24/20

to gsoc-erlef

hi, thanks for your interest in this project!

The main difference is that riak core lite is a modern and simplified version of riak_core which enables applications built on top of it to stay up to date, they are also easier to setup and run on more architectures and operative systems due to it being pure erlang.

The idea is to try to exercise all the APIs exposed by riak core, and have a reference implementation that is documented and maintained to test riak core lite changes, detect regressions, run benchmarks and use as the default starting point to try new ideas and show how different parts of riak core lite are used in practice.

Also it will be used as the foundation for other projects that will test this reference implementation (and riak core by extension) by using different testing methodologies.

It would be nice to have 2 implementations, one in Erlang and one in Elixir, that are as close as possible to each other in terms of features and architecture.

The following is a list of features that can be implemented in order, each point leaves a useful service, the more that can be implemented the better.

GET, PUT, DELETE with bucket support with ets backend

Handoff

List buckets

List keys in a bucket

HTTP API

Pluggable persistence backends, add dets support

Add leveled support (https://github.com/martinsumner/leveled)

Redis API for commands that make sense (https://github.com/marianoguerra-atik/ameo)

Change feed subscription for keys (PUB/SUB)

Quotas and eviction

Bucket permissions (requires authentication/authorization)

Alternatively, you could implement something different like a message queue, a pubsub (MQTT, Kafkaish) or a time series db service, we can discuss alternatives that show how riak core can be used to implement other kinds of services.

Tak

unread,

Feb 25, 2020, 9:32:35 AM2/25/20

to gsoc-erlef

Hi Mariano. Thanks for reply!

I forgot to introduce myself.

I am Riki and I am a sophomore at University of Tokyo.

I have experience playing around with Basho products including riak core (I did your tutorial as well!) and I know a bit about them.

Here are my thoughts about the ideas that you brought up.

GET, PUT, DELETE with bucket support with ets backend
Handoff

HTTP API

I think these are pretty much explained in your riak core tutorial so it would not be difficult.
I am not sure how much this related but I have one thought about the consistency model. What kind of behaviour do think would be good if nodeA and nodeB updates the same data at the same time? e.g. the following commands are called in order
- 1.nodeA reads X,
- 2.nodeB reads X,
- 3.nodeA updates X to Y,
- 4.nodesB updates X to Z,
then the remaining data should be X or Y or Z or undefined??

List buckets
Bucket permissions (requires authentication/authorization)

What kind of data structure are you thinking to keep the name of buckets? Dictionary? Table? Either way if they are kept as one global data, I think it is necessary to make the creation of bucket atomic, meaning that no node should be able create a bucket while one is modifying the global data. e.g. In Riak CS (another basho product), it uses something called "Stanchion" to serialize the request of making buckets (which lowers the availability because Stanchion runs on one server but makes it very easy to list up the buckets).

List keys in a bucket

A simple implentation would be to use coverage to make all vnodes reply their entries in ets. However, I think this would involve a lot of network access which probably makes it slow. I reckon that is the reason why Riak says this way of listing keys should only be used in production level (e.g. tests) (Ref). This is my random thought but if riak-core-lite were to be used as backend of FUSE which is like a file system emulator where something like "ls Bucket" is called very often, listing keys this way might not be a good idea. One idea might be to keep a key-value which holds all the keys inside a bucket (This might need some atomic operaiton when adding KV in a bucket.) But again, for simplicity, using coverage for listing keys would be good for reference implementation.

Pluggable persistence backends, add dets support
Add leveled support (https://github.com/martinsumner/leveled)

I think the implemetation of Riak KV would be a good reference to this. (It uses leveled/bitbucket)

Redis API for commands that make sense (https://github.com/marianoguerra-atik/ameo)

Quotas and eviction

I am not sure. What is Quotas and eviction?

Change feed subscription for keys (PUB/SUB)

MQTT in riak core sounds interesting.

I have one problem about environment setup.

I am currently using macOS and I tried to compile your tanodb project but it failed. (I think this is because Catalina has changed some stuff around /usr/include/ libraries)

So I decided to compile in it the virtual environment ubuntu18.04

I installed erlang OTP 20 with apt-get install with rebar 3.1 and when I "make" it, it says

[info] Application x started on node y

...

[warning] No ring file available.

and crashes.

I am not sure what is wrong with my environment...,

do you have any idea how to fix it or do you know any easy tools to set up the enivronment such as Vagrantfile for riak core ??

2020年2月24日月曜日 23時32分59秒 UTC+9 Mariano Guerra:

Albert Schimpf

unread,

Feb 26, 2020, 5:52:12 AM2/26/20

to gsoc-erlef

Hi Riki!

I think I can answer some of your questions.

> I am not sure how much this related but I have one thought about the consistency model. What kind of behavior do think would be good if nodeA and nodeB updates the same data at the same time?

This is up to the application built on top and that's the hard part :)

There are many approaches, just a quick selection:

Propagate the conflict to the user and let him decide how to solve the conflict
Decide on a consistency model to resolve these conflicts

Strong consistency which requires extensive synchronization
Eventual consistency, e.g. a last-writer-wins global timestamp mechanism, where the 'latest' state is used after network split
Causal consistency, e.g. with vectorclocks and CRDTs, where the new state is defined by how the merge of X and Y is defined. The merge results in the same (materialized) state in both replicas and is always non-conflicting

Or any other mechanism you want to try out and implement. There is quite a lot of research on consistency models for data bases and there are many different approaches to this specific problem.

> am currently using macOS and I tried to compile your tanodb project

> I installed erlang OTP 20 with apt-get install with rebar 3.1 and when I "make" it, it says

The TanoDB project is not yet ported to riak_core_lite. There may be small sublteties which need to be changed in the build configuration to make the TanoDB project work.

If you want to try it out, you could start with the 'get started' tutorial: https://riak-core-lite.github.io/

Furthermore, the latest Mac update broke something for riak_core users, I think (concerning eleveldb), but that does not affect riak_core_lite (as it is pure Erlang).
Also, riak_core_lite does not target OTP 20, try OTP > 21 and use the recommended rebar version in the repository, if possible (3.13.0).

> easy tools to set up the enivronment such as Vagrantfile for riak core

The goal of riak_core_lite is to ensure that a recent OTP version and rebar is enough for the set up of your environment, it should work using macOS without using a virtual environment.

Tak

unread,

Feb 26, 2020, 8:35:52 AM2/26/20

to gsoc-erlef

Hey Albert!

Thanks for your reply.

I really like thinking about consistency model and how they are implemented. (This paper that I read in computer architecture class was fascinating)

I saw the video on the AntidoteDB page which implements casual consistency. Very interesting!

As for this project, I would like to make a reference implementation of these main consistency model so that users can choose from them according to their needs.

- strong consistency model

- eventual consistency model

- casual consistency model

What do you think about it?

And also how should I start? Should I start coding?

By the way, I was able to compile riak-core-lite on my mac thanks to your help!

Albert Schimpf

unread,

Feb 27, 2020, 4:46:38 AM2/27/20

to gsoc-erlef

I would recommend to collect your requirements first before starting to code. Some pointers:

Consistency model

Transactions?

Dynamic scaling
Replication, especially in conjunction with the consistency model
Client interaction?
What should happen in an overload situation (one node or whole cluster overloaded)

Message losses
Node crashes
...

If you want to provide multiple consistency models, try to select and focus on one first.

You can implement a simple in-memory key-value store in with riak_core_lite with dynamic scaling and play around with that to get an idea of what you need to do.

Then start thinking about your scope and what you want to do in your proposal.

Tak

unread,

Feb 27, 2020, 12:36:49 PM2/27/20

to gsoc-erlef

Hey Albert!

I started implementing a simple key value store in riak-core-lite. Here is my code.

In the mean time, I will start writing a draft of the proposal regarding the consistency models and the concepts that you raised.

I would appreciate if you and the other mentors review my draft via mail or google docs because I feel this place is too open to share the proposal.