Starting a Riak cluster using Docker that uses the Leveldb backend

22 views
Skip to first unread message

w...@resilia.nl

unread,
Feb 27, 2019, 3:31:52 PM2/27/19
to riak-users
Hello everyone,

I am trying to set up a Riak cluster using a `docker-compose.yml` file. It works with the bitcask backend, but when switching to leveldb, the nodes never start; they seem to timeout.

I have no idea how to properly debug this. Can someone tell me what I am doing wrong?

The `docker-compose.yml`-file and the `riak.conf` that it uses can be found here:


Thank you!

~Wiebe-Marten Wijnja / Qqwy

bryanhu...@gmail.com

unread,
Feb 28, 2019, 5:06:24 AM2/28/19
to riak-users
Wiebe-Marten,

File must have been saved in a windows editor - it's got a load of windows newline characters - https://stackoverflow.com/a/5843561/9958902

I've created a PR against your repo with the fixed file - https://github.com/Qqwy/riak-cluster-test/pull/1

I haven't tested anything else - so there may still be issues - those basho riak docker images are very old at this stage - we may create our own if there's sufficient interest - docker-compose is quite difficult to coordinate cluster setup - kubernetes seems more suitable for Riak clusters.

Regards,

Bryan Hunt

w...@resilia.nl

unread,
Feb 28, 2019, 6:19:02 AM2/28/19
to riak-users
Thank you for your response!

Very odd, since that file was taken from _within_ the docuer images (running `docker-compose run coordinator cat /etc/riak/riak.conf > riak.conf`).
In any case, even with having the newlines replaced, the combined image still does not start in the same way.

Do note that when using `bitcask` instead of `leveldb` it starts without a problem. Maybe the problem is related to some folders where `leveldb` writes it's data not being available?
How can something like this be debugged?

Thank you!,

~W-M

Martin Sumner

unread,
Feb 28, 2019, 6:28:11 AM2/28/19
to w...@resilia.nl, riak-users
In the riak.conf in your repo the storage_backend is set to multi not to leveldb (https://github.com/Qqwy/riak-cluster-test/blob/master/riak.conf#L335), so you should set this to leveldb

By default leveldb will be persisted to a leveldb folder within your platform_data_dir (which you have set to `platform_data_dir = /var/lib/riak`).

If you look in you log folder (in your riak.conf you have this set to `platform_log_dir = /var/log/riak`), you should see evidence of why it failed.  Normally a failure to start will result in a crash log being generated.

--
You received this message because you are subscribed to the Google Groups "riak-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to riak-users+...@googlegroups.com.
To post to this group, send email to riak-...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/riak-users/ba474b4d-06c1-4551-a32f-96712337081d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

w...@resilia.nl

unread,
Feb 28, 2019, 6:46:49 AM2/28/19
to riak-users
 I have tried it on both `multi` and `leveldb` with the same results.
Checking `/var/log/riak.erlang.log.1` of the coordinator node, we find:

$ sudo docker-compose exec coordinator tail -f /var/log/riak/erlang.log.1

=====
===== LOGGING STARTED Thu Feb 28 11:38:41 UTC 2019
=====
config
is OK
-config /var/lib/riak/generated.configs/app.2019.02.28.11.38.45.config -args_file /var/lib/riak/generated.configs/vm.2019.02.28.11.38.45.args -vm_args /var/lib/riak/generated.configs/vm.2019.02.28.11.38.45.args
Exec:  /usr/lib/riak/erts-5.10.3/bin/erlexec -boot /usr/lib/riak/releases/2.2.3/riak               -config /var/lib/riak/generated.configs/app.2019.02.28.11.38.45.config -args_file /var/lib/riak/generated.configs/vm.2019.02.28.11.38.45.args -vm_args /var/lib/riak/generated.configs/vm.2019.02.28.11.38.45.args              -pa /usr/lib/riak/lib/basho-patches -- console
Root: /usr/lib/riak
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_api,{{shutdown,{failed_to_start_child,\"pb://\\"172.17.0.6\\":8087\",{bad_return_value,{error,eaddrnotavail}}}},{riak_api_app,start,[normal,[]]}}}"}

Crash dump was written to: /var/log/riak/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,riak_api,{{shutdown,{failed_to_start_child,"pb://\"172.17.0.6\":8087",{bad_return_value,{error,eaddrnotavail}}}},{riak_api_

So it seems it tries to connect to one of thte child nodes(?) from the coordinator and because it fails to do so, it fails to start?

Hmm...

Martin Sumner

unread,
Feb 28, 2019, 7:09:41 AM2/28/19
to w...@resilia.nl, riak-users
That may indicate that there's already something listening on the IP/port 172.17.0.6:8087, or that is not a local IP.  I don't think this is trying to connect, it is trying to start a process locally (that is the child referred to) that wants to own that IP/port and reports that it can't due to eaddrnotavail

--
You received this message because you are subscribed to the Google Groups "riak-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to riak-users+...@googlegroups.com.
To post to this group, send email to riak-...@googlegroups.com.

w.m.w...@student.rug.nl

unread,
Mar 7, 2019, 7:42:10 AM3/7/19
to riak-users
I am fairly certain that 172.17.0.6 is one of the (local, virtual, docker-assigned) IPs of one of the other nodes. I have no idea how to investigate or fix this further.

What would be another way to run a group of Riak-nodes, such that its distributed nature w.r.t. how it would interact with our application and what happens in the case of partitions can be tested?

Thank you very much for your hard work and all your help, Martin!

~Marten/Qqwy

Martin Sumner

unread,
Mar 7, 2019, 8:10:57 AM3/7/19
to w.m.w...@student.rug.nl, riak-users
You can run `make devrel` to create a cluster on a single node.

The `riak_test` library is where we test things like partitions on a single node.  This is how riak_test creates a partition - https://github.com/basho/riak_test/blob/develop-2.9/src/rt.erl#L544-L550

There are a number of tests (in the tests folder of riak_test) that call `rt:partition/2` - you can search through the repo for examples.  

Martin

w...@resilia.nl

unread,
Apr 2, 2019, 6:35:38 AM4/2/19
to riak-users
It turns out that the reason that it did not work, is that the Riak docker file has a very peculiar way of including special documentation: It uses a `sed`-based script to grab all lines from a `user.conf` (which you are supposed to have mounted in the `/etc/riak` directory in your docker configuration) and adds those lines, splitting them by the `=`. This means that replacing the `riak.conf`-file itself will probably not work right.

So with that change, I was able to configure Riak running as a docker cluster successfully. The nodes now also start up properly.
The only thing I'd like to resolve at some point, is that this runs riak KV version 2.1.7-0-gbd8e312 which is multiple versions out of date.
Obviously running the 2.9 pre-release would be much nicer :-).

Thanks for your help!
Reply all
Reply to author
Forward
0 new messages