Inherited Riak DB, want to dump it out with goriakpbc

192 views
Skip to first unread message

Ken GoNoob

unread,
Dec 26, 2013, 3:14:17 PM12/26/13
to golan...@googlegroups.com
Using goriakpbc, I have gotten an archive copy of a DB that has a bunch of unknown data in it. I can get all the keys out of a particular bucket, but
would like to get all of the bucket names first to see how many there are, and what their names are.

Jesper Louis Andersen

unread,
Dec 27, 2013, 7:55:06 AM12/27/13
to Ken GoNoob, golang-nuts
You will probably have to add the ListBuckets call to the goriakpbc client. Protobufs got this thing added in a fairly late version. Also note


which somewhat recommends against making a list buckets call. It requires you to have every node involved in the cluster to satisfy it. And the same can be said for the list keys call. In many ways, Riak works like a distributed venti(8) store of plan9 fame. You need to be wary of things that operates on the data as a whole.

If, on the other hand, your database is rather smallish in size, then your option is viable. You probably want to add the missing code to the client and then list the keys in question. From a quick skim, the driver looks somewhat complete, so this should not take a whole lot of time.


On Thu, Dec 26, 2013 at 9:14 PM, Ken GoNoob <drke...@gmail.com> wrote:
Using goriakpbc, I have gotten an archive copy of a DB that has a bunch of unknown data in it. I can get all the keys out of a particular bucket, but
would like to get all of the bucket names first to see how many there are, and what their names are.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
J.

Ken MacDonald

unread,
Dec 27, 2013, 7:36:56 PM12/27/13
to Jesper Louis Andersen, golang-nuts
Hi Jesper,
Thanks, just trying to have a tool to explore the Riak DB I inherited, not likely a production thing, except for (maybe) really exceptional circumstances. DB is relatively small for now, and we aim to have other methods in place - eventually - to be able to keep track of related data elsewhere. Looking to access some of the extant data to see if new data we are creating is compatible. Liking Riak a lot except for lack of toooooooooolz :-).

Frank Schröder

unread,
Dec 28, 2013, 4:19:50 AM12/28/13
to golan...@googlegroups.com, Jesper Louis Andersen
Riak tool support is not very good when it comes to looking and extracting the data from the data store. Basically, you cannot ask Riak what data it has stored since you neither can't list buckets nor keys, you cannot extract the data if the number of keys becomes too big and you cannot search for the data if you get creative about your data encoding (i.e. something other than JSON, e.g. like gripped JSON since data size has an impact on performance) since a) map-red jobs are slow and b) map-red jobs in JS can only decode JSON. 

So, yes Riak is interesting but I suggest you familiarise yourself with the constraints and decide well whether you can live with them. Tool support is not going to improve in the near future since some of the issues are inherent properties of the data store. 

Charl Matthee

unread,
Dec 30, 2013, 1:25:22 AM12/30/13
to golan...@googlegroups.com
One can list keys (http://docs.basho.com/riak/latest/dev/references/http/list-keys) and buckets (http://docs.basho.com/riak/1.3.0/references/apis/http/HTTP-List-Buckets/) via the HTTP API but in both cases Basho does not recommend doing this in production as this requires traversing all keys stored in the cluster.

In practice the impact of running these depend on the size of your cluster nodes and workload they are performing.

My suggestion would be to use the HTTP API rather and ask riak for JSON documents (by specifying the 'Content-Type: application/json' header). This way you can leverage go's http and json libs.

I also use the streaming query parameter (add '?keys=true' to the URL for listing keys) to turn on HTTP 1.1 chunked transfer encoding. Just be aware that if you're collecting JSON documents the chunks may not be split on the serialized object boundaries.

Charl Matthee

unread,
Dec 30, 2013, 1:25:22 AM12/30/13
to golan...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages