How to use range-based queries and links on Riak Data Types?

5 views
Skip to first unread message

w...@resilia.nl

unread,
Mar 25, 2019, 4:12:00 AM3/25/19
to riak-users
 
Hello there,

I recently started working on a Riak Ecto adapter to allow people in Elixir to easily store map-based data structures in Riak. (https://github.com/Qqwy/elixir_riak_ecto3)
The core functionality works; the main thing to add now would be support for has-one, has-many, belongs-to (and potentially has-and-belongs-to-many) relationships. To model these relationships in Riak, I can think of three ways to model this in Riak:

1. Only use a `belongs_to` field on the inner data structure (so if a Message belongs to a User, it contains a `user_id`. We might then search for it using Solr.
2. Use the 'links' feature to create a bidirectional link (still have a `user_id` on the Message, but also `message_ids` in the links metadata)
3. Use an actual field (of type Set) in the outer Map CRDT containing references to all nested structures ('User.message_ids').

(1) might be the most simple and natural to use, but the fact that we have to wait approximately a second before Solr is able to see an inserted/altered data structure is a deal breaker.
(2) on the other hand might work fine for normal KV-values, but the documentation is unclear on how to use Links or 2i (secondary indexes) on CRDTs (Riak Data Types) instead of normal values.
So unless there is a better way to use (1) or (2), we're probably stuck with (3).


However, there is another problem: In our current application that will be the first consumer of the RiakEcto3-library, we store many 'Message'-objects whose primary key (== the Riak key) is a Snowflake (so they are roughly casually ordered). To allow people to paginate through these messages, we'd therefore like to perform a range-based query on this primary ID: Again, the fact that Solr takes a second before seeing the structures might pose a problem here (and because we are using the primary key anyway, using plain Riak, if possible, is probably a lot faster). Is this possible from the get-go, or do we also need to add a 'secondary index' for the 'primary index' to perform range-based queries? Are these kinds of queries and indexes possible to use together with Riak Data Types?



Thank you very much for your help!

~Wiebe-Marten / Qqwy

Martin Sumner

unread,
Mar 25, 2019, 7:48:04 AM3/25/19
to w...@resilia.nl, riak-users
My understanding is that we cannot currently use 2i (or links which are deprecated anyway) with CRDTs in Riak.  This I believe is due to a lack of a defined way of merging metadata in an eventually consistent way.

If you wanted to use CRDTs and 2i, you would need I think to manage the merging of the object yourself (i.e. use a normal Riak object, and put CRDT logic in your application) and managing the logic of the merge of metadata in your application as well as the managing the merging of the CRDT-based values.  This change would mean you would lose the flexibility of the CRDT API.

If you want a range query, then this can only be done with secondary indexes or solr at present.  There was a mapfold feature that was squeezed out of Riak 2.9.0 and postponed to 2.9.1 - and this feature allows you to fold over a range of object keys, passing in a function and an accumulator where the function is applied to the object metadata only.  It isn't a huge piece of work to complete mapfold, but it might be several months before it is in a release candidate.  Although, if the object is a CRDT, then there is no useful metadata to take advantage of this.

There are various other features we have discussed within the Riak community to extend and improve the flexibility and usability of CRDTs, and lots of work done by Russell Brown that has been left on the shelf.  The issue has not been the potential usefulness of the features, but the difficulty of getting funding from any of the current Riak customers to push this as a priority.  I do think it would be good for Riak to have better features available to help with building an ecto adapter, but actually making that happen is a different matter. 

--
You received this message because you are subscribed to the Google Groups "riak-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to riak-users+...@googlegroups.com.
To post to this group, send email to riak-...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/riak-users/47f3db62-7e5e-4ac2-a253-7bd684516f7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

w...@resilia.nl

unread,
Mar 28, 2019, 11:34:40 AM3/28/19
to riak-users
Thank you for your reply!

It turns out that while setting an index on a CRDT is impossible, it definitely is possible to use the `$key` 'secondary' index that LevelDB will auto-create when inserting values (even if they are CRDTs).
This means that we can fetch range-based (primary) keys using the 2i-syntax, using e.g.
:riakc_pb_socket.get_index(pid, {bucket_type, bucket}, "$key", "LEXICOGRAPHICAL_LOWER_BOUND", "LEXICOGRAPHICAL_UPPER_BOUND")
in Elixir or Erlang, or
curl 'localhost:8098/types/BUCKET_TYPE/buckets/BUCKET/index/$key/LEXICOGRAPHICAL_LOWER_BOUND/LEXICOGRAPHICAL_UPPER_BOUND'
using cURL,

which will find results right away rather than only after 'about a second' which would be the case for Solr.

So I guess then both of my issues are solved:

1. For relations, store the relation-data at both sides (as a plain key-in-a-register at the `belongs_to` side, and as registers in a Set at the `has_many` side)
2. For searching based on ranges of the primary index, use it for a 2i range query.
To unsubscribe from this group and stop receiving emails from it, send an email to riak-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages