Key space, value space and erlang interfaces

28 views
Skip to first unread message

adr...@gmail.com

unread,
Feb 14, 2013, 11:00:11 AM2/14/13
to scal...@googlegroups.com
Hello,

as I'm trying to make a db_hanoidb backend from db_toke I have some questions about keys, values and the interfaces to them.

In some stores (like for example Redis and hanoidb) keys are arbitrary byte arrays. This type is erlang's <<"binary">>. Values types vary.

Trying scalaris through api_tx:req_list(Tlog, List) I've been able to use any erlang term as value, not only binaries (with binaries as keys).
In scalaris doc main.pdf §4.1.1 "Supported types" I see:
Keys are always strings. In order to avoid problems with different encodings on different systems, we suggest to only use ASCII characters.
For values we distinguish between native, composite and custom types.

Q1: Are keys arbitrary arrays of ASCII characters ? or are they erlang lists of 4 or 8 bytes with values ranging from 0 to 127?
Q2: As far as arbitrary byte arrays are (easyly?) universaly ordered, can we use them as native keys to have an ordered map without having to think about what those bytes encode and leave this to applications makers? Wouldn't that broaden the use of scalaris?

Q3: I find it very cool that scalaris has rich value types. I have successfully natively stored tuples (very usefull for erlang apps) even if it is not explicitly written in the manual and "The use of them is discouraged". I like very much the ability to store integers and floats, and the "add to number" API. Why not harden non erlang APIs to do their conversions well and let erlang apps store usefull erlang terms? or except precise cases like integer+add_to_number() let every value be an arbitrary byte array and its API be app_to_binary() and binary_to_app()?

Q4: About implementing an alternative backend. As I understand api_tx:req_list(Tlog, List) is the erlang client interface for apps. Types are there ruled by §4.4.1. My candidate backend storage stores only binaries (as far as I understand) hence a need of term_to_binary to put in store and binary_to_term after retrieving from store. But what are keys and values at the db_store.erl level just before the backend API? I mean since api_tx:req_list(Tlog, List) there seems to be some walk in modules and types and I feel lost in a maze ;-)

Q5: Are there some key space management hints/tips? feedback from real cases/apps?

I hope nobody needs aspirin after reading this :-)

Pierre M.

Nico Kruber

unread,
Feb 15, 2013, 5:40:20 AM2/15/13
to scal...@googlegroups.com
On Thursday 14 Feb 2013 08:00:11 adr...@gmail.com wrote:
> Hello,
>
> as I'm trying to make a db_hanoidb backend from db_toke I have some
> questions about keys, values and the interfaces to them.
>
> In some stores (like for example Redis and hanoidb) keys are arbitrary byte
> arrays. This type is erlang's <<"binary">>. Values types vary.
>
> Trying scalaris through api_tx:req_list(Tlog, List) I've been able to use
> any erlang term as value, not only binaries (with binaries as keys).
> In scalaris doc main.pdf §4.1.1 "Supported types" I see:
> Keys are always strings. In order to avoid problems with different
> encodings on different systems, we suggest to only use ASCII characters.
> For values we distinguish between native, composite and custom types.

please note that this doc describes the client data types, i.e. in the client
APIs like api_tx
-> internally, at the scope of a DB implementation, things are a bit different
(ref. db_beh.hrl):
* a DB-key is defined as ?RT:key(), created by ?RT:hash_key/1 - you cannot
make assumptions on the exact type as this depends on the RT implementation!
* a DB-value is atom() | boolean() | number() | binary() - basically a
rdht_tx:encoded_value() as created by rdht_tx:encode_value/1
-> you should not make any assumptions on the value type though (if possible)
since the encoding might change in future

> Q1: Are keys arbitrary arrays of ASCII characters ? or are they erlang
> lists of 4 or 8 bytes with values ranging from 0 to 127?

client keys (client_key() type) are defined as string() - without enforcing
any restrictions on the range (see scalaris.hrl)

> Q2: As far as arbitrary byte arrays are (easyly?) universaly ordered, can
> we use them as native keys to have an ordered map without having to think
> about what those bytes encode and leave this to applications makers?
> Wouldn't that broaden the use of scalaris?

Client keys are hashed by the hash function provided by the routing table
which is able to spread the items in the key space (and thus among different
nodes) or can enforce a certain order.
I don't quite understand how changing the client keys would broaden the use...
it is just an identifier for a value.

> Q3: I find it very cool that scalaris has rich value types. I have
> successfully natively stored tuples (very usefull for erlang apps) even if
> it is not explicitly written in the manual and "The use of them is
> discouraged".

"discouraged" in terms of: "not aupported by all APIs"

> I like very much the ability to store integers and floats,
> and the "add to number" API. Why not harden non erlang APIs to do their
> conversions well and let erlang apps store usefull erlang terms? or except
> precise cases like integer+add_to_number() let every value be an arbitrary
> byte array and its API be app_to_binary() and binary_to_app()?

Any API can always read any value and needs to know the type it is expecting
it to be. If it is not, an appropriate error will be thrown.
If you only use e.g. the Erlang API, you can store whatever Erlang allows you
to - but don't expect other types than those described in the user/dev guide
to work in the other APIs.

> Q4: About implementing an alternative backend. As I understand
> api_tx:req_list(Tlog, List) is the erlang client interface for apps. Types
> are there ruled by §4.4.1. My candidate backend storage stores only
> binaries (as far as I understand) hence a need of term_to_binary to put in
> store and binary_to_term after retrieving from store. But what are keys and
> values at the db_store.erl level just before the backend API? I mean since
> api_tx:req_list(Tlog, List) there seems to be some walk in modules and
> types and I feel lost in a maze ;-)

see above - but yes, you need term_to_binary and binary_to_term just like
db_toke if you depend on binaries at the DB level

> Q5: Are there some key space management hints/tips? feedback from real
> cases/apps?

what do you mean by "key space management hints/tips"?


Nico
signature.asc

adr...@gmail.com

unread,
Feb 15, 2013, 6:27:37 AM2/15/13
to scal...@googlegroups.com
Thank you Nico for the explanations. Much appreciated.

I now see I was near sighted: I forgot there is a hashing ring between the app key space and the storage key space. Of course things are not as straight as I had in mind.
Suggestion: I would appreciate a schema or some saying in the documentation about the "travel" of keys and values from apps to storage forward and backward. The types and transformations (hashing...) would be easier to follow for me.

About "key space management hints/tips" I mean some code snippets or example applications showing key patterns at the application level for common application entities: user, session, catalog, cart, order (e-commerce example). This question is not scalaris specific unless there are some recommended uses and/or discouraged ones.

Pierre M.    (recovering from distributed hashing ;-)

Nico Kruber

unread,
Feb 15, 2013, 6:35:16 AM2/15/13
to scal...@googlegroups.com
On Friday 15 Feb 2013 03:27:37 adr...@gmail.com wrote:
> Thank you Nico for the explanations. Much appreciated.
>
> I now see I was near sighted: I forgot there is a hashing ring between the
> app key space and the storage key space. Of course things are not as
> straight as I had in mind.
> Suggestion: I would appreciate a schema or some saying in the documentation
> about the "travel" of keys and values from apps to storage forward and
> backward. The types and transformations (hashing...) would be easier to
> follow for me.

indeed, the documentation for the Scalaris developer (in contrast to a
developer simply using Scalaris) is quite thin...

> About "key space management hints/tips" I mean some code snippets or
> example applications showing key patterns at the application level for
> common application entities: user, session, catalog, cart, order
> (e-commerce example). This question is not scalaris specific unless there
> are some recommended uses and/or discouraged ones.

a common technique is to use prefixes or suffixes to separate certain
entities, e.g. user:<username>" for storing info for a user, "session:<id>"
etc.
Make sure, if you mix schemes, that domains of keys are always unique, i.e. no
key_for_session(id) can even be key_for_user(name)


Nico
signature.asc

adr...@gmail.com

unread,
Feb 15, 2013, 7:21:49 AM2/15/13
to scal...@googlegroups.com
Ok, thank you.
user:<username> makes me remind of something:
If I want to build indexes or reverse indexes, for example to find "user:pol" from its name "Paul" I will need a "revindex:user:name:Paul" key. And this "...:Paul" key has to be ok for the key space as "Paul" was an ok value for the value space. Isn't it? So for "indexable" data I need the value space to fit into the key space, right?
If user:pol uses non ASCII (say non latin chars) for its name I could be in trouble, couldn't I?
Unless keys are arbitrary binary arrays?-)

Pierre M.

Nico Kruber

unread,
Feb 15, 2013, 8:42:31 AM2/15/13
to scal...@googlegroups.com
Actually, I have been using UTF8 keys with the "Wiki on Scalaris" demo for
quite a while now and did not see any problems with special characters - maybe
this warning was too cautious and there are indeed no problems any more.

The only problem I can think of is that some API may not be able to
incorporate certain characters into its strings. Or if you put non-unicode
code points into the erlang string (recall that in Erlang a string is only a
list of integers in a certain range - not all are valid though, even if they
seem to work).

We could use binaries for keys and could still provide a "high-level" API with
strings which are normally easier to handle, especially in other programming
languages like Java, and more natural to the user.

But even in this case you would have to convert from the value space to the
key space if you want to use a value there (unless you always store binaries).
-> therefore, this would not broaden the use of Scalaris either


Nico

On Friday 15 Feb 2013 04:21:49 adr...@gmail.com wrote:
> Ok, thank you.
> user:<username> makes me remind of something:
> If I want to build indexes or reverse indexes, for example to find
> "user:pol" from its name "Paul" I will need a "revindex:user:name:Paul"
> key. And this "...:Paul" key has to be ok for the key space as "Paul" was
> an ok value for the value space. Isn't it? So for "indexable" data I need
> the value space to fit into the key space, right?
> If user:pol uses non ASCII (say non latin chars) for its name I could be in
> trouble, couldn't I?
> Unless keys are arbitrary binary arrays?-)
>
> Pierre M.
>
> Le vendredi 15 février 2013 12:35:16 UTC+1, Nico Kruber a écrit :
signature.asc
Reply all
Reply to author
Forward
0 new messages