hanoidb as a scalaris disk backend ?

199 views
Skip to first unread message

adr...@gmail.com

unread,
Jan 8, 2013, 9:37:21 AM1/8/13
to scal...@googlegroups.com
Hello,

On http://code.google.com/p/scalaris/wiki/Tokyocabinet I read "Tokyocabinet can be used to store more data in a Scalaris node than would fit into its main memory".
Tokyocabinet seems deprecated at best in favor of Kyotocabinet. They are C (or C++?) projects.

Has anyone envisioned or tried or deployed scalaris with hanoidb as an alternative disk storage backend ? I'm wondering if hanoidb could fit the role of "storing (much) more than what would fit in RAM" while sticking to a pure Erlang (and simple?) solution.
According to https://github.com/krestenkrab/hanoidb it has ~2000 lines of pure erlang code in src.

I haven't found in the tokyoc wiki page neither in scalaris "main.pdf" guide enough beginner's hint to discover where the set/get/del hooks are and hence how to try to switch them from ram to tokyo or to something else. I know my question is a bit naive because it eludes the start/stop/transaction hooks which must be much tricker.

Thanks in advance for feedback on alternative backends, the crazyness or realism of the idea, experience with hanoidb (2TB/shard anyone?) and any info how to try to plug it.

Pierre M.

Florian Schintke

unread,
Jan 9, 2013, 9:34:41 AM1/9/13
to scal...@googlegroups.com
Hi,

it is relatively simple to write a new database backend for Scalaris.

You have to implement the behaviour db_beh.

See
src/db_common.hrl
src/db_ets.erl
src/db_toke.erl

for the database backends using Erlang ets and tokyo cabinet. If you
implement that for hanoidb, we also could add that to the
repository. The actually used database can be configured at compile
time using the ?DB macro.

Florian


[adr...@gmail.com]
> --
> You received this message because you are subscribed to the Google Groups "scalaris" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/scalaris/-/V3Ng_4TXRhIJ.
> To post to this group, send email to scal...@googlegroups.com.
> To unsubscribe from this group, send email to scalaris+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scalaris?hl=en.
>

Florian

adr...@gmail.com

unread,
Jan 18, 2013, 4:42:13 AM1/18/13
to scal...@googlegroups.com
Hello Florian,

thank you for your answer. Yes it seems "relatively simple" to bind an alternative database backend like hanoidb.
I had a quick look at the db_beh behaviour, db_common, db_ets and db_toke. (thank you for narrowing the backend hooks).

I thought I would only see put & del hooks but there are more. I see somewhat wrapper functions ending with underscores and I'm not sure I understand correctly the semantics of all operations (lack of scalaris knowledge and Erlang skills).
Even if it seems "relatively simple" it is not that simple for me. But I admit it must be easy for any Erlanger with some scalaris experience.
As you offer kindly to merge such hanoidb backend work I can't tell I can write db_hanoidb but hopefully give a testing try (hence feedback) to somebody's else db_hanoidb.

Pierre M.
Florian

Nico Kruber

unread,
Jan 18, 2013, 5:25:50 AM1/18/13
to scal...@googlegroups.com
Hi Pierre,

basically, what you need to do is to copy db_toke or db_ets and re-implement
the functions implemented there.
Most of the functions in the db_beh behaviour are implemented in db_beh.hrl by
calling their respective <name>_/<samearity> functions to make dialyzer happy.
So you don't need to re-implement the behaviour's functions but their
underscore counterparts.


Nico
> > [adr...@gmail.com <javascript:>]
> > Florian
signature.asc

adr...@gmail.com

unread,
Jan 22, 2013, 11:41:12 AM1/22/13
to scal...@googlegroups.com
Hello Nico,

thank you for this hint, narrowing even more.
I'll look again at db_toke and db_ets and try to derive db_hanoidb from a copy.
(or contribute as feedback tester of someone else's code - anybody else interested blasting the RAM barrier with fresh Erlang backend? ).

Pierre M.

adr...@gmail.com

unread,
Feb 15, 2013, 5:01:08 AM2/15/13
to scal...@googlegroups.com
Hello again,

here is a first try of db_hanoidb.erl. It doesn't pass all ct test suites, but it somewhat begins to work.
Quick howto:
copy db_hanoidb.erl to src/ (where db_ets and db_toke are)
alter include/scalaris.hrl line ~50+ to
-define(DB, db_hanoidb).
make clean
./configure
make
make doc
./bin/scalarisctl -e '-pa /path/to/hanoidb/ebin /path/to/hanoidb/deps/*/ebin' -m -n pre...@127.0.0.1 -p 14195 -y 8000 -s -f start
do some writes in the shell:
[ api_tx:write(<<"k",($0+X)>>, <<"v",($0+X)>>) || X <- lists:seq(5, 55) ].
do some reads to see them work:
api_tx:read(<<"k5">>). 
{ok,<<"v5">>}
api_tx:read(<<"k7">>).
{ok,<<"v7">>}

Still UNITTESTARGS='-pa /path/to/hanoidb/ebin /path/to/hanoidb/deps/*/ebin' make test
has issues.

I think I'm a bit lost in the keyspace ;-)
(key types, key types conversions, key intervals/ranges)

Thanks to "make test" I see losts of things on disk, some having more than 1Mb and good hanoidb patterns (like A9.data, A10.data, A13.data, B13.data, X13.data and nursery.log). But I don't know how to debug further. How could I get the DB handle from the scalaris shell prompt?

Hopefully somebody is interested in investigating this with me :-)
Have fun

Pierre M.
db_hanoidb.zip

adr...@gmail.com

unread,
Mar 12, 2013, 11:48:54 AM3/12/13
to scal...@googlegroups.com
Hello,

I'm stuck because I don't know where to look to diagnose anything.
api_tx seems to work from the shel to the disk. (Yeah!)
But CT make test has issues (and I'm not able to understand them).
What could I try ?

And side question : after successfull writes & reads with api_tx in the shell
-how do I clean stop the cluster (closing hanoidb without erasing) ?
-how do I restart the cluster with the existing recorded data files ?

Thanks for any help

Pierre M.

MM

unread,
Mar 20, 2013, 9:37:07 AM3/20/13
to scal...@googlegroups.com
But CT make test has issues (and I'm not able to understand them).
What could I try ?
Depends on the errors you get. In which test suites do they occur? How many errors are there? Can you paste some?
 
And side question : after successfull writes & reads with api_tx in the shell
-how do I clean stop the cluster (closing hanoidb without erasing) ?
-how do I restart the cluster with the existing recorded data files ?
 
I am not sure if it is currently possible to restart the cluster with an existing database. The behavior expects a db implementation to export open/1 to load such a file, but I can't find source code which uses that function. I think the join procedure (dht_node_join:finish_join_and_slide/6) should possible receive the value of ?DB:open/1 instead of ?DB:new/0.

Florian Schintke

unread,
Mar 20, 2013, 10:37:56 AM3/20/13
to scal...@googlegroups.com
Hi,

> > And side question : after successfull writes & reads with api_tx in the
> > shell
> > -how do I clean stop the cluster (closing hanoidb without erasing) ?
> > -how do I restart the cluster with the existing recorded data files ?

currently there is no support for persisting the data and use it on a
restart, a majority of servers has to be kept online.

See our FAQ entry on that topic:
https://code.google.com/p/scalaris/wiki/FAQ#Is_the_store_persisted_on_disk?

The failure model of Scalaris is crash-stop for the time being. We
currently work on ideas to support a crash-recovery failure
model. Then, it will be possible to persist the data to disk *and* to
use them on a restart.

If you have the persisted databases and you know that they reflect the
latest state (*) of Scalaris, you could load the dump into a newly
started Scalaris.

(*) this is the hard property, which we cannot guarantee at the moment.


Florian

adr...@gmail.com

unread,
May 21, 2013, 8:02:21 AM5/21/13
to scal...@googlegroups.com
Hello,

thank you both for your answers. I appreciate it.
I'm attaching a ZIP with my second attempt to use hanoidb instead of outdated toke as a larger than RAM disk backend.
The CT result with rev 4776 (if I'm not mistaken) is :
TEST COMPLETE, 472 ok, 27 failed, 62 skipped of 561 test cases
Skipped test are 62 (11 user/51 auto).
(CT with standard ETS had zero failed test).
I see somewhat 50M bytes of files in the hanoidb store, so encouraging :-)

I don't understand much about CT failing tests. The first I can see something related to hanoidb is this one :
snapshot_suite.test_rdht_tx_read_validate_should_abort (#495)

*** CT Error Notification 2013-05-17 18:16:08.718 ***
db_hanoidb:get_entry2_ failed on line 125
Reason: function_clause

*** User 2013-05-17 18:16:08.720 ***
####################################################
End snapshot_SUITE:test_rdht_tx_read_validate_should_abort -> {error, {function_clause, [{db_hanoidb,get_entry2_, [{db_3985071319,762318896,{762323088,0,0}}, 80325066489831061459460196859901989661], [{file,"src/db_hanoidb.erl"},{line,125}]}, {db_hanoidb,get_entry_,2,[{file,"src/db_common.hrl"},{line,41}]}, {rdht_tx_read,validate,3, [{file,"src/transactions/rdht_tx_read.erl"},{line,259}]}, {snapshot_SUITE,test_rdht_tx_read_validate_should_abort,1, [{file,"test/snapshot_SUITE.erl"},{line,146}]}, (snip)

There seems indeed to be some function signature mismatch :
get_entry2_({{DB, _FileName}, _Subscr, _SnapState}, Key)
seems called with
{db_3985071319,762318896,{762323088,0,0}}, 80325066489831061459460196859901989661
and the atom db_3985071319 can't match the tuple {DB, _FileName}.

I have made the db_hanoidb module from the toke module by taking care of same function signatures/API.
This is one of 27 errors reported by CT.
What can we do about this ? I feel both optimistic as I see scalaris filling the hanoidb backend (even on first try) and lost as I don't understand much of the surrounding API to avoid easy mistakes.

On the hard property of durability on disk : yes, I had read "Is_the_store_persisted_on_disk". My point was just a modest first step :
-have a use case of time windows where the application can refuse write requests (called planned maintenance mode, may be 15 minutes).
-use it to let all write transactions terminate.
-hence have each node and the entire cluster in a consistent read-only state.
-hence make some dump back up.
-allow to start from such a known consistent state rather than an empty database.

You are talking about crash-recovery on restart. I see "snapshot" feature in SVN. I feel scalaris is getting closer and closer to have larger than RAM backupable storage (at least first for a simple use case).

Have fun with all these interesting topics

Pierre M.
db_hanoidb_2nd_try.zip

Jan Fajerski

unread,
May 21, 2013, 11:18:32 AM5/21/13
to scal...@googlegroups.com
Hi Pierre,
thanks a lot for your contribution!
As for the CT fails in the snapshot_SUITE: it did not obey the defined DB macro
but instead defined its own TEST_DB macro (which was set to db_ets). As some
tests use the db directly as well as the transaction layer, it tried to use
db_ets and db_hanoi at the same time. No wonder that didn't work. SVN has a fix
for that.

The rest of the code look quiete nice. I have some comments on a few TODO and
FIXME comments but I would like to look a little closer at the code. Since I
will be on holiday starting next week, it'll probably take me till mid June, so
please have a little patience.

Again, thanks for the effort...we will keep you updated.

Best, Jan
> --
> You received this message because you are subscribed to the Google
> Groups "scalaris" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to scalaris+u...@googlegroups.com.
> To post to this group, send email to scal...@googlegroups.com.
> Visit this group at [2]http://groups.google.com/group/scalaris?hl=en.
> For more options, visit [3]https://groups.google.com/groups/opt_out.
>
> Verweise
>
> 1. https://code.google.com/p/scalaris/wiki/FAQ#Is_the_store_persisted_on_disk
> 2. http://groups.google.com/group/scalaris?hl=en
> 3. https://groups.google.com/groups/opt_out


adr...@gmail.com

unread,
May 21, 2013, 11:54:26 AM5/21/13
to scal...@googlegroups.com
Hi Jan,

thank you for your explanations (I feel less dumb now I know there is an db_ets mix-match) and thank you for your quick review.
I wish you very nice hollydays and wait for your come back mid june.

Pierre M.

adr...@gmail.com

unread,
May 23, 2013, 12:38:30 PM5/23/13
to scal...@googlegroups.com
Hello all,

minor attached update : after hardcoding some config of data directory the CT failed tests count against svn 4785 is down to only 14 (but with 51 auto skipped).

Have fun
P.M.
db_hanoidb_3rd_try.zip

adr...@gmail.com

unread,
Jun 20, 2013, 10:16:50 AM6/20/13
to scal...@googlegroups.com
Hello people,

has anybody tried the latest db_hanoidb disk backend ? How is it doing for you ?
What do you think about it ?
Can somebody explain the failed tests so I can try again a tentative db_hanoidb module ?

Have fun

Pierre M.

Florian Schintke

unread,
Jun 20, 2013, 11:40:18 AM6/20/13
to scal...@googlegroups.com
Hi,

sorry that this takes so long...

The reason is that we internally discuss and develop a major
restructuring of the db-layer, which besides other things will include
a much smaller backend-dependent part. We will use your example of
hanoidb as our test case and will - in the end - provide a working
version of it.

Cheers,
Florian

> Hello people,
>
> has anybody tried the latest db_hanoidb disk backend ? How is it doing for
> you ?
> What do you think about it ?
> Can somebody explain the failed tests so I can try again a tentative
> db_hanoidb module ?
>
> Have fun
>
> Pierre M.
>
>
> Le jeudi 23 mai 2013 18:38:30 UTC+2, adr...@gmail.com a �crit :
> >
> > Hello all,
> >
> > minor attached update : after hardcoding some config of data directory the
> > CT failed tests count against svn 4785 is down to only 14 (but with 51 auto
> > skipped).
> >
> > Have fun
> > P.M.
> >
>
> --
> You received this message because you are subscribed to the Google Groups "scalaris" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scalaris+u...@googlegroups.com.
> To post to this group, send email to scal...@googlegroups.com.
> Visit this group at http://groups.google.com/group/scalaris.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Florian

adr...@gmail.com

unread,
Jun 20, 2013, 4:42:47 PM6/20/13
to scal...@googlegroups.com
Hello again,

something smaller, simpler, with the hanoidb example as a canonical test case ?
Wow this is very good news !
I'm happy to have contributed something somewhat usefull ;-)
Count me in for testing, hopefully soon.

Good night

Pierre M.

adr...@gmail.com

unread,
Jun 21, 2013, 9:19:02 AM6/21/13
to scal...@googlegroups.com
Hi again,

BTW, as you are restructuring the db-backend layer, something comes to my mind:
What about having the backend type (ets, kyoto, hanoidb, whatever) configured at runtime rather than hard coded at compile time ? I think it makes things simpler : same build for all, same software to deploy for all, build and use cases separated, easier to swap backends (for research, tests, discovery, fun).

I had forgotten until now to report that I feel unconfortable with the need-patch-recompile-redeploy to try an alternative backend. I prefer to alter config files and share the build with all other users.

Have fun researching and restructuring

Pierre M.

Nico Kruber

unread,
Jun 25, 2013, 8:28:00 AM6/25/13
to scal...@googlegroups.com
Performance-wise it is probably better to have it hard-coded, especially on
the hot-path.

However, I can think of a way of packaging the DB-dependent modules separately
So if you install it via the packages, you can choose quite easily.


Nico
--
You received this message because you are subscribed to the Google Groups
"scalaris" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to scalaris+u...@googlegroups.com.
To post to this group, send email to scal...@googlegroups.com.
Visit this group at http://groups.google.com/group/scalaris.
For more options, visit https://groups.google.com/groups/opt_out.






--
Nico Kruber
phone: +49 30 84185-253
-----
Zuse Institute Berlin
Takustr. 7
D-14195 Berlin
signature.asc

adr...@gmail.com

unread,
Jul 9, 2013, 1:02:33 PM7/9/13
to scal...@googlegroups.com
Thank you for your answer. Yes, as I was disk-minded, I didn't thought of a performance hot-path issue by a little "case Backend of' : the potential disk seek latency (8ms) is much more than the backend selection switch (some ns?).
Hopefully you have a "work with" if not a "work around".

BTW Basho doesn't seem that scared about such a performance issue on the hot-path : the backends are configured at runtime. One is memory only (like native scalaris), one is high throughoutput (bitcask), one is high volume (eleveldb like hanoidb). And one Riak cluster can mix 'buckets' of several backends. So hopefully the RAM-only one is not that impeded by the switch on the hot path.

Have fun

Nico Kruber

unread,
Jul 9, 2013, 2:18:24 PM7/9/13
to scal...@googlegroups.com
We have been thinking about this after I wrote my last email and we will have
a look once the DB redesign it out and evaluate the drawbacks, if any.

Nico
signature.asc

adr...@gmail.com

unread,
Jul 19, 2013, 12:26:20 PM7/19/13
to scal...@googlegroups.com
Hello again,

hopefully there will be no drawback with the backend switch.

I've just had a look at the changelog to happyly discover r4947 "new backend db implementation". This is exciting :-)
I've quickly read backend_beh.erl db_ets.erl and db_ets_SUITE.erl after the commit message. It looks clean and understandable for me (your refactoring simplification is bearing fruit, well done).

My little feedback :
-in the behaviour I see new/1 and close/1. I'd like to also have open/1 to start from non empty dataset (sorry for being so disk backend minded, he he). And after all even an existing ets table can be "opened", uh?
-I hope I (or somebody else) will manage to map foldl & foldr to hanoidb:fold_range in a db_hanoidb alternative (disklargerthanRAM-)backend. As db_ets uses ordered_set and hanoidb is ordered sets it doesn't look crazy.
-in db_ets_SUITE the tests seems to be against ets, not against db_ets. I assume this is a placeholder before actual db_ets tests are written ? I would need such a template for db_hanoidb_SUITE.

Have fun celebrating this r4947, nice week end :-)

Pierre

Jan Fajerski

unread,
Jul 19, 2013, 1:29:34 PM7/19/13
to scal...@googlegroups.com
Hi Pierre,

glad you like the refactoring at first glance.
We will continue to work on the backend (among other stuff of course) and we are
looking forward to incorporate other backends than ets.
As for your feedback:

> -in the behaviour I see new/1 and close/1. I'd like to also have open/1
> to start from non empty dataset (sorry for being so disk backend
> minded, he he). And after all even an existing ets table can be
> "opened", uh?
Good point. I'll add it.

> -I hope I (or somebody else) will manage to map foldl & foldr to
> hanoidb:fold_range in a db_hanoidb alternative
> (disklargerthanRAM-)backend. As db_ets uses ordered_set and hanoidb is
> ordered sets it doesn't look crazy.
It does sound quite compatible. If you have any trouble let me know. We were
going to look at haniodb in the future anyway, but we welcome your expertise of
course.

> -in db_ets_SUITE the tests seems to be against ets, not against db_ets.
> I assume this is a placeholder before actual db_ets tests are written ?
> I would need such a template for db_hanoidb_SUITE.
I have just commited a little change in the structure of the backend unit tests.
There is now a db_ets_SUITE again (it was db_backend_SUITE before) and it should
provide a good template for other backend implementations.
I think you can just copy the whole file and swap ets for haniodb.
The value_creator and type_checker functions registered in init_per_suit/1 in
db_ets_SUITE.erl should be compatible to hanoidb.

Thanks for the feedback.
Have a good weekend,
Jan

adr...@gmail.com

unread,
Oct 31, 2013, 2:07:50 PM10/31/13
to scal...@googlegroups.com
Hello all,

here is an update for v0.6.1+svn 5666. It is a NOT YET tested db_hanoidb.erl made after latest db_toke.
It seems you have done a very good refactoring job as the API is much simpler. Hence also less error prone.
I have a feeling that the alternative backend feature benefits to scalaris even for users that don't use it. It works as a quality indicator for me.

I don't know how to test my new db_hanoidb.erl : I don't know how to configure enable-hanoidb=/path and I haven't yet looked at db_ets_SUITE to write tests. (learning slowly one thing at a time)
BTW as indicated in my comments in the code may be some things could be bypassed/optimized by relying more (in the driver) on hanoidb "ranges" than on scalaris "intervals". Has anybody a feeling about that ? Could one of these two close concepts be blended/converted to the other to leverage the use of native functions ?
Sorry for my poor-less-than-academic research level. I hope you enjoy it anyway.

Thanks in advance for all your feedback.
Have fun

Pierre
db_hanoidb_r5666.zip

Jan Fajerski

unread,
Nov 6, 2013, 7:26:51 AM11/6/13
to scal...@googlegroups.com
Hi Pierre,
sorry for the long response time. I have looked at you code and it does look
pretty good but I haven't been able to test it yet. I am having some trouble
adding the required paths (everything that is in hanoidb/deps) the the configure
script. I have been able to start it from the scalaris console...so it does run.
It just needs proper integration. ;)
Testing should be fairly simple once hanoidb is correctly integrated. you can
just include db_backend_SUITE.hrl into a db_hanoidb_SUITE.erl to get a basic
test suite. Some tests should be added (for example for opening existing files)
directly to db_hanoidb_SUITE.erl

I sure we can integrate the hanoidb ranges with scalaris intervals. It is mostly
a question of what order hanoidb maintains on the keys. I'll have a
look into it once I got hanoidb running correctly.

thanks for your work...I'll keep you posted.

Best,
Jan
> --
> You received this message because you are subscribed to the Google
> Groups "scalaris" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to scalaris+u...@googlegroups.com.
> To post to this group, send email to scal...@googlegroups.com.
> Visit this group at [1]http://groups.google.com/group/scalaris.
> For more options, visit [2]https://groups.google.com/groups/opt_out.
>
> Verweise
>
> 1. http://groups.google.com/group/scalaris
> 2. https://groups.google.com/groups/opt_out


adr...@gmail.com

unread,
Nov 6, 2013, 10:31:09 AM11/6/13
to scal...@googlegroups.com
Hello again,

thank you very much. I appreciate your integration effort.
I've tentatively written a first db_hanoidb_SUITE.erl paraphrasing db_toke_SUITE.erl
I hope this first suite can run from the console. Enjoy the tests.

I think I can learn more about scalaris with the coming research about scalaris intervals and hanoidb ranges.
Have fun integrating, testing and researching

Pierre
db_hanoidb_SUITE.zip

adr...@gmail.com

unread,
Dec 18, 2013, 9:04:03 AM12/18/13
to scal...@googlegroups.com
Hi all !

I try a little bump here. Is it time for some feedback ?
How is the db_hanoidb backend doing ? What about its tentative hanoidb_SUITE ?
How is integration going ? any chance of landing in svn as experimental ?
Does the module need some work/test from me ?
What about the status of the study of scalaris:intervals and hanoidb:ranges ?
I hope this dataset-larger-than-RAM backend is in good shape.

Have fun

Pierre M.

Jan Fajerski

unread,
Jan 6, 2014, 9:21:51 AM1/6/14
to scal...@googlegroups.com
Bon Jour Pierre,

I just realized I have never answered you. I am so sorry...I simply forgot.
Anyway, I hope the new year finds you well. I have looked at the hanoidb
backend.
I did find a rather substantial problem. Currently we expect a backend to return
data in erlang term order. This is of course very important when doing a
fold[lr] over a certain interval.
hanoidb unfortunatly does not order its terms by erlang term order but rather
uses a string ordering (I guess because it stores data as binary strings).
I.e. we expect a backend to treat foo > empty as true but hanoi would consider
empty to be smaller.

Maybe you know a way to configure this behaviour in hanoidb?

Best,
Jan

adr...@gmail.com

unread,
Jan 9, 2014, 12:15:25 PM1/9/14
to scal...@googlegroups.com
Hello and happy new year !

It's cool you have looked at the hanoidb backend. And it is good that you have found this "substantial" issue.
I've just tried this in an erlang shell :
foo > empty -> true
"foo" > "empty" -> true
<<"foo">> > <<"empty">> -> true
empty is always smaller than foo, either atom, string or binary.
I don't understand why Erlang term ordering matters in scalaris or hanoidb : scalaris wants keys and values only as erlang strings, hanoidb wants then only as arbitrary binaries and "empty" is smaller than "foo" as <<"empty">> is smaller than <<"foo">> as empty is smaller than foo. The order seems the same (and Erlang's) when comparing string to string or bin to bin.
I'm not aware of a "configuration" of hanoidb to alter ordering.
What am I missing ?
How is the scalaris-to-backend boundary crossed ? Does it involve multi type comparisons ?
What is the backend module missing to bind scalaris intervals and hanoidb ranges ?

Happy new year to the scalaris Team !

Pierre M.

Jan Fajerski

unread,
Jan 20, 2014, 11:19:05 AM1/20/14
to scal...@googlegroups.com
Hi Pierre,
yes you are correct concerning the erlang term order. bad argumentation on my
part.
let me illustrate the problem wit an example:
you can find the test code in test/db_backend_SUITE.hrl:95-123. It tests the
foldl implementation. the inputs are random generated, also please note that on
the backend level a much broader range of possible values is allowed compared to
the actual db_dht (please see src/db_backend.beh).
in this particular instance the test is called like so:
db_hanoidb_SUITE:prop_foldl([{{}},{42},{42},{-1.3336411304950038},{0.0}],
all,
4)
So the 4 tuples are stored (the second {42} is ignored). The expected results
after foldl are the same one would get with lists:foldl if you ignore the
interval and the result limiter.
with hanoidb the result for foldl/3 is:
[{{}},{42},{-1.3336411304950038},{0.0}] whereas the expected result is
[{{}},{42},{0.0},{-1.3336411304950038}] which respects erlang term order
(although reversed because of foldl)

so when I said hanoidb does not repect erlang term order I was suspecting
hanoidb encodes (some) data it stores and sorts that encoded data. the binary
suspicion came from an earlier test where the data happened to be returned in
string order rather that natural order, i.e. 1, 10, 2, 21 instead of 1, 2, 10, 21

so I am not really sure where the problem stems from but there is one.
I hope that made it clearer.
Best,
Jan
> On Wed, Dec 18, 2013 at 06:04:03AM -0800, [1]adr...@gmail.com wrote:
> > Hi all !
> > I try a little bump here. Is it time for some feedback ?
> > How is the db_hanoidb backend doing ? What about its tentative
> > hanoidb_SUITE ?
> > How is integration going ? any chance of landing in svn as
> experimental
> > ?
> > Does the module need some work/test from me ?
> > What about the status of the study of scalaris:intervals and
> > hanoidb:ranges ?
> > I hope this dataset-larger-than-RAM backend is in good shape.
> > Have fun
> > Pierre M.
>
> --
> You received this message because you are subscribed to the Google
> Groups "scalaris" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to scalaris+u...@googlegroups.com.
> To post to this group, send email to scal...@googlegroups.com.
> Visit this group at [2]http://groups.google.com/group/scalaris.
> For more options, visit [3]https://groups.google.com/groups/opt_out.
>
> Verweise
>
> 1. javascript:/
> 2. http://groups.google.com/group/scalaris
> 3. https://groups.google.com/groups/opt_out

adr...@gmail.com

unread,
Jan 30, 2014, 6:49:58 PM1/30/14
to scal...@googlegroups.com
Hi Jan (& all)

thank you for illustrating the issue. I'm less confused ;-)

So I understand than my hanoidb backend driver compiles and passes all tests except the fold ones. I appreciate this step. I call it a first achievement :-)

Despite your nice explanation of the test suite there is (again) something (new) I don't understand : what are those tuples() ? In the manual I had read scalaris is all about string() key and values. (the reason for my questions in threads about key space and value space or simple binaries and issues for secondary indexes).

WHAT IF the test suite used only binary() as keys and as values ?
Wouldn't it work right now if used 'à la Redis' with keys and values being simple arbitrary binaries ?
If yes, wouldn't it make a nice use case for lots of users ?

Does the current hanoidb driver work right now(TM) as a memory-cached disk-backed binary/binary K/V store ?
If yes I suggest its merge as experimental with a 'binary/binary' only warning in the manual/release notes and a restricted db_hanoidb_SUITE with big warnings too.

Does that make sense ? Did I miss something else ?
Have fun

Pierre M.

adr...@gmail.com

unread,
Mar 10, 2014, 12:42:46 PM3/10/14
to scal...@googlegroups.com
Hello,

gentle bump ;-)
WHAT IF keys and values are both only binary() ?
Does the back end work for this use case ?
Have fun

P.

Jan Fajerski

unread,
Mar 11, 2014, 9:22:36 AM3/11/14
to scal...@googlegroups.com
Hi Pierre,
as usual my apologies for my glacier slow reaction time ;)

as to your questions: the value as a binary is not a problem. scalaris allows
any term for values. a key must allways be a string. this contraint comes from
the routing table implementation and its hash function. So to use {binary(),
binary()} in scalaris would require quite extensive changes.

the main reason (I think) for not supporting hanoidb is actually the "new"
backend abstraction that was introduced with the last release. a backend has to
implement the db_backend_beh.erl behaviour. the very permissive spec for keys
and values comes from there (that keys as well as values can be any term).
this was done so backends (such as db_ets.erl) can be used elsewhere in the
system (see db_prbr.erl for example) and as a preparation for future developments of the
storage layer. so because hanoidb does not pass our unit tests we are not
planning to support it.

BUT

as far as I can see hanoidb should be alright for use in db_dht.erl
so what you could do is use your db_hanoidb.erl in db_dht and run the
db_dht_SUITE.erl in the test folder. this _should_ work if db_hanoidb is
implemented correctly (the problems exposed by db_ets_SUITE should not occur
since db_dht assumes keys are strings).

This could be a workaround for your problem. Unfortunatly I don't see hanoidb
making its ways into regular scalaris releases anytime soon.

I hope that is some help to you.

Best,
Jan

PS: I just did a quick run of the above mentioned workaround. It seems to be
working ok. I got quite a few failed tests...but they where due to timeouts,
which is to be expected. just give db_dht_SUITE.erl more time for the
individual test cases and see how ot goes. I have attached the db_hanoidb.erl I
used.
> --
> You received this message because you are subscribed to the Google
> Groups "scalaris" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [1]scalaris+u...@googlegroups.com.
> To post to this group, send email to [2]scal...@googlegroups.com.
> Visit this group at [3]http://groups.google.com/group/scalaris.
> For more options, visit [4]https://groups.google.com/d/optout.
>
> Verweise
>
> 1. mailto:scalaris+u...@googlegroups.com
> 2. mailto:scal...@googlegroups.com
> 3. http://groups.google.com/group/scalaris
> 4. https://groups.google.com/d/optout
db_hanoidb.erl

adr...@gmail.com

unread,
Mar 20, 2014, 11:14:28 AM3/20/14
to scal...@googlegroups.com
Bonjour Jan & Co.

Thanks for answering.
So if I understand well you have found a workaround to make your (slightly modified) db_hanoidb.erl work on a restricted binary/binary k/v feature set ? Sounds cool. Hopefully the driver and some demo howto will make it in a release as experimental. With appropriate warnings like "Do NOT use this, it is still very alpha, it will burn your home and eat kitten".

Even if things seems to go forward I've not the big picture in sight. Issue details are confusing me. I'd like to see the pieces reframed together to understand better.
I'd like to see the dataflow from frontend API to backend implementation. And especially how the front/back boundary is crossed. Which module is responsible for what and what types implications.
Either I've missed or misunderstood something in the current documentation, or I'm confused with db_dht's arrival in the thread. I thought everything was about db_backend_beh.
So...
which module receives k/v API calls ? api_tx?
which module hashes k and DHT forwards k/v ?
why is the hashing function that sensitive to k & v types ? (and where is this function btw?)
what about the remaing steps (db_dht? db_backend_beh? other?) to the backend ?

I hope things can get simpler in my mind.
Have fun designing coding documenting and testing

Pierre M.
Reply all
Reply to author
Forward
0 new messages