Definition of the schema of Skydb for a Clojure clone?

80 views
Skip to first unread message

Jeroen van Dijk

unread,
Feb 12, 2014, 3:57:10 AM2/12/14
to sk...@googlegroups.com
Hi,

First of all thanks for Skydb. Last year I watched the presentation about it [1] and since a few weeks I have problem that I think can be analyzed/solved with something like Skydb.

Unfortunately I could't get skydb up and running (quickly) this time. I'm not a Go expert and I really wanted to use these concepts, therefore I started a Clojure clone of the main ideas behind Skydb. I think I did pretty well, I created something with decent performance (10 MM events/sec/core after optimizing and with a simple data model) and there is much potential for extensions e.g. running it on Hadoop.

The implementation I created uses a Clojure dsl of functions for the query engine [2]. It currently has a simple data format (maybe that explains the performance), so I wonder if there is a (formal) schema of the database tables which I can implement?

Thanks,
Jeroen

Ben Johnson

unread,
Feb 12, 2014, 9:37:55 AM2/12/14
to Jeroen van Dijk, sk...@googlegroups.com
hey Jeroen-

I started a Clojure clone of the main ideas behind Skydb
The Clojure port of Sky sounds great. I'd love to check it out sometime (although I'm not a Clojure dev myself).

Unfortunately I could't get skydb up and running (quickly) this time.
The current master branch of Sky points at v0.3.0 which is very old. The "unstable" branch is what's currently being used and desperately needs a release. We've been making a lot of changes and swapping out backends (LevelDB to LMDB and possibly Bolt in the near future) and query JIT compilation (LuaJIT to LLVM).

It currently has a simple data format (maybe that explains the performance), so I wonder if there is a (formal) schema of the database tables which I can implement?
There's not a formal schema for the database I can point you to. It's fairly simple though. We use LMDB's multi-value keys feature (aka DUPSORT) so we store one event per value and a key represents an object. The event itself stores an 8-byte big endian timestamp followed by a list of MessagePack encoded map. The map uses integer keys (to represent the property id) and a value (int, float, bool, string). The MessagePack encoding requires quite a bit of processing time since it's branch heavy but it allows us to compact our ints from 8-bytes to as little as 1-byte. It allows us to put a lot more data into memory.

Let me know if you have any specific questions and I can try to answer them. I wouldn't suggest trying to make your Clojure port binary compatible with Sky since it can be a moving target until Sky hits 1.0.

-- 
Ben
Reply all
Reply to author
Forward
0 new messages