new storage model for v2.2

109 views
Skip to first unread message

Dmitry Simonenko

unread,
Apr 11, 2016, 6:58:05 AM4/11/16
to Sophia database

Good day everyone.

I would like to make a small announcement about storage model changes in
upcoming version 2.2.

I've made a decision to switch from key-value/document storage types to a single
multi-field model.

Each field has a unique name, type and can be part of an index key. Index key
can be compound and include several fields.

The main motivation is to provide a simple to use model, which can be used
for a secondary indexes. Additionally new format allows to store field duplicates only
once per page (field compression).

Database scheme definition moved from db.name.index into new db.name.scheme. 

For example:
db.name.scheme.name = string,key
db.name.scheme.address = string

void *o = sp_document(name);
sp_setstring(o, "name", ...)
sp_setstring(o, "address", ...)

To maintain compatibility, if no scheme is defined it will be created as:

db.name.scheme.key = string,key
db.name.scheme.value = string

Hopefully no changes to API are expected.


Thanks,
Dmitry.

Mark Callaghan

unread,
Apr 11, 2016, 11:44:58 AM4/11/16
to Sophia database
Does this require a lot more detail?
Is it similar to what WiredTiger supports?
http://source.wiredtiger.com/1.6.5/schema.html

Dmitry Simonenko

unread,
Apr 11, 2016, 1:05:05 PM4/11/16
to Sophia database

As i can see It looks very similar, but it might have differences.

Sophia is a document storage.
Where key-value treated as a special case of document storage.
It is not a columnar storage, yet the scheme definition is very likely in some sense.
It is more closer to a row storage.

Basically, Sophia storage format allows to define field types in a very likely way
how WiredTiger does. Fields are part of document. Probably a more close comparison would be
to imagine a JSON document. It has named fields, but we only save values of those
(because we know the scheme and their types).

Yet, fields does not stored in the same order they are defined. Fixed sized fields (integers) are
grouped to save space, etc.

One of the the nice features of the new scheme if ability to store duplicate fields only once.

For example, by using the default scheme:

db.name.scheme.key = u32,key
db.name.scheme.value = string

We can save space by storing value field copies only once per range.
This is beside a common compression (lz4, etc).
Reply all
Reply to author
Forward
0 new messages