new storage model for v2.2

Dmitry Simonenko

unread,

Apr 11, 2016, 6:58:05 AM4/11/16

to Sophia database

Good day everyone.

I would like to make a small announcement about storage model changes in

upcoming version 2.2.

I've made a decision to switch from key-value/document storage types to a single

multi-field model.

Each field has a unique name, type and can be part of an index key. Index key

can be compound and include several fields.

The main motivation is to provide a simple to use model, which can be used

for a secondary indexes. Additionally new format allows to store field duplicates only

once per page (field compression).

Database scheme definition moved from db.name.index into new db.name.scheme.

For example:

db.name.scheme.name = string,key
db.name.scheme.address = string

void *o = sp_document(name);

sp_setstring(o, "name", ...)

sp_setstring(o, "address", ...)

To maintain compatibility, if no scheme is defined it will be created as:

db.name.scheme.key = string,key
db.name.scheme.value = string

Hopefully no changes to API are expected.

master branch commit: https://github.com/pmwkaa/sophia/commit/0bb6719910781393e855b11e1f5747e28362d16c

Thanks,

Dmitry.

Mark Callaghan

unread,

Apr 11, 2016, 11:44:58 AM4/11/16

to Sophia database

Does this require a lot more detail?
Is it similar to what WiredTiger supports?
http://source.wiredtiger.com/1.6.5/schema.html

Dmitry Simonenko

unread,

Apr 11, 2016, 1:05:05 PM4/11/16

to Sophia database

As i can see It looks very similar, but it might have differences.

Sophia is a document storage.

Where key-value treated as a special case of document storage.

It is not a columnar storage, yet the scheme definition is very likely in some sense.

It is more closer to a row storage.

Basically, Sophia storage format allows to define field types in a very likely way

how WiredTiger does. Fields are part of document. Probably a more close comparison would be

to imagine a JSON document. It has named fields, but we only save values of those

(because we know the scheme and their types).

Yet, fields does not stored in the same order they are defined. Fixed sized fields (integers) are

grouped to save space, etc.

One of the the nice features of the new scheme if ability to store duplicate fields only once.

For example, by using the default scheme:

db.name.scheme.key = u32,key

db.name.scheme.value = string

We can save space by storing value field copies only once per range.

This is beside a common compression (lz4, etc).

Reply all

Reply to author

Forward