About read-intensivity and front-end

45 views

Skip to first unread message

kura...@mail.ru

unread,

May 13, 2016, 3:42:20 AM5/13/16

to Sophia database

Good day!

1) Is Sophia read-intensive? I have a task to handle big-data dataset which I set one time only. But I have to read it and build reports many times. Would it be good to use Sophia for this?

1-a) Can you name databases for purpose for 1), please? :-) Because it's really difficult to find a DBMS which have "read-intensive" in its readme...

2) Does Sophia have some bindings (for Python, etc.) and/or net front-end?

Thanks for your work!

Dmitry Simonenko

unread,

May 13, 2016, 5:54:35 AM5/13/16

to Sophia database

Hi,

1) That depends on your query scenario, eg. your data locality, how much actual disk reads it would

take to build a report.

Sophia has an append-only design to handle intensive write-load (set/upsert never does disk read).

In the same time it is range-scan optimized and it is should be good for ordered queries, like timed-series, events, log-storage and

so on. If you intend to use database in a read-only mode (eg. import data and do only reads) it should

gurantee O(1) for random access. Performance should not degraded with big dataset.

Just for fun: I've been testing performance for sequential iteration using Sophia on a large dataset and made a

small video https://www.youtube.com/watch?v=m0YdY0eNrDA a year ago :)

I believe RocksDB can also be tuned to have a single LSM level and wait while compaction completes, then do queries.

As far as i can tell, similar technique been used during benchmarks: https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks

Also, if you can do ordered data import (without random updates later) then a classical B-Tree might be your choice.

WiredTiger is worth to check out. It has support both for LSM and B-Tree.

2) Yes. There are many drivers supported by community: http://sophia.systems/drivers.html