Hi,
1) That depends on your query scenario, eg. your data locality, how much actual disk reads it would
take to build a report.
Sophia has an append-only design to handle intensive write-load (set/upsert never does disk read).
In the same time it is range-scan optimized and it is should be good for ordered queries, like timed-series, events, log-storage and
so on. If you intend to use database in a read-only mode (eg. import data and do only reads) it should
gurantee O(1) for random access. Performance should not degraded with big dataset.
Just for fun: I've been testing performance for sequential iteration using Sophia on a large dataset and made a
a)
I believe RocksDB can also be tuned to have a single LSM level and wait while compaction completes, then do queries.
Also, if you can do ordered data import (without random updates later) then a classical B-Tree might be your choice.
WiredTiger is worth to check out. It has support both for LSM and B-Tree.
Thanks!