Scaling read performance

85 views
Skip to first unread message

Alex Mol

unread,
Aug 19, 2016, 7:53:56 PM8/19/16
to Sophia database
We've made some tests on 2.2. Write performance is really nice but read is not that good. 
We have three fields - sid(u64, key(0)), time(u32, key(1)) and value (48 byte). 
Read operation - cursor where we select 500 latest values for the sid in a specified time range. In a single thread.
We've got around 1500 read operations per second disregarding of the db size. And looks like this number was limited by CPU (htop showed 100% on a single core).
Then we run two readers in separate threads but performance suddenly drop to 400 RPS. Looks like sophia designed to work in a single thread (i'm not talking about compaction). Documentation doesn't say much about this. 
So, how can we scale read operations? One option i see is having several db files. Is there other alternatives?

Dmitry Simonenko

unread,
Aug 20, 2016, 4:07:52 AM8/20/16
to Sophia database

If I understood you correctly, you are trying to read 500 x 1500 = 750k rows per second? It looks like a highly CPU bound operation.
What is the dataset size? CPU usage should drop when dataset will be bigger than RAM.

Yes, Sorry. Sophia does not yet scale parallel operations. This is something that I'm working on.
Operations are multi-thread safe, but they don't scale. In fact they will lock each other, thats why there are seen performance drop.
It is possible to run several environments, but probably this is not exactly what you expect.

Alex Mol

unread,
Aug 20, 2016, 4:56:07 AM8/20/16
to Sophia database
Data set varies from 5 Gb up to 50 Gb. Looks like split data to a number of db is the only now. Thanks.

суббота, 20 августа 2016 г., 11:07:52 UTC+3 пользователь Dmitry Simonenko написал:

Alex Mol

unread,
Aug 20, 2016, 5:01:03 AM8/20/16
to Sophia database
>>> CPU usage should drop when dataset will be bigger than RAM.

Why is that? Are you saying that the less memory sophia has the less cpu it uses? 

суббота, 20 августа 2016 г., 11:56:07 UTC+3 пользователь Alex Mol написал:

Dmitry Simonenko

unread,
Aug 20, 2016, 5:16:33 AM8/20/16
to Sophia database
When a database size is lower then RAM, it probably sits in file system cache and all operations do very little actual IO. In some sense, when a database grows in size load
scenario might change from from CPU bound to IO bound. It is possible to enable O_DIRECT to see what actual read
performance might be, if we avoid using file system cache:

db.name..mmap = 0
db.name.direct_io = 1

Pavel Nevezhin

unread,
Aug 22, 2016, 4:33:50 AM8/22/16
to Sophia database
I've tried this. Program's behavior becomes unstable. I started getting "Segmentation fault" and "sophia/index/si_node.h:57: si_nodelock: Assertion `! (node->flags & 1)' failed." errors on filling and warmup db. Is this known issues or i have mistakes in make configuration?
I use sophia version_2.2
sp_setstring(env, "sophia.path", "_test", 0);
sp_setstring(env, "db", "test", 0);
sp_setint(env, "db.test.direct_io", 1);
sp_setint(env, "db.test.mmap", 0);

суббота, 20 августа 2016 г., 12:16:33 UTC+3 пользователь Dmitry Simonenko написал:

Dmitry Simonenko

unread,
Aug 22, 2016, 4:53:33 AM8/22/16
to Sophia database
No, this should not ever happen. Configuration looks fine. Can you please provide a simple test which repeats the issue, I will see what I can do.
Thanks

Pavel Nevezhin

unread,
Aug 22, 2016, 6:40:26 AM8/22/16
to Sophia database
#include <stdio.h>

#include "sophia/sophia.h"

int main(int argc, char *argv[])
{
(void)argc;
(void)argv;

void *env = sp_env();
sp_setstring(env, "sophia.path", "_test", 0);
sp_setstring(env, "db", "test", 0);
sp_setstring(env, "db.test.scheme", "uid", 0);
sp_setstring(env, "db.test.scheme", "tm", 0);
sp_setstring(env, "db.test.scheme", "value", 0);
sp_setstring(env, "db.test.scheme.uid", "u64,key(0)", 0);
sp_setstring(env, "db.test.scheme.tm", "u64,key(1)", 0);
sp_setstring(env, "db.test.scheme.value", "string", 0);

sp_setint(env, "db.test.direct_io", 1);
sp_setint(env, "db.test.mmap", 0);


void *db = sp_getobject(env, "db.test");
int rc = sp_open(env);
if (rc == -1)
goto error;

/* set */
uint64_t i = 0;
uint64_t k = 0;
uint32_t param1 = 100000;
uint32_t values = 1500;

for (i = 0; i <= param1; i++) {
void *tx = sp_begin(env);
uint64_t uid = i;
for (k = 0; k <= values; k++) {
void *o = sp_document(db);
char value[48];
sprintf(value, "%d", (int)k);
sp_setstring(o, "uid", &uid, sizeof(uid));
sp_setstring(o, "tm", &k, sizeof(k));
sp_setstring(o, "value", &value, sizeof(value));
rc = sp_set(tx, o);
if (rc == -1)
goto error;
}
sp_commit(tx);
}
/* finish work */
sp_destroy(env);
return 0;

error:;
int size;
char *error = sp_getstring(env, "sophia.error", &size);
printf("error: %s\n", error);
free(error);
sp_destroy(env);
return 1;
}

I hope this help you to detect problem. Thanks.

понедельник, 22 августа 2016 г., 11:53:33 UTC+3 пользователь Dmitry Simonenko написал:

Dmitry Simonenko

unread,
Aug 22, 2016, 6:50:27 AM8/22/16
to Sophia database

Dmitry Simonenko

unread,
Aug 22, 2016, 7:33:42 AM8/22/16
to Sophia database
Ok, i've pushed the fix. Please try again.

Pavel Nevezhin

unread,
Aug 22, 2016, 7:54:57 AM8/22/16
to Sophia database
Great! Now there are no more errors. Thanks.

понедельник, 22 августа 2016 г., 14:33:42 UTC+3 пользователь Dmitry Simonenko написал:
Reply all
Reply to author
Forward
0 new messages