Hi,
I'm still relatively new to KairosDB so please bear with me. I've read similar posts about read performance but I wanted to see if there was anything that stood out about what I'm seeing. I feel like there must be something basic we're missing here.
In our functionality test environment the resource usage of the systems running KairosDB and Cassandra all seem very good. We have a Cassandra cluster with 3 VMs, 8GB each with 2 cores. There are no disk reads at all by Cassandra, so for our small tests that should mean it is all in memory. The total Cassandra cluster CPU usage is less than 1 core when testing queries. KairosDB is on another system with 16GB ram and 8 cores. Again no resource issues I can tell from the reported "used_memory" and disk IO, CPU etc.
The test environment has been running for over 3 weeks (which I read is the partition size so that is important). It has very minimal data ingestion compared to what we hope to scale to.
1) When I started out the kairosdb.datastore.cassandra.key_query_time was <100ms, but is now 3~5 seconds. There is hardly any difference in datastore.query_time and http.query_time. So this would mean that most of the time is in this "phase_1" where KairosDB is getting all of the keys from Cassandra. I've posted some details on the size of the Cassandra data below. I'm wondering if this sounds reasonable for what we have? My understanding is that it is having to go through all of the keys for the 3 week period and filter out the ones it needs. I hear there are plans to improve this?
2) If I issue a query that has 5 tags, the total response time is the 3~5 seconds multiplied by the 5 tags, or 15~25 seconds. Am I correct that there is not a way to run these in parallel? If I issue multiple single tag queries to KairosDB at the same time only one runs. I have kairosdb.datastore.concurrentQueryThreads=5, but I'm not doing more than 5 (as far as I can tell). So without being able to run more than one query at a time our best option is to figure out how to speed up the one query that is running?
My concern is that we're not going to be able to get data out quick enough, especially as we start scaling. I'm starting to look into things like rollups. Any help would be appreciated, thanks!
cqlsh:kairosdb> select count(*) from data_points limit 10000000;
count 1,545,620
cqlsh:kairosdb> select count(*) from row_key_index limit 10000000;
count 610,695
cqlsh:kairosdb> select count(*) from row_key_index where key=textAsBlob('A');
count 202,518
cqlsh:kairosdb> select count(*) from row_key_index where key=textAsBlob('B');
count 202,519
cqlsh:kairosdb> select count(*) from row_key_index where key=textAsBlob('C');
count 202,520
nodetool cfstats for kairosdb is attached