KairosDB read performance, cassandra.key_query

Riley Zimmerman

unread,

Feb 23, 2016, 11:05:40 AM2/23/16

to KairosDB

Hi,

I'm still relatively new to KairosDB so please bear with me. I've read similar posts about read performance but I wanted to see if there was anything that stood out about what I'm seeing. I feel like there must be something basic we're missing here.

In our functionality test environment the resource usage of the systems running KairosDB and Cassandra all seem very good. We have a Cassandra cluster with 3 VMs, 8GB each with 2 cores. There are no disk reads at all by Cassandra, so for our small tests that should mean it is all in memory. The total Cassandra cluster CPU usage is less than 1 core when testing queries. KairosDB is on another system with 16GB ram and 8 cores. Again no resource issues I can tell from the reported "used_memory" and disk IO, CPU etc.
The test environment has been running for over 3 weeks (which I read is the partition size so that is important). It has very minimal data ingestion compared to what we hope to scale to.

1) When I started out the kairosdb.datastore.cassandra.key_query_time was <100ms, but is now 3~5 seconds. There is hardly any difference in datastore.query_time and http.query_time. So this would mean that most of the time is in this "phase_1" where KairosDB is getting all of the keys from Cassandra. I've posted some details on the size of the Cassandra data below. I'm wondering if this sounds reasonable for what we have? My understanding is that it is having to go through all of the keys for the 3 week period and filter out the ones it needs. I hear there are plans to improve this?

2) If I issue a query that has 5 tags, the total response time is the 3~5 seconds multiplied by the 5 tags, or 15~25 seconds. Am I correct that there is not a way to run these in parallel? If I issue multiple single tag queries to KairosDB at the same time only one runs. I have kairosdb.datastore.concurrentQueryThreads=5, but I'm not doing more than 5 (as far as I can tell). So without being able to run more than one query at a time our best option is to figure out how to speed up the one query that is running?

My concern is that we're not going to be able to get data out quick enough, especially as we start scaling. I'm starting to look into things like rollups. Any help would be appreciated, thanks!

cqlsh:kairosdb> select count(*) from data_points limit 10000000;

count 1,545,620

cqlsh:kairosdb> select count(*) from row_key_index limit 10000000;

count 610,695

cqlsh:kairosdb> select count(*) from row_key_index where key=textAsBlob('A');

count 202,518

cqlsh:kairosdb> select count(*) from row_key_index where key=textAsBlob('B');

count 202,519

cqlsh:kairosdb> select count(*) from row_key_index where key=textAsBlob('C');

count 202,520

nodetool cfstats for kairosdb is attached

nodetool cfstats.txt

Riley Zimmerman

unread,

Feb 23, 2016, 2:09:08 PM2/23/16

to KairosDB

Another data point is that there are 116,813 rows in string_index in Cassandra.

Noorul Islam K M

unread,

Feb 23, 2016, 4:34:53 PM2/23/16

to Riley Zimmerman, KairosDB

We are also having this problem. In your scenario it means that it needs
to read 202, 520 partitions from data_points for simple query. Cassandra
reads are always slow when it has to read from many partitions. Did your
performance improve if you do some filtering? Also number of partitions
is relative to number of tags and tag values.

Thanks and Regards
Noorul

Brian Hawkins

unread,

Feb 25, 2016, 9:33:57 AM2/25/16

to KairosDB, rzimm...@gmail.com

Just to be clear. The number of tags/values in one metric does not have an affect on queries for any other metric. Kairos only retrieves the keys for the metric for which you queried. Also queries that are sent in a single request are ran serially. If you send them as separate requests they will be processed in parallel.

Brian

On Tuesday, February 23, 2016 at 2:34:53 PM UTC-7, Noorul Islam Kamal Malmiyoda wrote:

Riley Zimmerman writes:

Riley Zimmerman

unread,

Feb 25, 2016, 11:27:00 AM2/25/16

to KairosDB, rzimm...@gmail.com

Hi,

We've made a change to our tags and saw a huge improvement in read request response time. They are down from 15~25 seconds to < 50ms now.

We had a lot of tags for a specific metric which increased the number of keys generated in Cassandra. Consequently it increased the time to execute the phase_1. The problem with our old schema was the inclusion of both the timestamp and the uuid in the tags, which meant our tag count would continue to grow indefinitely. Our solution was to remove the timestamp as a tag and also to use the uuid as part of the metric name.

Praveen Agrawal

unread,

Mar 5, 2016, 2:10:41 AM3/5/16

to KairosDB, rzimm...@gmail.com

Hi Riley,

Thanks for sharing your solution, it helps a lot. I have a quick question on it:

I can understand removing the timestamp as a tag will reduce the number of keys but uuid wether as tag or part of metric still means the same number of rows? Am I missing something?

Cheers.

Brian Hawkins

unread,

Mar 6, 2016, 10:13:42 AM3/6/16

to KairosDB, rzimm...@gmail.com

I'm sorry if it isn't clear about what should and should not be used as tag values. Basically you need to stick with a finite set and low cardinality. Others on this list have gained performance as well by taking out a tag and making it part of the metric name, especially if that tag has a lot of values.

We have plans for querying across metrics so you will gain back the ability to aggregate across those different UUID's at least in a limited fashion.

Brian

Praveen Agrawal

unread,

Mar 6, 2016, 11:54:31 PM3/6/16

to KairosDB, rzimm...@gmail.com

Hi Brian,

Thanks for the response.

My understanding is that the tag=value row will NOT be created till the time the specific tag=value is actually inserted.

e.g, lets say I have a metric1 with tag1(cardinality c1), t2(c2), t3(c3) & t4(c4). So, potentially I can have c1*c2*c3*c4 rows but each of these will be created only when the data comes in for that tag=value pair.

Now, coming back to original question if tag=value is part of metric name or tag how would it matter because both will end up creating a new row only at the time of insertion. Am I missing something?

Cheers.

Riley Zimmerman

unread,

Mar 7, 2016, 9:14:58 AM3/7/16

to KairosDB, rzimm...@gmail.com

Hi Praveen,

You are right, removing the timestamp was the majority of the solution.

We also changed the way we store the uuid tag to experiment with the trade-off of phase_1 vs phase_2 response times. We have 4 main metrics we look up right now, so we are using tags: uuid.A, uuid.B, uuid.C and uuid.D. This results in 4x more tags than if we just had a tag for uuid, but our understanding is it will reduce the phase_2 time because the results will already be narrowed down by 4x. I'm afraid I don't have any conclusive results yet (it will probably be better to take the hit in phase_2 for us), although I'm sure our specific case would be unique from yours anyway.

Thanks as well Brian. Looking back I think the docs and forum are fairly clear on that point. We're just getting up to speed with a lot of new open source technologies at once and clearly didn't catch that.

Brian Hawkins

unread,

Mar 9, 2016, 12:23:16 PM3/9/16

to KairosDB, rzimm...@gmail.com

Praveen,

Your assumptions are correct. The data is inserted into the same number of rows either way. So for writing the data it really doesn't matter. This can be misleading for people because they see no performance problems sticking the data in. The issue is that Kairos has to maintain an index for tags and because tags are multi dimensional the index is really crappy (Yes I said that). The thing is the index works fine as long as your tag cardinality doesn't get crazy big. When you have a lot of tag values the index gets really big causing a slow down in query performance because we have to read in that index first.

Also in this case they were putting a timestamp in the tag value (major no no).

So the downside of pulling tag values out and putting them in the metric name is that you cannot aggregate across them when querying the data. We do have a feature request to query across multiple metrics which will alleviate this issue.

Brian

Riley Zimmerman

unread,

Mar 21, 2016, 4:45:18 PM3/21/16

to KairosDB, rzimm...@gmail.com

Hi,

I think some of my confusing was coming from the wiki example having 2 customers and 2 server as the example numbers. 2+2=4, but so does 2*2=4, which makes me uncertain which it is when scaled up to larger numbers.

https://github.com/kairosdb/kairosdb/wiki/Query-Performance

In the example there are two tags, customer and host. Lets expand it and say I have 4 customers custA custB custC custD and 5 servers server1 server2 server3 server4 server5. The data will be written to 4*5=20 partitions (every 3 weeks), right? Or is it actually 4+5=9 partitions? The number of partitions here is also the number of index keys read back during a query, so it is what determines the phase_1 speed.

Also, lets say I merge my tags and make it custA_server1, custA_server2, custA_server3, custA_server4, custA_server5, custB_server1, custB_server2 ... custD_server5. Should I expect to gain (or lose) any performance by this vs the two separate tags before?

Thanks again!

Brian Hawkins

unread,

Mar 29, 2016, 9:41:16 PM3/29/16

to KairosDB, rzimm...@gmail.com

Good catch on the docs, I'll change it. The answer is that it multiplies. So in your case of 4 and 5 it would be 20 partitions and 20 key entries every 3 weeks.

Combining tags doesn't really help unless you put the tag in the metric name. When you combined the tags you still end up with 20 unique tags.

Brian

Reply all

Reply to author

Forward

KairosDB read performance, cassandra.key_query_time

Riley Zimmerman

Riley Zimmerman

Noorul Islam K M

Brian Hawkins

Riley Zimmerman

Praveen Agrawal

Brian Hawkins

Praveen Agrawal

Riley Zimmerman

Brian Hawkins

Riley Zimmerman

Brian Hawkins