Cassandra Datastore inner workings old and new

212 views
Skip to first unread message

Yiğitcan UÇUM

unread,
Aug 3, 2017, 5:16:12 PM8/3/17
to KairosDB
Hello guys, i have been researching about Cassandra and KairosDB as a part of my summer internship. I have wrote an article about what i have learned so far. Regarding how does KairosDB uses Cassandra, what to keep in mind while modeling your data/doing optimizations... Can you please provide some feedback on the article? If you guys are interested in the subject, i will strip parts of my articles and re-write them to contribute to the KairosDB documentation.


Kind Regards,
Yiğitcan UÇUM

Brian Hawkins

unread,
Aug 3, 2017, 11:50:29 PM8/3/17
to KairosDB
Good article but there are a few problems.  Some of those with my code, I didn't realize I had changed the string column from text to a blob in the new schema and I added a note to include the deletes for the new index tables.

Here are some corrections.  The string tables are only used for populating the UI and in fact only the metric_names is used, the tag_names and tag_values aren't used at all.  In the new beta I don't populate the last two anymore.

When doing queries it reads the row_key_index only once.  In the beta code it will still read the old row_key_index but as soon as you go beyond the 3 week window that query will return lightning fast as cassandra does a really good job of quickly telling you when data is not there.

Did you get a chance to run some performance tests?

Brian

Yiğitcan UÇUM

unread,
Aug 4, 2017, 5:03:08 AM8/4/17
to KairosDB
Hello Brian,

Thanks for the feedback! I have edited the parts about usage of string_index. But i couldn't understand the part of your reply about row_key_index. Can you please clarify what you meant with row_key_index being queried only once?

I haven't run any performance tests yet. Will probably have some stats regarding the performance of old vs new kairosdb within this month. I will share them here!

Kind Regards,
Yiğitcan UÇUM

Brian Hawkins

unread,
Aug 4, 2017, 10:34:58 AM8/4/17
to KairosDB
In your gotchas section you mention it would scan the row_key_index twice in order to do that delete.  That is not true, it would only read the row once and then do the deletes.

Brian

Yiğitcan UÇUM

unread,
Aug 4, 2017, 7:24:13 PM8/4/17
to KairosDB
I remember checking it with Wireshark aswell. 

Wouldn't a query like this, considering that i only have 2 different metric names and 300.000 different cities as tags, query the row_key_index table 4 times, and filter all of the records in memory twice? Is there an optimization for this? This is somewhat a problem of both optimization on the KairosDB side, and a issue of data modeling on the developer side. But it would be cool if there is a mechanism to query the row_key_index table once for Temperature and once for Humidity. Then we would have 2 queries and 2 filters.

Brian Hawkins

unread,
Aug 10, 2017, 11:38:07 AM8/10/17
to KairosDB
The solution is to group by city.  The query fragment would look like this:

{ "name": "Temperature", "tags": { "city": ["Antalya", "Istanbul"] },  "group_by": [
        {
          "name": "tag",
          "tags": [
            "city"
          ]
        }
      ]...

The above will hit the row key index only once.

Brian
Reply all
Reply to author
Forward
0 new messages