Understanding KairosDB for production (distributed database)?

Fernando Paladini

unread,

Oct 9, 2015, 4:05:52 PM10/9/15

to KairosDB

Hello again, beauty community!

I'm really fascinated with KairosDB and this new world of big data and data mining that I'm discovering. I've some very basic questions about bringing KairosDB for production that I hope can help more people in the future. Anyway, I'll write a tutorial about that when I get it, so hope you can help me helping you (in the future :p ). As I said before, a article about Spark-KairosDB integration is on the way.

I want to make a distributed database using Cassandra + KairosDB and I don't have any idea on how to do that. Currently I've the following setup:

Two home computers running over Ubuntu 14.04.
Each computer has KairosDB and Cassandra up and running. I already made some tests and everything is okay.
To simplify the question I've cleaned the KairosDB keyspace from both computers (actually, deleted and created a new keyspace. It's valid to note that I've data for populate a database with more than 1 million point values).

Far I know, in order to create a distributed database, I should share the same database with all my nodes; that's right? Should I create a KairosDB instance only for reads and another for writes? If so, how can I achieve that? Should I configure Cassandra and KairosDB or just configuring Cassandra is okay?

Please, be patient and try to make this newbie understand the concepts behind a distributed KairosDB database. Of course, this will become a newbie #101 guide in the future :)

Thank you!
Fernando Paladini.

Brian Hawkins

unread,

Oct 10, 2015, 1:14:04 AM10/10/15

to KairosDB

I think your question is more geared to how to setup Cassandra. As far as Kairos is concerned one Kairos node knows nothing of any other Kairos node. The distributed nature comes purely from Cassandra. For example at work we have 9 Cassandra nodes in a cluster. There are 4 Kairos nodes in front of the Cassandra cluster.

As far as making some Kairos nodes read and others write that is entirely up to you and your application. We have debated configuring our load balancer to point the ingest url to some nodes and the query url to others for accomplishing such a separation.

As far as using Kairos goes it doesn't care if you have one Cassandra node or 10,000 Cassandra nodes, Kairos works the same - probably faster in the latter case. If you need to ingest more data or add capacity for storing more data then you grow the Cassandra cluster.

Does that help?

Brian

Fernando Paladini

unread,

Oct 12, 2015, 11:58:30 AM10/12/15

to KairosDB

Yes, it does help!

Actually your reply really helped me, I was very confused on how Cassandra works with KairosDB on a distributed environment. So far I see (from your comment), Cassandra is nothing to do with Kairos - Cassandra will hundle it's nodes by himself, while KairosDB isn't a distributed software per-se, but can easily use a Cassandra.

Thank you!

Brian Hawkins

unread,

Oct 13, 2015, 5:58:57 PM10/13/15

to KairosDB

Cassandra does most of the heavy lifting for us. Kairos for now is just a data translation layer that sits on top of Cassandra. Ok it is a little cooler than that but you get the idea.

Brian

Reply all

Reply to author

Forward