Understanding tables compression

rosita.ho...@gmail.com

unread,

Mar 29, 2021, 3:02:34 PM3/29/21

to KairosDB

Hello everyone, I have been working with some time series databases, seeing which one fits with the use case we need to implement. In this moment we are focusing in tables compression. I have a lot of things I would like to better understand for configuring and using compression on Cassandra tables, specifically from the point of view of the tables that KairosDB uses and creates on start.

[firs question here:]

What I have found so far is that Cassandra has a lot of possible mechanism for tables compression, and the default one is LZ2. When KairosDB creates the tables, these tables are created with the compression LZ2 as we can find out using cqlsh on our Cassandra cluster and executing `DESCRIBE SCHEMA`. What I can't figure out is, in case we would want to use a different compression mechanism, how would we do it without modifying the source code of KairosDB? Is there a direct way to tell KairosDB to create the tables on Cassandra with another compression mechanism? I want to know this because using `ALTER TABLE` with cqlsh would not be an ideal solution, given that we are deploying our Cassandra and KairosDB cluster with k8s.

[second question here]

Another question I have (and this is from my experience working with SQL databases) is that the tables are compressed for a given condition in our experience working with TimescaleDB (which uses PostgreSQL as its backend), that means, the compression would have a condition similar to this: If the timestamp is older than a week from now, compress it. In the case of the tables that KairosDB uses on Cassandra, how does it work? What is the default condition for compressing the tables? Is it possible to modify that condition?

Thanks in advance for any help! Rosita.-

Greg Matza

unread,

Mar 29, 2021, 3:10:58 PM3/29/21

to rosita.ho...@gmail.com, KairosDB

Rosita,

One of ScyllaDB's engineers has a deep-dive writeup on Compression. I'm not 100% sure that C* has all the options of Scylla, but I think it does.

https://www.scylladb.com/2019/10/04/compression-in-scylla-part-one/
https://www.scylladb.com/2019/10/07/compression-in-scylla-part-two/

Full disclosure - I'm biased becuase I work at Scylla, but would encourage you to give Scylla a look, in place of C*. If your primary cost/concern is disk usage, you might want to read up on Scylla's Incremental Compaction Strategy - https://docs.scylladb.com/kb/compaction/#incremental-compaction-strategy-ics

Greg

--
You received this message because you are subscribed to the Google Groups "KairosDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kairosdb-grou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kairosdb-group/5668cad4-8bcd-4b6b-8b75-975406e39ce5n%40googlegroups.com.

Brian Hawkins

unread,

Apr 23, 2021, 11:01:24 AM4/23/21

to KairosDB

So you need to understand two things. The first is how Kairos lays out data in Cassandra and the second is how cassandra does compactions. Here are a couple of presentations I did with some friends at Scylla that describes how Kairos puts data in Cassandra: https://www.youtube.com/watch?v=d_vwhaISqq0 and https://www.youtube.com/watch?v=F2ukOa1gGlo

From my understanding Cassandra compresses data as it compacts files together. So far we have found the best suited compaction strategy to be time window compaction which lumps files together based on age, kind of what you are asking about in your second question.

Kairos data is quite compact to begin with but it does depend on your data on how well your compression is. Search back through this forum there was a large discussion on compression and how much space is taken by kairos per data point. Some people did some experiments and came up with a few numbers that may be interesting for what you are doing.

Brian

Reply all

Reply to author

Forward