Cassandra doesn't seem to free up disk space after TTL expires

Erik van den Ackerveken

unread,

Mar 30, 2017, 3:56:57 AM3/30/17

to KairosDB

Hi,

Our cluster (3 nodes) is in production since December 2016. It contains roughly 18K metrics and for each for metric 10 data points per minute are added with a TTL of 1 month. (so give or take 3000 data points per second).

After 1 month of gathering data I was excited to see that data wasn't query-able anymore, however I also expected that the disk space stopped increasing and level out, however that didn't happen.

But, taken into account gc_grace (default of 10 days) and the 3 weeks windows of the row key it could take at least 2 month I figured. But now we are three months later and I already had to double the disk capacity of the server to prevent a crash and disk space still continues to increase.

Btw, we use the default SizeTieredCompactionStrategy.

Could you guys please give some insight if this is expected behavior and how I can resolve this?

Much appreciated,

Erik

CREATE KEYSPACE kairosdb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'} AND durable_writes = true;

CREATE TABLE kairosdb.data_points (

key blob,

column1 blob,

value blob,

PRIMARY KEY (key, column1)

) WITH COMPACT STORAGE

AND CLUSTERING ORDER BY (column1 ASC)

AND bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 1.0

AND speculative_retry = 'NONE';

Brian Hawkins

unread,

Mar 30, 2017, 11:22:40 AM3/30/17

to KairosDB

First you need to understand how TTL works. It basically shows up as a timestamp on the data and then after that timestamp has expired Cassandra ignores it. Now when does it actually go away. Your assumption is correct about the gc_grace period, in your case after 1 month + 10 days the data is a candidate for cleanup. When that happens is when the sstable is compacted. With size tiered compaction the oldest data will be in the largest sstable and will only be compacted when a bunch of other sstables of similar size are around. Basically it is a crap shoot for when it will actually go away. You will have much better results with leveled compaction. We have a cluster where the data expires after 14 days and it works pretty good.

A shortcut cleanup option for you is to look at your sstables and see if any are older than 1 month + 10 days. If they are then all the data has expired and you can delete it (turn C* off first).

Brian

Erik van den Ackerveken

unread,

Mar 30, 2017, 2:10:56 PM3/30/17

to KairosDB

Thanks a lot Brian,

Does your advice to use leveled compaction also hold up when we also insert datapoints (aggregations) without TTL in between? We're continuously aggregating data on a hourly basis and store these datapoints on the designated metric with a tag indicating that it's aggregated data instead of raw data points. Does compaction than remove our "TTLed" raw datapoints from the SSTables or does it leave it alone because there're also datapoints which aren't subject to cleanup.

Do you advice leveled compaction over data tiered compaction?

Must appreciated,

Erik

Op donderdag 30 maart 2017 17:22:40 UTC+2 schreef Brian Hawkins:

Brian Hawkins

unread,

Mar 30, 2017, 5:41:04 PM3/30/17

to KairosDB

I prefer leveled over size tiered. It nicely handles sstables that have partially TTLed data. It will compact the file and leave the rollup data.

I would use Time windowed compaction over date tiered from a conceptual perspective. I don't have enough experience using it in the real world to know if it deals with TTLs. Maybe someone else on the list can answer that. The system we are using with leveled is a 90 node cluster and data expires out pretty well (not a kairos cluster, it is an internal app)

Brian

Corey Cole

unread,

Apr 6, 2017, 2:20:46 PM4/6/17

to KairosDB

I'm stuck on 2.2.x without the ability to use Jeff Jirsa's TWCS and have been using DTCS with TTLs for some time. It works out pretty well for us.

DTCS does require that you set readrepair_chance to zero. Otherwise you run the risk of having a timestamp that is outside the time window for the SSTable.

Reply all

Reply to author

Forward