Disk usage

46 views
Skip to first unread message

AlexandreD

unread,
Aug 5, 2021, 9:42:30 AM8/5/21
to KairosDB
Hello,

I am trying to understand how much space will be needed on my Cassandra / KairosDB nodes.
The cluster I am working on contains 4 nodes. The total space used (nodetool status) is 596 GB, an the total number of points is around 1,7 billion for 9k metrics.
Most of the datapoint have only one tag associated.

This averages to 350 bytes per datapoint.
Does this sound correct to you ? Of course there is redundancy (replication factor is 2) but this seems to be a lot of overhead.

Regards,
Alex


Brian Hawkins

unread,
Aug 5, 2021, 11:34:19 PM8/5/21
to KairosDB
That seems to be way to high to me.  In previous tests we have seen closer to 15-20 bytes per data point.  I'd double check that cassandra isn't holding snapshots of the data that is bloating your disk usage.

Brian

Loic Coulet

unread,
Aug 6, 2021, 3:47:43 AM8/6/21
to KairosDB
Hi, as Brian I would check if there are snapshots dangling around.

Another possibility is having lot of rewrite or delete operations, and Cassandra requiring a major compaction to reclaim deleted data disk space and tombstones
Deleting in Cassandra adds data in the first place.

Another check is methods to measure disk size and number of data points, you can use nodetool tablestats to check where the disk space is used for Cassandra, and how did you measure number of data points ?

The data usage we measured is much lower (half long values with lot of repetitions, half doubles )... around 10~12 bytes per data point (using LZ4 compressor).

Loic

AlexandreD

unread,
Aug 6, 2021, 5:00:16 AM8/6/21
to KairosDB
Hi Loic, Brian,

Thanks for you answers - you confirm that our disk usage is way to high.
@Loic I counted the points using the KairosDB count API, on every metrics. We don't delete often.
I will try to start a new cluster from scratch, reimport all data and check the disk usage.

Alex

AlexandreD

unread,
Aug 12, 2021, 10:24:53 AM8/12/21
to KairosDB
Hi,
Well I created a new four-nodes cluster (KairosDB 1.3, yeah!) with cassandra 4 and reimported all the datapoints.
The used space is down to ~35 GB.
The first cluster has been broken before - hardware failure and bad recovery - so there could be something wrong with it.
For now I'll just move the data to the new cluster.

Alex
Reply all
Reply to author
Forward
0 new messages