Populating KairosDB with historical data (and some questions regarding version 1.3.0)

閲覧: 59 回
最初の未読メッセージにスキップ

Rosita Hormann Lobos

未読、
2022/05/03 15:50:152022/05/03
To: KairosDB
Hello all.

My team and I have been working with a prototype of a system in which we want to store high volumes of time series data. KairosDB is a technology we have been working for a long time and we are confident it will met our requirements. We have some questions that arose while I was migrating our KairosDB Dockerfile to use KairosDB 1.3.0 (from 1.2.2).

The first big change between these two versions that I noticed is that now instead of using kairosdb.properties file for the configuration, kairosdb.conf must be used. So I started "migrating" the optimal configuration we found for our KairosDB service into this new format, including a ttl for the table data_points data based on it's timestamp. But I have found that the property that we used before, which was called "kairosdb.datastore.cassandra.align_datapoint_ttl_with_timestamp" that we set to true, is not in kairosdb.conf (at least not with the same name). How can we configure with the new file the ttl to be aligned with the timestamps of data_points and not aligned to when the data was inserted?

Another concern I have is regarding compactions. We have been using TimeWindowCompaction strategy, which I have seen as the recommendation for most use cases with time series data in Cassandra. We want to re-deploy our system and use the updated version of KairosDB, which means Cassandra will have no data to begin with (without entering into details, our previous prototype is currently irrecoverable, thus it's not an option to re-deploy the prototype using our previously inserted data).

We want to use KairosDB to insert old data into Cassandra to have it populated with at least 1 year of time series data initially, but may be extended to 2, 3 and more years of historical data. My concern comes with how TimeWindowCompaction works: My understanding is that TimeWindowCompaction compacts data based on how old the data is (which means, based on when it was inserted, not based on the timestamp of the data_points table). In this case, which is the recommended compaction strategy to be used? is it recommended at all to populate Cassandra+KairosDB with historical data?

Any help is highly appreciated.

Regards,
Rosita Hormann.

Francesco

未読、
2022/05/25 13:07:182022/05/25
To: KairosDB
Hello,

On Tuesday, May 3, 2022 at 9:50:15 PM UTC+2 rosita.ho...@gmail.com wrote:
Hello all.

My team and I have been working with a prototype of a system in which we want to store high volumes of time series data. KairosDB is a technology we have been working for a long time and we are confident it will met our requirements. We have some questions that arose while I was migrating our KairosDB Dockerfile to use KairosDB 1.3.0 (from 1.2.2).

The first big change between these two versions that I noticed is that now instead of using kairosdb.properties file for the configuration, kairosdb.conf must be used. So I started "migrating" the optimal configuration we found for our KairosDB service into this new format, including a ttl for the table data_points data based on it's timestamp. But I have found that the property that we used before, which was called "kairosdb.datastore.cassandra.align_datapoint_ttl_with_timestamp" that we set to true, is not in kairosdb.conf (at least not with the same name). How can we configure with the new file the ttl to be aligned with the timestamps of data_points and not aligned to when the data was inserted?
 
I haven't tried this specific property, but in general it should work by adding  "align_datapoint_ttl_with_timestamp: ..."  in the "datastore.cassandra: {...}" section of the "kairosdb.conf" file. 


Another concern I have is regarding compactions. We have been using TimeWindowCompaction strategy, which I have seen as the recommendation for most use cases with time series data in Cassandra. We want to re-deploy our system and use the updated version of KairosDB, which means Cassandra will have no data to begin with (without entering into details, our previous prototype is currently irrecoverable, thus it's not an option to re-deploy the prototype using our previously inserted data).

We want to use KairosDB to insert old data into Cassandra to have it populated with at least 1 year of time series data initially, but may be extended to 2, 3 and more years of historical data. My concern comes with how TimeWindowCompaction works: My understanding is that TimeWindowCompaction compacts data based on how old the data is (which means, based on when it was inserted, not based on the timestamp of the data_points table). In this case, which is the recommended compaction strategy to be used? is it recommended at all to populate Cassandra+KairosDB with historical data?

TWCS is useful in combination with the "data entered in order and TTL enabled" scenarios to efficiently eliminate expired data. On the other hand, if you do not intend to use TTL, STCS is still fine.

 

Any help is highly appreciated.

Regards,
Rosita Hormann.

Best,
Francesco 

Brian Hawkins

未読、
2022/06/10 23:47:062022/06/10
To: KairosDB
Sorry for the late response but here it goes.  The configuration kairosdb.datastore.cassandra.align_datapoint_ttl_with_timestamp is still there and can be set with that exact same name.  It is worth reading some documentation on hocon as it is very flexible.  You can actually still use the .properties file but will not be able to set some of the properties if you do.  looks like I forgot to migrate the documentation for that into the new config file.

I've not had experience loading historical data into TWCS so I'm not much help there.

Brian

全員に返信
投稿者に返信
転送
新着メール 0 件