To comply to regulation, I'm working on a feature to delete all data with a certain tag value.
Since kairosdb does not provide a REST endpoint to list all metrics, I've implemented it directly on top of Cassandra (streaming all row_key_index rows, filtering out the ones I need and then deleting both the data_points and row_key_index rows)
However, I noticed that after I deleted the data (from both data_points and row_key_index - I'm still on kairos 1.1.3), if I reposted the deleted data through kairos, the data was not stored anymore.
I traced it down to the fact that the timestamp passed to the hector client by the CassandraDataStore is in (epoch) milliseconds (see org/kairosdb/datastore/cassandra/CassandraDatastore.java:278 and org.kairosdb.datastore.cassandra.CassandraDatastore:354), where I saw (by accident) in the javadoc of the CQL BoundStatement that CQL expect a in microseconds since the epoch.
Also when I inspect the contents of a sstable (e.g. datapoints) with sstabledump I saw that the (column) timestamps generated by kairos are in the 1970's, where the tombstone that my tool generated was the current timestamp (2018).
I fixed the problem I had (reinserting deleted data did not store it), by also specifying the timestamp on my delete row statement in milliseconds since epoch.
Now the question I have, is/was this a bug or a deliberate choice ?
For me works fine, because with timestamps in 1970's, a compaction will nicely clean up all the tombstones that my delete generated (since they are way before the gc_grace_seconds and a such eligible for garbage collection).
Small other question, do you happen to know how the expiry_date of a column is calculated if you specify a TTL ? Is that calculated based on the "arrival" time of the column at the C* side, or is the specified timestamp used ? Since TTL does work with kairosdb, I believe C* server calculates it based on the "arrival" time + TTL.