Cassandra column timestamp question

342 views
Skip to first unread message

Bruno Ballekens

unread,
Sep 5, 2018, 4:30:21 AM9/5/18
to KairosDB
To comply to regulation, I'm working on a feature to delete all data with a certain tag value.
Since kairosdb does not provide a REST endpoint to list all metrics, I've implemented it directly on top of Cassandra (streaming all row_key_index rows, filtering out the ones I need and then deleting both the data_points and row_key_index rows)

However, I noticed that after I deleted the data (from both data_points and row_key_index - I'm still on kairos 1.1.3), if I reposted the deleted data through kairos, the data was not stored anymore.
I traced it down to the fact that the timestamp passed to the hector client by the CassandraDataStore is in (epoch) milliseconds (see org/kairosdb/datastore/cassandra/CassandraDatastore.java:278 and org.kairosdb.datastore.cassandra.CassandraDatastore:354), where I saw (by accident) in the javadoc of the CQL BoundStatement that CQL expect a in microseconds since the epoch.
Also when I inspect the contents of a sstable (e.g. datapoints) with sstabledump I saw that the (column) timestamps generated by kairos are in the 1970's, where the tombstone that my tool generated was the current timestamp (2018).
I fixed the problem I had (reinserting deleted data did not store it), by also specifying the timestamp on my delete row statement in milliseconds since epoch.

Now the question I have, is/was this a bug or a deliberate choice ?
For me works fine, because with timestamps in 1970's, a compaction will nicely clean up all the tombstones that my delete generated (since they are way before the gc_grace_seconds and a such eligible for garbage collection).
Small other question, do you happen to know how the expiry_date of a column is calculated if you specify a TTL ? Is that calculated based on the "arrival" time of the column at the C* side, or is the specified timestamp used ? Since TTL does work with kairosdb, I believe C* server calculates it based on the "arrival" time + TTL.


jsabin

unread,
Sep 7, 2018, 12:10:46 PM9/7/18
to KairosDB
Not sure what you mean by "list all metrics". The rest endpoint /api/v1/metricnames lists all metric names. What is it you are looking for? 

Also, the timestamps for metrics are specified in milliseconds since the Epoch. This is a deliberate choice to allow data points to be specified on a millisecond interval. From the docs:

"The timestamp is the date and time when the data was measured. It’s a numeric value that is the number of milliseconds since January 1st, 1970 UTC."

And yes the TTL is based on the "arrival" time of the column on the C* side.

Bruno Ballekens

unread,
Sep 10, 2018, 11:53:43 AM9/10/18
to KairosDB
Not sure what you mean by "list all metrics". The rest endpoint /api/v1/metricnames lists all metric names. What is it you are looking for? 

Hmm, seems like overlooked that one. Sorry for that
 
Also, the timestamps for metrics are specified in milliseconds since the Epoch. This is a deliberate choice to allow data points to be specified on a millisecond interval. From the docs
"The timestamp is the date and time when the data was measured. It’s a numeric value that is the number of milliseconds since January 1st, 1970 UTC."

Yes, I know that the timestamp for the metrics are in milliseconds.
The timestamp I'm referring to is a timestamp that you can set on (any?) Cassandra column and that Cassandra apparently uses for determining consistency.
It's the clock parameter of the Hector client's HColumnImpl class or the defaultTimestamp of the com.datastax.driver.core.Statement (or the timestamp generated by the com.datastax.driver.core.TimestampGenerator that is set on the session)


And yes the TTL is based on the "arrival" time of the column on the C* side.

Ok thanks for confirmation! 

jsabin

unread,
Sep 11, 2018, 11:06:48 AM9/11/18
to KairosDB
I see. Good find. Give us a pull request for that fix.

Thanks

Brian Hawkins

unread,
Sep 11, 2018, 12:08:57 PM9/11/18
to KairosDB
So the timestamp is a legacy issue.  Yes what happened is that you put a delete in with a timestamp that is far larger than any timestamp that new data is inserted with so your delete supersedes it and the data doesn't show up.  You can modify your cql commands to use a millisecond timestamp that will fix the issue.

Changing the way kairos writes the timestamp is tricky as we have a lot of users with data already in C* using the millisecond timestamp.  Changing it on the fly could have bad repercussions.

When we insert data with a ttl we specify the ttl in seconds, C* changes that to an internal timestamp of when it will expire.  As far as I know it is not connected to the modification timestamp that we are inserting as milliseconds.

Brian
Reply all
Reply to author
Forward
0 new messages