Cassandra column writetime timestamps from distributed clients cause (expected) inconsistency

michael....@eturi.com

unread,

Dec 13, 2017, 5:52:03 PM12/13/17

to DataStax Python Driver for Apache Cassandra User Mailing List

Hi,

I have a client system timestamp consistency problem that's puzzling me and I'm not really sure what a good solution strategy would be:

I am testing a Python based backend with Cassandra DB through its Web API, e.g. "create a user A", "get data for user A". The test runner goes through the following simplified sequence:

- call Web API to create the user and extract the new user_id from the response. The backend creates a new user row with an empty permissions column of type MAP

- connect directly to Cassandra with an INSERT to add a new permission to the permissions column

- call the Web API to "get data" for user A

- occasionally receive a 401 permission denied error

(Note: Cassandra 2.1.17, cassandra_driver==3.9.0, Python 2.7.x)

After some research and try and error I found that this happens when the INSERT permission column timestamp is less than the INSERT user row with all columns timestamp (or rather the client system time during the time the CQL is executed).

- The backend running the Web API has a system timestamp T1 when it makes the user row INSERT. I can read the timestamp later using a SELECT user_id, WRITETIME(user_name) FROM users WHERE user_id='abc';

- The test runner adding an element to the user row permissions MAP has a system timestamp T2 when it makes the INSERT.

- If T2 < T1, a subsequent SELECT does not show the added permission

If the test runner's internal timestamp lags more than the request roundtrip time behind the Web API server, the permission INSERT is silently ignored (more precisely: completed but never read) because on read, Cassandra treats the row INSERT (which defined ALL columns including the empty permissions MAP) as more recent. If timestamps are somewhat identical, then the HTTP request roundtrip latency usually leads to monotonously increasing client timestamps and no error. The problem is pronounced here because the test runner uses a somewhat different system setup than the backend under test, and because it does a very fast create-update cycle on the same row. Ideally all clients would be NTP time synced but they are not.

A few notes:

- The above flow is just an example for reproducing the error. Other non-test, non-permission, non-MAP-type related use cases show the same behavior.

- I do not want to add a test endpoint to the backend that just inserts a specific permission to the user row because this is not part of the production server.

- This does not occur in production because there is no use case when a row is inserted and immediately modified by a client on a different host. I am very aware that Cassandra is an eventually consistent, strongly distributed, non-centralized system. It works quite well for us to date but I'd like to find a way around the testing problem. Also, production hosts are more strictly NTP synced, so this limits the risk further.

- For testing it's using a single node Cassandra cluster, but 2 different hosts for test runner and backend.

My question is: is there a way to instruct the cassandra_driver to generate the internal (column writetime) timestamp server side (by the Cassandra node)? I assume that all Cassandra nodes are much better synced than the the clients. Would that solve anything or make it worse?

I've read the discussions in https://issues.apache.org/jira/browse/CASSANDRA-9131 and https://issues.apache.org/jira/browse/CASSANDRA-6178 and it seems that server side timestamps seem deprecated. Are they?

Are there any other strategies?

Thanks,

Michael

Jaume Marhuenda

unread,

Dec 14, 2017, 12:38:04 PM12/14/17

to python-dr...@lists.datastax.com

Hello Michael,

> is there a way to instruct the cassandra_driver to generate the internal (column writetime) timestamp server side (by the Cassandra node)?

Yes there is, you have to set Cluster.use_client_timestamp to False

> I assume that all Cassandra nodes are much better synced than the the clients. Would that solve anything or make it worse?

That really depends on the synchronization you have between your servers, if it's bad you could have the same problems you are having

> and it seems that server side timestamps seem deprecated. Are they?

I don't think they are, the default behavior is the drivers to use a client generated one but you can still let the server generate them.

I'd recommend setting ntpd on your clients and your servers and allowing the client to generate the timestamps.

Jaume

--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.

Michael Meisinger

unread,

Dec 14, 2017, 4:47:44 PM12/14/17

to python-dr...@lists.datastax.com

Hi Jaume,

thanks for the reponse.

On Dec 14, 2017, at 9:38 AM, Jaume Marhuenda <jaume.m...@datastax.com> wrote:

Hello Michael,

> is there a way to instruct the cassandra_driver to generate the internal (column writetime) timestamp server side (by the Cassandra node)?

Yes there is, you have to set Cluster.use_client_timestamp to False

That makes sense. I had a hunch it was there and it’s actually where one would expect it.

> I assume that all Cassandra nodes are much better synced than the the clients. Would that solve anything or make it worse?
That really depends on the synchronization you have between your servers, if it's bad you could have the same problems you are having

> and it seems that server side timestamps seem deprecated. Are they?
I don't think they are, the default behavior is the drivers to use a client generated one but you can still let the server generate them.

I'd recommend setting ntpd on your clients and your servers and allowing the client to generate the timestamps.

Agreed. Any issues related to leap seconds and NTP corrections cannot be expected to be handled by the server(s) alone. The driver’s MonotonicTimestampGenerator and custom timestamp generators seem like good mechanisms to apply more specific client side behavior.

For what it’s worth, I found it strangely hard to relalize synced time on all clients. I’ve seen diverging timestamps in serveral cases. E.g. when running docker containers on a Mac, it internalls spins up a Linux host VM on which the containers are running. That host VM determines the timestamps for the docker containers running the services. Timestamps diverge when the Mac sleeps but even without sleeping. They can diverge multiple seconds within a few minutes of docker uptime (mostly they jump ahead of NTP time for me, by sometimes 15 seconds and more). This seems to be a long standing Mac Docker “issue". It seems worse with recent versions of Mac Docker.

https://blog.shameerc.com/2017/03/quick-tip-fixing-time-drift-issue-on-docker-for-mac

Nonetheless, it also happens on cloud based Linux hosts, but not to that extreme. Might of course be host configuration related.

Thanks again,

Michael

Jaume Marhuenda

unread,

Dec 14, 2017, 5:43:44 PM12/14/17

to python-dr...@lists.datastax.com

Hello Michael,

I see, maybe for you case it's convenient to use server timestamps so no time is wasted on syncing. Here's an article that may help you face a possible leap seconds in cassandra without relying on the client monotonic timestamp generator.

Jaume

Reply all

Reply to author

Forward