Bug: Timestamps set on CassStatement that are then added to CassBatch are not applied

173 views
Skip to first unread message

Robin Mahony

unread,
Feb 23, 2016, 4:07:44 PM2/23/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
Hi,

I am trying to re-create the following behaviour but instead using CassStatement's and CassBatches. (using version 2.2.2 of the driver).

BEGIN BATCH
INSERT into bla.table ... USING TIMESTAMP 1
INSERT into bla.table ... USING TIMESTAMP 2
APPLY BATCH

Where each row has a timestamp explicitly set.

However, it appears this does not work.

Using cass_statement_set_timestamp(), and then cass_batch_add_statement(); does not seem to apply the timestamp set on the statement.

Is this a bug or desired behaviour? As my application requires this ability for its critical path.

Cheers,

Robin

Sorin Manolache

unread,
Feb 23, 2016, 4:15:27 PM2/23/16
to cpp-dri...@lists.datastax.com
You cannot set individual timestamps to statements included in a batch.
All statements belonging to a batch have, by definition, the same
timestamp. You can specify a timestamp for the whole batch though:

begin batch using timestamp ts1
insert ...
insert ...
apply batch

Robin Mahony

unread,
Feb 23, 2016, 4:23:13 PM2/23/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
That is not actually true. If you specify the CQL as a string to cass_statement_new() in the format I described above, you can indeed set timestamps for individual statements within a batch.

Also, the Cassandra docs indicate this should be possible.


"Using a timestamp

BATCH supports setting a client-supplied timestamp, an integer, in the USING clause with one exception: if a DML statement in the batch contains a compare-and-set (CAS) statement, such as the following statement, do not attempt to use a timestamp:

INSERT INTO users (id, lastname) VALUES (999, 'Sparrow') IF NOT EXISTS
The timestamp applies to all statements in the batch. If not specified, the current time of the insertion (in microseconds) is used. The individual DML statements inside a BATCH can specify a timestamp if one is not specified in the USING clause.

For example, specify a timestamp in an INSERT statement.

BEGIN BATCH
INSERT INTO purchases (user, balance) VALUES ('user1', -8) USING TIMESTAMP 19998889022757000;
INSERT INTO purchases (user, expense_id, amount, description, paid)
VALUES ('user1', 1, 8, 'burrito', false);
APPLY BATCH;
Verify that balance column has the client-provided timestamp."

http://docs.datastax.com/en//cql/3.1/cql/cql_reference/batch_r.html

Phil Brayshaw

unread,
May 26, 2016, 9:19:27 AM5/26/16
to DataStax C++ Driver for Apache Cassandra User Mailing List

Hi,

I'm seeing the same behaviour here. If I put the timestamp directly into the CQL string with "USING TIMESTAMP" then it works as expected. Each statements gets executed with a separate timestamp. The downside to using this method is that the timestamp is then part of the statement so I don't think it's possible to rebind a new value to it to reuse the statement. If that's true it would entail preparing the statement from scratch each time it needs to be used.

However, if I set the timestamp on the statements with cass_statement_set_timestamp before adding to the batch it behaves as though the entire batch is processed with a single timestamp. Similarly if I use a monotonic timestamp generator to generate the timestamps the batch is processed as though it has a single timestamp.

Looking at the C++ driver source code I can see that when the batch is encoded in BatchRequest::encode it iterates over the statements and encodes each using ExecuteRequest::encode_batch but that doesn't appear to encode a timestamp for the statement. Once it's finished iterating over the statements it does encode a timestamp for the batch.

However, when encoding a single statement (ie. not in a batch) in ExecuteRequest::internal_encode it does encode a timestamp.

I'm not an expert on the C++ driver, so may be misunderstanding things, but it looks like it simply doesn't encode and send the timestamp to the database for the statements in a batch.

Is anyone able to confirm that this is a limitation of the C++ driver (and if there are any plans to address it)?

I also created a StackOverflow question recently (http://stackoverflow.com/questions/37460390/can-cassandra-statements-inside-a-batch-have-separate-timestamps-using-cpp-drive). If I get an answer there I'll let you know

Phil

Michael Penick

unread,
May 26, 2016, 10:35:48 AM5/26/16
to cpp-dri...@lists.datastax.com
The underlying native protocol (CQL) doesn't support per-statement timestamps in a batch. Each query in batch consists of the following encoding:

form:
<kind><string_or_id><n>[<name_1>]<value_1>...[<name_n>]<value_n>
where:
  • <kind> is a [byte] indicating whether the following query is a prepared one or not. <kind> value must be either 0 or 1.
  • <string_or_id> depends on the value of <kind>. If <kind> == 0, it should be a [long string] query string (as in QUERY, the query string might contain bind markers). Otherwise (that is, if <kind> == 1), it should be a [short bytes] representing a prepared query ID.
  • <n> is a [short] indicating the number (possibly 0) of following values.
  • <name_i> is the optional name of the following <value_i>. It must be present if and only if the 0x40 flag is provided for the batch.
  • <value_i> is the [bytes] to use for bound variable i (of bound variable <name_i> if the 0x40 flag is used).

Cassandra would need to add support to the native protocol for driver support. I did a quick search of the Cassandra JIRA (https://issues.apache.org/jira/browse/CASSANDRA) and was unable to find an outstanding issue. 

Mike


--
You received this message because you are subscribed to the Google Groups "DataStax C++ Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.

Phil Brayshaw

unread,
May 26, 2016, 11:19:54 AM5/26/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
Hi Michael,

Thanks for taking the time to look into this. I hadn't thought of looking at the spec of the protocol, but that confirms what I was able to understand from examining the source code.

At least now I know it's not (currently) possible I don't need to spend any more time going down that route.

I'll continue to look into putting it directly into the CQL with "USING TIMESTAMP" and re-binding new values to the statement.

Would it be possible to get this added to JIRA as a future enhancement to the protocol? I think it would be a very useful feature to have.

Perhaps it would also be useful in the mean time to put a note in the driver documentation or cassandra.h header file explaining that any timestamp set on the statement will not be sent to the database once the statement has been put into a batch.

Phil

Michael Penick

unread,
May 26, 2016, 1:16:29 PM5/26/16
to cpp-dri...@lists.datastax.com
+1 on the header documentation.

I've created a JIRA issue: https://issues.apache.org/jira/browse/CASSANDRA-11901. Feel free to add your input.

Mike

Robin Mahony

unread,
May 26, 2016, 2:23:52 PM5/26/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
Hi Phil,

So I had this issue but there is a way to work around it. If you write your prepared statements like this:

"INSERT INTO ... USING TIMESTAMP ?".

And then bind to "[timestamp]" using an int64_t value as if it was a column, it will do what you want.

Cheers,

Robin M

Phil Brayshaw

unread,
May 27, 2016, 4:30:19 AM5/27/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
Thanks Mike. If I can think of anything to add, I will.

Phil

Phil Brayshaw

unread,
May 27, 2016, 4:31:24 AM5/27/16
to DataStax C++ Driver for Apache Cassandra User Mailing List

Hi Robin,

Thanks for the suggestion, I'll try that and see how I get on

Phil

Reply all
Reply to author
Forward
0 new messages