Two Phase Commit at Application Level

178 views
Skip to first unread message

Deepak Panda

unread,
Sep 13, 2017, 9:49:29 AM9/13/17
to java-dri...@lists.datastax.com
Hi,

In our case multiple services are accessing Cassandra cluster. For 5%
of the operations we want to handle strong consistency among rows
spanning MULTIPLE tables. I understand that LOGGED batch option can be
used in this case, but I would like to perform some additional
validations as part of my transactions.

Hence we are considering implementing 2 phase commit at our
application level so that other applications do not corrupt the data.
I understand that it affects availability but since these are critical
and rare use cases we would like to protect them.

Has anyone implemented 2 Phase Commit with Cassandra in their
projects? If yes, would it be possible to share the experiences?

Any pointer to publicly available code sample would be of great help.

Regards,
AL

Greg Bestland

unread,
Sep 13, 2017, 11:22:03 AM9/13/17
to java-dri...@lists.datastax.com
AL,

I believe what you are looking for is light weight transactions (LWT).
Here are a couple of posts which might be useful to you.

https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_ltwt_transaction_c.html
https://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

~Thanks.
Greg Bestland.


--
You received this message because you are subscribed to the Google Groups "DataStax Java Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-user+unsubscribe@lists.datastax.com.




--


Deepak Panda

unread,
Sep 13, 2017, 11:33:33 AM9/13/17
to java-dri...@lists.datastax.com
Thanks Greg for the pointers. But my requirement is a bit different. Here is an example.

Consider that there are two tables  called "Person" and "Employee". 

When I create a Person I want an Employee also to be created as part of the SAME transaction. If any other  sessions/clients tries to create the same Person and Employee then it should be rejected.

Similarly when I am updating the name in both Person and Employee table it should get executed as part of the same transaction and no other session should be able to update when this update is going on.

As I understood LWT helps achieving Isolation at a particular row level not across multiple tables' rows at once.

Please correct me if my understanding is wrong.

Kevin Gallardo

unread,
Sep 13, 2017, 12:24:40 PM9/13/17
to java-dri...@lists.datastax.com
Batching conditional updates across multiple partitions/table is not supported in Cassandra, for performance purpose, as Cassandra is not made for this kind of workload. By trying to handle it at the application level you may face unforeseen synchronization issues in case you have many clients, in multiple regions, and so on. I see 2 things that could help:
  • Separate your workloads and use a relational database only for the data requiring these specific guarantees (apparently 5% of your workload as you mentioned)
  • or one not-so-nice workaround is, rework your data model to have the information you need to update atomically located in the same table, as mentioned here: https://stackoverflow.com/a/41329573
Hope that helps.
Kévin Gallardo.
Software Developer in Drivers and Tools Team,
DataStax.

Deepak Panda

unread,
Sep 13, 2017, 12:59:35 PM9/13/17
to java-dri...@lists.datastax.com
Hi Kevin, 

Thanks for your reply. Few comments...

I meant 5% of the write operations would be transactional. There would be other 95% read/write operations happening on these tables for which no transaction handling is required. If we go for RDBMS we may have to move all our data to RDBMS instead of using Cassandra. Should it be a driving factor to move away from Cassandra?

From the stackoverflow link I am failing to understand how the mapping table would help maintaining the atomicity of the operation. Any example may really help.

Just reluctant to dump Cassandra only because of this use case.

Kevin Gallardo

unread,
Sep 13, 2017, 2:32:21 PM9/13/17
to java-dri...@lists.datastax.com
I meant 5% of the write operations would be transactional. There would be other 95% read/write operations happening on these tables for which no transaction handling is required.

First I must assume that the 5% of transactional operations you require concern the same subset of data for all the operations, correct? Meaning that you would not update the same row, once with strong consistency requirements and once without, because this would break your consistency requirements (see here, "mixing LWTs and normal operations can result in errors"). Then if we assume that only the same subset of data requires strong transactional guarantees, then this subset of data can be deferred to a RDBMS. Coming to my mind I have 2 example use cases:
  1. You have a "Customer" table. 5% of the customers are high value and require strong consistency guarantees, the rest don't. Then these 5% high value customers may be stored in an RDBMS in a Customer table where you will be able to execute with strong consistency, for the others you would use Cassandra.
  2. You have a User, a Address and a Transfers tables. Transfers requires high consistency because it contains money information, so you just use a database with strong consistency to store the transfers table. Now if you need to do verifications on the table User and Address when adding a row to Transfers, then it is the whole data set that requires strong consistency and maybe Cassandra is not the best fit, since as I explained earlier Cassandra does not support it for performance/scaling purpose (or, if it is only for a subset of the data for these tables, then refer to example 1. above).
I am failing to understand how the mapping table would help maintaining the atomicity of the operation.
 
To re-use the example you mentioned, with Person and Employee, you would rework your data model to have Person and Employee in the same table + a column with a flag indicating whether a row in this table is an Employee or a Person. Then there is only one table to update conditionally so it would not cause a problem with LWTs.

Deepak Panda

unread,
Sep 14, 2017, 9:58:25 AM9/14/17
to java-dri...@lists.datastax.com
Hi Kevin,

Many thanks for the response.

>> First I must assume that the 5% of transactional operations you require concern the same subset of data for all the operations, correct?

Yes there are other operations which access the same 5% of the data. Yes it is possible for us to execute these writes ONLY using LWT so that we can avoid teh issue that i smentioned in the link.

Just curious to know why you are suggesting to go for RDBMS for these data as the same can be achieved using LOGGED BATCH operations using timestamp where the entire operation is atomic. Is there any case where atomicity would not be respected when multiple tables are involved? Or it is just a performance consideration. I tried using batch operation USING TIMESTAMP across multiple tables and it works for me as expected.

Regards,
AL

Kevin Gallardo

unread,
Sep 14, 2017, 11:34:36 AM9/14/17
to java-dri...@lists.datastax.com
Just curious to know why you are suggesting to go for RDBMS for these data as the same can be achieved using LOGGED BATCH operations using timestamp where the entire operation is atomic.
You mentioned having to batch operations with LWTs and across multiple partitions and tables within the same batch, which is not supported for Cassandra as far as I know, hence I suggested the alternative. Now if you only batch operations in a batch with LWTs, which does not span across multiple partitions, then a batch with LWTs in Cassandra is fine.

Kevin Gallardo

unread,
Sep 14, 2017, 11:41:32 AM9/14/17
to java-dri...@lists.datastax.com
USING TIMESTAMP does not provide the same guarantees as a LWT. Also please note that Batches spanning across multiple partitions (with or without LWTs) in general are not recommended.

Deepak Panda

unread,
Sep 14, 2017, 11:44:54 AM9/14/17
to java-dri...@lists.datastax.com
Thanks again Kevin,

>> USING TIMESTAMP does not provide the same guarantees as a LWT.

Would be very useful for me if you could just highlight the differences.

Kevin Gallardo

unread,
Sep 29, 2017, 4:17:38 PM9/29/17
to java-dri...@lists.datastax.com
My apologizes for getting back to this quite late.

Shortly, USING TIMESTAMP does not provide concurrent updates conflicts resolution and/or would require clients to be perfectly synchronized (and even then I think there would be concurrent scenarios that would not be covered), Lightweight Transactions bring linearizable consistency for distributed transactions, so distributed conflict resolution with reliable feedback to the clients (e.g. "this conditional insert did not succeed, here's what's currently stored instead"). So while timestamps might be good enough in many situations, imo it does not bring guarantees as strong and the same functionalities as LWTs. Here's a few links I found explaining in more details:

Kevin Gallardo

unread,
Sep 29, 2017, 4:23:37 PM9/29/17
to java-dri...@lists.datastax.com
Also, you would probably find better responses to How timestamps differ from LWTs, from Cassandra experts on the Cassandra mailing list. (this is the Java driver mailing list - granted your original question was a client side question, but I think the last question is more of a Cassandra design topic)
Reply all
Reply to author
Forward
0 new messages