How will concurrent updates be performed？

Xiang Zhou

<feixiang11010@gmail.com>

unread,

Oct 12, 2020, 9:32:18 PM10/12/20

to ScyllaDB users

Hey~!

Regarding concurrent updates across DC clusters, I have a question:

For the same row of a table, I have two updates

update1, set a = a+1

update2, set a = a+1

The Scylla cluster is deployed across 3DCs, where there is a 30ms network communication delay between DC02 and DC01, and between DC02 and DC03. RF=3 for each DC. We use DC02 to receive requests. Set LWT=only_rmw_uses_lwt.

The UpdateItem request will be sent to Alternator. As far as I know, CL = LOCAL_QUORUM, which means that there are two replica nodes in DC02 that have completed the update, and this request can be returned.

Then, here comes the problem. Suppose update1 reaches DC02 first, and after getting returned in DC02, update2 also reaches DC02. At this time, update1 is only executed in DC02 (the delay is at the ms level), but considering the 30ms network communication delay across DC, update1 is not updated in DC01 and DC03.

So, will update2 be executed immediately in DC02? Or does update2 need to wait for update1 to finish executing in DC01 and DC03 before it starts executing?

I have read the following article, but in response to my problem, I did not draw a conclusion.

https://docs.google.com/document/d/1i4yjF5OSAazAY_-T8CBce9-2ykW4twx_E_Nt2zDoOVs/edit#heading=h.x9tzamcdcqye

Nadav Har'El

<nyh@scylladb.com>

unread,

Oct 13, 2020, 2:18:48 AM10/13/20

to ScyllaDB users

On Tue, Oct 13, 2020 at 4:32 AM Xiang Zhou <feixia...@gmail.com> wrote:

Hey~!

Regarding concurrent updates across DC clusters, I have a question:
For the same row of a table, I have two updates
update1, set a = a+1
update2, set a = a+1

The Scylla cluster is deployed across 3DCs, where there is a 30ms network communication delay between DC02 and DC01, and between DC02 and DC03. RF=3 for each DC. We use DC02 to receive requests. Set LWT=only_rmw_uses_lwt.

The UpdateItem request will be sent to Alternator. As far as I know, CL = LOCAL_QUORUM, which means that there are two replica nodes in DC02 that have completed the update, and this request can be returned.

Right. Alternator uses LOCAL_QUORUM for writes meaning that the write request will return after having successfully written two copies on the local DC (the same DC as the coordinator node who got the request). The third replica in the local DC, and all replicas on other DCs, will be updated "eventually", but the client will not wait for them.

Note that this means that consistent reads (reads with the consistent flag) are only really "consistent" (guaranteed to see the last write) when done on a different DC from where the write was done. We did this deliberately, because this is exactly what DynamoDB does in this case: The DynamoDB documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/V2globaltables_HowItWorks.html says:

"If your application requires strongly consistent reads, it must perform all of its strongly consistent reads and writes in the same Region. DynamoDB does not support strongly consistent reads across Regions. Therefore, if you write to one Region and read from another Region, the read response might include stale data that doesn't reflect the results of recently completed writes in the other Region"

So in Alternator we chose to do the same thing, using LOCAL_QUORUM.

We could have also used EACH_QUORUM instead of LOCAL_QUORUM to have a quorum on each DC, or QUORUM to have a global quorum, and allow consistent reads - but the write latency will be higher and worse - high-availability will suffer (we can't proceed if one DC is down), and this is not required by the DynamoDB documentation.

Currently, this choice is not configurable - it can be changed in code, or if you think it should be configurable you can open an enhancement request.

Then, here comes the problem. Suppose update1 reaches DC02 first, and after getting returned in DC02, update2 also reaches DC02. At this time, update1 is only executed in DC02 (the delay is at the ms level), but considering the 30ms network communication delay across DC, update1 is not updated in DC01 and DC03.
So, will update2 be executed immediately in DC02? Or does update2 need to wait for update1 to finish executing in DC01 and DC03 before it starts executing

?

Here it's not the "LOCAL_QUORUM" parameter that matters, but the "LOCAL_SERIAL" which effects the "serial consistency" of LWT.

The isolation of the different writes is only guaranteed if the writes go to the same data center.

Again, DynamoDB documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/V2globaltables_HowItWorks.html explains that exactly the same thing happens in DynamoDB global tables:

"If applications update the same item in different Regions at about the same time, conflicts can arise. To help ensure eventual consistency, DynamoDB global tables use a last writer wins reconciliation between concurrent updates, in which DynamoDB makes a best effort to determine the last writer. With this conflict resolution mechanism, all the replicas will agree on the latest update and converge toward a state in which they all have identical data."

In other words, the "a = a + 1" will happen concurrently, and separately, in each region, and the last set value will eventually win - but it's the last "value", not the last "operation". You may end up seeing a incremented only once.

In Alternator, we can change "LOCAL_SERIAL" to "SERIAL" if you want isolation across the entire cluster in all DCs, but again this has a big latency cost, and a high-availability cost (if a DC is down, you can't write), and again is not yet configurable without manually changing the code.

I have read the following article, but in response to my problem, I did not draw a conclusion.
https://docs.google.com/document/d/1i4yjF5OSAazAY_-T8CBce9-2ykW4twx_E_Nt2zDoOVs/edit#heading=h.x9tzamcdcqye

The relevant section there is "Single-region and multi-region tables".

Nadav.

Xiang Zhou

<feixiang11010@gmail.com>

unread,

Oct 13, 2020, 3:52:06 AM10/13/20

to ScyllaDB users

Thank you, Nadav.

Your explanation is very careful.

在2020年10月13日星期二 UTC+8 下午2:18:48<Nadav Har'El> 写道：

In other words, the "a = a + 1" will happen concurrently, and separately, in each region, and the last set value will eventually win - but it's the last "value", not the last "operation". You may end up seeing a incremented only once.

In Alternator, we can change "LOCAL_SERIAL" to "SERIAL" if you want isolation across the entire cluster in all DCs, but again this has a big latency cost, and a high-availability cost (if a DC is down, you can't write), and again is not yet configurable without manually changing the code.

In the scenario described in my question, update1 and update2 go to the same data center (DC02).

The update1 request will return after having successfully written two copies on the local DC (DC02). The third replica in the local DC, and all replicas on other DCs, will be updated "eventually", but the client will not wait for them.

The update2 request will begin to write its two copies on the local DC (DC02), and there is no need to wait for update1 to update the remaining copies (The third replica in the local DC, and all replicas on other DCs).

In other words, update2 will be executed immediately after update1 is returned？

Because the two update operations go to the same data center, there will be no "You may end up seeing a incremented only once." situation, right?

Is my understanding correct?

Nadav Har'El

<nyh@scylladb.com>

unread,

Oct 13, 2020, 4:44:04 AM10/13/20

to ScyllaDB users

On Tue, Oct 13, 2020 at 10:52 AM Xiang Zhou <feixia...@gmail.com> wrote:

Thank you, Nadav.
Your explanation is very careful.

在2020年10月13日星期二 UTC+8 下午2:18:48<Nadav Har'El> 写道：

In other words, the "a = a + 1" will happen concurrently, and separately, in each region, and the last set value will eventually win - but it's the last "value", not the last "operation". You may end up seeing a incremented only once.

In Alternator, we can change "LOCAL_SERIAL" to "SERIAL" if you want isolation across the entire cluster in all DCs, but again this has a big latency cost, and a high-availability cost (if a DC is down, you can't write), and again is not yet configurable without manually changing the code.

In the scenario described in my question, update1 and update2 go to the same data center (DC02).

In this case, you get the full isolation you expect. If "a" started equal 7, and two "a = a + 1" requests are sent to the same DC, at the end "a" will be 9. Even if the two requests are concurrent.

The update1 request will return after having successfully written two copies on the local DC (DC02). The third replica in the local DC, and all replicas on other DCs, will be updated "eventually", but the client will not wait for them.

Right.

The update2 request will begin to write its two copies on the local DC (DC02), and there is no need to wait for update1 to update the remaining copies (The third replica in the local DC, and all replicas on other DCs).
In other words, update2 will be executed immediately after update1 is returned？

In some sense, yes. Our Paxos-based LWT implementation has multiple phases. As you said, the implementation can start working on update2 (reading the value and incrementing it) before all the background updates of update1 have finished.

Because the two update operations go to the same data center, there will be no "You may end up seeing a incremented only once." situation, right?

Right. Exactly.

In essence, we aim (at least by default) to provide exactly the same guarantees that DynamoDB does. In DynamoDB a normal (single-region) table offers isolation between concurrent writes, and offers (the option of) consistent read, so we must support this too, and do. DynamoDB doesn't offer these guarantees across different regions, so neither do we. The nice thing about Scylla is that it *could* offer additional guarantees by different choices of consistency levels, if we ever want to do that, but we didn't make that configurable in Alternator yet.

Is my understanding correct?

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/e0dc9a9e-011e-45e9-a069-4fc8c176cf6bn%40googlegroups.com.

Xiang Zhou

<feixiang11010@gmail.com>

unread,

Oct 13, 2020, 5:12:21 AM10/13/20

to ScyllaDB users

Thanks again,

By the way, I can't wait to see Alternator enrich its optional configuration. : )

For example, consistency level, I think it should be configurable in Alternator, so that we can carry out more abundant tests to help beginners understand ScyllaDB.

Reply all

Reply to author

Forward