Simple transaction test case wedges forever against Cockroach

Nikhil Benesch

unread,

Apr 15, 2018, 11:53:41 PM4/15/18

to cockro...@googlegroups.com

There's a rather simple test of transactions that breaks badly against Cockroach. It goes something like this:

db := sql.Open("cockroach://root@localhost...")

db.Exec("UPDATE foo SET bar = 1 WHERE id = 42")

txn := db.Txn()

txn.Exec("UPDATE foo SET bar = 2 WHERE id = 42")

var bar int

db.QueryRow("SELECT bar FROM foo WHERE id = 42").Scan(&bar)

assert(bar != 2)

txn.Commit()

db.QueryRow("SELECT bar FROM foo WHERE id = 42").Scan(&bar)

assert(bar == 2)

Basically, you open a transaction that writes a piece of data. While holding that transaction open, and in the same client thread, you attempt to read that same piece of data outside of the transaction (i.e., in a fresh transaction) and assert that the write hasn't yet been applied. You'll end up blocking forever, because your read transaction, with a higher timestamp, is waiting for the write transaction to commit, but the write transaction can't commit because you're waiting for the read to return.

One solution, and the one I think we've employed so far, is to educate users about why this occurs. Client-induced transactional deadlock like this is usually limited to test code. Production code that looks like this is usually doing something wrong.

But I'm starting to worry that we need to handle this case without inducing deadlock. Sequelize (a Node JS ORM) has nearly a dozen tests of this type that work fine against MSSQL/MySQL/Postgres/SQLite but deadlock against CockroachDB. It's also very confusing to figure out what's wedged if you accidentally write a test like this. I wasted the better part of an hour once because I thought it was the Go database driver that was deadlocking, not Cockroach.

So my question is: is there anything we can do?

One idea that comes to mind is to teach a stalled read to eventually abort the write that it gets stalled upon. In this case the write transaction, I think, would be able to refresh its spans and commit. I'm not sure if that's feasible, though. My knowledge of our transaction infrastructure is spotty.

I also wonder if you could move the reader's commit timestamp back in time to before the writer's timestamp. That is, in a read-only transaction, couldn't you commit at any time between the timestamp of the latest piece of data you read and the current time (minus the clock offset)? I'll admit that this sounds like a big can of worms.

Nikhil

Ben Darnell

unread,

Apr 16, 2018, 9:17:41 AM4/16/18

to Nikhil Benesch, cockroach-db

On Sun, Apr 15, 2018 at 11:52 PM, Nikhil Benesch <ben...@cockroachlabs.com> wrote:

But I'm starting to worry that we need to handle this case without inducing deadlock. Sequelize (a Node JS ORM) has nearly a dozen tests of this type that work fine against MSSQL/MySQL/Postgres/SQLite but deadlock against CockroachDB. It's also very confusing to figure out what's wedged if you accidentally write a test like this. I wasted the better part of an hour once because I thought it was the Go database driver that was deadlocking, not Cockroach.

This is really about transaction isolation levels, not CockroachDB per se. I believe you'd get the same behavior in other DBs if you increased the isolation level to SERIALIZABLE. The test is (unknowingly) making assumptions about the isolation level.

One idea that comes to mind is to teach a stalled read to eventually abort the write that it gets stalled upon. In this case the write transaction, I think, would be able to refresh its spans and commit. I'm not sure if that's feasible, though. My knowledge of our transaction infrastructure is spotty.

Yes, we could consider making the read push the write, now that the cost of pushing a write's timestamp is much less than it used to be. However, it's not free, and if we allowed every read to push any write it encountered then writes would be prone to starvation. We'd need some sort of limitation or heuristic here to not push all the time (and if we did that, could we still make the read push the write quickly or reliably enough to make this test pass?)

I also wonder if you could move the reader's commit timestamp back in time to before the writer's timestamp. That is, in a read-only transaction, couldn't you commit at any time between the timestamp of the latest piece of data you read and the current time (minus the clock offset)? I'll admit that this sounds like a big can of worms.

Moving the timestamp backwards is risky because it could violate read-your-writes. In this case the test wants to read the value 1, so it needs to pick a timestamp between the first update's and the pending transaction's. How could the server know that?

-Ben

Nikhil

--
You received this message because you are subscribed to the Google Groups "CockroachDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nikhil Benesch

unread,

Apr 16, 2018, 11:28:21 AM4/16/18

to Ben Darnell, cockroach-db

On Mon, Apr 16, 2018 at 9:17 AM, Ben Darnell <b...@cockroachlabs.com> wrote:

On Sun, Apr 15, 2018 at 11:52 PM, Nikhil Benesch <ben...@cockroachlabs.com> wrote:

But I'm starting to worry that we need to handle this case without inducing deadlock. Sequelize (a Node JS ORM) has nearly a dozen tests of this type that work fine against MSSQL/MySQL/Postgres/SQLite but deadlock against CockroachDB. It's also very confusing to figure out what's wedged if you accidentally write a test like this. I wasted the better part of an hour once because I thought it was the Go database driver that was deadlocking, not Cockroach.

This is really about transaction isolation levels, not CockroachDB per se. I believe you'd get the same behavior in other DBs if you increased the isolation level to SERIALIZABLE. The test is (unknowingly) making assumptions about the isolation level.

I thought so too, but even in SERIALIZABLE mode Postgres is capable of handling this test without blocking or aborting either transaction. I can only assume that Postgres is logically reordering the transactions so that the reader commits before the writer, in which case there's no serializability violation.

One idea that comes to mind is to teach a stalled read to eventually abort the write that it gets stalled upon. In this case the write transaction, I think, would be able to refresh its spans and commit. I'm not sure if that's feasible, though. My knowledge of our transaction infrastructure is spotty.

Yes, we could consider making the read push the write, now that the cost of pushing a write's timestamp is much less than it used to be. However, it's not free, and if we allowed every read to push any write it encountered then writes would be prone to starvation. We'd need some sort of limitation or heuristic here to not push all the time (and if we did that, could we still make the read push the write quickly or reliably enough to make this test pass?)

Yeah, what those heuristics would look like is fuzzy to me too. Is this a road even worth exploring?

I also wonder if you could move the reader's commit timestamp back in time to before the writer's timestamp. That is, in a read-only transaction, couldn't you commit at any time between the timestamp of the latest piece of data you read and the current time (minus the clock offset)? I'll admit that this sounds like a big can of worms.

Moving the timestamp backwards is risky because it could violate read-your-writes. In this case the test wants to read the value 1, so it needs to pick a timestamp between the first update's and the pending transaction's. How could the server know that?

I'm envisioning that we'd assign the read a timestamp that's earlier than the pending write's timestamp but later than the latest committed write on any key in the read set. (It might not be possible to find such a timestamp if there are other outstanding transactions.) Is there a "clocks don't permit this in distributed systems" problem that I'm not seeing here?

Ben Darnell

unread,

Apr 16, 2018, 11:38:12 AM4/16/18

to Nikhil Benesch, cockroach-db

On Mon, Apr 16, 2018 at 11:27 AM, Nikhil Benesch <ben...@cockroachlabs.com> wrote:

On Mon, Apr 16, 2018 at 9:17 AM, Ben Darnell <b...@cockroachlabs.com> wrote:
On Sun, Apr 15, 2018 at 11:52 PM, Nikhil Benesch <ben...@cockroachlabs.com> wrote:

But I'm starting to worry that we need to handle this case without inducing deadlock. Sequelize (a Node JS ORM) has nearly a dozen tests of this type that work fine against MSSQL/MySQL/Postgres/SQLite but deadlock against CockroachDB. It's also very confusing to figure out what's wedged if you accidentally write a test like this. I wasted the better part of an hour once because I thought it was the Go database driver that was deadlocking, not Cockroach.

This is really about transaction isolation levels, not CockroachDB per se. I believe you'd get the same behavior in other DBs if you increased the isolation level to SERIALIZABLE. The test is (unknowingly) making assumptions about the isolation level.

I thought so too, but even in SERIALIZABLE mode Postgres is capable of handling this test without blocking or aborting either transaction. I can only assume that Postgres is logically reordering the transactions so that the reader commits before the writer, in which case there's no serializability violation.

One idea that comes to mind is to teach a stalled read to eventually abort the write that it gets stalled upon. In this case the write transaction, I think, would be able to refresh its spans and commit. I'm not sure if that's feasible, though. My knowledge of our transaction infrastructure is spotty.

Yes, we could consider making the read push the write, now that the cost of pushing a write's timestamp is much less than it used to be. However, it's not free, and if we allowed every read to push any write it encountered then writes would be prone to starvation. We'd need some sort of limitation or heuristic here to not push all the time (and if we did that, could we still make the read push the write quickly or reliably enough to make this test pass?)

Yeah, what those heuristics would look like is fuzzy to me too. Is this a road even worth exploring?

It might be. For example, we could allow a single push (if Timestamp == OrigTimestamp), or offer some way for the user to specify whether they prioritize non-blocking reads over starvation-free writes. (Putting the read in a high priority transaction accomplishes this today if you're willing to make changes to the client code).

I also wonder if you could move the reader's commit timestamp back in time to before the writer's timestamp. That is, in a read-only transaction, couldn't you commit at any time between the timestamp of the latest piece of data you read and the current time (minus the clock offset)? I'll admit that this sounds like a big can of worms.

Moving the timestamp backwards is risky because it could violate read-your-writes. In this case the test wants to read the value 1, so it needs to pick a timestamp between the first update's and the pending transaction's. How could the server know that?

I'm envisioning that we'd assign the read a timestamp that's earlier than the pending write's timestamp but later than the latest committed write on any key in the read set. (It might not be possible to find such a timestamp if there are other outstanding transactions.) Is there a "clocks don't permit this in distributed systems" problem that I'm not seeing here?

The problem is that unless the read is an implicit transaction, we don't know the read set all at once (and even if we do know the read set, it could be spread across multiple ranges). We'd need some sort of backwards version of the timestamp push process and verification to discover whether this is safe except in the simplest cases. There's no fundamental problem with this, just a lot of work.

-Ben

Alex Robinson

unread,

Apr 16, 2018, 11:41:32 AM4/16/18

to Nikhil Benesch, Ben Darnell, cockroach-db

On Mon, Apr 16, 2018 at 11:27 AM, Nikhil Benesch <ben...@cockroachlabs.com> wrote:

On Mon, Apr 16, 2018 at 9:17 AM, Ben Darnell <b...@cockroachlabs.com> wrote:
On Sun, Apr 15, 2018 at 11:52 PM, Nikhil Benesch <ben...@cockroachlabs.com> wrote:

But I'm starting to worry that we need to handle this case without inducing deadlock. Sequelize (a Node JS ORM) has nearly a dozen tests of this type that work fine against MSSQL/MySQL/Postgres/SQLite but deadlock against CockroachDB. It's also very confusing to figure out what's wedged if you accidentally write a test like this. I wasted the better part of an hour once because I thought it was the Go database driver that was deadlocking, not Cockroach.

This is really about transaction isolation levels, not CockroachDB per se. I believe you'd get the same behavior in other DBs if you increased the isolation level to SERIALIZABLE. The test is (unknowingly) making assumptions about the isolation level.

I thought so too, but even in SERIALIZABLE mode Postgres is capable of handling this test without blocking or aborting either transaction. I can only assume that Postgres is logically reordering the transactions so that the reader commits before the writer, in which case there's no serializability violation.

As explained in https://drkp.net/papers/ssi-vldb12.pdf, postgres's SERIALIZABLE mode relies on a snapshot of what's committed when a transaction starts, then tracks RW dependencies as the transactions execute. The read transaction here would be using a snapshot that doesn't include the in-progress write. It can read the old value and commit without waiting because a single RW dependency isn't dangerous by itself. There'd have to be a third transaction or more operations at play before an abort would be needed.

One idea that comes to mind is to teach a stalled read to eventually abort the write that it gets stalled upon. In this case the write transaction, I think, would be able to refresh its spans and commit. I'm not sure if that's feasible, though. My knowledge of our transaction infrastructure is spotty.

Yes, we could consider making the read push the write, now that the cost of pushing a write's timestamp is much less than it used to be. However, it's not free, and if we allowed every read to push any write it encountered then writes would be prone to starvation. We'd need some sort of limitation or heuristic here to not push all the time (and if we did that, could we still make the read push the write quickly or reliably enough to make this test pass?)

Yeah, what those heuristics would look like is fuzzy to me too. Is this a road even worth exploring?

I also wonder if you could move the reader's commit timestamp back in time to before the writer's timestamp. That is, in a read-only transaction, couldn't you commit at any time between the timestamp of the latest piece of data you read and the current time (minus the clock offset)? I'll admit that this sounds like a big can of worms.

Moving the timestamp backwards is risky because it could violate read-your-writes. In this case the test wants to read the value 1, so it needs to pick a timestamp between the first update's and the pending transaction's. How could the server know that?

I'm envisioning that we'd assign the read a timestamp that's earlier than the pending write's timestamp but later than the latest committed write on any key in the read set. (It might not be possible to find such a timestamp if there are other outstanding transactions.) Is there a "clocks don't permit this in distributed systems" problem that I'm not seeing here?

--

Nikhil Benesch

unread,

Apr 16, 2018, 6:27:44 PM4/16/18

to Ben Darnell, cockroach-db

Ah, so we keep track of the write set (I assume via the "intents" field on the transaction proto) but not the read set?

Spencer Kimball

unread,

Apr 16, 2018, 6:28:26 PM4/16/18

to Nikhil Benesch, Ben Darnell, cockroach-db

We do keep track of the read set now too actually.

--

You received this message because you are subscribed to the Google Groups "CockroachDB" group.

To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.

Tobias Schottdorf

unread,

Apr 16, 2018, 8:01:03 PM4/16/18

to Spencer Kimball, Nikhil Benesch, Ben Darnell, cockroach-db

Even with the read set, I think a dependency analysis similar to Postgres' is hard for us to achieve. For one, we do track the read set, but not where you typically need it (so you add extra latencies). Second, Postgres controls the txn commit order and knows when transactions are open concurrently (at which point you can allow more reorderings than if they're sequential, where you have to read your own write). We don't have that luxury. When you find an intent, the transaction could have been committed or not, and it may even have been your own SQL session that wrote the conflicting intent. Causality tokens would help for the latter point and then you could perhaps cook up an algorithm that simply ignores intents while reading (other than noting the transaction ID, which must somehow be used to look up that transaction's RW dependencies to run your checks at commit time).

--

-- Tobias

Spencer Kimball

unread,

Apr 16, 2018, 9:26:09 PM4/16/18

to Tobias Schottdorf, Ben Darnell, Nikhil Benesch, Spencer Kimball, cockroach-db

Yeah sorry for the terse message. I didn’t mean to suggest we can do same thing. Just adding detail. There’s more we can do with the refresh spans though i suspect. I thought about adding a PR before 2.0 to push serializable txns on reads. Just as a matter of course. I was worried about full implications but I think it might be a solid change. With the refresh spans it might just work out nicely in most circumstances though hard to say with confidence.

--

Spencer Kimball | Co-founder & CEO

Cockroach Labs

Help us Make Data Easy!

Reply all

Reply to author

Forward