conditional commit optimization

27 views

Skip to first unread message

Bradley Kuszmaul

unread,

Mar 8, 2021, 12:19:06 PM3/8/21

to rocksdb

Hi,

I'm wondering if it's possible to do the following optimization in RocksDB. I'll start with a simple example and then proceed to the general question.

The simple example: Suppose I have a single key-value in my database, and it represents a counter. The key is a string "a", and the counter is 8 bytes representing a number. My transaction just wants to do

bool increment_counter(uint64_t *value) {

begin transaction

get_for_update("a", value); // Eliding all the details about converting a string to a number.
++value;

put("a", *value);

return commit();

}

The database is set up to fsync on each transaction.

In general, I wouldn't be surprised if, when multitple threads call increment_counter in parallel, it essentially serializes.

So here's the optimization I'd like: I'd like commit to

a) Write the relevant log entries into the write-ahead log.

b) Release the lock acquired by get_for_update()

c) Arrange to fsync the write-ahead log (using group commit).

This would allow me to update the counter many times per fsync().

In contrast, the implementation seems to do

a) Write the relevant log entries

b) fsync()

c) release the locks

In general, if we could release the locks held by a transaction after writing the relevant log entries (so that we know that the transaction will not abort due to conflicts with other transactions: the transaction could still abort if the fsync fails, however, but then all subsequent transactions should abort also, so it's OK.) then I suspect that many places where people use relaxed consistency can essentially use serializable transactions.

Is there a way to get this effect in RocksDB? (Maybe it's already implemented?....)