Optimistic parallel replication for RocksDB

233 views
Skip to first unread message

Kristian Nielsen

unread,
Oct 10, 2016, 4:44:50 AM10/10/16
to Sergey Petrunia, Yoshinori Matsunobu, MariaDB Developers, roc...@googlegroups.com
Sergey, Yoshinori, it was great talking to you about MyRocks in Amsterdam.

I took a first look at how to extend MyRocks to work with optimistic
parallel replication. It looks conceptually quite simple.

Sergey, I understand you have more pressing priorities right now (like
getting a tree to build :), so let us revisit this in more detail when you
get to it.

It looks like the fix is conceptually as simple as this patch, which calls
thd_rpl_deadlock_check() whenever a transaction is blocked on a row lock:

-----------------------------------------------------------------------
diff --git a/utilities/transactions/transaction_lock_mgr.cc b/utilities/transactions/transaction_lock_mgr.cc
index 28e8598..5ff291f 100644
--- a/utilities/transactions/transaction_lock_mgr.cc
+++ b/utilities/transactions/transaction_lock_mgr.cc
@@ -317,6 +317,8 @@ Status TransactionLockMgr::AcquireWithTimeout(LockMap* lock_map,
return result;
}

+extern "C" int thd_rpl_deadlock_check(MYSQL_THD thd, MYSQL_THD other_thd);
+
// Try to lock this key after we have acquired the mutex.
// Sets *expire_time to the expiration time in microseconds
// or 0 if no expiration.
@@ -340,6 +342,9 @@ Status TransactionLockMgr::AcquireLocked(LockMap* lock_map,
lock_info.expiration_time = txn_lock_info.expiration_time;
// lock_cnt does not change
} else {
+ THD *blocked_thd = getTHD(txn_lock_info.txn_id);
+ THD *bloking_thd = getTHD(lock_info.txn_id);
+ thd_rpl_deadlock_check(blocked_thd, blocking_thd);
result = Status::TimedOut(Status::SubCode::kLockTimeout);
}
}
-----------------------------------------------------------------------

A real patch will need some plumbing to put the code in the right place and
have the right information available. Ie. probably the
thd_rpl_deadlock_check() call will go into an overridden virtual method in
ha_rocksdb.cc. I also did not check if/how one can get from txn_id to THD
(what is called getTHD() above), I assume it can be implemented reasonably
easy if it is not already there? Hints will be appreciated here as I am new
to the MyRocks and RocksDB codebases.

When are row locks released? I am interested in whether row locks can be
released earlier than at transaction commit time. If so, the simple patch
above will give false positives, and it might be worth it to investigate
ways to not report locks that are released earlier than commit. Eg. in
InnoDB, auto-increment locks are released earlier than commit, and thus are
not reported.

Once something like this is in place, I think optimistic parallel
replication should work. In case of a conflict between transactions T1 and
T2, thd_rpl_deadlock_check(T1, T2) will be called and will cause T2 to be
killed so that T1 can proceed and T2 be re-tried afterwards. So things look
good now; let us revisit this when there is a tree to work on.

- Kristian.

Kristian Nielsen

unread,
Oct 11, 2016, 2:32:16 AM10/11/16
to Yoshinori Matsunobu, Sergey Petrunia, MariaDB Developers, roc...@googlegroups.com, myroc...@googlegroups.com
Yoshinori Matsunobu <yosh...@fb.com> writes:

> About transaction ids, it's not visible from MyRocks yet. We're currently working on
> RocksDB to add an API to get transaction id, and making it available via MyRocks.

Ok, I see.

Optimistic parallel replication needs the ability to somehow find the THD
that is holding the row lock that is blocking another THD. If we have T1
followed by T2, T2 will only commit after T1 has. So if there is a
conflicting row lock, we need a way to identify T2 so that the conflict can
be resolved. Lock wait timeout is not sufficient here because of the
requirement of in-order commit.

> Rows are normally released at transaction commit or rollback, but there are some exceptions.
> - Auto-increment id allocation is implemented as std::atomic<longlong> and
> the lock is released earlier than statement/transaction, like
> InnoDB. I hope this doesn't matter for
> parallel replication, since auto-inc ids are always given on slaves
> (either RBR image, or insert_id with SBR).

Agree, it does not matter, MyRocks just should not report these lock
conflicts with thd_rpl_deadlock_check() (and since this is using a different
mechanism, there is no reason it would).

> - MyRocks has data dictionary
> (https://github.com/facebook/mysql-5.6/wiki/MyRocks-data-dictionary-format),
> and data dictionary operations' transaction scope is different from
> applications'. For example, internal index
> id allocation is done (and committed) immediately. There is no SQL
> statements to directly manipulate data dictionary,
> so I assume this won't matter for replication either.

Agree, it shouldn't.

Optimistic parallel replication handles DDL pessimistically anyway - DDL is
not run in parallel with any other statements.

Thanks,

- Kristian.
Reply all
Reply to author
Forward
0 new messages