Behavior on write-path failures from RocksDB

Mihir Jadhav

unread,

Apr 10, 2020, 1:25:42 PM4/10/20

to MyRocks - RocksDB storage engine for MySQL

Hello there,

I had a dumb question regarding how errors from RocksDB write api are treated and whether they should be retried at all.

I looked at the code and see a sigabrt for write failures from RocksDB.

Example https://github.com/facebook/mysql-5.6/blob/fb-mysql-5.6.35/storage/rocksdb/ha_rocksdb.cc#L3801

Write failures can be either due to background errors (compaction, flushing) or foreground errors (failure during memtable insert); I can't think of any other foreground failures in the write path. So presumably they are all catastrophic

Why is it correct behavior to sigabrt and not retry? I understand that in scenarios like disk full, and other background errors that retrying is futile, but what about memtable insert failures? Is it incorrect to assume RocksDB can throw transient write failures which are retry-able?

RocksDB commits write batches to WAL before inserting into the memtable, so no data is lost in case of sigabrt; but I'm trying to understand motive for this approach and whether retrying is a viable alternative.

Thanks! Any guidance would be useful.

Yi Zhang

unread,

Apr 10, 2020, 2:58:36 PM4/10/20

to MyRocks - RocksDB storage engine for MySQL

According to https://github.com/facebook/rocksdb/wiki/Background-Error-Handling, write failures will turn RocksDB into read-only mode so subsequent write operations will just fail. This is most likely because RocksDB assumes local disk and with that write errors are essentially non-transient. Maybe memory allocation errors are transient and if you release some memory / reduce cache, maybe retry could've worked, but very few people does that anyway. If RocksDB/MyRocks at some point decides to start looking at supporting remotely attached disks/storage, the entire error handling strategy in RocksDB and MyRocks in those cases need to be revised.

Yoshinori Matsunobu

unread,

Apr 10, 2020, 3:00:16 PM4/10/20

to Mihir Jadhav, MyRocks - RocksDB storage engine for MySQL

Yes, as you pointed out, MyRocks sigaborts on all I/O errors right now. Typically MySQL is operated under replication with

automated failover, so failing earlier means availability impact will be lower than suspending and stalling writes for a long time on the failed machine.

We have plans to add more options on I/O errors including disk full.

Best Regards,

- Yoshinori

--
You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/860ae410-ea7d-4175-9d9c-9d067b94f065%40googlegroups.com.

Reply all

Reply to author

Forward