Hi everyone,
we sometimes see some of our threads getting stuck in db->CompactRange() calls for a very long time.
The RocksDB version is 7.2, and the exact commit in use is 2b5df21e95096fbfc25e8aac33b2153302e710e9.
Here is an example backtrace of such thread:
Thread 10 (LWP 143252):
#0 __syscall_cp_asm () at src/thread/aarch64/syscall_cp.s:28
#1 0x0000000003ceeedc in __syscall_cp_c (nr=98, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, y=<optimized out>, z=<optimized out>) at src/thread/pthread_cancel.c:33
#2 0x0000000003cf908c in __futex4_cp (to=0x0, val=2, op=128, addr=0xffff783f9154) at src/thread/__timedwait.c:52
#3 __timedwait_cp (addr=addr@entry=0xffff783f9154, val=val@entry=2, clk=clk@entry=0, at=at@entry=0x0, priv=128, priv@entry=1) at src/thread/__timedwait.c:52
#4 0x0000000003cef3b0 in __pthread_cond_timedwait (c=0xffff81639250, m=0xffff81638f00, ts=0x0) at src/thread/pthread_cond_timedwait.c:100
#5 0x0000000002229350 in rocksdb::port::CondVar::Wait () at /work/ArangoDB/3rdParty/rocksdb/port/port_posix.cc:122
#6 0x0000000002100ee8 in rocksdb::InstrumentedCondVar::WaitInternal () at /work/ArangoDB/3rdParty/rocksdb/monitoring/instrumented_mutex.cc:52
#7 rocksdb::InstrumentedCondVar::Wait () at /work/ArangoDB/3rdParty/rocksdb/monitoring/instrumented_mutex.cc:45
#8 0x0000000001f9b1c0 in rocksdb::DBImpl::WaitForFlushMemTables () at /work/ArangoDB/3rdParty/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2352
#9 0x0000000001f9fd68 in rocksdb::DBImpl::FlushMemTable () at /work/ArangoDB/3rdParty/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2101
#10 0x0000000001fa976c in rocksdb::DBImpl::CompactRangeInternal () at /work/ArangoDB/3rdParty/rocksdb/db/db_impl/db_impl_compaction_flush.cc:1023
#11 0x0000000001fa9c2c in rocksdb::DBImpl::CompactRange () at /work/ArangoDB/3rdParty/rocksdb/db/db_impl/db_impl_compaction_flush.cc:904
#12 0x0000000001913f7c in rocksdb::StackableDB::CompactRange () at /work/ArangoDB/3rdParty/rocksdb/include/rocksdb/utilities/stackable_db.h:271
No other threads are doing relevant work when it gets stuck here.
The thread is waiting in DBImpl::WaitForFlushMemTables, and doesn't make any progress. There is no background error, and no shutdown happening (i.e. db->Close() wasn't called yet).
The compaction options are:
rocksdb::CompactRangeOptions opts;
opts.exclusive_manual_compaction = false;
opts.allow_write_stall = true;
opts.canceled = &::cancelCompactions;
We use cancelable compactions, and the compaction in question should have been canceled already. The compaction cancelation check however happens only at the beginning of a compaction run, and not after it has been started.
Would it be an option to pass an optional pointer to the cancelation variable into WaitForFlushMemTables, and check from its while loop if the waiting should be canceled? That cancelation variable could be fed in from compactions that trigger flushes, and could be omitted from other callers.
If you think this is a good way forward, I am happy to work on a PR with the change.
Thanks
J