Hi Matan, as always thanks for your interest!
You've already flagged the race we're protecting against. WiredTiger supports
large pages sizes, so we expect overflow pages to be infrequent and have low contention. Lockless algorithms also carry an increased maintenance cost that we haven't seen a need for yet. Without knowing the details of the contention I'd first recommend more granular locks located at the page-level instead of the current lock at the btree-level.
If you'd prefer to remove the lock you can look at WiredTiger's
generations. Reader threads enter a generation (
__wt_session_gen_enter) to indicate they need continued access to a resource until they call
__wt_session_gen_leave. If the
__wt_ovfl_discard thread calls
__wt_gen_next_drain after atomically setting
WT_CELL_VALUE_OVFL_RM, but before freeing the block it forces the thread to wait until all other readers have left their current generation and the block will be safe to discard.
However, generations are a single global counter which means you can't free an overflow page until
all other overflow pages in the database have finished being read. If you don't want to block during
__wt_ovfl_discard you could look into an implementation similar to the session stash where objects are saved to the session structure via
__wt_stash_add, and you can call
__wt_stash_discard at a later time to free these objects.
Let me know if you have any further questions!
Andrew