Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

ovfl_lock vs atomic write

16 views
Skip to first unread message

Matan Tennenhaus

unread,
Aug 19, 2024, 4:26:02 AM8/19/24
to wiredtiger-users
Hi,
I noticed in the code the usage of ovlf_lock for changing the type of a cell from ovfl to ovlf_rm (In order to prevent races with readers, also I have read the explanation in __wt_ovfl_remove).
1) Why isn't this change happens with atomic write?
2) My motivation is that I write many overflow items (in fact most of the items are overflows). Therefore, I want to remove the usage of the lock. I would appreciate any recommendation on how to do so.

Many thanks,
Matan.

Andrew Morton

unread,
Aug 20, 2024, 5:29:39 AM8/20/24
to wiredtig...@googlegroups.com
Hi Matan, as always thanks for your interest!

You've already flagged the race we're protecting against. WiredTiger supports large pages sizes, so we expect overflow pages to be infrequent and have low contention. Lockless algorithms also carry an increased maintenance cost that we haven't seen a need for yet. Without knowing the details of the contention I'd first recommend more granular locks located at the page-level instead of the current lock at the btree-level.

If you'd prefer to remove the lock you can look at WiredTiger's generations. Reader threads enter a generation (__wt_session_gen_enter) to indicate they need continued access to a resource until they call __wt_session_gen_leave. If the __wt_ovfl_discard thread calls __wt_gen_next_drain after atomically setting WT_CELL_VALUE_OVFL_RM, but before freeing the block it forces the thread to wait until all other readers have left their current generation and the block will be safe to discard.

However, generations are a single global counter which means you can't free an overflow page until all other overflow pages in the database have finished being read. If you don't want to block during __wt_ovfl_discard you could look into an implementation similar to the session stash where objects are saved to the session structure via __wt_stash_add, and you can call __wt_stash_discard at a later time to free these objects.

Let me know if you have any further questions!
Andrew

--
You received this message because you are subscribed to the Google Groups "wiredtiger-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wiredtiger-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wiredtiger-users/a306f88d-30f1-4ccf-8dbc-c8e4f4bd631bn%40googlegroups.com.

Matan Tennenhaus

unread,
Aug 20, 2024, 5:50:14 AM8/20/24
to wiredtiger-users
Thanks for the answer!
I will look into the implementation is the stash functions.
I just still do not understand why there is a usage of lock inside  __wt_ovfl_discard, instead of just changing the cell type via atomic write.
Thanks.

ב-יום שלישי, 20 באוגוסט 2024 בשעה 12:29:39 UTC+3, Andrew Morton כתב/ה:

Andrew Morton

unread,
Aug 20, 2024, 8:14:28 PM8/20/24
to wiredtig...@googlegroups.com
No worries, to expand on the race __wt_ovfl_discard's critical section only sets the cell type to WT_CELL_KEY_OVFL_RM , but after that point it's allowed to free the backing disk blocks in the subsequent bm->free call.
In __wt_ovfl_read both the flag check and read of the backing block takes place inside the critical section, so we can guarantee a thread that sees a non-WT_CELL_KEY_OVFL_RM value knows the backing block can't be deleted under it.

The race we prevent with the lock is:
1. __wt_ovfl_read reads the cell type as not WT_CELL_KEY_OVFL_RM
2. __wt_ovfl_discard sets cell type to WT_CELL_KEY_OVFL_RM
3. __wt_ovfl_discard deletes the backing block with bm->free
4. __wt_ovfl_read tries to read the now-deleted backing block 


Reply all
Reply to author
Forward
0 new messages