Why WT use append-only Btree not use update in place Btree?

54 views
Skip to first unread message

baoshang young

unread,
Feb 4, 2021, 5:01:38 AM2/4/21
to wiredtiger-users
Hi, all
Append-only Btree could cause write amplification. Because I think if I flush a dirty leaf node to the disk, I need flush the leaf node's parent nodes. WT use append-only Btree just for the node size is not fixed so that it can not be updated in place?

Keith Bostic

unread,
Feb 4, 2021, 11:01:24 AM2/4/21
to wiredtiger-users
On Thursday, February 4, 2021 at 2:01:38 AM UTC-8 zaor...@gmail.com wrote:
 
Append-only Btree could cause write amplification. Because I think if I flush a dirty leaf node to the disk, I need flush the leaf node's parent nodes. WT use append-only Btree just for the node size is not fixed so that it can not be updated in place?

The most important reason for no-overwrite Btrees is because if you are overwriting a block and then the write fails, you might have garbage on disk and so you can't recover using the log, that is, there might be random contents in the disk block and so your log records can't move you to a known state. If that happens, you have to fallback to catastrophic recovery which is really, really painful.

baoshang young

unread,
Feb 5, 2021, 2:39:30 AM2/5/21
to wiredtiger-users
Thanks for your reply. Generally, the Btree's page size is greater than block size of the file system. So It is not atomic when flush a page into disk. Overwritten Btrees could use double writing to solve this problem, just like innodb. However, the double writing will get IO overheads.Thank you for answering my question :)
Best regards.
Zaorang Yang

Reply all
Reply to author
Forward
0 new messages