BufferPool related question found in LeanStore

王欢明

unread,

Oct 23, 2019, 1:22:36 AM10/23/19

to wiredtiger-users

When I was reading 《LeanStore: In-Memory Data Management Beyond Main Memory》, I found these words in the paper:

In the canonical buffer pool implementation [1], each page access requires a hash table lookup in order to translate a logical page identifier into an in-memory pointer

As Fig. 1 shows, traditional buffer manager implementations like BerkeleyDB or WiredTiger therefore only achieve a fraction of the TPC-C performance of an in-memory B-tree

However, as I know, WiredTiger doesn't use a canonical hash table to translate PageID to memory address. As instead, the WT_REF could refer to child page directly.

So, did this paper make a mistake, or just my own misunderstanding? Look forward for you apply.

Keith Bostic

unread,

Oct 23, 2019, 6:04:21 AM10/23/19

to wiredtiger-users

Based on a quick review, the paper is wrong.

ScoobyDoo

unread,

Oct 23, 2019, 10:33:40 PM10/23/19

to wiredtiger-users

Thank you for clarifying. I will try to discuss with the author of this paper about the controversy. WiredTiger deserves the truth.

在 2019年10月23日星期三 UTC+8下午6:04:21，Keith Bostic写道：

bai...@gmail.com

unread,

Oct 9, 2022, 3:53:19 AM10/9/22

to wiredtiger-users

I'm curious what the comparison result will be if compared with newest version(paper use wiredtiger 2.9). I'm also curious which techniques in the paper (including paper " Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines ,Sigmod 2020") can be used to make wiredtiger better.

Keith Smith

unread,

Oct 10, 2022, 1:22:00 PM10/10/22

to wiredtig...@googlegroups.com

Hi there,

Thanks for your interest in WiredTiger and related research literature.

On Sun, Oct 9, 2022 at 3:53 AM bai...@gmail.com <bai...@gmail.com> wrote:

I'm curious what the comparison result will be if compared with newest version(paper use wiredtiger 2.9).

To rephrase the question, the LeanStore paper emphasizes TPC-C performance. So you're basically asking if WiredTiger would show better TPC-C performance now, compared to 2.9.

It's hard to provide a quick guess. WiredTiger's core concurrent B-tree implementation is largely the same, but there have been a lot of other changes since 2.9 (which was released in December, 2016). There have been substantial improvements to logging, spilling old versions of records to disk, eviction, checkpointing, etc.

We do run TPC-C internally, but it is part of MongoDB's performance testing. So we don't have convenient data on how WiredTiger by itself performs in this workload.

I'm also curious which techniques in the paper (including paper " Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines ,Sigmod 2020") can be used to make wiredtiger better.

Thanks for the pointer. I hadn't seen this follow-up to Viktor Leis's earlier paper. I've only skimmed a bit of it, but I particularly noticed their discussion of "fuzzy checkpoints". WiredTiger has been going in this direction as well. In earlier versions of WiredTiger (such as 2.9), a checkpoint was a consistent snapshot of the database. Today, we allow data newer than the checkpoint snapshot to be included in the checkpoint. This means that eviction can (often) run concurrently with checkpointing, making our checkpoints less of a bottleneck and helping to smooth out disk write traffic.

At the moment, I have a lot of other papers on my reading list (I'm doing peer-review for a conference...). But I'll try to give this paper a read before long and follow up. In the meantime, maybe others on this list have (or will) read it and have thoughts.

Keith Smith

在2019年10月24日星期四 UTC+8 10:33:40<ScoobyDoo> 写道：
Thank you for clarifying. I will try to discuss with the author of this paper about the controversy. WiredTiger deserves the truth.

在 2019年10月23日星期三 UTC+8下午6:04:21，Keith Bostic写道：

On Wednesday, October 23, 2019 at 6:22:36 AM UTC+1, ScoobyDoo wrote:
When I was reading 《LeanStore: In-Memory Data Management Beyond Main Memory》, I found these words in the paper:

In the canonical buffer pool implementation [1], each page access requires a hash table lookup in order to translate a logical page identifier into an in-memory pointer
As Fig. 1 shows, traditional buffer manager implementations like BerkeleyDB or WiredTiger therefore only achieve a fraction of the TPC-C performance of an in-memory B-tree

However, as I know, WiredTiger doesn't use a canonical hash table to translate PageID to memory address. As instead, the WT_REF could refer to child page directly.

So, did this paper make a mistake, or just my own misunderstanding? Look forward for you apply.

Based on a quick review, the paper is wrong.

--
You received this message because you are subscribed to the Google Groups "wiredtiger-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wiredtiger-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wiredtiger-users/9b84093a-659b-4c06-a28a-7ea4c90c32fbn%40googlegroups.com.

Reply all

Reply to author

Forward