WT scan

胡皓胜

unread,

Mar 4, 2022, 4:19:25 AM3/4/22

to wiredtiger-users

Hi everyone!

Recently, I tested WiredTiger in YCSB and found the scan performance is pretty good.

Since I didn't look at the specific code, I want to know whether the scan operation is optimized in WT.

Thanks.

alexande...@mongodb.com

unread,

Mar 6, 2022, 5:24:46 PM3/6/22

to wiredtiger-users

Hi,

Thanks for your interest in WiredTiger! Scan operations form a core part of WiredTiger, and we have worked hard to make them efficient. We have a number of optimizations in place, including lock-free reads which really are key to making scans efficient.

There might be more information in our documentation: http://source.wiredtiger.com/develop/overview.html

Let us know if you have more specific questions - we'll be happy to answer.

- Alex

胡皓胜

unread,

Mar 6, 2022, 8:45:56 PM3/6/22

to wiredtiger-users

Hi Alex,
Thanks for your answer.

Here is a specific question.

I used YCSB Workload E to test WiredTiger while limiting the size of cache.
I found that increasing the maximum query length of the range query does not degrade the performance of WT.
I really want to know why.

alexande...@mongodb.com

unread,

Mar 8, 2022, 11:07:41 PM3/8/22

to wiredtiger-users

Hi,

I am not familiar with your particular workload/setup, but my experience with YCSB is that conditions other than the WiredTiger query speed often outweigh the cost of the read operations. i.e: there is another bottleneck (maybe in the driver application, or maybe in the benchmark, or maybe somewhere else in WiredTiger) that is limiting throughput.

Keith Smith

unread,

Mar 9, 2022, 10:53:07 AM3/9/22

to wiredtig...@googlegroups.com

Hi there,

I thought I would add a few thoughts in addition to Alex.

The standard YCSB E workload is 5% insert operations and 95% range scans. Typically the range scans are for up to 100 records, but it sounds like you are experimenting with larger ranges. The record size is 1KB.

While it's great to hear that WiredTiger is showing good performance in your tests, it's possible that the performance you are seeing reflects other bottlenecks or optimizations in the system. Two items that come to mind are both cache related.

First, YSCB uses a Zipfian request distribution. This means the requests are heavily skewed towards a small number of records. So even with a constrained cache, your workload may have a good cache hit rate.

Second, even though you are constraining WiredTiger's cache, the test files probably fit in the operating system cache. So the performance penalty when you miss in the WiredTiger cache is will be small. In the extreme, if the WiredTiger cache is too small to help, you may essentially be measuring the performance of reading everything from the OS cache.

It's hard to provide a definitive answer to why a specific workload performs in a certain way because it depends on the workload, the system, configuration parameters etc.

Keith S

--
You received this message because you are subscribed to the Google Groups "wiredtiger-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wiredtiger-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wiredtiger-users/1cfdc923-f392-4e2e-8952-800008e8122an%40googlegroups.com.

胡皓胜

unread,

Mar 9, 2022, 9:45:34 PM3/9/22

to wiredtiger-users

Hi everyone,

Thank you for your answers.
I originally wanted to get a possible explanation from WiredTiger itself. For example, in my opinion, each leaf node of wiredtiger has a separate buffer (I call it), which is conducive to obtaining a wider range of key value pairs when accessing a leaf node.
But I know, this is really not only related to the system, but also related to many other factors.