Thread scaling WiredTiger engine

35 views
Skip to first unread message

Uttam Pawar

unread,
Nov 7, 2022, 9:52:21 PM11/7/22
to wiredtiger-users
Hi,
I'm new to WiredTiger storage engine project, very much interested in understanding the performance with threads and core scaling. While browsing the WiredTiger github pages I came across following "Read" scalability document,
This document is simple to follow but couldn't really reproduce the results since I can't seem to find db_bench_wiredtiger binary after the build of latest WiredTiger engine.
Can someone point me to correct process or steps which can help to do these experiments? I would like to contribute to this and then to overall mongodb server project wrt. scalability with many core systems on the horizon.

I appreciate any  help.

Thanks 
Uttam

Uttam Pawar

unread,
Nov 7, 2022, 10:04:17 PM11/7/22
to wiredtiger-users
I did run wtperf tool with ycsb-c.wtperf config with following content, 
$ cat ycsb-c.wtperf
conn_config="cache_size=40G,log=(enabled=true)"
pareto=20
table_config="type=file"
key_sz=100
value_sz=1024
icount=120000000
run_time=3600
threads=((count=20,reads=1))
warmup=120
sample_interval=5
populate_threads=8
report_interval=5

$ time ../../../build/bench/wtperf/wtperf -O ycsb-c.wtperf

The total time I believe includes loading of the database (or files in WT_TEST directory) and the execution time.  It also doesn't display any "ops/sec" metric (can be calculated after the run by icount/total time). This test is not showing increased core uses as I increased the run threads from 20 "threads=((count=20,reads=1) to 40, 80, etc. "htop" command shows only one core busy.
Am I missing something or anything? How can I separate load and read execution times?

Donald Anderson

unread,
Nov 8, 2022, 9:08:29 AM11/8/22
to wiredtiger-users
Hello Uttam,

Thank you for your interest in WiredTiger.  After you run wtperf, you'll notice a WT_TEST directory that contains the WiredTiger data files. wtperf puts a test.stat file in that directory that contains a report every five seconds (configurable via report_interval) during the execution.  It separates out the load phase from the rest of the run.

- Don

Uttam Pawar

unread,
Nov 8, 2022, 1:29:32 PM11/8/22
to wiredtiger-users
Don,
I appreciate your response.

Thanks.

alexande...@mongodb.com

unread,
Nov 13, 2022, 10:02:29 PM11/13/22
to wiredtiger-users
Regards your other questions:

> The total time I believe includes loading of the database (or files in WT_TEST directory) and the execution time.

I believe Don already answered that, but yes, the simple output shows how long the entire benchmark ran from start to finish. When I run the wtperf I pass in -o verbose=2 on the command line and it outputs some more information about what is happening as the run proceeds.

> This test is not showing increased core uses as I increased the run threads from 20 "threads=((count=20,reads=1) to 40, 80, etc. "htop" command shows only one core busy.

I suspect you are monitoring the test during the populate phase, which does happen single threaded. Note that the populate inserts 120 million records - so could take a while depending on your hardware. Generally single threaded populate is more efficient than multi-threaded populate, since there are some optimizations in WiredTiger related to single threaded bulk loading.

I ran wtperf against the most recent version of WiredTiger using the configuration file you referenced and see it utilizing all available CPU (I have a machine with 8 CPUs, and top reports 799% CPU utilized by wtperf). That is with the default 20 thread reader configuration - I didn't increase that, since CPU is already saturated.

Screen Shot 2022-11-14 at 1.49.33 pm.png

Uttam Pawar

unread,
Nov 14, 2022, 12:23:59 PM11/14/22
to wiredtiger-users
Alex,
I appreciate the response. Now I've better understanding of the wtperf and it's output. Thanks.

Uttam
Reply all
Reply to author
Forward
0 new messages