Range query by WiredTiger c++

32 views
Skip to first unread message

Sean

unread,
Nov 2, 2022, 7:42:52 AM11/2/22
to wiredtiger-users
Hi everyone,

May I know if WiredTiger supports point and range queries with O(log(n)) time complexity in C++ programs? If so, could anyone give me some suggested APIs and functions?

Thank you very much!

Best,
Sean.

Keith Smith

unread,
Nov 2, 2022, 11:07:34 AM11/2/22
to wiredtig...@googlegroups.com
Hi Sean,

Thanks for your question. 

Point queries in WiredTiger are O(log(n)), where n is the number of records in the table. You can perform a point query using the WT_CURSOR::search() or WT_CURSOR::search_near() methods.

For a range query, if you want to retrieve all of the values in the range, then it is an O(k+log(n)) operation, where n is the number of records in the tree and k is the number of records that fall in the range. You would do this by searching to the beginning (or end) of the range as above, and then iteratively calling WT_CURSOR::next() (or WT_CURSOR::prev()) to scan through the range. WiredTiger doesn't provide a special API to do this as it would have to do the same work described here.

The WiredTiger sources include a sample program (in C) that demonstrates the use of these cursor operations.

Note that in a complex system like WiredTiger, algorithmic complexity may not predict real performance, as different operations can have substantially different costs. For example, the cost of reading a page from storage is very large compared to accessing it in WiredTiger's cache. So an operation that has to read more pages from disk will typically take longer than one that reads fewer pages from disk, regardless of the algorithmic complexity of how WiredTiger uses the data on those pages.

Hope this helps!

Keith S

--
You received this message because you are subscribed to the Google Groups "wiredtiger-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wiredtiger-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wiredtiger-users/68c9c50a-af22-4c42-b59d-f7d986a63afen%40googlegroups.com.

Sean

unread,
Nov 8, 2022, 1:03:40 AM11/8/22
to wiredtiger-users
Hi Keith,

I implemented it as you said, and it works well when the database is small-scale. However, I increase the data scale and have the following error:

[1667887151:740323][27639:0x7f711050c740], WT_CURSOR.get_value: [WT_VERB_DEFAULT][ERROR]: __wt_cursor_kv_not_set, 244: requires value be set: Invalid argument

It would be great if I could have your suggestions. Thank you very much.

Best regards,
Sean.

Keith Smith

unread,
Nov 8, 2022, 10:54:58 AM11/8/22
to wiredtig...@googlegroups.com
Hi again Sean,

Part of the internal state of a cursor is a key and value. The key can be set either by your application calling WT_CURSOR::set_key(), or by performing an WiredTiger operation, such as WT_CURSOR::next() that sets or updates the key. 

The error you are seeing happens if you call WT_CURSOR::get_value() when the cursor's internal key is not set. For example, you can generate this error if you open a cursor and then immediately call WT_CURSOR::get_value(). Similarly, the WT_CURSOR::reset() and WT_CURSOR::reconfigure() methods clear the cursor's state, so if you call WT_CURSOR::get_value() after one of these calls, you will also get this error.

There are also some errors that will clear a cursor's key. A simple example is that if you call WT_CURSOR::set_key() with an invalid key, you will clear any previous key that was set on the cursor, leaving the key unset. It is hard to enumerate the full set of errors that can cause this, as there are a bunch of different cursor methods and some of them behave differently for different cursor types or on different types of tables.

So two likely reasons you might be seeing this error are:
  1. Calling WT_CURSOR::get_key() on a new or reset cursor before setting the cursor's key.
  2. Calling WT_CURSOR::get_key() after an error on a previous cursor operation.
Hope this helps!

Keith


Sean

unread,
Nov 11, 2022, 11:44:21 AM11/11/22
to wiredtiger-users
Hi Keith,

Thank you very much. I have realized the point and range query based on WiredTiger!

However, queries have high latency (e.g., 0.5 ms), and I wonder if you mind sharing some suggestions for performance improvement:

1. Is there a way to increase the cache size (page size) to reduce the latency?
2. Can multi-thread be enabled to increase the speed?

Other ideas are also welcome! My current program is a simple KV store implementation based on C++ and with the provided WiredTiger functions.

Thank you again!

Best,
-Sean.

Keith Smith

unread,
Nov 11, 2022, 4:44:39 PM11/11/22
to wiredtig...@googlegroups.com
Hi Sean,

I'm glad to hear your program is working better. To answer your questions:
     
 Is there a way to increase the cache size (page size) to reduce the latency?

WiredTiger has a large number of configuration options. You can set the cache size when you open a connection to the database using the config option to wiredtiger_open(). For example:

    wiredtiger_open(home, NULL, "cache_size=1GB", &my_connection);

The documentation for wiredtiger_open() includes information about this option and many others.

> Can multi-thread be enabled to increase the speed?

Yes, WiredTiger is designed to support multi-threaded operations. Multiple threads make calls into WiredTiger at the same time. Each thread needs a separate WT_SESSION and WT_CURSORs. There is some more information, and some sample code, online.

If the threads are performing one operation at a time, there isn't much more you need to do. If a thread needs to perform multiple operations without interference from other threads, you will probably want to use transactions. There is a lot of information about transactions in the documentation. You can start with that linked page and then read on through the following sections of the application guide as well as the other linked pages.

Good luck!

Keith

Reply all
Reply to author
Forward
0 new messages