Yes, every non-leaf (and leaf) node has a count of keys in the subtree
where that node is the root.
> I don't see how erase(a,b) can be O(log N), is that a typo?
No, it's not.
In order to explain this, I have to explain split() first:
split(Tree&& tree, key a, key b) splits a tree to two trees, say tree_l and
tree_r, so that
tree_l contains all keys less than a, tree_r contains all keys greater than
b.
By B-Tree's nature, this is O(log N).
erase(a, b) works like this:
1. compute lb = lower_bound(a), ub = upper_bound(b) pair. These are
iterators. O(log N)
2. compute order(lb), order(ub). Therefore, order(ub) - order(lb) is the
number of elements that will be erased. O(log N)
3-1. When order(ub) - order(lb) < 30, the tree erases element one by one
3-2. When it exceeds 30, the algorithm first splits the tree to two trees
by calling spit(std::move(*this), lb, ub)
and joins the two split trees. split() and join() is O(log N), and there is
one split() call and join() call,
so the whole complexity is O(log N)
I plan to benchmark mine with abseil's B-Tree soon.
Thank you for pointing out Beman's implementation and sqlite, I'll list
them too.
I'll add Boost license too, thanks for that
Best,
Jooseong
2022년 7월 25일 (월) 오후 11:29, Phil Endecott via Boost <bo...@lists.boost.org>님이
작성:
>I would be interested to see a comparison with Beman Dawes'
>B-Tree library from 2010:
https://github.com/Beman/btree seems to depend on an older version of
Boost.Endian,
It doesn't compile with the recent versions of boost. I can't find a proper
version of Boost.Endian which is compatible with.
To compare with Google's 2011 B-Tree implementation, (
https://code.google.com/archive/p/cpp-btree/)
I saw the following results:
https://github.com/frozenca/BTree/tree/295a9ec4b5495a83c546fd3e300d54d4c78d8ad8#Performance
Regards,
Jooseong
2022년 7월 26일 (화) 오전 12:03, Jooseong Lee <froze...@gmail.com>님이 작성:
Actually, the keys are stored contiguously:
For trivially copyable types, the type of key array is std::array;
for other types, the type of key array is std::vector. (in this case, your
argument would be right)
What is not stored contiguously is child, this is stored in
std::vector<std::unique_ptr>.
But when N keys are erased, the number of children erased is not O(N),
because a node can have #(fanout) keys.
Best,
Jooseong
2022년 7월 26일 (화) 오후 11:40, Phil Endecott via Boost <bo...@lists.boost.org>님이
작성:
> Jooseong Lee wrote:
I have to correct this sentence, the number of children erased is O(N).
But the coefficient is small, because practically B-Trees use large fanouts.
Most technically, a call for erase_range(a, b) would be O(log N) + O(K)
where N is the number of whole keys and K is the number of keys being
erased.
I've updated the repo description, thanks for the catch.
Best,
Jooseong
2022년 7월 27일 (수) 오전 12:06, Jooseong Lee <froze...@gmail.com>님이 작성:
Compared with boost::unordered_map and boost::container::flat_set,
I repeated inserting/retrieving/erasing 1,000,000 random std::int64_t in
random order,
and see the following results in my machine: (gcc 11.2 -O3)
(repeated 200 times for each target for B-Tree and unordered_set, repeated
10 times for flat_set)
My B-Tree: insert 196.22 ms, lookup 205.64 ms, erase 233.24 ms
boost::container::flat_set : insert 58402.9 ms, lookup 217.73 ms, erase
59075.2 ms
boost::unordered_set : insert 189.45 ms, lookup 232.19 ms, erase 268.01 ms
Regards,
Jooseong
2022년 7월 29일 (금) 오후 10:35, Rainer Deyke via Boost <bo...@lists.boost.org>님이
작성:
Repo link: https://github.com/frozenca/BTree
Best, Jooseong
2022년 7월 30일 (토) 오후 2:00, Rainer Deyke via Boost <bo...@lists.boost.org>님이
작성:
> On 30.07.22 06:38, Jooseong Lee via Boost wrote:
> Best Regards,
> Jooseong Lee
Hello, can I ask if Boost is interested in my library and if I can request
a review?
For performance, I saw the following results:
Inserting/retrieving/erasing 1,000,000 random std::int64_t in random order
in gcc 11.2 -O3, averaged 20~200 times:
(https://github.com/frozenca/BTree#performance)
frozenca::BTreeSet : insert 172ms, lookup 199ms, erase 208ms
Google btree::btree_set : insert 248ms, lookup 229ms, erase 243ms
std::set : insert 912ms, lookup 943ms, erase 1002ms
boost::container::flat_set : insert 58527ms, lookup 208ms, erase 59134ms
boost::unordered_set : insert 194ms, lookup 226ms, erase 272ms
If you're interested, I'd appreciate it if you could take a look.
Jooseong
2022년 7월 30일 (토) 오후 2:00, Rainer Deyke via Boost <bo...@lists.boost.org>님이
작성:
> On 30.07.22 06:38, Jooseong Lee via Boost wrote:
> And the review process itself is described at:
> https://www.boost.org/community/reviews.html
Thank you! I wasn't aware of these requirements exactly.
I'll post again after I have worked on this to be fully compatible
with the Boost library conditions and fully backward compatible with at
least C++17.
Jooseong
2022년 8월 11일 (목) 오후 11:19, Glen Fernandes <glen.fe...@gmail.com>님이 작성: