"Alf P. Steinbach" <
alf.p.stein...@gmail.com> writes:
> This tutorial, if it works (it's an experiment), is intended to work
> this way:
>
> * I post some working code.
> * Learner(s) study it and ask about things.
> * Others answer questions and post critique or get bogged down in long
> sub-threads about sausages and swearing.
>
> The following code augments the basic sorted binary tree code in part
> 1 with a difference of subtree heights in each node, which will be
> needed for balancing, and debug/exploration output of the tree
> structure, ditto -- since we're getting closer to some awesome
> complexity.
Great!
I have a generic suggestion before we get to the balancing part.
Usually trees like this are used to implement sets, not
multi-sets. The allowing of multiple keys with the same value is
really just a distraction here. I suggest simply dropping
repeated key values on the floor, or if you want to include them
put a count in each node and increment the count when you find a
duplicate.
> When tree balancing is added (in part 3?) the subtree height
> differences will naturally reduce to the range -1 ... +1, since as
> soon as a difference of magnitude 2 occurs it will be balanced
> away. And then there can possibly be some good optimizations due to
> that. I have yet to read up on balancing in e.g. Wikipedia, so I only
> have vague recollections of ?red black? trees etc., but hopefully
> there's nothing more to it than optimizing the representation,
> updating and use of the differences for a guaranteed tiny three-value
> range.
I have some comments below just on the balancing part, and will
elide (without markings) some of the rest of the code.
(Hopefully I haven't cut out anything essential.)
> [code]
> #include <stack> // std::stack
> #include <stdlib.h> // abs(int)
> #include <string> // std::(string, to_string)
> #include <utility> // std::move
>
> namespace my {
> using cppx::popped_top_of;
> using std::stack;
>
> using Value = double;
>
> class Tree
> : public cppx::No_copy_or_move
> {
> private:
> struct Node
> {
> Node* left;
> Node* right;
> Value value;
> int height_diff; // right subtree height -
> // left subtree height
> };
>
> Node* root_ = nullptr;
>
> // Called after a new node, the `p_new_child`, has been inserted:
> void update_height_differences(
> Node* const p_new_child,
> stack<Node*> parent_nodes
> )
> {
> Node* p_child = p_new_child;
> while( not parent_nodes.empty() )
> {
> Node* const p_parent = popped_top_of( parent_nodes );
> int& height_diff = p_parent->height_diff;
> int const old_height_diff = height_diff;
>
> height_diff += (p_parent->right == p_child? +1 : -1);
>
> bool const this_height_increased =
> (abs( height_diff ) > abs( old_height_diff ));
> if( not this_height_increased )
> {
> break;
> }
> p_child = p_parent;
> }
> }
>
> public:
> void add( Value const& value )
> {
> stack<Node*> parent_nodes;
> Node** p_ptr = &root_;
> while( *p_ptr != nullptr )
> {
> Node*& ref_ptr = *p_ptr;
> parent_nodes.push( ref_ptr );
> p_ptr = &(value < ref_ptr->value? ref_ptr->left : ref_ptr->right);
> }
> *p_ptr = new Node{ nullptr, nullptr, value, 0 };
> update_height_differences( *p_ptr, move( parent_nodes ) );
> }
>
> };
> } // my
This way of maintaining the balance factors uses more machinery
than it needs to.
A key idea in AVL tree update is that no stack is needed to
update the height differences. The same property holds when
no rebalancing is being done. Here is a sketch for how to
do that (ie, in the case here where there is no rebalance).
While we are walking the tree to find the insertion point, keep
track of the last node on the path where an insertion lower down
will /not/ affect the height balances of any nodes above it. A
node on the path has this property if the branch being taken is
shorter than the other branch (ie, if height_diff is greater than
zero but the insertion goes down the left side, or if height_diff
is less than zero and the insertion goes down the right side).
Remembering the last such node means all nodes further down will
have their heights increase (which means height_diff will get
further away from zero).
After we have done the insertion (which we need to be sure
actually occurs, ie, that there wasn't a duplicate key), we start
at the remembered node and walk the insertion path again,
adjusting height_diff as appropriate at each node. That is, if
the path goes down the left side, height_diff goes down by one,
or if the path goes down the right side, height_diff goes up by
one. Do that for all nodes starting at the remembered node and
stopping at the freshly added node (which of course needs
height_diff set to zero). Doing that correctly updates the
height_diff's for all nodes that need it.
When it comes time to do rebalancing, the above algorithm needs
to be adjusted slightly to take that into account, but for now
this simpler version can do the job.
Is this description clear enough, or should I try to write
some pseudo-code?