Tutorial on threaded binary tree part 2: BST with subtree height differences

Alf P. Steinbach

unread,

Dec 11, 2016, 2:17:02 AM12/11/16

to

This tutorial, if it works (it's an experiment), is intended to work
this way:

* I post some working code.
* Learner(s) study it and ask about things.
* Others answer questions and post critique or get bogged down in long
sub-threads about sausages and swearing.

The following code augments the basic sorted binary tree code in part 1
with a difference of subtree heights in each node, which will be needed
for balancing, and debug/exploration output of the tree structure, ditto
-- since we're getting closer to some awesome complexity.

When tree balancing is added (in part 3?) the subtree height differences
will naturally reduce to the range -1 ... +1, since as soon as a
difference of magnitude 2 occurs it will be balanced away. And then
there can possibly be some good optimizations due to that. I have yet to
read up on balancing in e.g. Wikipedia, so I only have vague
recollections of “red black” trees etc., but hopefully there's nothing
more to it than optimizing the representation, updating and use of the
differences for a guaranteed tiny three-value range.

[code]
#include <stack> // std::stack
#include <stdlib.h> // abs(int)
#include <string> // std::(string, to_string)
#include <utility> // std::move

namespace cppx {
using std::stack;
using std::string;
using std::to_string;

struct No_copy_or_move
{
auto operator=( No_copy_or_move const& ) -> No_copy_or_move& =
delete;
auto operator=( No_copy_or_move&& ) -> No_copy_or_move& = delete;

No_copy_or_move() = default;
No_copy_or_move( No_copy_or_move const& ) = delete;
No_copy_or_move( No_copy_or_move&& ) = delete;
};

template< class Item >
inline auto popped_top_of( stack<Item>& st )
-> Item
{
Item result = move( st.top() );
st.pop();
return result;
}

inline auto repeated( int const n, string const& s )
-> string
{
string result;
for( int i = 1; i <= n; ++i ) { result += s; }
return result;
}

inline auto string_from( string const& s )
-> string const&
{ return s; }

inline auto string_from( char const* const s )
-> string
{ return s; }

template< class Type >
inline auto string_from( Type const& o )
-> string
{ return to_string( o ); }

template< class Type >
inline auto operator<<( string& s, Type const& o )
-> string&
{ return s.append( string_from( o ) ); }
} // namespace cppx

namespace my {
using cppx::popped_top_of;
using cppx::repeated;
using cppx::operator<<;
using std::stack;
using std::string;
using std::move;

using Value = double;

class Tree
: public cppx::No_copy_or_move
{
private:
struct Node
{
Node* left;
Node* right;
Value value;
int height_diff; // right subtree height - left
subtree height
};

Node* root_ = nullptr;

template< class Func >
static void apply_in_infix_order( Node* root, Func const& f )
{
if( root != nullptr )
{
apply_in_infix_order( root->left, f );
f( root );
apply_in_infix_order( root->right, f );
}
}

// Called after a new node, the `p_new_child`, has been inserted:
void update_height_differences(
Node* const p_new_child,
stack<Node*> parent_nodes
)
{
Node* p_child = p_new_child;
while( not parent_nodes.empty() )
{
Node* const p_parent = popped_top_of(
parent_nodes );
int& height_diff = p_parent->height_diff;
int const old_height_diff = height_diff;

height_diff += (p_parent->right == p_child? +1 : -1);

bool const this_height_increased =
(abs( height_diff ) > abs( old_height_diff ));
if( not this_height_increased )
{
break;
}
p_child = p_parent;
}
}

// Structure inspection for debugging & exploration:
static void add_internal_structure_to(
string& s,
Node* const root,
int const level
)
{
string const indentation = repeated( level, "| " );
if( root == nullptr )
{
s << indentation << "()\n";
}
else
{
add_internal_structure_to( s, root->left, level + 1 );
s << indentation
<< root->value << " [height diff " <<
root->height_diff << "]"
<< "\n";
add_internal_structure_to( s, root->right, level + 1 );
}
}

public:
void add( Value const& value )
{
stack<Node*> parent_nodes;
Node** p_ptr = &root_;
while( *p_ptr != nullptr )
{
Node*& ref_ptr = *p_ptr;
parent_nodes.push( ref_ptr );
p_ptr = &(value < ref_ptr->value? ref_ptr->left :
ref_ptr->right);
}
*p_ptr = new Node{ nullptr, nullptr, value, 0 };
update_height_differences( *p_ptr, move( parent_nodes ) );
}

template< class Func >
void for_each( Func const& f )
{
apply_in_infix_order( root_, [&f]( Node* p ){ f( p->value
); } );
}

auto internal_structure() const
-> string
{
string result;
add_internal_structure_to( result, root_, 0 );
return result;
}

Tree() = default;
};
} // my

#include <iostream>
using namespace std;
auto main()
-> int
{
my::Tree t;
for( int const v : {3, 1, 4, 1, 5, 9, 2, 6, 5, 4} )
{
t.add( v );
}

t.for_each(
[]( double x ) { cout << x << ' '; }
);
cout << endl;

cout << endl;
cout << t.internal_structure() << endl;
}
[/code]

---------------------------------------------------------------------------
[output]
1 1 2 3 4 4 5 5 6 9

| | ()
| 1.000000 [height diff 2]
| | | ()
| | 1.000000 [height diff 1]
| | | | ()
| | | 2.000000 [height diff 0]
| | | | ()
3.000000 [height diff 2]
| | ()
| 4.000000 [height diff 4]
| | | | ()
| | | 4.000000 [height diff 0]
| | | | ()
| | 5.000000 [height diff 2]
| | | | | | ()
| | | | | 5.000000 [height diff 0]
| | | | | | ()
| | | | 6.000000 [height diff -1]
| | | | | ()
| | | 9.000000 [height diff -2]
| | | | ()
[/output]

Cheers, & enjoy,

- Alf

Mr Flibble

unread,

Dec 11, 2016, 12:44:45 PM12/11/16

to

Why are you using a std::stack rather than the much quicker machine
stack (i.e. why aren't you using recursion to update heights)?

/Flibble

Gareth Owen

unread,

Dec 11, 2016, 1:10:14 PM12/11/16

to

Mr Flibble <flibbleREM...@i42.co.uk> writes:

>
> Why are you using a std::stack rather than the much quicker machine
> stack (i.e. why aren't you using recursion to update heights)?

Definitely worth quoting 250 lines of code for that.

Mr Flibble

unread,

Dec 11, 2016, 1:23:29 PM12/11/16

to

Bandwidth isn't an issue these days and if your Usenet reader is any
good it will allow you to expand/collapse quoted text so in a word: meh.

/Flibble

Alf P. Steinbach

unread,

Dec 11, 2016, 4:08:54 PM12/11/16

to

On 11.12.2016 18:44, Mr Flibble wrote:
> On 11/12/2016 07:13, Alf P. Steinbach wrote:

>> [snip]

>> // Called after a new node, the `p_new_child`, has been inserted:
>> void update_height_differences(
>> Node* const p_new_child,
>> stack<Node*> parent_nodes
>> )
>> {
>> Node* p_child = p_new_child;
>> while( not parent_nodes.empty() )
>> {
>> Node* const p_parent = popped_top_of(
>> parent_nodes );
>> int& height_diff = p_parent->height_diff;
>> int const old_height_diff = height_diff;
>>
>> height_diff += (p_parent->right == p_child? +1 : -1);
>>
>> bool const this_height_increased =
>> (abs( height_diff ) > abs( old_height_diff ));
>> if( not this_height_increased )
>> {
>> break;
>> }
>> p_child = p_parent;
>> }
>> }
>>

>> [snip]

>> public:
>> void add( Value const& value )
>> {
>> stack<Node*> parent_nodes;
>> Node** p_ptr = &root_;
>> while( *p_ptr != nullptr )
>> {
>> Node*& ref_ptr = *p_ptr;
>> parent_nodes.push( ref_ptr );
>> p_ptr = &(value < ref_ptr->value? ref_ptr->left :
>> ref_ptr->right);
>> }
>> *p_ptr = new Node{ nullptr, nullptr, value, 0 };
>> update_height_differences( *p_ptr, move( parent_nodes ) );
>> }

>> [snip]

>
> Why are you using a std::stack rather than the much quicker machine
> stack (i.e. why aren't you using recursion to update heights)?

Three main reasons why I chose iterative:

* The version 1 add() was iterative. I've just added a few statements to
the original code there, to remember the parent node chain.

* I felt stupid/slow, at the end of the day for me in the morning.
Didn't feel up to rewriting the above as recursive. One would have to do
the updating on the unwind part, and since there's no picture of that
popping up in my mind even now, I guess iterative is simpler, yes?

* This way allowed a nice separation of add() and
update_height_differences(), I'm not sure how to do that separation, or
some other suitable abstraction, in a recursive version?

Also, will the machine stack (pushed arguments) necessarily be faster
than a std::stack allocated with suitable small sufficient capacity
(height of tree, = log2 N)? After all there is the call overhead.

And another possibility is to use a linked list of nodes allocated from
an small array with just free-list, which should be super-fast?

Well, maybe I as tutorial producer shouldn't be asking questions, but
hey, I'm here to learn too. :)

Cheers!,

- Alf

Melzzzzz

unread,

Dec 11, 2016, 7:48:25 PM12/11/16

to

On Sun, 11 Dec 2016 08:13:39 +0100
"Alf P. Steinbach" <alf.p.stein...@gmail.com> wrote:

> struct Node
> {
> Node* left;
> Node* right;
> Value value;
> int height_diff; // right subtree height - left
> subtree height
> };
>

How would you implement iterator without parent pointer? Why height
diff instead of just height as in AVL tree?

--
press any key to continue or any other to quit...

Jerry Stuckle

unread,

Dec 11, 2016, 9:21:08 PM12/11/16

to

On 12/11/2016 7:48 PM, Melzzzzz wrote:
> On Sun, 11 Dec 2016 08:13:39 +0100
> "Alf P. Steinbach" <alf.p.stein...@gmail.com> wrote:
>
>> struct Node
>> {
>> Node* left;
>> Node* right;
>> Value value;
>> int height_diff; // right subtree height - left
>> subtree height
>> };
>>
> How would you implement iterator without parent pointer? Why height
> diff instead of just height as in AVL tree?
>

IMHO the best way would be to make the iterator recursive (which is
actually quite easy and doesn't take a lot of stack space).
Alternatively, you could follow down from the beginning of the tree
until you get to the current one, remembering the previous tree's
location. But that can be very time consuming if the tree has a large
number of nodes.

The easiest way is to have a pointer to the parent. It's a bit more
work to maintain the extra pointer, but not all that bad.

--
==================
Remove the "x" from my email address
Jerry Stuckle
jstu...@attglobal.net
==================

Tim Rentsch

unread,

Dec 13, 2016, 6:33:35 PM12/13/16

to

"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:

> This tutorial, if it works (it's an experiment), is intended to work
> this way:
>
> * I post some working code.
> * Learner(s) study it and ask about things.
> * Others answer questions and post critique or get bogged down in long
> sub-threads about sausages and swearing.
>
> The following code augments the basic sorted binary tree code in part
> 1 with a difference of subtree heights in each node, which will be
> needed for balancing, and debug/exploration output of the tree
> structure, ditto -- since we're getting closer to some awesome
> complexity.

Great!

I have a generic suggestion before we get to the balancing part.
Usually trees like this are used to implement sets, not
multi-sets. The allowing of multiple keys with the same value is
really just a distraction here. I suggest simply dropping
repeated key values on the floor, or if you want to include them
put a count in each node and increment the count when you find a
duplicate.

> When tree balancing is added (in part 3?) the subtree height
> differences will naturally reduce to the range -1 ... +1, since as
> soon as a difference of magnitude 2 occurs it will be balanced
> away. And then there can possibly be some good optimizations due to
> that. I have yet to read up on balancing in e.g. Wikipedia, so I only

> have vague recollections of ?red black? trees etc., but hopefully

> there's nothing more to it than optimizing the representation,
> updating and use of the differences for a guaranteed tiny three-value
> range.

I have some comments below just on the balancing part, and will
elide (without markings) some of the rest of the code.
(Hopefully I haven't cut out anything essential.)

> [code]
> #include <stack> // std::stack
> #include <stdlib.h> // abs(int)
> #include <string> // std::(string, to_string)
> #include <utility> // std::move
>

> namespace my {
> using cppx::popped_top_of;

> using std::stack;

>
> using Value = double;
>
> class Tree
> : public cppx::No_copy_or_move
> {
> private:
> struct Node
> {
> Node* left;
> Node* right;
> Value value;
> int height_diff; // right subtree height -

> // left subtree height
> };
>
> Node* root_ = nullptr;
>

> // Called after a new node, the `p_new_child`, has been inserted:
> void update_height_differences(
> Node* const p_new_child,
> stack<Node*> parent_nodes
> )
> {
> Node* p_child = p_new_child;
> while( not parent_nodes.empty() )
> {
> Node* const p_parent = popped_top_of( parent_nodes );
> int& height_diff = p_parent->height_diff;
> int const old_height_diff = height_diff;
>
> height_diff += (p_parent->right == p_child? +1 : -1);
>
> bool const this_height_increased =
> (abs( height_diff ) > abs( old_height_diff ));
> if( not this_height_increased )
> {
> break;
> }
> p_child = p_parent;
> }
> }
>

> public:
> void add( Value const& value )
> {
> stack<Node*> parent_nodes;
> Node** p_ptr = &root_;
> while( *p_ptr != nullptr )
> {
> Node*& ref_ptr = *p_ptr;
> parent_nodes.push( ref_ptr );
> p_ptr = &(value < ref_ptr->value? ref_ptr->left : ref_ptr->right);
> }
> *p_ptr = new Node{ nullptr, nullptr, value, 0 };
> update_height_differences( *p_ptr, move( parent_nodes ) );
> }
>

> };
> } // my

This way of maintaining the balance factors uses more machinery
than it needs to.

A key idea in AVL tree update is that no stack is needed to
update the height differences. The same property holds when
no rebalancing is being done. Here is a sketch for how to
do that (ie, in the case here where there is no rebalance).

While we are walking the tree to find the insertion point, keep
track of the last node on the path where an insertion lower down
will /not/ affect the height balances of any nodes above it. A
node on the path has this property if the branch being taken is
shorter than the other branch (ie, if height_diff is greater than
zero but the insertion goes down the left side, or if height_diff
is less than zero and the insertion goes down the right side).
Remembering the last such node means all nodes further down will
have their heights increase (which means height_diff will get
further away from zero).

After we have done the insertion (which we need to be sure
actually occurs, ie, that there wasn't a duplicate key), we start
at the remembered node and walk the insertion path again,
adjusting height_diff as appropriate at each node. That is, if
the path goes down the left side, height_diff goes down by one,
or if the path goes down the right side, height_diff goes up by
one. Do that for all nodes starting at the remembered node and
stopping at the freshly added node (which of course needs
height_diff set to zero). Doing that correctly updates the
height_diff's for all nodes that need it.

When it comes time to do rebalancing, the above algorithm needs
to be adjusted slightly to take that into account, but for now
this simpler version can do the job.

Is this description clear enough, or should I try to write
some pseudo-code?

Tim Rentsch

unread,

Dec 13, 2016, 6:36:29 PM12/13/16

to

Melzzzzz <m...@zzzzz.com> writes:

> On Sun, 11 Dec 2016 08:13:39 +0100
> "Alf P. Steinbach" <alf.p.stein...@gmail.com> wrote:
>
>> struct Node
>> {
>> Node* left;
>> Node* right;
>> Value value;
>> int height_diff; // right subtree height - left
>> subtree height
>> };
>
> How would you implement iterator without parent pointer?

That happens when we get to the "threaded" part.

A. Bolmarcich

unread,

Dec 14, 2016, 5:11:02 PM12/14/16

to

On 2016-12-13, Tim Rentsch <t...@alumni.caltech.edu> wrote:
<snip, including code, from another, that uses an explicit std::stack>

> A key idea in AVL tree update is that no stack is needed to
> update the height differences. The same property holds when
> no rebalancing is being done. Here is a sketch for how to
> do that (ie, in the case here where there is no rebalance).

<snip>

There is something needed when rebalancing an AVL tree to have changes
in subtree heights possibily cascade up to the root node of the tree.
The snipped code used an explicit std::stack. With implementations
that use recursion, there is an implicit stack due to the recursive
function calls. In implementation where each tree node has a pointer
to its parent node, recursion is not necessary, and the the pointers
to parent nodes can be explicitly used.

Tim Rentsch

unread,

Dec 14, 2016, 7:59:48 PM12/14/16

to

Even in cases where height changes propagate all the way to the
root, it is a remarkable result for AVL trees than no stack is
needed, nor are parent pointers or recursion. Inserting into an
AVL tree (with two bits per node of balance information) can be
done with only O(1) space. Along the path from the root to the
insertion point, the node farthest from the insertion point such
that all subsequent nodes along the path are balanced is the only
node where rebalancing might be needed. Because that node, which
might be called "the tipping point", can be identified during the
initial walk (and remembered for later) from the root to the
insertion point, height adjustments can be done after the new
node is inserted, by starting at the tipping point and re-walking
the child links to the newly inserted node. It is after the
downstream heights are adjusted that a rebalance may need to take
place, but only at one point in the tree, ie, at "the tipping
point" node.

Alf P. Steinbach

unread,

Dec 15, 2016, 12:29:01 AM12/15/16

to

Nice! The presented code stops propagating at the point one should
remember with this new scheme. But I just didn't think of that
optimizaton: it was more natural to me to just add a couple of lines of
tracking code.

> When it comes time to do rebalancing, the above algorithm needs
> to be adjusted slightly to take that into account, but for now
> this simpler version can do the job.

Uh huh. :)

> Is this description clear enough, or should I try to write
> some pseudo-code?

Sounds clear enough.

I guess it's time to read up on AVL and so forth.

Oh, “Adelson-Velsky Landis”, the inventors. I thought it would surely be
a silly acronym. The Wikipedia articles mentions `split` and `join`
operations, and I guess it's important to cover those.

Cheers, & thanks!,

- Alf

A. Bolmarcich

unread,

Dec 15, 2016, 2:44:20 PM12/15/16

to

On 2016-12-15, Tim Rentsch <t...@alumni.caltech.edu> wrote:
> Even in cases where height changes propagate all the way to the
> root, it is a remarkable result for AVL trees than no stack is
> needed, nor are parent pointers or recursion. Inserting into an
> AVL tree (with two bits per node of balance information) can be
> done with only O(1) space. Along the path from the root to the
> insertion point, the node farthest from the insertion point such
> that all subsequent nodes along the path are balanced is the only
> node where rebalancing might be needed. Because that node, which
> might be called "the tipping point", can be identified during the
> initial walk (and remembered for later) from the root to the
> insertion point, height adjustments can be done after the new
> node is inserted, by starting at the tipping point and re-walking
> the child links to the newly inserted node. It is after the
> downstream heights are adjusted that a rebalance may need to take
> place, but only at one point in the tree, ie, at "the tipping
> point" node.

That approach is not what I have seen as the usual described of an
AVL tree. The resulting tree may have nodes in different places
that they would be in what is usually described as an AVL tree.

Tim Rentsch

unread,

Dec 15, 2016, 4:13:43 PM12/15/16

to

I don't know what you think of as the usual description of
AVL trees, so I can't evaluate whether what you're saying
is right or not. My comments are based on the description
of AVL trees in Knuth TAOCP volume 3, Sorting and Searching.
Can you provide a reference for AVL trees that describes the
insertion process in enough detail so I can compare them?

Tim Rentsch

unread,

Dec 15, 2016, 4:40:19 PM12/15/16

to

"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:

Yes, well, it isn't like I came up with the idea to start with,
it occurred to me only after working through the section on AVL
trees in Knuth.

>> When it comes time to do rebalancing, the above algorithm needs
>> to be adjusted slightly to take that into account, but for now
>> this simpler version can do the job.
>
> Uh huh. :)

As long as we're here let me describe the algorithm for finding
"the tipping point", as I call it, for AVL tree insertion, which
is pretty close to the above.

For AVL tree insertion, we do like the above and walk the path
starting at the root and down to the insertion point. In this
case though the "tipping point" node is the node along that path
that is farthest from the insertion point where all subsequent
nodes on the path are balanced. This means, fairly obviously,
that the tipping point node either is itself not balanced, or is
the node at the very top of the tree. (It may of course be
both.) In addition, besides remembering the (address of) the
tipping point node, we also need to remember the address of the
pointer that got us to the tipping point node. This second
address will be either: one of the child pointers in the node
just above the tipping point node (if the tipping point node is
not the top of the tree); or, in the case that the tipping point
node is at the top of the tree, whatever root pointer got us to
the top of the tree in the first place.

The reason we need to remember the address of that pointer is
that, if the tree needs rebalancing, that pointer needs to be
updated to point to a different node.

For now I won't say any more about updating heights or how to do
rebalancing for the AVL case, but I thought it would be helpful
to explain about finding the tipping point node in that
situation.

>> Is this description clear enough, or should I try to write
>> some pseudo-code?
>
> Sounds clear enough.
>
> I guess it's time to read up on AVL and so forth.

Before you do that you might try implementing the scheme I
sketched above, for no-rebalancing trees. That will get your
feet wet, and I think is a good introduction to the full-on
AVL insertion.

> The Wikipedia articles mentions `split` and `join`
> operations, and I guess it's important to cover those.

Before you get to split and join, you will want to do delete,
which is challenging enough. I think all the explanations of
balanced trees (of various kinds) that I've read gloss over how
to do deletion, and having implemented it now a few times I know
why: compared to deletion, insertion is a cake walk. :p

A. Bolmarcich

unread,

Dec 16, 2016, 1:12:56 PM12/16/16

to

By usual description I mean one like the first one that I learned.
That implementation was in the book "Algorithms + Data Structures =
Programs" by Niklaus Wirth. Page 218 contains:

"The process of node insertion consists essentially of the following
three consecutive parts:

1. Follow the search path until it is verified that the key is not
already in the tree.
2. Insert the new node and determine the resulting balance factor.
3. Retreat along the search path and check the balance factor at each
node."

Wirth gives an implementation in Pascal using a recursive procedure.
One of the parameters of that procedure is a var parameter of type
boolean that indicates whether the height of the subtree along the
search path has been changed. That parameter is set to true at the
bottom of the recursion if a new node has been added to the tree.

As the recursion unwinds, if the value of that parameter is true,
then the balance field (height of left subtree minus height of right
subtree) in the node at that level of the recursion is modified.
Depending on the previous value of the balance field, the value of
that parameter may be set to false.

I am not familar with the implementaion by Knuth that does not
adjust the balance while retreating along the search path.

Tim Rentsch

unread,

Dec 16, 2016, 3:00:46 PM12/16/16

to

I don't have a copy of the Wirth book handy just now, but based
on your description this algorithm should give the same tree
structure as the Knuth algorithm. How the Knuth algorithm
works is iterative rather than recursive, but they end up making
the same changes to the same nodes of the tree.