Tutorial on threaded binary tree part 1: simple unthreaded tree

Alf P. Steinbach

unread,

Dec 1, 2016, 4:35:48 AM12/1/16

to

This tutorial, if it works (it's an experiment), is intended to work
this way:

* I post some working code.
* Learner(s) study it and ask about things.
* Others answer questions and post critique or get bogged down in long
sub-threads about sausages and swearing.

The following code implements a simple sorted binary tree with traversal.

There's no attempt at balancing, so this code does not deal nicely with
sorted input in the big O sense. Random input is the thing here. I used
the digits of pi.

[code]
namespace cppx {
struct No_copy_or_move
{
auto operator=( No_copy_or_move const& ) -> No_copy_or_move& =
delete;
auto operator=( No_copy_or_move&& ) -> No_copy_or_move& = delete;

No_copy_or_move() = default;
No_copy_or_move( No_copy_or_move const& ) = delete;
No_copy_or_move( No_copy_or_move&& ) = delete;
};
} // namespace cppx

namespace my {

using Value = double;

class Tree
: public cppx::No_copy_or_move
{
private:
struct Node
{
Node* left;
Node* right;
Value value;
};

Node* root_ = nullptr;

template< class Func >
static void apply_in_infix_order( Node* root, Func const& f )
{
if( root != nullptr )
{
apply_in_infix_order( root->left, f );
f( root->value );
apply_in_infix_order( root->right, f );
}
}

public:
void add( Value const& value )
{
Node** p_ptr = &root_;
while( *p_ptr != nullptr )
{
Node*& ref_ptr = *p_ptr;
p_ptr = &(value < ref_ptr->value? ref_ptr->left :
ref_ptr->right);
}
*p_ptr = new Node{ nullptr, nullptr, value };
}

template< class Func >
void for_each( Func const& f )
{
apply_in_infix_order( root_, f );
}

Tree() = default;
};
} // my

#include <iostream>
using namespace std;
auto main()
-> int
{
my::Tree t;
for( int const v : {3, 1, 4, 1, 5, 9, 2, 6, 5, 4} )
{
t.add( v );
}
t.for_each(
[]( double x ) { cout << x << ' '; }
);
cout << endl;
}
}
[/code]

Enjoy,

- Alf

Mr Flibble

unread,

Dec 1, 2016, 3:23:29 PM12/1/16

to

Without balancing your tree is as good as useless; your post was totally
pointless.

/Flibble

Richard

unread,

Dec 1, 2016, 4:34:41 PM12/1/16

to

[Please do not mail me a copy of your followup]

I believe you have said in previous threads that you like it for
"consistency" (which you don't seem to apply consistently throughout
even this small code sample), but the use of auto deduced return types
for methods and functions here feels gratuitous. It doesn't add any
clarity but comes at the expense of more tokens I have to scan through
in order to see what is happening.

"Everything should be made as simple as possible, but no simpler."
(Attributed to Einstein, <https://en.wikiquote.org/wiki/Albert_Einstein#1920s>)

To my mind, that means writing things in the simplest form with as
few tokens as possible.

It's why we write ++i instead of i = i + 1 and if (predicate) instead
of if (predicate == true). In both cases, the former is simpler and
expresses the exact same semantics.

Slavishly using auto and trailing return types on functions/methods
(and not even consistently throughout) just takes something simple
and makes it more complicated without any benefit.

Yes, it's a matter of style and not correctness, so your opinion may
differ -- I assume it does as you wrote it that way. Consider that when
we write code, we should think of the next person that is reading it
and not use code as an attempt to inculcate the rest of the world into
using our personal preferences.

Given that matters of style are personal taste, barring other
operational or security considerations, my tendency is to borrow the
style of Kernighan and Ritchie when writing code in C/C++. There are
many stylistic fads and opinions which differ from their style and I
uniformly have found them all to be of no benefit, or at best they
solve a problem in the wrong way. Your code exhibits one or two of
these tendencies but I don't consider them to be worth elevating to a
point of discussion as much as the gratuitous use of auto deduced
return types.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Jerry Stuckle

unread,

Dec 1, 2016, 4:54:50 PM12/1/16

to

On 12/1/2016 3:23 PM, Mr Flibble wrote:
> On 01/12/2016 09:32, Alf P. Steinbach wrote:
>> This tutorial, if it works (it's an experiment), is intended to work
>> this way:
>>
>> * I post some working code.
>> * Learner(s) study it and ask about things.
>> * Others answer questions and post critique or get bogged down in long
>> sub-threads about sausages and swearing.
>>
>> The following code implements a simple sorted binary tree with traversal.
>>
>> There's no attempt at balancing, so this code does not deal nicely with
>> sorted input in the big O sense. Random input is the thing here. I used
>> the digits of pi.
>

> Without balancing your tree is as good as useless; your post was totally
> pointless.
>
> /Flibble
>

No, it's not. This is a start and builds the tree correctly. Balancing
can come later. It's just a matter of adding about 4 functions and call
them from the appropriate places. The existing code will still work.

--
==================
Remove the "x" from my email address
Jerry Stuckle
jstu...@attglobal.net
==================

Mr Flibble

unread,

Dec 1, 2016, 5:24:12 PM12/1/16

to

+1

(Alf will now go off in a strop and write a Hello, World! program so he
can write "auto main() -> int" again like some crazy OCD cat person.)

/Flibble

Alf P. Steinbach

unread,

Dec 1, 2016, 6:35:48 PM12/1/16

to

On 01.12.2016 22:34, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> I believe you have said in previous threads that you like it for
> "consistency" (which you don't seem to apply consistently throughout
> even this small code sample), but the use of auto deduced return types
> for methods and functions here feels gratuitous. It doesn't add any
> clarity but comes at the expense of more tokens I have to scan through
> in order to see what is happening.

There are no deduced return types in this code.

Mainly because I mostly agree with your sentiment here. :)

[snip]

> Slavishly using auto and trailing return types on functions/methods
> (and not even consistently throughout) just takes something simple
> and makes it more complicated without any benefit.

Re consistency, this code is totally consistent:

* `void` for Pascal /procedures/ (no expression result functions).
* `auto` for Pascal functions (functions that return value).

:)

But the 100% consistency here is just a coincidence, due to the
shortness of the code. I do not believe 100% consistency is good! In the
end, I believe, this boils down to the reason why programming can't be
automated: we need humans to supply intelligence.

Dang, there was a good quote about consistency, I've forgotten it...

Cheers!,

- Alf

Richard

unread,

Dec 1, 2016, 6:57:21 PM12/1/16

to

[Please do not mail me a copy of your followup]

"Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code
<o1qc1t$3nc$1...@dont-email.me> thusly:

>On 01.12.2016 22:34, Richard wrote:

>> [...] but the use of auto deduced return types

>> for methods and functions here feels gratuitous.

>There are no deduced return types in this code.

OK, consider that nit picked. s/deduced/trailing/.

>Re consistency, this code is totally consistent:
>
>* `void` for Pascal /procedures/ (no expression result functions).
>* `auto` for Pascal functions (functions that return value).

Again, back to the point of the audience that is reading this code.
The fact that you had to explain your "consistency" is buttressing my
assertion that this style is leading to less clarity and not more.

When I read the code, I see some things using trailing return types
and some things not using trailing return types.

If using auto is good enough for trailing type int, why isn't it
good enough for trailing type void?

If using classic return type without auto is good enough for void,
why isn't it good enough for int?

Etc.

Again, this style just makes me think "someone is all excited about
the new syntax of trailing return types and is using it gratuitously"
instead of using it where it adds clarity as in type deduction of a
return type from an expression using types of input arguments.

Alf P. Steinbach

unread,

Dec 1, 2016, 7:17:28 PM12/1/16

to

On 02.12.2016 00:57, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> "Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code
> <o1qc1t$3nc$1...@dont-email.me> thusly:
>
>> On 01.12.2016 22:34, Richard wrote:
>>> [...] but the use of auto deduced return types
>>> for methods and functions here feels gratuitous.
>
>> There are no deduced return types in this code.
>
> OK, consider that nit picked. s/deduced/trailing/.

There is world of difference.

>> Re consistency, this code is totally consistent:
>>
>> * `void` for Pascal /procedures/ (no expression result functions).
>> * `auto` for Pascal functions (functions that return value).
>
> Again, back to the point of the audience that is reading this code.
> The fact that you had to explain your "consistency" is buttressing my
> assertion that this style is leading to less clarity and not more.
>
> When I read the code, I see some things using trailing return types
> and some things not using trailing return types.
>
> If using auto is good enough for trailing type int, why isn't it
> good enough for trailing type void?

For the very common single case of `void` functions, `auto` just adds
verbosity (for member function implementations it can drastically reduce
verbosity, but for simple `void` functions it adds wordage).

Also, as I see it it's very nice to have this class of functions singled
out visually, as with the Pascal keyword `procedure` (which had that
exact purpose: to single them out). Because in the basic meaning the
`void` functions have a very different purpose, namely to /do/ something
rather than /compute/ something. Of course it's possible to use either
class of function for the opposite purpose, with just more awkward
notation, but that's the basis – and via the C++11 and later support for
move semantics, C++ now increasingly supports the style where
computations yield function results, used in /expressions/.

Most computer scientist, I believe, view that usage as the ideal, that a
routine that computes something should return that as its function
result, and therefore agree that the C conflation of procedure and
function was a bad choice. In original C one would have to let those
procedures return `int`, either explicitly or via C implicit int. It was
ugly, the code saying something different than the intention.

> If using classic return type without auto is good enough for void,
> why isn't it good enough for int?

Singling out `void`, procedures, is doable and has a reason (which IMO
is strong and good).

Not so for `int`.

But people have argued that `int main()` is so idiomatic that it just
feels wrong and perplexing to see it expressed with `auto`. I do that
for consistency. And also, to introduce the syntax to more people. :)

Cheers!,

- Alf

Alf P. Steinbach

unread,

Dec 1, 2016, 9:18:22 PM12/1/16

to

On 01.12.2016 21:23, Mr Flibble wrote:
> On 01/12/2016 09:32, Alf P. Steinbach wrote:

[snip]

>> void add( Value const& value )
>> {
>> Node** p_ptr = &root_;
>> while( *p_ptr != nullptr )
>> {
>> Node*& ref_ptr = *p_ptr;
>> p_ptr = &(value < ref_ptr->value? ref_ptr->left :
>> ref_ptr->right);
>> }
>> *p_ptr = new Node{ nullptr, nullptr, value };
>> }
>>

[snip]

>
> Without balancing your tree is as good as useless; your post was totally
> pointless.

The intention of a “part 1”, implying a later “part 2”, and so on, was
to establish a baseline and see if the idea of generating discussion,
rather than providing it, panned out.

I agree that balancing is crucial for dealing with sorted or mostly
sorted input, to avoid quadratic accumulated insertion time.

And there's one even more special case to consider: the case of a sorted
input that is a sequence of equal values. Here simple balancing doesn't
help, because a sequence of equal values always becomes a degenerate
tree, a single branch of right-pointers (or left-pointers, depending on
one's choice), that cannot be balanced up. So ideally, to avoid square
time also for this special case, the `add` routine should be modified to
not descend down such a chain of equal value nodes.

Possibilities include:

• Inserting a new node with value V at the very top of an existing chain
of V, reducing the insertion complexity to logarithmic.

• Adding a value count in each node, and just incrementing it.
This precludes using the tree to associate different info with each
key V.

• Treating the tree as a simple set, and failing or doing nothing if V
already exists.

I think there may be some complexity hidden in the first possibility.

But anyway, as you can see, avoiding square time /in general/ so as to
make the structure generally useful, involves a decision about what the
tree is used for, and modifying the `add` routine accordingly:

a set (last bullet), a multiset (middle bullet), or a multimap (first
bullet)?

Cheers!,

- Alf

Daniel

unread,

Dec 1, 2016, 11:13:07 PM12/1/16

to

On Thursday, December 1, 2016 at 6:35:48 PM UTC-5, Alf P. Steinbach wrote:
>
> Dang, there was a good quote about consistency, I've forgotten it...
>

One of these?

“A foolish consistency is the hobgoblin of little minds"

- Ralph Waldo Emerson

"Do I contradict myself? Very well, then I contradict myself, I am large, I contain multitudes."

- Walt Whitman

Alf P. Steinbach

unread,

Dec 1, 2016, 11:56:16 PM12/1/16

to

Yep. Thanks!

Cheers!,

- Alf

leigh.v....@googlemail.com

unread,

Dec 2, 2016, 8:46:18 AM12/2/16

to

Adding multiple identical keys should be no problem for any self balancing search tree otherwise we wouldn't have std::multiset and std::multimap.

/leigh

Mr Flibble

unread,

Dec 2, 2016, 10:32:53 AM12/2/16

to

Perhaps you are talking about guaranteeing the stability of a sequence
of duplicate keys? In which case an incrementing counter approach will
mean your insert operation cannot guarantee logarithmic complexity any
longer. The correct approach to implementing a binary search tree that
guarantees stability of duplicate key order is to make it a hybrid data
structure that also includes a linked list: this approach offers other
advantages: iterator increment/decrement changes from logarithmic
complexity to constant time.

/Flibble

Mr Flibble

unread,

Dec 2, 2016, 10:56:07 AM12/2/16

to

On 02/12/2016 02:15, Alf P. Steinbach wrote:

You are wrong about how multiset and multimap differ: certainly they do
not correspond to your bullet points. multiset and multimap have
identical data structures: the only difference is multimap value_type is
a pair in which the key is the first part.

/Flibble

Mr Flibble

unread,

Dec 2, 2016, 11:15:28 AM12/2/16

to

Obviously if you have a balancing scheme that does not alter sort order
of nodes (e.g. red-black tree rotation that most std:: node based
container implementations use) then stability of equivalent keys and
logarithmic complexity for insert can be guaranteed.

/Flibble

Öö Tiib

unread,

Dec 2, 2016, 11:30:49 AM12/2/16

to

Not a problem but it takes some time to write and to test
otherwise we wouldn't have container templates.
There are different requirements so the underlying
implementation of such templates is often rather different.
Compare things like std::(unordered_)(multi)map(set,
boost::flat_(multi)map/set and boost::intrusive::set.
Lot of those don't even have tree underneath.

Implementing threaded binary tree sounds like not bad
idea as it allows to optimize out parent_ pointer in tree
node; fastens iterating over container up and reduces
potential need for recursive algorithms or fat iterators.
However that is theory ... in practice it looks like a
complex beast and so I haven't profiled one.

Richard

unread,

Dec 2, 2016, 5:06:14 PM12/2/16

to

[Please do not mail me a copy of your followup]

"Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code

<o1qeg2$a8c$1...@dont-email.me> thusly:

>[...] And also, to introduce the syntax to more people. :)

Yes. Gratuitous use of new syntax.

ruben safir

unread,

Dec 2, 2016, 5:39:54 PM12/2/16

to

On 12/01/2016 03:23 PM, Mr Flibble wrote:
>> [/code]
>
> Without balancing your tree is as good as useless; your post was totally
> pointless.

not true

ruben safir

unread,

Dec 2, 2016, 5:45:35 PM12/2/16

to

On 12/01/2016 04:34 PM, Richard wrote:
> It's why we write ++i instead of i = i + 1 and if (predicate) instead
> of if (predicate == true). In both cases, the former is simpler and
> expresses the exact same semantics.

no, actually

Ian Collins

unread,

Dec 2, 2016, 5:48:24 PM12/2/16

to

How so?

--
Ian

ruben safir

unread,

Dec 2, 2016, 5:49:44 PM12/2/16

to

On 12/02/2016 05:48 PM, Ian Collins wrote:
>> no, actually
>
> How so?

it is undefined on the order of evaluation

Ian Collins

unread,

Dec 2, 2016, 5:52:34 PM12/2/16

to

What is?

--
Ian

Alf P. Steinbach

unread,

Dec 2, 2016, 7:01:28 PM12/2/16

to

On 02.12.2016 16:55, Mr Flibble wrote:
>>
[snip]

>> Possibilities include:
>>
>> • Inserting a new node with value V at the very top of an existing chain
>> of V, reducing the insertion complexity to logarithmic.
>>
>> • Adding a value count in each node, and just incrementing it.
>> This precludes using the tree to associate different info with each
>> key V.
>>
>> • Treating the tree as a simple set, and failing or doing nothing if V
>> already exists.
>>
>> I think there may be some complexity hidden in the first possibility.
>>
>> But anyway, as you can see, avoiding square time /in general/ so as to
>> make the structure generally useful, involves a decision about what the
>> tree is used for, and modifying the `add` routine accordingly:
>>
>> a set (last bullet), a multiset (middle bullet), or a multimap (first
>> bullet)?
>
> You are wrong about how multiset and multimap differ: certainly they do
> not correspond to your bullet points. multiset and multimap have
> identical data structures: the only difference is multimap value_type is
> a pair in which the key is the first part.

Hm, the above two sentences contradict each other. :)

The last sentence even contradicts itself.

As you note in the last part of that sentence, with a multimap multiple
occurrences of the same key need to be distinguished, because they can
be associated with different values.

Therefore, contrary to the assertion in the first part of that sentence,
they do not naturally have “identical structures”. There are
interpretations where this assertion holds, but they are meaningless.
For example, one structure can be implemented in terms of the other,
they can both be implemented in terms of arrays, and so on etc.; there
is no meaning in that, and it does not relate to the bullet points.

With a multiset you only need a count of each value.

With a multimap you need to distinguish each value, even identical ones,
because they can have different associated information.

Cheers & hth.,

- Alf

Jerry Stuckle

unread,

Dec 2, 2016, 7:44:29 PM12/2/16

to

It is perfectly defined. The addition is performed before the
assignment. The variable is only changed once - in the assignment.
There is no undefined operation.

Now there is undefined behavior in the expression i = i++ + 1. But that
is a different situation.

Melzzzzz

unread,

Dec 2, 2016, 7:51:56 PM12/2/16

to

You are wrong. Value is completely non essential for that data
structure. With that said, I can't figure out purpose of multiset at
all... but generally you are right. multiset does not needs to store
actual nodes. count and key are enough.

--
press any key to continue or any other to quit

Mr Flibble

unread,

Dec 2, 2016, 8:54:51 PM12/2/16

to

Wrong; again multiset and multimap are identical data structures both
being a binary search tree with a node for each element irregardless of
whether there are any duplicate keys. The only difference between
multiset and multimap is the key is part of a pair for multimap.

/Flibble

Mr Flibble

unread,

Dec 2, 2016, 9:07:36 PM12/2/16

to

Basically for a std::multiset the key_type is ALSO the value_type; if
the multiset has 10 equivalent keys it must have 10 equivalent values
(elements) i.e. NOT a single value (element) with a count of 10. Keys
that are equivalent do not have to be identical (comparable with ==
rather than <) and indeed can be complex objects with mutable state that
does not contribute to order.

I am surprised Alf that someone with your experience doesn't know how
std::multiset works or how it needs to be implemented.

/Flibble

Alf P. Steinbach

unread,

Dec 2, 2016, 9:07:56 PM12/2/16

to

A map data structure is a collection of (key, value) pairs. For example,
the keys can be student id's, and the values can be the grades they have
in some course. Or the keys can be main country codes, and the values
codes for the languages spoken there, like ("NO", "NB") and ("NO", "NN").

The “multi” in /multimap/ means that the same key can occur twice or
more, then generally with different associated values for each
occurrence, as in the country/language code example.

For more details see <url:
http://en.cppreference.com/w/cpp/container/multimap>.

> With that said, I can't figure out purpose of multiset at
> all...

:) See above.

> but generally you are right. multiset does not needs to store
> actual nodes. count and key are enough.

Well, that depends.

I was referring to concepts and common usage.

However, `std::multiset`, as opposed to the mathematical idea of
multiset, guarantees that the order of the keys that compare equivalent
is the order of insertion and does not change, i.e. it provides a stable
sort. To make sense of that -- the /order/ of equivalent key values,
what? -- you need to know that `std::multiset` can work with a custom
comparison function that can use only part of the value stored, as key.
And for this case Mr. Flibble would be right, if that was what he meant.
But then I think he would have mentioned it.

A count is not sufficient for the general functionality of `std::multiset`.

But a tree with just a count for each value, can implement a multiset
(although not the full general `std::multiset` in the C++ standard
library), and can not implement a multimap of any kind.

Cheers!,

- Alf

Alf P. Steinbach

unread,

Dec 2, 2016, 9:14:08 PM12/2/16

to

On 03.12.2016 03:07, Mr Flibble wrote:
>
> I am surprised Alf that someone with your experience doesn't know how
> std::multiset works or how it needs to be implemented.

Oh, thanks for compliment. :)

But I wasn't referring to `std::multiset` specifically: I wasn't
considering what data structure you need to implement `std::multiset`.

After listing three possibilities I noted that with the middle one, with
(only) a count for each value, the most you can use that tree for, or at
least the natural application, is /a multiset/.

That's general, programming language-independent terminology.

Mr Flibble

unread,

Dec 2, 2016, 9:17:57 PM12/2/16

to

This is a C++ newsgroup not a mathematics newsgroup so if someone uses
the terms "multiset" and "multimap" it is a fair assumption that they
are actually referring to the C++ Standard Library containers rather
than the mathematical concepts: you should have been more clear about
what you meant by providing appropriate qualifications.

/Flibble

Öö Tiib

unread,

Dec 3, 2016, 4:31:56 AM12/3/16

to

Thanks guys for interesting sub-thread about different ways how to
organize the bucket of values that compare equal in a tree. Slight
miscommunication is unfortunately inevitable since it is impossible
to be precise in human language without becoming unreadable.

Tim Rentsch

unread,

Dec 10, 2016, 1:05:31 AM12/10/16

to

"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:

> This tutorial, if it works (it's an experiment), is intended to work
> this way:
>
> * I post some working code.
> * Learner(s) study it and ask about things.
> * Others answer questions and post critique or get bogged down in long
> sub-threads about sausages and swearing.
>
> The following code implements a simple sorted binary tree with traversal.
>
> There's no attempt at balancing, so this code does not deal nicely
> with sorted input in the big O sense. Random input is the thing
> here. I used the digits of pi.

I'm wondering if/when you might show a version with threaded
trees and/or balancing.

The iterative algorithm for AVL trees is somewhat tricky.
Combining that with threaded leaves is trickier still. I was
able to do a recursive AVL-like insertion fairly easily (with
nodes having an explicit height field) but a more traditional
iterative algorithm proved to be more difficult.