RB tree map vs. AVL map

19 views
Skip to first unread message

Daniel K. O.

unread,
Dec 13, 2004, 5:34:44 AM12/13/04
to
Hello.

I was just curious about the differences (if any) between a map
implemented using a RB-tree vs one implemented with an AVL tree (speed
and memory usage). I use GCC, so I already have a map implemented with a
RB-tree, but I was wondering if anyone has already implemented a map
with an AVL-tree. I searched on Google, but I could only find
stand-alone AVL implementations, not "std::map-compliant" ones. Does it
mean that nobody wants to implement a map with an AVL-tree, or that I
don't know how to search for it?

If anydoby could point me such map implementation, I would be very happy
to create some tests and make them available. (if not, I'll implement
one myself, but I'm not that familiar with implementing a STL-compliant
container, so it'll take some time...)

Thanks for your atention.

Daniel K. O.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Antoun Kanawati

unread,
Dec 14, 2004, 3:02:21 PM12/14/04
to
Daniel K. O. wrote:
> Hello.
>
> I was just curious about the differences (if any) between a map
> implemented using a RB-tree vs one implemented with an AVL tree (speed
> and memory usage). I use GCC, so I already have a map implemented with a
> RB-tree, but I was wondering if anyone has already implemented a map
> with an AVL-tree. I searched on Google, but I could only find
> stand-alone AVL implementations, not "std::map-compliant" ones. Does it
> mean that nobody wants to implement a map with an AVL-tree, or that I
> don't know how to search for it?

Both RB trees and AVL trees use a binary tree representation, with an
extra field in the node, denoting either color or a small integer.

Both depend on tree rotations to rebalance after insertions and
deletions.

The RB tree is a newer data structure, and, to my recollection, it keeps
a somewhat better balance than the AVL tree.

For more details, see: http://www.stanford.edu/~blp/avl/, which is a
comprehensive reference on this subject. AVL and Red-Black trees are
both in there, with significant details and analysis.
--
A. Kanawati
NO.anto...@comcast.net

David B. Held

unread,
Dec 14, 2004, 3:10:26 PM12/14/04
to
Daniel K. O. wrote:
> I was just curious about the differences (if any) between a map
> implemented using a RB-tree vs one implemented with an AVL tree (speed
> and memory usage).

I prever AVL trees because of their conceptual elegance, but it turns
out that in the real world, RB-trees have some advantages. Of course,
the primary benefit of an AVL tree is that it is always optimally
balanced. That is, you cannot create a binary tree that is more
balanced than an equivalent AVL tree, which means that you always get
the fastest search time, in terms of number of nodes visited.

Unfortunately, AVL trees typically do not contain a parent pointer,
which means that traversal is slower than with an RB-tree that has
parent pointers. That's because traversing to a sibling node requires
starting over at the root, rather than only moving up the tree as far
as you need to.

The other factor is that AVL trees are actually a bit *too* balanced.
Because they maintain the optimal balance for any given tree, they
perform rotations more often than is necessary to get good balanced
tree operation. So even though one side of an RB tree may be twice
as high as the other side, this imbalance is paid for in the reduced
number of rotations on insert and removal operations. Of course, in
specific applications, one design might be always better than the
other, but in general, these are the major reasons why RB-trees are
favored over AVL trees. If I recall, the rotations in an RB-tree are
slightly simpler because there are fewer cases to consider, but I
think it is a very minor advantage.

> [...]


> If anydoby could point me such map implementation, I would be very happy
> to create some tests and make them available. (if not, I'll implement
> one myself, but I'm not that familiar with implementing a STL-compliant
> container, so it'll take some time...)

Make sure that you have a very good reason to use AVL over RB before
you embark on this adventure. ;)

Dave

Swampmonster

unread,
Dec 14, 2004, 3:30:37 PM12/14/04
to
Daniel K. O. wrote:
> Hello.
>
> I was just curious about the differences (if any) between a map
> implemented using a RB-tree vs one implemented with an AVL tree (speed
> and memory usage). I use GCC, so I already have a map implemented with a
> RB-tree, but I was wondering if anyone has already implemented a map
> with an AVL-tree. I searched on Google, but I could only find
> stand-alone AVL implementations, not "std::map-compliant" ones. Does it
> mean that nobody wants to implement a map with an AVL-tree, or that I
> don't know how to search for it?
>
> If anydoby could point me such map implementation, I would be very happy
> to create some tests and make them available. (if not, I'll implement
> one myself, but I'm not that familiar with implementing a STL-compliant
> container, so it'll take some time...)
>
>
>
> Thanks for your atention.
>
> Daniel K. O.

google: AVL tree map std::map
gives good results.

have a look at this, maybe it's what you want:
http://www.codeproject.com/vcpp/stl/indexlist.asp?df=100&forumid=1997&exp=0&select=252038

bye, 'monster

Stephen Howe

unread,
Dec 14, 2004, 3:34:08 PM12/14/04
to

> I was just curious about the differences (if any) between a map
> implemented using a RB-tree vs one implemented with an AVL tree (speed
> and memory usage). I use GCC, so I already have a map implemented with a
> RB-tree, but I was wondering if anyone has already implemented a map
> with an AVL-tree. I searched on Google, but I could only find
> stand-alone AVL implementations, not "std::map-compliant" ones. Does it
> mean that nobody wants to implement a map with an AVL-tree, or that I
> don't know how to search for it?

The basic difference is that for RB-trees, 1-bit is required to store the
colour of a node.
For AVL trees, 2-bits are required to store whether a node is 0
(balanced), -1 (left-tree heavy) or +1 (right-tree heavy).
That is an extra bit to store compared to RB trees.

I would also expect AVL trees to faster at doing a map lower_bound() or
find() compared with RB trees at the trade-off expense of being slower to do
an insert() or erase(). For RB trees, it is the opposite.
That is because for RB trees, the maximum number of nodes from the root lies
between log N and 2 * log N (the former occuring when the path from
root-to-leaf is all black nodes, the latter occuring when there is a red
node between every pair of black nodes and a red node as the leaf). For AVL
trees, the bounds are much tighter and so it does more single or double
rotations on an insert()/erase() compared to a RB tree.

Stephen Howe

assaarpa

unread,
Dec 14, 2004, 8:35:47 PM12/14/04
to
> implemented using a RB-tree vs one implemented with an AVL tree (speed
> and memory usage). I use GCC, so I already have a map implemented with a
> RB-tree, but I was wondering if anyone has already implemented a map
> with an AVL-tree. I searched on Google, but I could only find
> stand-alone AVL implementations, not "std::map-compliant" ones. Does it
> mean that nobody wants to implement a map with an AVL-tree, or that I
> don't know how to search for it?

I haven't looked at the standard library std::map implementations of Visual
C++ .NET 2003, 2005 or g++ 3.4.2 but the performance indicates that they are
atleast something along the lines of AVL or RB tree on those
implementations. Generally I have observed that the RB tree is slightly
better performance compared to AVL tree in the elusive 'average case' (let's
say that I have done some work on testing different implementations of
binary trees fairly recently).

The sourcecode is closed source and not my property to publish publicly but
hope the conclusions confirm yours even though I don't work as software
engineer anymore (but the non disclosure clauses still apply).

ka...@gabi-soft.fr

unread,
Dec 15, 2004, 11:36:36 AM12/15/04
to
David B. Held wrote:
> Daniel K. O. wrote:

[...]


> Unfortunately, AVL trees typically do not contain a parent pointer,
> which means that traversal is slower than with an RB-tree that has
> parent pointers. That's because traversing to a sibling node
requires
> starting over at the root, rather than only moving up the tree as far
> as you need to.

I think that the parent pointer is optional in both cases, but the
complexity requirements on iterators require it in the case of std::map
and std::set.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

ka...@gabi-soft.fr

unread,
Dec 15, 2004, 11:50:06 AM12/15/04
to
Stephen Howe wrote:
> > I was just curious about the differences (if any) between a map
> > implemented using a RB-tree vs one implemented with an AVL tree
(speed
> > and memory usage). I use GCC, so I already have a map implemented
with a
> > RB-tree, but I was wondering if anyone has already implemented a
map
> > with an AVL-tree. I searched on Google, but I could only find
> > stand-alone AVL implementations, not "std::map-compliant" ones.
Does it
> > mean that nobody wants to implement a map with an AVL-tree, or that
I
> > don't know how to search for it?

> The basic difference is that for RB-trees, 1-bit is required to store
the
> colour of a node.
> For AVL trees, 2-bits are required to store whether a node is 0
> (balanced), -1 (left-tree heavy) or +1 (right-tree heavy).
> That is an extra bit to store compared to RB trees.

I don't think that this is a consideration in pratical implementations.
There are two solutions concerning where to put this information:

- Portabily, you add a field to the structure: bool for a RB tree,
and
signed char for an AVL tree. This means at least 8 bits -- on my
machine, alignment constraints mean 8 bytes in both cases. On a
typical 32 bit machine (but with 64 bit doubles which must be
aligned), the situation is even more interesting. The complexity
requirements for the iterators mean that a node must contain a
parent pointer, as well as the left and right pointers. This means
that each node has three pointers: 12 bytes. Alignment
requirements
force the size of the node up to the next multiple of 8 bytes, so
you can add up to 4 bytes of additional information at no cost in
size.

- If memory is really a problem, you can often collapse information.
Thus, for example, on many 16 bit machines, all node pointers will
be even -- and with a bit of very implementation dependant casting
and other hacks, you can maintain the boolean of an RB tree on the
low order bit of a pointer. Except that since each node has three
pointers, you can maintain up to 3 bits of information using this
technique. (Of course, if the alignment constraints are larger
than
2, you have even more free bits.)

> I would also expect AVL trees to faster at doing a map lower_bound()
> or find() compared with RB trees at the trade-off expense of being
> slower to do an insert() or erase(). For RB trees, it is the
opposite.

Interesting. I'd guess that in most applications, find() outnumbers
all
of the other operations combined, sometimes by a large margin.

> That is because for RB trees, the maximum number of nodes from the
> root lies between log N and 2 * log N (the former occuring when the
> path from root-to-leaf is all black nodes, the latter occuring when
> there is a red node between every pair of black nodes and a red node
> as the leaf). For AVL trees, the bounds are much tighter and so it
> does more single or double rotations on an insert()/erase() compared
> to a RB tree.

I think the real difference is between average and worst case
performance. On an average, an RB tree will be almost as balanced as
an
AVL tree, so searches aren't any longer, and insertion and delet are
faster. The RB tree has worst case bounds, however.

It's much like the trade off between quick sort and heap sort.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Stephen Howe

unread,
Dec 17, 2004, 4:31:31 AM12/17/04
to
> > The basic difference is that for RB-trees, 1-bit is required to store
> the
> > colour of a node.
> > For AVL trees, 2-bits are required to store whether a node is 0
> > (balanced), -1 (left-tree heavy) or +1 (right-tree heavy).
> > That is an extra bit to store compared to RB trees.
>
> I don't think that this is a consideration in pratical implementations.
> There are two solutions concerning where to put this information:

There is a another possibility forAVL trees. You can restrict yourself to
one-sided AVL trees. insert() and erase() become more complicated but can
still be done in O(log N) (I guess the constant is bigger).

> - Portabily, you add a field to the structure: bool for a RB tree,

> [snip]


> 2, you have even more free bits.)

Thanks. I have seen both these solutions.

> > I would also expect AVL trees to faster at doing a map lower_bound()
> > or find() compared with RB trees at the trade-off expense of being
> > slower to do an insert() or erase(). For RB trees, it is the
> opposite.

Putting figures:

Worst and Best case height bounds of an AVL tree is:
ceil(log2(N) + 1) <= h(N) <= 1.44log2(N+2) - 0.33
Maximum number of rotations to restore balancing on insert/delete for a tree
of height h:
ceil(h(N)/2)

Worst and Best case height of an Red-Black tree is:
floor(log2(N+1)) <= h(N) <= 2 * log2 (N+1)
Inserts can be done with a maximum of 2 rotations and O(log(N)) recolourings
Deletes can be done with a maximum of 3 rotations and O(log(N)) recolourings

The low number of rotations of Red-Black trees compared with AVL trees
needed to maintain the balance constraints gives it the edge for inserts and
deletes.

Stephen Howe

Hans Bos

unread,
Dec 19, 2004, 7:41:04 PM12/19/04
to
"Stephen Howe" <stephenPOINThoweAT...@eu.uu.net> wrote in message
news:41c20336$0$19158$ed9e...@reading.news.pipex.net...

>> > The basic difference is that for RB-trees, 1-bit is required to store
>> the
>> > colour of a node.
>> > For AVL trees, 2-bits are required to store whether a node is 0
>> > (balanced), -1 (left-tree heavy) or +1 (right-tree heavy).
>> > That is an extra bit to store compared to RB trees.
>>

You only need 1 bit to store the balance of an AVL tree node,
if you store the bit in the subtree.
If a subtree is heavy, you mark it with heavy = 1,
otherwise mark it with heavy = 0.
The balanceof a node is heavy(right) - heavy(left)
...


> Putting figures:
>
> Worst and Best case height bounds of an AVL tree is:
> ceil(log2(N) + 1) <= h(N) <= 1.44log2(N+2) - 0.33
> Maximum number of rotations to restore balancing on insert/delete for a tree
> of height h:
> ceil(h(N)/2)

Inserting in an AVL tree takes at most 1 (double) rotation.

>
> Worst and Best case height of an Red-Black tree is:
> floor(log2(N+1)) <= h(N) <= 2 * log2 (N+1)
> Inserts can be done with a maximum of 2 rotations and O(log(N)) recolourings
> Deletes can be done with a maximum of 3 rotations and O(log(N)) recolourings
>
> The low number of rotations of Red-Black trees compared with AVL trees
> needed to maintain the balance constraints gives it the edge for inserts and
> deletes.
>

If you look per operation only delete operations needs more
rotations in an avl tree.
According to Knuth's The Art of Computer Programing,
on avarage, you need +/- 0.21 rotations per delete. operation..

The worst case for both avl and red-black trees is still O(log n) operations
(adjusting balance factor or recoloring).

Greetings,
Hans.

Daniel K. O.

unread,
Dec 29, 2004, 3:47:19 AM12/29/04
to
Daniel K. O. escreveu:

>(if not, I'll implement
>one myself, but I'm not that familiar with implementing a STL-compliant
>container, so it'll take some time...)
>
>

I finally got time to work on this.
I found a few avl-map implementations, but all of them had "problems", like

* Used recursive algorithms for insertion/deletion;
* Stored the node's height instead of the balancing factor, so each
insertion/deletion required a height computation;
* Simply didn't had all the required std::map members, so it couldn't be
used like a std::map substitute.

So I changed the libstdc++'s RB-tree implementation a bit, and now I got
this:
http://www.inf.ufrgs.br/~dkosmari/avlmap/

The headers "map.h", "stl_map.h" and "stl_tree.h" are mostly the
original GCC headers.
The tree.cpp has the insertion/deletion implementation, that I wrote
from scratch, inspired a bit by the original RB-tree insertion/deletion
code.
"main.cpp" is the test program.

Everything is working fine but the "_avl_tree_rebalance_for_erase()"
function: it's corrupting the balancing factors. :(

The first half of the code is still the original (from the RB-tree), so
I'm (almost) sure the problem is in the second half (the "rebalancing"
code, that I wrote myself). But I already spent some hours trying to
debug it, without any success. This is the first time I'm implementing
an AVL tree, so maybe I'm doing something stupid without noticing. I
would be very happy if anyone could help me finding the problem. The
other AVL implementations I read until now couldn't help me on this.

PS: sorry if this post should go a generic algorithms group, but as I
started the thread here, and this is the only thing stopping me from
closing this thread (RB vs AVL). Also, when this implementation is
finished, it may be useful for other C++ programmers.

PS2: many thanks to all who replied my original message, for your
insightful comments.

Daniel K. O.

unread,
Jan 11, 2005, 5:39:57 AM1/11/05
to
Hello everybody.

Sorry for taking so long to end this thread. I finally had time and
motivation to fix the deletion function, so it's working now. :)

All the files: http://www.inf.ufrgs.br/~dkosmari/avlmap/
Everything in a tar.gz archive:
http://www.inf.ufrgs.br/~dkosmari/avlmap.tar.gz


Implementation details:

This is a map/multimap implementation using an AVL tree; almost all the
code (not the insertion/deletion routines) was copied from the libstdc++
(I'm using MinGW/GCC 3.4.3), so it should work without any problem with GCC.
The code still needs a bit of cleanup; the map/multimap/tree containers
are on the "ext" namespace.
AFAIK this is the only std::map implementation using an AVL tree; ok,
it's for GCC only, but at least it works like a std::map (actually it
*is* a std::map). The best implementations I found were incomplete (e.g.
no rbegin/rend, no lower_bound, etc).


Tests:

In the archive there's a main.cpp file to test some operations (sorry, I
don't know how to do benchmarks...). Unfortunately, the
insertion/deletion operations involve memory allocation/deallocation; so
to minimize this overhead I only used integers (and not big objects,
like strings). My system is Win98 SE on an Athlon XP 2400+ with 256 MB.
For compilation flags, check the Makefile.win file. I didn't use
optimization flags because I didn't know if the libstdc++ was compiled
with optimizations; considering the numbers, I'm guessing it was
compiled without optimizations (the only files "compiled" is the
libstdc++ "tree.cpp" and my "tree.cpp").
Here are the numbers:
-----------------------
Test: sorted_insertion (+)
AVL: 1261779 (1057.49 ms)
RB: 1770125 (1483.54 ms)

Test: sorted_insertion (-)
AVL: 1322594 (1108.46 ms)
RB: 1839192 (1541.42 ms)

Test: random_insertion
AVL: 1396518 (1170.42 ms)
RB: 1423864 (1193.34 ms)

Test: traverse (+)
AVL: 8230 (6.89753 ms)
RB: 7742 (6.48854 ms)

Test: traverse (-)
AVL: 10588 (8.87377 ms)
RB: 9859 (8.26279 ms)

Test: random_deletion
AVL: 471595 (395.242 ms)
RB: 474115 (397.354 ms)
---------------------


Results:

Looks like a sorted insertion in an RB-tree is slower. This is the first
test I tried, so I tested it with different optimization levels (from
-O1 to -O3, with and without -fexpensive-optimizations), and the
AVL-tree always was at least 200 ms faster (~400 ms vs ~600 ms). OTOH
the random insertion test was identical with both trees. (on this
specific report, the AVL test was "faster", but most of times the RB
tree was "faster")

Traversing was a bit strange: the code is exactly the same (just a small
difference for AVL) for both implementations (*_tree_increment and
*_tree_decrement, in tree.cpp), but the RB was always "faster", no
matter what optimization flags I used.

What the tests show is: nothing. :P Seems that the conditions aren't the
same and I can't reproduce the same compilation. I think I would have to
recompile my libstdc++ on my system to perform a fair test. This would
be easier to do on a GNU/ Linux box, and I don't have one installed here
right now. I can say that there's a difference between the two trees
(some tests have the same result, some have different), but I don't know
which one is faster (because I don't know which one was compiled with
more optimizations).


So:
* Could anyone try the test on a GNU/Linux box with a "fair"
environment (compile both libstdc++ and my code with the same flags) and
see if there's any difference?
* Comments about my code are welcome. :)
* I would be a bit sad if so many hours spent coding this were to be
wasted; is there any GCC hacker that would like to put this code into
libstdc++ / ext? (well, maybe some more tests may show it's faster than
the RB tree...)


Thank you all for your attention.

Reply all
Reply to author
Forward
0 new messages