19 views

Skip to first unread message

Dec 13, 2004, 5:34:44 AM12/13/04

to

Hello.

I was just curious about the differences (if any) between a map

implemented using a RB-tree vs one implemented with an AVL tree (speed

and memory usage). I use GCC, so I already have a map implemented with a

RB-tree, but I was wondering if anyone has already implemented a map

with an AVL-tree. I searched on Google, but I could only find

stand-alone AVL implementations, not "std::map-compliant" ones. Does it

mean that nobody wants to implement a map with an AVL-tree, or that I

don't know how to search for it?

If anydoby could point me such map implementation, I would be very happy

to create some tests and make them available. (if not, I'll implement

one myself, but I'm not that familiar with implementing a STL-compliant

container, so it'll take some time...)

Thanks for your atention.

Daniel K. O.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

[ comp.lang.c++.moderated. First time posters: Do this! ]

Dec 14, 2004, 3:02:21 PM12/14/04

to

Daniel K. O. wrote:

> Hello.

>

> I was just curious about the differences (if any) between a map

> implemented using a RB-tree vs one implemented with an AVL tree (speed

> and memory usage). I use GCC, so I already have a map implemented with a

> RB-tree, but I was wondering if anyone has already implemented a map

> with an AVL-tree. I searched on Google, but I could only find

> stand-alone AVL implementations, not "std::map-compliant" ones. Does it

> mean that nobody wants to implement a map with an AVL-tree, or that I

> don't know how to search for it?

> Hello.

>

> I was just curious about the differences (if any) between a map

> implemented using a RB-tree vs one implemented with an AVL tree (speed

> and memory usage). I use GCC, so I already have a map implemented with a

> RB-tree, but I was wondering if anyone has already implemented a map

> with an AVL-tree. I searched on Google, but I could only find

> stand-alone AVL implementations, not "std::map-compliant" ones. Does it

> mean that nobody wants to implement a map with an AVL-tree, or that I

> don't know how to search for it?

Both RB trees and AVL trees use a binary tree representation, with an

extra field in the node, denoting either color or a small integer.

Both depend on tree rotations to rebalance after insertions and

deletions.

The RB tree is a newer data structure, and, to my recollection, it keeps

a somewhat better balance than the AVL tree.

For more details, see: http://www.stanford.edu/~blp/avl/, which is a

comprehensive reference on this subject. AVL and Red-Black trees are

both in there, with significant details and analysis.

--

A. Kanawati

NO.anto...@comcast.net

Dec 14, 2004, 3:10:26 PM12/14/04

to

Daniel K. O. wrote:

> I was just curious about the differences (if any) between a map

> implemented using a RB-tree vs one implemented with an AVL tree (speed

> and memory usage).

> I was just curious about the differences (if any) between a map

> implemented using a RB-tree vs one implemented with an AVL tree (speed

> and memory usage).

I prever AVL trees because of their conceptual elegance, but it turns

out that in the real world, RB-trees have some advantages. Of course,

the primary benefit of an AVL tree is that it is always optimally

balanced. That is, you cannot create a binary tree that is more

balanced than an equivalent AVL tree, which means that you always get

the fastest search time, in terms of number of nodes visited.

Unfortunately, AVL trees typically do not contain a parent pointer,

which means that traversal is slower than with an RB-tree that has

parent pointers. That's because traversing to a sibling node requires

starting over at the root, rather than only moving up the tree as far

as you need to.

The other factor is that AVL trees are actually a bit *too* balanced.

Because they maintain the optimal balance for any given tree, they

perform rotations more often than is necessary to get good balanced

tree operation. So even though one side of an RB tree may be twice

as high as the other side, this imbalance is paid for in the reduced

number of rotations on insert and removal operations. Of course, in

specific applications, one design might be always better than the

other, but in general, these are the major reasons why RB-trees are

favored over AVL trees. If I recall, the rotations in an RB-tree are

slightly simpler because there are fewer cases to consider, but I

think it is a very minor advantage.

> [...]

> If anydoby could point me such map implementation, I would be very happy

> to create some tests and make them available. (if not, I'll implement

> one myself, but I'm not that familiar with implementing a STL-compliant

> container, so it'll take some time...)

Make sure that you have a very good reason to use AVL over RB before

you embark on this adventure. ;)

Dave

Dec 14, 2004, 3:30:37 PM12/14/04

to

Daniel K. O. wrote:

> Hello.

>

> I was just curious about the differences (if any) between a map

> implemented using a RB-tree vs one implemented with an AVL tree (speed

> and memory usage). I use GCC, so I already have a map implemented with a

> RB-tree, but I was wondering if anyone has already implemented a map

> with an AVL-tree. I searched on Google, but I could only find

> stand-alone AVL implementations, not "std::map-compliant" ones. Does it

> mean that nobody wants to implement a map with an AVL-tree, or that I

> don't know how to search for it?

>

> If anydoby could point me such map implementation, I would be very happy

> to create some tests and make them available. (if not, I'll implement

> one myself, but I'm not that familiar with implementing a STL-compliant

> container, so it'll take some time...)

>

>

>

> Thanks for your atention.

>

> Daniel K. O.

> Hello.

>

> I was just curious about the differences (if any) between a map

> implemented using a RB-tree vs one implemented with an AVL tree (speed

> and memory usage). I use GCC, so I already have a map implemented with a

> RB-tree, but I was wondering if anyone has already implemented a map

> with an AVL-tree. I searched on Google, but I could only find

> stand-alone AVL implementations, not "std::map-compliant" ones. Does it

> mean that nobody wants to implement a map with an AVL-tree, or that I

> don't know how to search for it?

>

> If anydoby could point me such map implementation, I would be very happy

> to create some tests and make them available. (if not, I'll implement

> one myself, but I'm not that familiar with implementing a STL-compliant

> container, so it'll take some time...)

>

>

>

> Thanks for your atention.

>

> Daniel K. O.

google: AVL tree map std::map

gives good results.

have a look at this, maybe it's what you want:

http://www.codeproject.com/vcpp/stl/indexlist.asp?df=100&forumid=1997&exp=0&select=252038

bye, 'monster

Dec 14, 2004, 3:34:08 PM12/14/04

to

> I was just curious about the differences (if any) between a map

> implemented using a RB-tree vs one implemented with an AVL tree (speed

> and memory usage). I use GCC, so I already have a map implemented with a

> RB-tree, but I was wondering if anyone has already implemented a map

> with an AVL-tree. I searched on Google, but I could only find

> stand-alone AVL implementations, not "std::map-compliant" ones. Does it

> mean that nobody wants to implement a map with an AVL-tree, or that I

> don't know how to search for it?

The basic difference is that for RB-trees, 1-bit is required to store the

colour of a node.

For AVL trees, 2-bits are required to store whether a node is 0

(balanced), -1 (left-tree heavy) or +1 (right-tree heavy).

That is an extra bit to store compared to RB trees.

I would also expect AVL trees to faster at doing a map lower_bound() or

find() compared with RB trees at the trade-off expense of being slower to do

an insert() or erase(). For RB trees, it is the opposite.

That is because for RB trees, the maximum number of nodes from the root lies

between log N and 2 * log N (the former occuring when the path from

root-to-leaf is all black nodes, the latter occuring when there is a red

node between every pair of black nodes and a red node as the leaf). For AVL

trees, the bounds are much tighter and so it does more single or double

rotations on an insert()/erase() compared to a RB tree.

Stephen Howe

Dec 14, 2004, 8:35:47 PM12/14/04

to

> implemented using a RB-tree vs one implemented with an AVL tree (speed

> and memory usage). I use GCC, so I already have a map implemented with a

> RB-tree, but I was wondering if anyone has already implemented a map

> with an AVL-tree. I searched on Google, but I could only find

> stand-alone AVL implementations, not "std::map-compliant" ones. Does it

> mean that nobody wants to implement a map with an AVL-tree, or that I

> don't know how to search for it?

> and memory usage). I use GCC, so I already have a map implemented with a

> RB-tree, but I was wondering if anyone has already implemented a map

> with an AVL-tree. I searched on Google, but I could only find

> stand-alone AVL implementations, not "std::map-compliant" ones. Does it

> mean that nobody wants to implement a map with an AVL-tree, or that I

> don't know how to search for it?

I haven't looked at the standard library std::map implementations of Visual

C++ .NET 2003, 2005 or g++ 3.4.2 but the performance indicates that they are

atleast something along the lines of AVL or RB tree on those

implementations. Generally I have observed that the RB tree is slightly

better performance compared to AVL tree in the elusive 'average case' (let's

say that I have done some work on testing different implementations of

binary trees fairly recently).

The sourcecode is closed source and not my property to publish publicly but

hope the conclusions confirm yours even though I don't work as software

engineer anymore (but the non disclosure clauses still apply).

Dec 15, 2004, 11:36:36 AM12/15/04

to

David B. Held wrote:

> Daniel K. O. wrote:

> Daniel K. O. wrote:

[...]

> Unfortunately, AVL trees typically do not contain a parent pointer,

> which means that traversal is slower than with an RB-tree that has

> parent pointers. That's because traversing to a sibling node

requires

> starting over at the root, rather than only moving up the tree as far

> as you need to.

I think that the parent pointer is optional in both cases, but the

complexity requirements on iterators require it in the case of std::map

and std::set.

--

James Kanze GABI Software http://www.gabi-soft.fr

Conseils en informatique orientée objet/

Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Dec 15, 2004, 11:50:06 AM12/15/04

to

Stephen Howe wrote:

> > I was just curious about the differences (if any) between a map

> > implemented using a RB-tree vs one implemented with an AVL tree

(speed

> > and memory usage). I use GCC, so I already have a map implemented

with a

> > RB-tree, but I was wondering if anyone has already implemented a

map

> > with an AVL-tree. I searched on Google, but I could only find

> > stand-alone AVL implementations, not "std::map-compliant" ones.

Does it

> > mean that nobody wants to implement a map with an AVL-tree, or that

I

> > don't know how to search for it?

> > I was just curious about the differences (if any) between a map

> > implemented using a RB-tree vs one implemented with an AVL tree

(speed

> > and memory usage). I use GCC, so I already have a map implemented

with a

> > RB-tree, but I was wondering if anyone has already implemented a

map

> > with an AVL-tree. I searched on Google, but I could only find

> > stand-alone AVL implementations, not "std::map-compliant" ones.

Does it

> > mean that nobody wants to implement a map with an AVL-tree, or that

I

> > don't know how to search for it?

> The basic difference is that for RB-trees, 1-bit is required to store

the

> colour of a node.

> For AVL trees, 2-bits are required to store whether a node is 0

> (balanced), -1 (left-tree heavy) or +1 (right-tree heavy).

> That is an extra bit to store compared to RB trees.

I don't think that this is a consideration in pratical implementations.

There are two solutions concerning where to put this information:

- Portabily, you add a field to the structure: bool for a RB tree,

and

signed char for an AVL tree. This means at least 8 bits -- on my

machine, alignment constraints mean 8 bytes in both cases. On a

typical 32 bit machine (but with 64 bit doubles which must be

aligned), the situation is even more interesting. The complexity

requirements for the iterators mean that a node must contain a

parent pointer, as well as the left and right pointers. This means

that each node has three pointers: 12 bytes. Alignment

requirements

force the size of the node up to the next multiple of 8 bytes, so

you can add up to 4 bytes of additional information at no cost in

size.

- If memory is really a problem, you can often collapse information.

Thus, for example, on many 16 bit machines, all node pointers will

be even -- and with a bit of very implementation dependant casting

and other hacks, you can maintain the boolean of an RB tree on the

low order bit of a pointer. Except that since each node has three

pointers, you can maintain up to 3 bits of information using this

technique. (Of course, if the alignment constraints are larger

than

2, you have even more free bits.)

> I would also expect AVL trees to faster at doing a map lower_bound()

> or find() compared with RB trees at the trade-off expense of being

> slower to do an insert() or erase(). For RB trees, it is the

opposite.

Interesting. I'd guess that in most applications, find() outnumbers

all

of the other operations combined, sometimes by a large margin.

> That is because for RB trees, the maximum number of nodes from the

> root lies between log N and 2 * log N (the former occuring when the

> path from root-to-leaf is all black nodes, the latter occuring when

> there is a red node between every pair of black nodes and a red node

> as the leaf). For AVL trees, the bounds are much tighter and so it

> does more single or double rotations on an insert()/erase() compared

> to a RB tree.

I think the real difference is between average and worst case

performance. On an average, an RB tree will be almost as balanced as

an

AVL tree, so searches aren't any longer, and insertion and delet are

faster. The RB tree has worst case bounds, however.

It's much like the trade off between quick sort and heap sort.

--

James Kanze GABI Software http://www.gabi-soft.fr

Conseils en informatique orientée objet/

Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Dec 17, 2004, 4:31:31 AM12/17/04

to

> > The basic difference is that for RB-trees, 1-bit is required to store

> the

> > colour of a node.

> > For AVL trees, 2-bits are required to store whether a node is 0

> > (balanced), -1 (left-tree heavy) or +1 (right-tree heavy).

> > That is an extra bit to store compared to RB trees.

>

> I don't think that this is a consideration in pratical implementations.

> There are two solutions concerning where to put this information:

> the

> > colour of a node.

> > For AVL trees, 2-bits are required to store whether a node is 0

> > (balanced), -1 (left-tree heavy) or +1 (right-tree heavy).

> > That is an extra bit to store compared to RB trees.

>

> I don't think that this is a consideration in pratical implementations.

> There are two solutions concerning where to put this information:

There is a another possibility forAVL trees. You can restrict yourself to

one-sided AVL trees. insert() and erase() become more complicated but can

still be done in O(log N) (I guess the constant is bigger).

> - Portabily, you add a field to the structure: bool for a RB tree,

> [snip]

> 2, you have even more free bits.)

Thanks. I have seen both these solutions.

> > I would also expect AVL trees to faster at doing a map lower_bound()

> > or find() compared with RB trees at the trade-off expense of being

> > slower to do an insert() or erase(). For RB trees, it is the

> opposite.

Putting figures:

Worst and Best case height bounds of an AVL tree is:

ceil(log2(N) + 1) <= h(N) <= 1.44log2(N+2) - 0.33

Maximum number of rotations to restore balancing on insert/delete for a tree

of height h:

ceil(h(N)/2)

Worst and Best case height of an Red-Black tree is:

floor(log2(N+1)) <= h(N) <= 2 * log2 (N+1)

Inserts can be done with a maximum of 2 rotations and O(log(N)) recolourings

Deletes can be done with a maximum of 3 rotations and O(log(N)) recolourings

The low number of rotations of Red-Black trees compared with AVL trees

needed to maintain the balance constraints gives it the edge for inserts and

deletes.

Stephen Howe

Dec 19, 2004, 7:41:04 PM12/19/04

to

"Stephen Howe" <stephenPOINThoweAT...@eu.uu.net> wrote in message

news:41c20336$0$19158$ed9e...@reading.news.pipex.net...

>> > The basic difference is that for RB-trees, 1-bit is required to store

>> the

>> > colour of a node.

>> > For AVL trees, 2-bits are required to store whether a node is 0

>> > (balanced), -1 (left-tree heavy) or +1 (right-tree heavy).

>> > That is an extra bit to store compared to RB trees.

>>

news:41c20336$0$19158$ed9e...@reading.news.pipex.net...

>> > The basic difference is that for RB-trees, 1-bit is required to store

>> the

>> > colour of a node.

>> > For AVL trees, 2-bits are required to store whether a node is 0

>> > (balanced), -1 (left-tree heavy) or +1 (right-tree heavy).

>> > That is an extra bit to store compared to RB trees.

>>

You only need 1 bit to store the balance of an AVL tree node,

if you store the bit in the subtree.

If a subtree is heavy, you mark it with heavy = 1,

otherwise mark it with heavy = 0.

The balanceof a node is heavy(right) - heavy(left)

...

> Putting figures:

>

> Worst and Best case height bounds of an AVL tree is:

> ceil(log2(N) + 1) <= h(N) <= 1.44log2(N+2) - 0.33

> Maximum number of rotations to restore balancing on insert/delete for a tree

> of height h:

> ceil(h(N)/2)

Inserting in an AVL tree takes at most 1 (double) rotation.

>

> Worst and Best case height of an Red-Black tree is:

> floor(log2(N+1)) <= h(N) <= 2 * log2 (N+1)

> Inserts can be done with a maximum of 2 rotations and O(log(N)) recolourings

> Deletes can be done with a maximum of 3 rotations and O(log(N)) recolourings

>

> The low number of rotations of Red-Black trees compared with AVL trees

> needed to maintain the balance constraints gives it the edge for inserts and

> deletes.

>

If you look per operation only delete operations needs more

rotations in an avl tree.

According to Knuth's The Art of Computer Programing,

on avarage, you need +/- 0.21 rotations per delete. operation..

The worst case for both avl and red-black trees is still O(log n) operations

(adjusting balance factor or recoloring).

Greetings,

Hans.

Dec 29, 2004, 3:47:19 AM12/29/04

to

Daniel K. O. escreveu:

>(if not, I'll implement

>one myself, but I'm not that familiar with implementing a STL-compliant

>container, so it'll take some time...)

>

>

I finally got time to work on this.

I found a few avl-map implementations, but all of them had "problems", like

* Used recursive algorithms for insertion/deletion;

* Stored the node's height instead of the balancing factor, so each

insertion/deletion required a height computation;

* Simply didn't had all the required std::map members, so it couldn't be

used like a std::map substitute.

So I changed the libstdc++'s RB-tree implementation a bit, and now I got

this:

http://www.inf.ufrgs.br/~dkosmari/avlmap/

The headers "map.h", "stl_map.h" and "stl_tree.h" are mostly the

original GCC headers.

The tree.cpp has the insertion/deletion implementation, that I wrote

from scratch, inspired a bit by the original RB-tree insertion/deletion

code.

"main.cpp" is the test program.

Everything is working fine but the "_avl_tree_rebalance_for_erase()"

function: it's corrupting the balancing factors. :(

The first half of the code is still the original (from the RB-tree), so

I'm (almost) sure the problem is in the second half (the "rebalancing"

code, that I wrote myself). But I already spent some hours trying to

debug it, without any success. This is the first time I'm implementing

an AVL tree, so maybe I'm doing something stupid without noticing. I

would be very happy if anyone could help me finding the problem. The

other AVL implementations I read until now couldn't help me on this.

PS: sorry if this post should go a generic algorithms group, but as I

started the thread here, and this is the only thing stopping me from

closing this thread (RB vs AVL). Also, when this implementation is

finished, it may be useful for other C++ programmers.

PS2: many thanks to all who replied my original message, for your

insightful comments.

Jan 11, 2005, 5:39:57 AM1/11/05

to

Hello everybody.

Sorry for taking so long to end this thread. I finally had time and

motivation to fix the deletion function, so it's working now. :)

All the files: http://www.inf.ufrgs.br/~dkosmari/avlmap/

Everything in a tar.gz archive:

http://www.inf.ufrgs.br/~dkosmari/avlmap.tar.gz

Implementation details:

This is a map/multimap implementation using an AVL tree; almost all the

code (not the insertion/deletion routines) was copied from the libstdc++

(I'm using MinGW/GCC 3.4.3), so it should work without any problem with GCC.

The code still needs a bit of cleanup; the map/multimap/tree containers

are on the "ext" namespace.

AFAIK this is the only std::map implementation using an AVL tree; ok,

it's for GCC only, but at least it works like a std::map (actually it

*is* a std::map). The best implementations I found were incomplete (e.g.

no rbegin/rend, no lower_bound, etc).

Tests:

In the archive there's a main.cpp file to test some operations (sorry, I

don't know how to do benchmarks...). Unfortunately, the

insertion/deletion operations involve memory allocation/deallocation; so

to minimize this overhead I only used integers (and not big objects,

like strings). My system is Win98 SE on an Athlon XP 2400+ with 256 MB.

For compilation flags, check the Makefile.win file. I didn't use

optimization flags because I didn't know if the libstdc++ was compiled

with optimizations; considering the numbers, I'm guessing it was

compiled without optimizations (the only files "compiled" is the

libstdc++ "tree.cpp" and my "tree.cpp").

Here are the numbers:

-----------------------

Test: sorted_insertion (+)

AVL: 1261779 (1057.49 ms)

RB: 1770125 (1483.54 ms)

Test: sorted_insertion (-)

AVL: 1322594 (1108.46 ms)

RB: 1839192 (1541.42 ms)

Test: random_insertion

AVL: 1396518 (1170.42 ms)

RB: 1423864 (1193.34 ms)

Test: traverse (+)

AVL: 8230 (6.89753 ms)

RB: 7742 (6.48854 ms)

Test: traverse (-)

AVL: 10588 (8.87377 ms)

RB: 9859 (8.26279 ms)

Test: random_deletion

AVL: 471595 (395.242 ms)

RB: 474115 (397.354 ms)

---------------------

Results:

Looks like a sorted insertion in an RB-tree is slower. This is the first

test I tried, so I tested it with different optimization levels (from

-O1 to -O3, with and without -fexpensive-optimizations), and the

AVL-tree always was at least 200 ms faster (~400 ms vs ~600 ms). OTOH

the random insertion test was identical with both trees. (on this

specific report, the AVL test was "faster", but most of times the RB

tree was "faster")

Traversing was a bit strange: the code is exactly the same (just a small

difference for AVL) for both implementations (*_tree_increment and

*_tree_decrement, in tree.cpp), but the RB was always "faster", no

matter what optimization flags I used.

What the tests show is: nothing. :P Seems that the conditions aren't the

same and I can't reproduce the same compilation. I think I would have to

recompile my libstdc++ on my system to perform a fair test. This would

be easier to do on a GNU/ Linux box, and I don't have one installed here

right now. I can say that there's a difference between the two trees

(some tests have the same result, some have different), but I don't know

which one is faster (because I don't know which one was compiled with

more optimizations).

So:

* Could anyone try the test on a GNU/Linux box with a "fair"

environment (compile both libstdc++ and my code with the same flags) and

see if there's any difference?

* Comments about my code are welcome. :)

* I would be a bit sad if so many hours spent coding this were to be

wasted; is there any GCC hacker that would like to put this code into

libstdc++ / ext? (well, maybe some more tests may show it's faster than

the RB tree...)

Thank you all for your attention.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu