Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

what type of tree algorithm is this?

36 views
Skip to first unread message

James

unread,
Nov 11, 2009, 12:22:48 AM11/11/09
to
I am exploring tree data-structures and was wondering what the name of the
following tree algorithm is? I created this off the top of my head and know
that it cannot be anything new. Here is full example code in C:


#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <limits.h>
#include <time.h>


#define HASH_DEPTH 1024
#define HASH(k) ((k) & (HASH_DEPTH - 1))


#define TREE_SEARCH_ALGO_BINARY 0
#define TREE_SEARCH_ALGO_BIT 1


#define TREE_SEARCH_ALGO TREE_SEARCH_ALGO_BIT


typedef unsigned int tree_key;


static unsigned long int g_tree_search_count = 0;
static unsigned long int g_tree_search_count_max = 0;
static unsigned long int g_tree_search_count_min = ULONG_MAX;


struct tree
{
struct tree* left;
struct tree* right;
tree_key key;
};


struct hash
{
struct tree* buckets[HASH_DEPTH];
};


#define HASH_INIT { { NULL } }


struct tree*
tree_sys_find(struct tree*** pproot,
tree_key key)
{
unsigned long int count = 0;
tree_key bit = 0;
struct tree** proot = *pproot;
struct tree* t = *proot;

while (t)
{
++count;

if (t->key == key)
{
break;
}

#if (TREE_SEARCH_ALGO == TREE_SEARCH_ALGO_BIT)

if ((1 << bit) & key)
{
proot = &t->left;
t = t->left;
}

else
{
proot = &t->right;
t = t->right;
}

#else

if (key < t->key)
{
proot = &t->left;
t = t->left;
}

else
{
proot = &t->right;
t = t->right;
}

#endif

++bit;
}

*pproot = proot;

g_tree_search_count += count;

if (count > g_tree_search_count_max)
{
g_tree_search_count_max = count;
}

else
{
g_tree_search_count_min = count;
}

return t;
}


int
tree_insert(struct tree** proot,
struct tree* tnew)
{
if (! tree_sys_find(&proot, tnew->key))
{
tnew->left = tnew->right = 0;
*proot = tnew;
}

return 0;
}


struct tree*
tree_find(struct tree** proot,
tree_key key)
{
return tree_sys_find(&proot, key);
}


int
hash_insert(struct hash* h,
struct tree* tnew)
{
return tree_insert(&h->buckets[HASH(tnew->key)], tnew);
}


struct tree*
hash_find(struct hash* h,
tree_key key)
{
return tree_find(&h->buckets[HASH(key)], key);
}

#define DEPTH (HASH_DEPTH * 10000)


int main(void)
{
size_t i;
struct tree* tn;
static struct hash hash_init = HASH_INIT;
struct hash h = hash_init;
static struct tree nodes[DEPTH];

puts("Randomized Insertion\n--------------------------");

for (i = 0; i < DEPTH; ++i)
{
nodes[i].key = rand();
hash_insert(&h, nodes + i);
}

for (i = 0; i < DEPTH; ++i)
{
tn = hash_find(&h, nodes[DEPTH - i - 1].key);
assert(tn);
assert(tn->key == nodes[DEPTH - i - 1].key);
}

printf("inserts/finds: %lu\n"
"hash depth: %lu\n"
"average: %lu\n"
"maximum: %lu\n"
"minimum: %lu\n",
DEPTH,
HASH_DEPTH,
(g_tree_search_count / 2) / DEPTH,
g_tree_search_count_max,
g_tree_search_count_min);


puts("\n\nSequential Insertion
(Worst-Case)\n--------------------------");

h = hash_init;
g_tree_search_count = 0;
g_tree_search_count_max = 0;
g_tree_search_count_min = ULONG_MAX;

for (i = 0; i < DEPTH; ++i)
{
nodes[i].key = i;
hash_insert(&h, nodes + i);
}

for (i = 0; i < DEPTH; ++i)
{
tn = hash_find(&h, nodes[DEPTH - i - 1].key);

if (! tn)
{
assert(tn);
abort();
}

if (tn->key != nodes[DEPTH - i - 1].key)
{
assert(tn->key == nodes[DEPTH - i - 1].key);
abort();
}
}

printf("inserts/finds: %lu\n"
"hash depth: %lu\n"
"average: %lu\n"
"maximum: %lu\n"
"minimum: %lu\n",
DEPTH,
HASH_DEPTH,
(g_tree_search_count / 2) / DEPTH,
g_tree_search_count_max,
g_tree_search_count_min);

return 0;
}


I was not happy with a binary tree, so instead of comparing keys, I simply
check of bits are set or not. Each bit represents a level in the tree, so if
`unsigned long int' is 32-bit's then the tree will only have 32 levels and
will never have to search over 32 nodes to find a key (e.g., searching
100000000 items will never take more than 32 iterations of the search loop
in `tree_sys_find()'. Also, it seems to balance itself in the worst case
input (e.g., passing it sorted data). The plain binary tree simply tanks
when I hit it with worst case.


- Levels limited to number of bits in key
- O(number of bits in key) worst cast
- O(log N) best-case


Anyway, what do you think of it? Are my complexity numbers off?

AFAICT, it seems like this would be good, better than binary tree, to map
32/64-bit integers to specific state.

Richard Heathfield

unread,
Nov 11, 2009, 1:36:39 PM11/11/09
to
In <hdert2$nbl$1...@aioe.org>, James wrote:

> I am exploring tree data-structures and was wondering what the name
> of the following tree algorithm is? I created this off the top of my
> head and know that it cannot be anything new.

Yeah, I know. I invented AVL trees all by myselves (it was a joint
effort with a colleague) in, I think, 1998. Alas, A-V and L spotted
what I'd done, cunningly hopped into a time machine, and whizzed back
to 1962 so that they could pretend they'd invented it. (To add insult
to injury, they found a slight improvement, too!)

<code snipped unread>

> I was not happy with a binary tree, so instead of comparing keys, I
> simply check of bits are set or not. Each bit represents a level in
> the tree, so if `unsigned long int' is 32-bit's then the tree will
> only have 32 levels

Sure sounds like a trie to me. (Not a typo, by the way: trIe.) They
have quite a few uses, but they are perhaps most famous for their
starring role in compression programs.

Take a look at this:

http://www.itl.nist.gov/div897/sqg/dads/HTML/trie.html

and see if it rings a bell.

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

Gene

unread,
Nov 11, 2009, 8:04:54 PM11/11/09
to
On Nov 11, 1:36 pm, Richard Heathfield <r...@see.sig.invalid> wrote:

> In <hdert2$nb...@aioe.org>, James wrote:
> > I am exploring tree data-structures and was wondering what the name
> > of the following tree algorithm is? I created this off the top of my
> > head and know that it cannot be anything new.
>
> Yeah, I know. I invented AVL trees all by myselves (it was a joint
> effort with a colleague) in, I think, 1998. Alas, A-V and L spotted
> what I'd done, cunningly hopped into a time machine, and whizzed back
> to 1962 so that they could pretend they'd invented it. (To add insult
> to injury, they found a slight improvement, too!)
>
> <code snipped unread>
>
> > I was not happy with a binary tree, so instead of comparing keys, I
> > simply check of bits are set or not. Each bit represents a level in
> > the tree, so if `unsigned long int' is 32-bit's then the tree will
> > only have 32 levels
>
> Sure sounds like a trie to me. (Not a typo, by the way: trIe.) They
> have quite a few uses, but they are perhaps most famous for their
> starring role in compression programs.
>
> Take a look at this:
>
> http://www.itl.nist.gov/div897/sqg/dads/HTML/trie.html
>
> and see if it rings a bell.

It ends up looking like a trie, but it's slightly different from the
usual. This algorithm inserts based on the prefix (starting at the
lsb) that's sufficient to reach a missing node in the current tree.
The rest of the key is ignored. So insertion order matters. A
classical trie would, I suppose, create N levels where the MSB of the
key is at bit N-1, and insertion order would be immaterial. It's a
small change, but it's a pretty clever idea for the OP's application,
which seems to be a hash table for unsigned ints.

James

unread,
Nov 11, 2009, 8:38:13 AM11/11/09
to
"James" <n...@spam.invalid> wrote in message news:hdert2$nbl$1...@aioe.org...

>I am exploring tree data-structures and was wondering what the name of the
>following tree algorithm is? I created this off the top of my head and know
>that it cannot be anything new. Here is full example code in C:
>
[...]

>
> I was not happy with a binary tree, so instead of comparing keys, I simply
> check of bits are set or not. Each bit represents a level in the tree, so
> if `unsigned long int' is 32-bit's then the tree will only have 32 levels
> and will never have to search over 32 nodes to find a key (e.g., searching
> 100000000 items will never take more than 32 iterations of the search loop
> in `tree_sys_find()'. Also, it seems to balance itself in the worst case
> input (e.g., passing it sorted data). The plain binary tree simply tanks
> when I hit it with worst case.
>
>
> - Levels limited to number of bits in key
> - O(number of bits in key) worst cast
> - O(log N) best-case
>
>
> Anyway, what do you think of it? Are my complexity numbers off?

Actually, now that I think about it, I think the complexity numbers should
be something like:


let N be the total bits in the key (e.g., 32)

O(N) = worst case
O(log N) = best case

James

unread,
Nov 11, 2009, 8:48:58 AM11/11/09
to
"Gene" <gene.r...@gmail.com> wrote in message
news:38b9873b-24bf-443b...@j24g2000yqa.googlegroups.com...

On Nov 11, 1:36 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
> In <hdert2$nb...@aioe.org>, James wrote:
[...]

> > I was not happy with a binary tree, so instead of comparing keys, I
> > simply check of bits are set or not. Each bit represents a level in
> > the tree, so if `unsigned long int' is 32-bit's then the tree will
> > only have 32 levels
>
> Sure sounds like a trie to me. (Not a typo, by the way: trIe.) They
> have quite a few uses, but they are perhaps most famous for their
> starring role in compression programs.
>
> Take a look at this:
>
> http://www.itl.nist.gov/div897/sqg/dads/HTML/trie.html
>
> and see if it rings a bell.

> It ends up looking like a trie, but it's slightly different from the
> usual. This algorithm inserts based on the prefix (starting at the
> lsb) that's sufficient to reach a missing node in the current tree.
> The rest of the key is ignored. So insertion order matters.

Right. This modified "trie" is not sorted when you traverse it like:


void traverse(struce tree* t)
{
if (t)
{
traverse(t->left);
printf("%lu\n", t->key);
traverse(t->right);
}
}


It just that I did not need to process anything in any particular order, so
this "hacked" trie seems to be working pretty darn good for my application
so far.


> A
> classical trie would, I suppose, create N levels where the MSB of the
> key is at bit N-1, and insertion order would be immaterial. It's a
> small change, but it's a pretty clever idea for the OP's application,
> which seems to be a hash table for unsigned ints.

Yeah. I was experimenting with the hash to try and take "pressure" off of a
single trie. With the hash in place I guess the complexity would be
something like:


let N = total bits in key

no collision
--------------------------------------------------------
O(1) - best-case

collision
--------------------------------------------------------
O(log N) - worst-case

collision on a trie with N levels and item is at level N
--------------------------------------------------------
O(N) - horrible-case


Who knows, this hack might be fairly useful to somebody else besides me!

;^)

Richard Harter

unread,
Nov 11, 2009, 11:39:32 PM11/11/09
to
On Tue, 10 Nov 2009 21:22:48 -0800, "James" <n...@spam.invalid>
wrote:

>I am exploring tree data-structures and was wondering what the name of the
>following tree algorithm is? I created this off the top of my head and know
>that it cannot be anything new. Here is full example code in C:
>
>

[snip code]

To be honest your code makes my teeth itch. Don't feel bad,
people probably say the same about mine. If I put some work into
it I could figure it out - code is code - but there is no
documentation, there are unexplained macros scattered in the
code, the names are not very meaningful, and the code structure
is not obvious.

I understand how that can happen. I do it myself, working out
some clever idea (or so it seems to me) that ends up being a few
lines of cryptic code. So, can you present in a couple of
paragraphs what you are trying to do?


Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.

Gene

unread,
Nov 12, 2009, 12:54:32 AM11/12/09
to
On Nov 11, 11:39 pm, c...@tiac.net (Richard Harter) wrote:
> On Tue, 10 Nov 2009 21:22:48 -0800, "James" <n...@spam.invalid>
> wrote:
>
> >I am exploring tree data-structures and was wondering what the name of the
> >following tree algorithm is? I created this off the top of my head and know
> >that it cannot be anything new. Here is full example code in C:
>
> [snip code]
>
> To be honest your code makes my teeth itch.  Don't feel bad,
> people probably say the same about mine.  If I put some work into
> it I could figure it out - code is code - but there is no
> documentation, there are unexplained macros scattered in the
> code, the names are not very meaningful, and the code structure
> is not obvious.
>
> I understand how that can happen.  I do it myself, working out
> some clever idea (or so it seems to me) that ends up being a few
> lines of cryptic code.  So, can you present in a couple of
> paragraphs what you are trying to do?
>
> Richard Harter, c...@tiac.nethttp://home.tiac.net/~cri,http://www.varinoma.com

> Infinity is one of those things that keep philosophers busy when they
> could be more profitably spending their time weeding their garden.

It's exactly the same as a BST data structure, except that the bits of
the key starting at 0 are used to branch left or right during
insertion and searching rather than the result of comparing the insert/
search key with existing node keys.

If you want the trie-ish description, consider that tries are usually
described with string keys. Here the "string" is the key bits in
reverse order, except that only a prefix of the reversed bits is used
if a vacant space is found before they've all been employed to make
left-right decisions.


Richard Harter

unread,
Nov 12, 2009, 1:41:46 AM11/12/09
to
On Wed, 11 Nov 2009 21:54:32 -0800 (PST), Gene
<gene.r...@gmail.com> wrote:

>On Nov 11, 11:39=A0pm, c...@tiac.net (Richard Harter) wrote:

>> So, can you present in a couple of
>> paragraphs what you are trying to do?

>


>It's exactly the same as a BST data structure, except that the bits of
>the key starting at 0 are used to branch left or right during
>insertion and searching rather than the result of comparing the insert/
>search key with existing node keys.
>
>If you want the trie-ish description, consider that tries are usually
>described with string keys. Here the "string" is the key bits in
>reverse order, except that only a prefix of the reversed bits is used
>if a vacant space is found before they've all been employed to make
>left-right decisions.

Thank you. Nice and succinct.


Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com

Ben Bacarisse

unread,
Nov 12, 2009, 8:44:16 AM11/12/09
to
"James" <n...@spam.invalid> writes:

> I am exploring tree data-structures and was wondering what the name of
> the following tree algorithm is? I created this off the top of my head
> and know that it cannot be anything new.

Trees of this sort come under the broad heading of "radix trees".
Because you have only two pointers, you use one bit to branch, but
you could have four pointers and use pairs of bits, or eight and use
three bits. The more pointers you have, the shallower the tree but the
more waste there is if the data is sparse.

I doubt that using N==1 bit is optimal on modern machines simply
because memory is usually abundant and cache fills are slow (but I
admit that is a guess). If you switch to using an array for your
pointers you could test various N:

struct tree {
struct tree *child[N];
tree_key key;
};

You set mask = ~((tree_key)-1 << N); (in C it helps if tree_key is an
unsigned type) and shift key >>= N; i the loop. The branch
you take is then proot = &t->child[key & mask];

Using powers of 2 for the pointer array size is simply an
optimisation. You could have 3 or 17 but the shift and mask become /
and %.

I think it would be interesting.

--
Ben.

James

unread,
Nov 12, 2009, 7:45:46 PM11/12/09
to
"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
news:0.cf389a2dfa1a5c2933e4.2009...@bsb.me.uk...

Humm... Yes, I agree. Here is what I came up with:


<pseudo-code>


typedef unsigned int tree_key;


#define BITS 4
#define MASK ~(UINT_MAX << BITS)
#define NODES (MASK + 1)


struct tree
{
struct tree* nodes[NODES];
tree_key key;
};


struct tree*
tree_find(struct tree* root,
tree_key origin_key)
{
tree_key key = origin_key;

while (root)
{
if (root->key == origin_key) break;

root = root->nodes[key & MASK];

key >>= BITS;
}

return root;
}


Is that something like what you are thinking of Ben?


Here is source code for a quick test program that implements the new
algorithm:


#include <stdio.h>
#include <assert.h>
#include <stdlib.h>

#include <string.h>
#include <limits.h>
#include <time.h>


#define QUOTEX(t)#t
#define QUOTE(t)QUOTEX(t)


#define HASH_DEPTH 1


#define HASH(k) ((k) & (HASH_DEPTH - 1))


#define TREE_BITS 4
#define TREE_MASK ~(UINT_MAX << TREE_BITS)
#define TREE_NODES (TREE_MASK + 1)


#define TREE_SEARCH_ALGO_BINARY 0
#define TREE_SEARCH_ALGO_BIT 1


#define TREE_SEARCH_ALGO TREE_SEARCH_ALGO_BIT


#if (TREE_SEARCH_ALGO == TREE_SEARCH_ALGO_BINARY)
# undef TREE_NODES
# define TREE_NODES 2
# define TREE_NAME "Normal Binary Tree\n\n\n"
#else
# define TREE_NAME "Chris' %lu-Ary %lu-Bit Trie?\n\n\n", \
TREE_NODES, TREE_BITS
#endif


typedef unsigned int tree_key;


static unsigned long int g_tree_search_count = 0;
static unsigned long int g_tree_search_count_max = 0;

static unsigned long int g_tree_search_count_min = 0;


struct tree
{
struct tree* nodes[TREE_NODES + 2];
tree_key key;
};


struct hash
{
struct tree* buckets[HASH_DEPTH];
};


#define HASH_INIT { { NULL } }


struct tree*
tree_sys_find(struct tree*** pproot,

tree_key origin_key)
{
tree_key key = origin_key;


unsigned long int count = 0;

struct tree** proot = *pproot;
struct tree* t = *proot;

while (t)
{
++count;

if (t->key == origin_key)
{
break;
}


#if (TREE_SEARCH_ALGO == TREE_SEARCH_ALGO_BIT)

proot = &t->nodes[key & TREE_MASK];
t = *proot;
key >>= TREE_BITS;

#else

if (origin_key < t->key)
{
proot = &t->nodes[0];
t = *proot;
}

else
{
proot = &t->nodes[1];
t = *proot;
}

#endif

}

*pproot = proot;

g_tree_search_count += count;

if (count > g_tree_search_count_max)
{
g_tree_search_count_max = count;
}

else
{
g_tree_search_count_min = count;
}

return t;
}


int
tree_insert(struct tree** proot,
struct tree* tnew)
{
if (! tree_sys_find(&proot, tnew->key))
{

memset(tnew->nodes, '\0', sizeof(tnew->nodes));
*proot = tnew;
}

return 0;
}


struct tree*
tree_find(struct tree** proot,
tree_key key)
{
return tree_sys_find(&proot, key);
}


void
tree_iterate(struct tree* root)
{
if (root)
{
size_t i;

for (i = 0; i < TREE_NODES; ++i)
{
tree_iterate(root->nodes[i]);
}

printf("tree_iterate: %lu\n", root->key);
}
}


int
hash_insert(struct hash* h,
struct tree* tnew)
{
return tree_insert(&h->buckets[HASH(tnew->key)], tnew);
}


struct tree*
hash_find(struct hash* h,
tree_key key)
{
return tree_find(&h->buckets[HASH(key)], key);
}


#define DEPTH 1000000


int main(void)
{
size_t i;
struct tree* tn;

static struct tree nodes[DEPTH];


static struct hash hash_init = HASH_INIT;
struct hash h = hash_init;


srand((unsigned int)time(NULL));


printf("Testing Algorithm: " TREE_NAME);


puts("\nRandomized Insertion\n"
"----------------------------------------");


g_tree_search_count = 0;
g_tree_search_count_max = 0;
g_tree_search_count_min = ULONG_MAX;

for (i = 0; i < DEPTH; ++i)
{

nodes[i].key = rand();
hash_insert(&h, nodes + i);
}

for (i = 0; i < DEPTH; ++i)
{
tn = hash_find(&h, nodes[DEPTH - i - 1].key);

if (! tn)
{
assert(tn);
abort();
}

if (tn->key != nodes[DEPTH - i - 1].key)
{
assert(tn->key == nodes[DEPTH - i - 1].key);
abort();
}
}

printf("inserts/finds: %lu\n"
"hash depth: %lu\n"
"average: %lu\n"
"maximum: %lu\n"
"minimum: %lu\n",
DEPTH,
HASH_DEPTH,
(g_tree_search_count / 2) / DEPTH,
g_tree_search_count_max,
g_tree_search_count_min);


puts("\n\n\nSequential Worst-Case Insertion\n"
"----------------------------------------");

puts("\n\n\nTest Completed!");

return 0;
}


Any thoughts?

;^)

James

unread,
Nov 12, 2009, 7:58:43 PM11/12/09
to
"James" <n...@spam.invalid> wrote in message news:hdjkcu$tqv$1...@aioe.org...

[...]


It seems as if I could also use a similar algorithm to store strings into
the tree. If I broke the string apart into 32-bit integers and used all of
them as the key.

What do you think?

Ben Bacarisse

unread,
Nov 13, 2009, 1:18:28 PM11/13/09
to
"James" <n...@spam.invalid> writes:
<snip>

> It seems as if I could also use a similar algorithm to store strings
> into the tree. If I broke the string apart into 32-bit integers and
> used all of them as the key.
>
> What do you think?

There are lots of well-documented ways to do this for strings.
Breaking the string into an artificial unit (the 32-bit integer) does
not seem like the obvious way to go, but my gut reaction is no
substitute for proper analysis and experimentation. See how it goes.

--
Ben.

Gene

unread,
Nov 13, 2009, 10:54:32 PM11/13/09
to
On Nov 11, 11:39 pm, c...@tiac.net (Richard Harter) wrote:
> To be honest your code makes my teeth itch.

I love this simile. Pardon if I steal it. Some friends of mine have
absolutely earned the same complaint.

0 new messages