[erlang-questions] Trie Data Structure

75 views
Skip to first unread message

Michael Truog

unread,
Dec 29, 2010, 9:21:05 PM12/29/10
to erlang-q...@erlang.org
Hi,

I implemented a trie data structure that has performed 20% faster lookups
than the dict implementation in R14B01. I have tested the code but it can
always benefit from more testing. The trie module provides the same
interface as the dict module, but requires a list of integers for the key.
Please tell me if you find issues in the code.

Code:
https://github.com/okeuday/CloudI/blob/master/src/lib/cloud_stdlib/src/trie.erl

Blog posts:
http://okeuday.livejournal.com/16454.html
http://okeuday.livejournal.com/16182.html (dict as the fastest general tree
data structure)

I know you can use ETS for similar problems, but avoiding shared data helps
utilize Erlang's scalability advantages.

Thanks,
Michael Truog

Ryan Zezeski

unread,
Dec 30, 2010, 12:04:57 AM12/30/10
to Michael Truog, erlang-q...@erlang.org

Do you have the benchmark and dataset available as well? I wrote a trie in
Erlang a few weeks ago and I'm just curious how it stacks up.

--
*.........01010...00..00...0100...00..00.
.........10..11...0100...11..00..101.01.
.111010..01100.....10....101110..01.101.
.........10..10....11....01..10..10..01.
.........00..00....00....00..00..00..00.
........................................*

Michael Truog

unread,
Dec 30, 2010, 9:54:47 AM12/30/10
to Ryan Zezeski, erlang-q...@erlang.org
The trie:test_speed/0 function is what I used for determining how fast the
trie was for lookups, when compared to a dict. The wordlist used comes from
the Ubuntu 10.04 installation and is located at /usr/share/dict/words (98569
words with an average of 9.5 characters per word). You can also specify
your own wordlist with test_speed/1. The comparison stores all the words
and does lookups on all the words (using is_key, fetch, and find) while
tracking the time taken to do all the lookups.

Michael Truog

unread,
Jan 2, 2011, 4:31:56 PM1/2/11
to Ryan Zezeski, erlang-q...@erlang.org
Actually, use the benchmark code here:
https://github.com/okeuday/erlbench

Blog Post:
http://okeuday.livejournal.com/16707.html

The trie fetch performance is roughly 15% faster than a dict for 10000
entries, but might get better as the number of entries increases (though the
performance depends on the distribution of the key size, i.e., string
length).

Robert Virding

unread,
Jan 2, 2011, 8:58:16 PM1/2/11
to Michael Truog, erlang-q...@erlang.org, Ryan Zezeski
It is a little unfair comparison you have done between a trie and dict/gb_trees/... as the trie is designed for a specific type of key, a list of integers/characters, while the others can have keys which can be any valid term. So the trie *should* be more efficient for this case. That being said this definitely shows the need of having more good implementations of different data structures like the trie. Improving our toolbox is never wrong.

Robert

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questio...@erlang.org

Ulf Wiger

unread,
Jan 3, 2011, 3:38:53 AM1/3/11
to Robert Virding, Michael Truog, erlang-q...@erlang.org, Ryan Zezeski

Still, this is an important point. It is easy to always go for the ultimately
type-agnostic solution with Erlang, but there is of course a cost involved.

If performance is sufficiently important, it is good to have alternatives that
limit your options in exchange for speed.

BR,
Ulf W

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com

Reply all
Reply to author
Forward
0 new messages