Performances and some extra feature with tries

79 views
Skip to first unread message

Chris

unread,
Aug 3, 2010, 1:27:17 AM8/3/10
to Redis DB
I wonder if instead of standard hashed REDIS were using Judy array
(Speed comparison: http://judy.sourceforge.net/examples/Judy_hashing.pdf
Project: http://judy.sourceforge.net/) or Trie (http://
en.wikipedia.org/wiki/Trie), it would be even faster. I read some
stats that says that tries are on average 20% faster than hashes. Here
you find a little description of the pros of tries:
http://stackoverflow.com/questions/245878/how-do-i-choose-between-a-hash-table-and-a-trie-prefix-tree

Furthermore, tries would add nearly for free the feature to return all
the keys that start with a prefix. This is very useful to delete at
once some keys (ex "updates:*") and I can think other 2 or 3 cases
when this may be useful. And it would make Redis the perfect solution
for autocompletes (I saw someone already requested it in the past
here: http://code.google.com/p/redis/issues/detail?id=110).

There are libraries in C so it should be easier to write a wrapper on
these libraries that implement the functions in dict.and try how they
work. There is also an implementation in the Linux kernel used for
routing.


What do you think?

Best,
Chris

Chris

unread,
Aug 3, 2010, 1:36:09 AM8/3/10
to Redis DB
This is probably the best place to look at regarding (tries) :
http://www.nedprod.com/programs/portable/nedtries/

On Aug 2, 10:27 pm, Chris <cristian...@gmail.com> wrote:
> I wonder if instead of standard hashed REDIS were using Judy array
> (Speed comparison:http://judy.sourceforge.net/examples/Judy_hashing.pdf
> Project:http://judy.sourceforge.net/) or Trie (http://
> en.wikipedia.org/wiki/Trie), it would be even faster. I read some
> stats that says that tries are on average 20% faster than hashes. Here
> you find a little description of the pros of tries:http://stackoverflow.com/questions/245878/how-do-i-choose-between-a-h...

Jak Sprats

unread,
Aug 3, 2010, 11:37:16 PM8/3/10
to Redis DB
Hi Chris,

Tries are usually used for alphabetic keys and depending on the spread
of the keys it can be faster and have a smaller memory imprint than
hash-tables, but Tries are bad if key lengths are long as they usually
store the unique part of the key (plus pointers for each letter in
worst case).

Nedtries are a good idea to try out, and overriding a couple of
functions in dict.c should suffice (as well as commenting out anything
having to do w/ rehashing), but possibly the question to ask
beforehand is what needs to be tested.

Its easy to come up w/ a test that shows nedtries destroy hash-tables
and another test that shows the opposite.

There are no tools around that test the wide wide spectrum of data
that redis can hold, and w/o such tools, a single test proves too
little. Creating such tools is also HARD, but I guess for this
specific test, performing speed and memory tests while varying key-
length would be pretty meaningful.

Not all Nosql engines use hash-tables, some use B+Trees which are
slower, but much more memory efficient and are sorted.
I overrode the hash code in dict.c w/ some B+Tree code, and it
performed at about 60-70% the speed of the hash table, but the memory
usage was drastically improved (w/ some optimisations). The exact
benchmarks to illustrate this are difficult as the additional variable
of where the value in the key-value tuplet was stored greatly
decreased memory usage (so benchmarking has to vary value-size).

I am trying to say, if you do override the hash-table w/ a nedtrie (or
whatever), make sure you come up w/ some meaningful tests/
benchmarks ... a topic I am definitely interested in.

- Jak

Jeremy Zawodny

unread,
Aug 4, 2010, 5:12:34 PM8/4/10
to redi...@googlegroups.com
Jak...

On Tue, Aug 3, 2010 at 8:37 PM, Jak Sprats <jaks...@gmail.com> wrote:
Hi Chris,


I overrode the hash code in dict.c w/ some B+Tree code, and it
performed at about 60-70% the speed of the hash table, but the memory
usage was drastically improved (w/ some optimisations). The exact
benchmarks to illustrate this are difficult as the additional variable
of where the value in the key-value tuplet was stored greatly
decreased memory usage (so benchmarking has to vary value-size).

that's very interesting... 

I was recently wishing for a Redis option to use B+Tree structures for some data (where you're willing to trade that performance for memory footprint and range queries).  Did you publish on github?  Any idea of antirez is interested in supporting it somehow?

Jeremy

Jak Sprats

unread,
Aug 4, 2010, 6:52:12 PM8/4/10
to Redis DB
Hi Jeremy,

yeah range queries are nice, right? And some use cases require minimal
memory use (like if you have a box w/ not much RAM or you have a lot
of data)

I can put the code I have on github in about 7-10 days (it needs
polishing/testing/etc...) The code is very stable, but its APIs were
tailored for my use cases, so I have to make them generic APIs. I will
need another 7-10 days to get the Iterators polished/tested/etc...

I dont have any idea if redis/antirez/pieter plan on supporting B
+trees, but I havent heard anything, and the code involved is non-
trivial, B+Trees are one of the more complex data structures.

So more to come in a fortnight :)

- Jak

On Aug 4, 2:12 pm, Jeremy Zawodny <Jer...@Zawodny.com> wrote:
> Jak...
>
Reply all
Reply to author
Forward
0 new messages