I need to store a tree structure like the example bellow. I searched
the web and found nothing really clear to solve this.
Building the tree can be slow, but searching must be really fast.
The initial idea is to store URLs as trees and give a set of values
for each level
http://example.com/level1/level2/level3/
http://example.com/level1/level2/level31/
example.com: x=10, y=20
|- level1: x=22, y=41
|- level2: x=44, y=65
|- level3: x=44, y=11
|- level31: x=4, y=14
And if I do a query for:
http://example.com/level1/level2/level3/anythingelse/somefile.html
It must return me x=44 and y=11, because it's the ideal path for this URL.
Do you have any ideas on how to use Redis to do this?
I saw something about using sorted lists, but I'm not understanding
how to do it.
thanks in advance!
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
I already read this thread, but I'm worried about searching the tree.
For example, If I store a tree by using simple KV and some kind of
relationship by IDs I will probably need to do lot's of commands in
Redis to match the URL, right?
I'm new to Redis and don't know if some of it's features can help
doing a fast "tree" search and what's the best way to store a tree to
enable a fast search.
1) Store each path with a score, doesn't really matter how. Lets go
with a hash with the url as the key, and the fields x and y with the
score, like this:
path:example.com {x: 10, y: 20}
path:example.com/level1 {x: 22, y: 41}
... etc
2) Store an index of each path returning a score in a sorted set
(score 1 so it is sorted alphanumericly), lets call it "paths":
example.com
example.com/level1
example.com/level1/level2
example.com/level1/level2/level3
example.com/level1/level2/level31
3) Each request will need a unique id, a uuid would work. Split the
requested url into paths of different depths. for example:
split http://example.com/level1/level2/level3/anythingelse/somefile.html into:
example.com
example.com/level1
example.com/level1/level2
example.com/level1/level2/level3
example.com/level1/level2/level3/anythingelse
example.com/level1/level2/level3/anythingelse/somefile.html
4) Now send the request to the redis server:
MULTI
// add each path to a unique sorted set
ZADD req:uuid 1 example.com
ZADD req:uuid 1 example.com/level1
ZADD req:uuid 1 example.com/level1/level2
ZADD req:uuid 1 example.com/level1/level2/level3
ZADD req:uuid 1 example.com/level1/level2/level3/anythingelse
ZADD req:uuid 1 example.com/level1/level2/level3/anythingelse/somefile.html
// store all the intersecting paths in another sorted set
ZINTERSTORE res:uuid 2 paths req:uuid
// get the last matching path from the intersection (this is the only
value you need)
ZREVRANK res:uuid 0
//clean up
DEL req:uuid res:uuid
EXEC
The return value for the ZREVRANK command should be the deepest path
that matches the request. From there you can query for the scores. It
might be possible to do this all in one request by doing this instead
of the ZREVRANK:
SORT res:uuid LIMIT 0 1 GET # GET path:*->x GET path:*->y ALPHA DESC
Using SORT might also allow you to just use plain sets instead of
sorted sets since it will be sorting them anyways, which would save on
memory.
--
Derek
I learned a lot of Redis by studying your ideas!
I'll try all them (including the ones using SORT) and after that post
the final conclusions here.
Thanks again!
This is the final and tested idea. It worked very well! Thanks Derek!
** Storing the URL tree in Redis:
SET path:domain.com { 'x': 100, 'y': 200 }
SET path:domain.com/a { 'x': 100, 'y': 201 }
SET path:domain.com/b { 'x': 100, 'y': 202 }
SET path:domain.com/b/c { 'x': 100, 'y': 203 }
ZADD paths:domain.com 1 "domain.com"
ZADD paths:domain.com 1 "domain.com/a"
ZADD paths:domain.com 1 "domain.com/b"
ZADD paths:domain.com 1 "domain.com/b/c"
** Querying the URL tree (Ex.: domain.com/b/c/page.html):
MULTI
ZADD req:HASH 1 "domain.com"
ZADD req:HASH 1 "domain.com/b"
ZADD req:HASH 1 "domain.com/b/c"
ZADD req:HASH 1 "domain.com/b/c/page.html"
ZINTERSTORE req:HASH 2 paths:domain.com req:HASH
SORT req:HASH LIMIT 0 1 GET path:* ALPHA DESC
DEL req:HASH
EXEC
I've implemented this in Python and this "transaction" took about 54
secs to query 100k URLs (~1852 tps) in a dataset of about 1k different
"paths", in a desktop (Core 2 Duo E75000 2.93Ghz).
Thanks!
Oh good, it worked! I'll have to actually give it a try at some point too.
ZSET: 48.7541201115 secs
MGET: 13.725041151 secs