Prefix search

545 views
Skip to first unread message

Anto

unread,
May 25, 2012, 1:37:46 PM5/25/12
to redi...@googlegroups.com
Hi

I'm trying to implement a prefix search and choose the most accurate,
I plan to use searches for expressions (PHP) but maybe there is some
function in Redis without having to reinvent the wheel (I'm newbie :)
).

If I have a list like this:

..........
35554
35554249
35554250
35554251
35554252
35557
35568
35569
..........
..........

By checking the number 3555434548 would choose the 35554 and the value
that contains this key. By checking the number 3555424956 and elect
the value 35554249. Has anyone done something similar or can advise?
Thanks !

Regards

Dvir Volk

unread,
May 25, 2012, 1:55:56 PM5/25/12
to redi...@googlegroups.com
I'm not 100% sure I got you right, but it sounds a bit like the opposite of a prefix search.
You have a long expression and want to check the longest prefix that is part of it, right?
a prefix search usually means having a short prefix and choosing all expressions that this expression is a prefix of.

what you want can be pretty easily done with a set - just create a set of all your "prefixes", and then search it for your string, starting at one character and adding one at a time, until you come empty handed. the previous member of the set that matches the substring, is the longest prefix available.
The time complexity of this is rather small - it's relative only to the length of the checked string.

But maybe I got you wrong, so in that case please correct me.


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.


Anto

unread,
Jul 3, 2012, 6:23:24 PM7/3/12
to redi...@googlegroups.com
I'll try what I said. I will also see the different functions of
redistribution is that I see no examples of such capacity. Maybe there
is one with which to make this easy. thank you very much.

regards

2012/5/25 Dvir Volk <dv...@everything.me>:

Abdelkader Allam

unread,
Jul 3, 2012, 6:31:22 PM7/3/12
to redi...@googlegroups.com
Hello Anto,

Dvir is perfectly right, you are trying to do best/longuest match, is it for billing purpose, or you are setting up a LCR engine?

If you got a lot of dialcodes I would suggest you to implement a tree based algorithm and use redis to store the tree as a marshalled/pickle object rather than using sets.

I have done such kind of stuff in python and developed a specific module in C in order to handle very very fast longuest match search (i had millions of entries  in it, I don't know if you've reach this level of complexity).

Anto

unread,
Jul 10, 2012, 7:23:32 AM7/10/12
to redi...@googlegroups.com
Hello,

Exactly similar to a "LCR" used in telephony, but could also be used
to search for articles, products, etc.. For example: have some items
that begin with certain numbering and according to this assigned to a
product category.

I am looking to find something similar, but have not found an example
(LCR or similar). I will try to find out if I can find in C, python,
php or similar. Thank you very much.

Best regards

2012/7/4 Abdelkader Allam <abdelkad...@gmail.com>:

Anto

unread,
Sep 4, 2012, 5:44:55 PM9/4/12
to redi...@googlegroups.com
Hello

I'm still looking for sample code to develop something similar, but I
can not find. Does anyone know where I can find it? Thank you very
much.

regards
anto

2012/7/10 Anto <pot...@gmail.com>:

Josiah Carlson

unread,
Sep 4, 2012, 7:15:01 PM9/4/12
to redi...@googlegroups.com
I believe Dvir's suggestion earlier was right on.

If you want to match X against a group of prefixes Y, you would add
all of your prefixes Y to a Redis SET. Then to determine the longest
prefix of X that was in Y, you would perform something like...

def find_longest_prefix(conn, x):
for le in xrange(len(x), 0, -1):
if conn.sismember('candidates', x[:le]):
return x[:le]
return None

That is, you start with the entire string to see if it is in the group
of prefixes. If it isn't, you trim off a trailing character and check
again.

If you use a pipeline, you could check all of the prefixes in one
Redis round-trip.

Regards,
- Josiah

Dvir Volk

unread,
Sep 5, 2012, 2:50:22 AM9/5/12
to redi...@googlegroups.com
Just as a side note, if the common prefix is short or the text is long, it might be more efficient to pre extract all the prefixes, match them against the set in one transaction (or if you're using singular keys it can be done with MGET or something like that), and finding the longest match.

less efficient in theory, but in practice it might work faster.

the best of both worlds would be implementing this in Lua.


Dvir Volk
Chief Architect, Everything.me

Anto

unread,
Sep 6, 2012, 7:10:32 PM9/6/12
to redi...@googlegroups.com
Ok, look what I have said. I searched on github and software source
code, but have not found anything in any language (including lua) that
I can use as an idea. Thank you very much.

Best regards

2012/9/5 Dvir Volk <dvi...@gmail.com>:

Anto

unread,
Sep 11, 2012, 7:35:44 PM9/11/12
to redi...@googlegroups.com
Hello

I have no problem in the development of the application, my problem is
the structure of the data in redis. The best and optimization, because
I lack knowledge even redis.

Using mySQL, would have a structure like this:

id (integer)
prefix (string)
supplier (varchar)
price (int)
active (boolean)
description (varchar)

SELECT * FROM supplier WHERE prefix LIKE $ prefixw. '%' ORDER BY ASC
prefix; # 3 characters

With the methods the data returned in a loop and I checking which is
closer, using it.

Best regards

2012/9/7 Anto <pot...@gmail.com>:

Josiah Carlson

unread,
Sep 12, 2012, 12:04:52 AM9/12/12
to redi...@googlegroups.com
My previous post included source code to solve the problem in Python.

Was the solution not clear?

Regards,
- Josiah

Didier Spezia

unread,
Sep 12, 2012, 4:58:32 AM9/12/12
to redi...@googlegroups.com

>> I have no problem in the development of the application, my problem is 
>> the structure of the data in redis. The best and optimization, because 
>> I lack knowledge even redis. 

If you have additional data to associate to the prefix, you can easily adapt
the solution given by Dvir and Josiah by using a hash instead of a set.

You need one hash object per supplier, plus one hash to index all prefixes.

Example:

# Here are my suppliers
> ./redis-cli hmset supplier:1 prefix 5690 supplier Apple price expensive active Y description iJunk
OK
> ./redis-cli hmset supplier:2 prefix 569080 supplier Acer price cheap active Y description junk
OK

# Here is the index of prefixes, which I need to maintain each time I add/delete suppliers
> ./redis-cli hmset prefix 5690 1 569080 2
OK

# To check a given number, generate all sub-prefixes starting by the end (as explained by Dvir and Josiah)
# and check their existence in one rountrip
# Here I check 5690801 
> ./redis-cli hmget prefix 5690801 569080 56908 5690 569 56 5
1) (nil)
2) "2"
3) (nil)
4) "1"
5) (nil)
6) (nil)
7) (nil)

Form the result, you can easily find the record you are interested in
(take the first non null result).

# Then, you just have to retrieve the properties of the corresponding supplier
> ./redis-cli hgetall supplier:2
 1) "prefix"
 2) "569080"
 3) "supplier"
 4) "Acer"
 5) "price"
 6) "cheap"
 7) "active"
 8) "Y"
 9) "description"
10) "junk"

Regards,
Didier. 

Anto

unread,
Sep 12, 2012, 11:57:27 AM9/12/12
to redi...@googlegroups.com
HI

Sorry, I had not seen this post. Thanks !!

Best regards

2012/9/12 Josiah Carlson <josiah....@gmail.com>:

Anto

unread,
Sep 12, 2012, 11:58:56 AM9/12/12
to redi...@googlegroups.com
Hi !

Thank you very much to everyone for their messages of support ;-), I
will try to implement it and informed him. Thank you very much :-D.

Best regards

2012/9/12 Anto <pot...@gmail.com>:

Woody

unread,
Sep 28, 2012, 3:02:58 PM9/28/12
to redi...@googlegroups.com
Hi guys,

Love this solution.

How could the o/p have the same prefix for different suppliers? As far as I can tell the same prefix can only exist once.

Second question, in the example we're pulling back e.g. "hgetall supplier:2". If there were multiple products/prefixes for that supplier then how would we get the right price.

This case resembles something I'm trying to do so I'm really interested in this.

thanks in advance
Woody

Prerit Jain

unread,
Apr 18, 2015, 12:45:36 AM4/18/15
to redi...@googlegroups.com
Hi Kader,

I am looking the solution in redis for the same problem Anto is looking for. Can you please enlighten something and provide some use links for tree based algorithm and use redis to store the tree as a marshalled/pickle object. I tried to google it but din't able to find. It will be really helpful if you can share something useful regarding this.

Thanks in advance!!
Reply all
Reply to author
Forward
0 new messages