Locality Sensitive Hashing (LSH)

784 views
Skip to first unread message

Khashayar

unread,
Jun 22, 2012, 3:16:13 AM6/22/12
to redi...@googlegroups.com
Hi everyone,

I am in process of implementing search type of application with Redis. what I want to do is getting a query and apply some kind of hashing function to bucket similar queries and then I use this hash value as a key so I can look it up in O(1). I like to know if somebody tried this approach or can point me to some start point. 

I found that some people using Locality Sensitive Hashing for this purpose but I don't have any prior experience of implementing this algorithm. That would be great if anyone know about LSH share his thoughts on this. 

Thank you in advance!

Here is some links:



Best regards,
Khash

Pierre Chapuis

unread,
Jun 22, 2012, 4:56:22 AM6/22/12
to redi...@googlegroups.com
Hi Khash,

LSH search is a good algorithm for fuzzy search by example, for instance when you have a query that is very similar to what you are looking for but with added noise.

The principle is that you use Locality Sensitive hash functions, meaning that small changes in the input have few chances to change the output. This is basically the opposite of cryptographic / redundancy check hashes, where a slight change in the input should have as much impact on the output as possible.

Once you have a family of N functions like that, basic LSH search means hashing your query with each of these functions and union-ing the sets of documents with the same hash.

So the simplest way to use Redis to implement LSH would be give a numeric ID to each document and use one set for each output of each hash function. Inserting a document looks like this (except you want to make this a single request...):

for i from 1 to N
    x = hashfunction[$i](doc)
    SADD $i:$x id(doc)

Querying looks like this:

t = []
for i from 1 to N
    x = hashfunction[$i](query)
    t << $i:$x
results = SUNION *t

Pierre Chapuis

unread,
Jun 22, 2012, 5:11:59 AM6/22/12
to redi...@googlegroups.com
By the way, I'm very interested to find out that other people are considering Redis for search engines. I use it for an inverted index but its data structures approach should make it appropriate for a lot more use cases (and LSH is probably one).

If you want performance you should take advantage of Scripting, since too many round trips or too large inputs/outputs will kill performance. You may also want to play with the settings (eg. size of ziplists) since you will have usage patterns different from the usual ones.

Have fun :)

Josiah Carlson

unread,
Jun 22, 2012, 10:14:54 AM6/22/12
to redi...@googlegroups.com
I've got an entire chapter of my book dedicated to using Redis as a
search engine in a variety of ways. Basically, all of the tips and
tricks that I've learned over the last 2+ years. I'm about half-way
through writing the chapter, which may or may not make it for the
electronic update coming out mid next month.

In the mean-time, I've got an article that implements TF-IDF based
searching on my blog from almost 2 years ago:
http://dr-josiah.blogspot.com/2010/07/building-search-engine-using-redis-and.html
.

As for the OP, you can use Metaphone or double-Metaphone to create a
sort of pre-hash for words to handle misspellings, then come up with
an pre-ordering for terms that makes searches like 'search engine
redis' and 'redis search engine' come up the same (just sorting the
input words would be sufficient in this case). Alternatively, you
could hash each word individually, and add/xor the result. There are a
few options that are easy to do and may give you what you want.

Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/WCagYm-Ze7wJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.

Pierre Chapuis

unread,
Jun 22, 2012, 11:05:53 AM6/22/12
to redi...@googlegroups.com
On Friday, June 22, 2012 4:14:54 PM UTC+2, Josiah Carlson wrote:
I've got an entire chapter of my book dedicated to using Redis as a
search engine in a variety of ways. Basically, all of the tips and
tricks that I've learned over the last 2+ years. I'm about half-way
through writing the chapter, which may or may not make it for the
electronic update coming out mid next month.

In the mean-time, I've got an article that implements TF-IDF based
searching on my blog from almost 2 years ago:
http://dr-josiah.blogspot.com/2010/07/building-search-engine-using-redis-and.html

Nice! Looking forward to read that chapter. Looks like you have been writing
Redis-powered search engines for longer than me (I started in January 2011).

What I have implemented is similar to TF-IDF, but the differences force
me to use hashes instead of zsets and score out of Redis. This is
due to the fact that we (http://www.moodstocks.com/) are an image
(not text) search engine, so our "words" are not quite the same thing.

Josiah Carlson

unread,
Jun 22, 2012, 8:31:51 PM6/22/12
to redi...@googlegroups.com
On Fri, Jun 22, 2012 at 8:05 AM, Pierre Chapuis
<catwell...@catwell.info> wrote:
> On Friday, June 22, 2012 4:14:54 PM UTC+2, Josiah Carlson wrote:
>>
>> I've got an entire chapter of my book dedicated to using Redis as a
>> search engine in a variety of ways. Basically, all of the tips and
>> tricks that I've learned over the last 2+ years. I'm about half-way
>> through writing the chapter, which may or may not make it for the
>> electronic update coming out mid next month.
>>
>> In the mean-time, I've got an article that implements TF-IDF based
>> searching on my blog from almost 2 years ago:
>>
>> http://dr-josiah.blogspot.com/2010/07/building-search-engine-using-redis-and.html
>
> Nice! Looking forward to read that chapter. Looks like you have been writing
> Redis-powered search engines for longer than me (I started in January 2011).

There's always someone who has been doing it longer. I know I'm not the first.

I also know that there are a few people who have implemented ad
targeting engines in Redis, which I'll also be writing about.

> What I have implemented is similar to TF-IDF, but the differences force
> me to use hashes instead of zsets and score out of Redis. This is
> due to the fact that we (http://www.moodstocks.com/) are an image
> (not text) search engine, so our "words" are not quite the same thing.

Image search is tough, you have my sympathy and respect. I've stayed
out of that world on purpose.

Regards,
- Josiah
Reply all
Reply to author
Forward
0 new messages