You may find the autocomplete example could work for you, or if you
write a different tokenizer, you could use
http://dr-josiah.blogspot.com/2010/07/building-search-engine-using-redis-and.html
.
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
You could do that.
I have a question for you: *why* would you want someone to be able to
search by a single character for a name? Why should "i" be able to
return "mike" or "sick"? Are you following an (unfortunate) spec that
says it should be possible?
Generally speaking, two common cases dominate others for easily 95%+
of the potential uses in search: prefix-based autocomplete (think
Google search, finding contacts when typing an email address in gmail,
etc.) and tokenization with some sort of spelling correction (again
think Google search). This is from practical experience doing search
and ad targeting for two different companies, along with helping to
build a sales backend where productivity is the priority (adding the
right contacts, filtering by the right dimensions, etc.).
Personally, I've implemented an autocomplete for a web-based internal
application that my company uses. Proper autocomplete for entries has
saved countless hours, countless searches in other contexts, areas,
etc. We've used it to quickly find a half-dozen different types of
contacts (like gmail), and quickly find a dozen different types of
objects that are related to the task at hand (our autocomplete
includes sub-word autocomplete, multi-word autocomplete, multi-entry
options, and the Double Metaphone algorithm for handling
misspellings). If you are completing on names, generally autocomplete
+ double-metaphone works amazingly well.
I've also implemented word/token-based search for contexts where the
idea of a "name" isn't applicable, or when filtering on other kinds of
data (tags, numeric ranges, etc.) Starting with a variant of what I
linked earlier [1], we also index all of our metadata to make it
searchable and filterable (user profiles, geography, history, ages,
notes, etc.).
So... do you really need to have "i" return "mike" and "sick"? Or can
it be solved in some other way?
- Josiah
[1] http://dr-josiah.blogspot.com/2010/07/building-search-engine-using-redis-and.html
Yes, I really need to look up names by entering them in letters.
But the problem is that my version uses a lot of memory. Maybe someone
can suggest something else?
What I need.
For example, we have: Kenny, Kyle, Stan, Eric.
And when I'm looking for 'k', the result of Kenny, Kyle.
When the 'e', then Kenny, Kyle, Eric
etc.
en => Kenny
yle => Kyle
stan => Stan
.....
This should work with 20k + names
03.04.2011 12:14, Josiah Carlson пишет:
Of course, 20k names is pretty small, and suffix trees are a bit
expensive in terms of memory. (All tries hog memory, because the
pointer overhead is significant compared to each individual character
unless a lot of entries have a lot of common substrings.) If you're
willing to trade memory usage a bit of CPU usage, you could store all
names in an array, and then map [a, b, ... z] and [aa, ab, ... zy, zz]
to arrays of indexes into the names array, or to bit arrays if a lot
of names match a letter or letter pair. If the user types in more than
two letters like 'abc' then you'd have to filter out all names mapped
to by 'ab' that don't actually contain 'abc'.
- Mike
Alternatively, skip the fancy data structures. Add "mike" to the sets
"name:m", "name:i", "name:k", "name:e". Do this for all names. If you
want all the names with the letter "i", just pull the set "name:i". If
you want to know the set of names with the substring "ik", intersect
"name:i" and "name:k", then post-filter with a standard string search.
You will cut your set down significantly for every character, which
will more than make up for having to do the string search.
Alternatively, if you've got the spare memory, directly check the
strings in your web app. On my slow VM, testing 20k 30-byte strings
for the existence of a 1-10 character substring takes less than 3 ms.
What language are you using? Could you just put the names in an
in-process data structure, etc.?
- Josiah
Thanks Josiah
I will try do this:
name:m
name:i
name:k
name:e
......
Maybe really should only be used autocompletion.....
09.04.2011 10:14, Josiah Carlson пишет: