How Autocomplete works

127 views
Skip to first unread message

prateek chandan

unread,
Nov 1, 2013, 6:01:47 AM11/1/13
to wncc...@googlegroups.com
Hi all,

I have to design some search thing which give the user autocomplete suggestions and some level of spelling corrections too
I have some 10^6 search strings from which I have to search

Can anyone suggest me how to do it efficiently .. I am new to this .. can anyone suggest me something which is easily understood for a second year student .
Some popular search engines uses ML and give ranking to data according to frequency and recency . but my data set has equal weight too all strings and I have to get 5-10 best poosible matches from the given string at each keystroke. 

And also I am using python framework for this . Is it okay ?

Regards ,
Prateek
2nd year , CSE

Anvit Tawar

unread,
Nov 1, 2013, 6:10:06 AM11/1/13
to wncc...@googlegroups.com
For a trivial solution, you can try storing your strings, with their past search frequencies, in a prefix tree, and then look for the 5-10 most frequent strings with what the user has typed as prefix, and return them. Doesn't include auto-correct yet, but I think this should be a good start from scratch.

If you are looking for a full-fledged search system, have a look at Elastic Search (Beware though, it needs some knowledge of server-side concepts)


--
--
The website for the club is http://stab-iitb.org/wncc
To post to this group, send email to wncc...@googlegroups.com
 
---
You received this message because you are subscribed to the Google Groups "Web and Coding Club IIT Bombay" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wncc_iitb+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dilawar Singh

unread,
Nov 1, 2013, 6:49:01 AM11/1/13
to wncc...@googlegroups.com
If you want to do quick searching, before you look at algorithms, you should
look at data-structures. Read about Skip-Lists. They are as good as Red-Black
trees but easier to implement (and understand). This video is pretty good.

http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/lecture-12-skip-lists/

As soon as one types a character, you can quickly search the location of
starting point of that character-group in main-list. That would take O(26) in
worst case. When you type second character, you have two choices, create another
skip lists if character group is large or search "efficiently" in the character
group. This is without database approach. You loose everything once the
application terminates. This should be good enough, for searching over a million
record is not a big deal these days.

Do you have any data-base at your disposal, like sqlite? It would be a good time
to have a look at SQL database. In database, you can update the no of times a
word was used (and perhaps the patter of use also). People usually use extensive
computation to populate some kind of data-bases which they can query later to
give a suggestion. Updating database is very critical and writing fast queries
also. That is one architecture you can exploits.

If you are interested in ideas, see Ravi Kannan work on Searching. And pay
attention in your linear algebra class, if you wish to design good search
engines later in your life.

--
Dilawar
NCBS Bangalore
EE, IIT Bombay

Vinayak Gagrani

unread,
Nov 1, 2013, 6:35:15 AM11/1/13
to wncc_iitb
elasticsearch does not do spell check on its own. it does have a completion suggestor but in that you need to supply tokens to the strings as input and string as output.
the edit distance it supplies is 1 or 2.

If you just want to show autocomplete then you can also try typeahead.js It has very good UI and is pretty fast.

Its better to make prefix tree and dictionary with frequency of english alphabets to be used as first character than use some other system.
--
Vinayak Gagrani 
B.Tech (Hons.) Computer Science and Engineering Indian Institute of Technology, Bombay
+91-99-202-42-601http://www.cse.iitb.ac.in/alumni/~gagrani09 | Skype id : vinayak.gagrani

Manish Goregaokar

unread,
Nov 1, 2013, 6:27:53 AM11/1/13
to wncc...@googlegroups.com
I was going to suggest elasticsearch but it looks like that's already been done. Lucene is pretty good too, last I checked Wikipedia's autocomplete runs Lucene. And Stack Overflow's search engine was Lucene until recently when they switched to elastic.

For typos, there is http://en.wikipedia.org/wiki/Wikipedia:RegExTypoFix, which is regex based. Otherwise you could try statistical machine learning like Google, but I bet you don't want to go that deep :P

As for best possible matches, give the search terms weights according to their size/importance, and dynamically update them as a function of visits if you'd like. Then just do the equivalent of an ORDER BY xyz DESC LIMIT 5 query.

-Manish Goregaokar


On Fri, Nov 1, 2013 at 3:40 PM, Anvit Tawar <anvit...@gmail.com> wrote:

Dilawar Singh

unread,
Nov 1, 2013, 7:23:46 AM11/1/13
to wncc...@googlegroups.com
I can't locate the talk Kannan gave at IITB in 2011. The idea was to encode data
as vector and calculate the angle between two vector as a measure of similarity.
The point to note is that this will be inefficient if you are comparing words
with words (why replace string searching by vectors) but it becomes useful when
things you try to match are not mere words, but fragments of text with some
"meaning" in them. But nonetheless, the idea was nice and he was able to do
something in Bing search engine which I don't remember what was it.

Nonetheless, have a look at this. It has some good ideas but this must not
discourage you to write a basic framework for searching.

http://research.microsoft.com/en-us/um/people/kannan/Papers/samplefly.pdf

>And also I am using python framework for this . Is it okay ?
More than okay. Where else you will get such a great standard library at your
disposal!

--
Dilawar
NCBS Bangalore
EE, IIT Bombay

On Fri, Nov 01, 2013 at 03:01:47AM -0700, prateek chandan wrote:

Anil Shanbhag

unread,
Nov 1, 2013, 9:40:04 AM11/1/13
to wncc...@googlegroups.com
Some links : If you are using python -
http://haystacksearch.org/
https://django-haystack.readthedocs.org/en/latest/autocomplete.html

Works with both Solr / elasticsearch.
--
Anil Shanbhag

aditya kumar akash

unread,
Nov 1, 2013, 10:40:53 AM11/1/13
to wncc...@googlegroups.com
i too have similar problem at hand, but using handling of databases using normal PHP.
Please provide similar suggestions for PHP framework.

Dilawar Singh

unread,
Nov 1, 2013, 11:10:53 PM11/1/13
to wncc...@googlegroups.com


On Fri, Nov 1, 2013 at 8:10 PM, aditya kumar akash <adityaku...@gmail.com> wrote:
i too have similar problem at hand, but using handling of databases using normal PHP.
Please provide similar suggestions for PHP framework.

--
--
The website for the club is http://stab-iitb.org/wncc
To post to this group, send email to wncc...@googlegroups.com
 
---
You received this message because you are subscribed to the Google Groups "Web and Coding Club IIT Bombay" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wncc_iitb+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Dilawar
NCBS Bangalore
Reply all
Reply to author
Forward
0 new messages