Thoughts on Cleo

44 views
Skip to first unread message

Paul Houle

unread,
Dec 11, 2013, 4:43:01 PM12/11/13
to cleo-ty...@googlegroups.com
Typeahead has been an important navigation mechanism for my site at


for some time,  but I wasn't happy with the performance of my half-baked implementation in mysql,  particularly if I was using an economical RDS instance in AMZN.  I switched over to Cleo and the speed boost is obvious enough that I didn't feel the need to do any benchmarking.  It runs happily on a m1.small in AWS together with another simple web service.  I can build an index in 30 seconds on my dev workstation with an SSD and a local instance of MySQL,  it takes 150s on the m1.small talking to an RDS instance and using an EBS volume for the Cleo database.  That isn't bad at all for an index that has 500,000 topics.

I wrote up some notes on the experience here:


The questions that I am thinking about at this point are:

* how do i normalize names to make the index easier to use?  For instance,  for people somebody should be able to find me with "Houle Paul",  "Paul Houle",  "Houle, Paul" and other obvious variations.  I'm also interested in locations where it could be "Manchester, NH",  "Manchester, New Hampshire",  etc.  I have a lot of semantic data that can help with this if I have a clear picture of what needs to be done.
* how does one evaluate the quality of typeahead results?  I tried two different term generation strategies and found that one of them seemed "worse" than the other in that seemingly irrelevant results came up.  I don't want to go too far the road thinking about ranking,  however,  unless I have some repeatable way to say that these results are better than those results.




Reply all
Reply to author
Forward
0 new messages