I'd like to ask everyone's opinion on implementing a search
functionality in an app. The app is a forum that tends to be simple and
pluggable. Now I'm on a quest of picking a right solution for searching
and have stuck.
My current thoughts and decision:
- Searching using "like" db queries is too simplistic and tends to be
slower over time.
- Database-specific solutions (MySQL search, Postgres TSearch2) kill
portability.
- PyLucene is too large to work in-process (20 MB in memory). Also it
doesn't work with Python's threading (segfaulting the whole process on
import). A solution would be a dedicated PyLucene process.
- Xapian looks good but I didn't actually try it yet. I've heard though
that it doesn't implement locking of index database and this should be
done manually. Not a rocket science but complicates the solution a bit.
I've also seen recommendation to run it in a dedicated process.
So my questions are:
- Am I doomed to have a separate server? This complicates things a lot
and I very much inclined to use some in-process thing
- Are there any solutions on a scale between simplistic "likes" and
sophisticated indexers like Lucene?
Probably. :)
> - Are there any solutions on a scale between simplistic "likes" and
> sophisticated indexers like Lucene?
http://www.osreviews.net/reviews/misc/hyperestraier
http://cheeseshop.python.org/pypi/estraiernative/0.2
http://swish-e.org/
http://cheeseshop.python.org/pypi/Swish-E/0.5
If you're willing to go "search server", you might even consider SOLR
(lucene based search server with a web api). Especially if you scale
out your front end's (the django app servers) horizontally in a large
environment, it becomes appealing. How many front-end's you have
actually becomes something to seriously consider, because the likes of
PyLucene, Xapian, and others all have search related indices that then
need to be kept up to date and available to the searcher processes.
You've heard incorrect information then, since Xapian most definitely
does implement database locking.
Cheers,
Olly