Django Sphinx or haystack Xapian

283 views
Skip to first unread message

zweb

unread,
Jun 19, 2010, 12:21:31 PM6/19/10
to Django users
I need to implement search solution over models in mysql.

Anyone has experience with both or has gone through same decision
making?

What should I consider before choosing one or other?

(Mysql text search will not work as I use INNODB. Solr is powerful but
I heard it is very memory hungry and it is in Java. Whoosh is not yet
as mature as others. So that leaves choice between Django Sphinx and
Haystack Xapian.)

thanks


Nick Arnett

unread,
Jun 19, 2010, 1:50:25 PM6/19/10
to django...@googlegroups.com
On Sat, Jun 19, 2010 at 9:21 AM, zweb <trader...@gmail.com> wrote:
I need to implement search solution over models in mysql.

Anyone has experience with both or has gone through same decision
making?

What should I consider before choosing one or other?

I'm making a similar decision... as far as I can see, neither one does search result clustering, which I'd like.  But the memory required to host Solr means a big increase in hosting costs.  I'm using Webfaction, so to go from their basic service to the one that offers 300 MB of memory is an increase of about $25/month, which I'd rather not have to pay yet.

Nick

Nick Arnett

unread,
Jun 19, 2010, 1:54:35 PM6/19/10
to django...@googlegroups.com
On Sat, Jun 19, 2010 at 9:21 AM, zweb <trader...@gmail.com> wrote:

(Mysql text search will not work as I use INNODB. Solr is powerful but
I heard it is very memory hungry and it is in Java. Whoosh is not yet
as mature as others. So that leaves choice between Django Sphinx and
Haystack Xapian.)

I meant to mention that although I'm using InnoDB, I'm storing a copy of the searchable text in a MyISAM table and using the MySQL search on it.  That's okay for now, but I'm looking for clustering, faceted search and other fancy stuff.

I've worked in search-related technology for a long time and I should know that search performance always demands lots of memory...  So I may bite the bullet and use Solr, after all.  I just wish there were a way to trade off the memory for speed until I'm ready to deploy a real working version.

Nick

zweb

unread,
Jun 19, 2010, 4:30:52 PM6/19/10
to Django users
I am tilting towards Haystack Xapian solution over django sphinx
mainly for two reasons
1) Haystack supports - Solr, Xapian and Whoosh. So in future I can
easily migrate from Xapian to Solr as my need grows.
2) Sphinx has slow index update. Updating index takes as much time
as building a new one.


My views are based on quick reading of material and opinions on web
and not on my experience yet. Once I will try haystack xapian this
weekend and let you know.



On Jun 19, 10:54 am, Nick Arnett <nick.arn...@gmail.com> wrote:

zweb

unread,
Jun 19, 2010, 9:26:06 PM6/19/10
to Django users
" I'm storing a copy of the
searchable text in a MyISAM table and using the MySQL search on it."


I need to search results to be immediately available as soon as they
are updated in DB.
This will require me to update the index in real time which is not
very efficient.

Haystack does provide real time index update. Has anyone used it.

http://docs.haystacksearch.org/dev/searchindex_api.html#keeping-the-index-fresh

other than that I have two other options:
1) Search db for data updated after the last search index update date
and time and then show results from both Db and search engine. Risk is
search index result could return stale data or I might get duplicates.
2) Use an appoach like yours - " copy of the
searchable text in a MyISAM table and using the MySQL search on it."



On Jun 19, 10:54 am, Nick Arnett <nick.arn...@gmail.com> wrote:

Dmitry Dulepov

unread,
Jun 24, 2010, 4:15:50 AM6/24/10
to django...@googlegroups.com
Hi!

zweb wrote:
> 2) Sphinx has slow index update. Updating index takes as much time
> as building a new one.

I have a Sphinx indexed form with 2 million posts. Indexing takes mess than
2 minutes. Is that considered slow? I use full rebuild of the index even 10
minutes. Incremental reindexing is much faster but crashes from time to
time. Still I do not consider 2 minutes to be slow. There are much slower
insexing search engines around.

I chose Sphinx exactly because of its indexing and search speed. Docs say
that indexing is slow but I came from search engines that used hours to
index my forums. 2 minutes of Sphinx indexing is just nothing compared to
others.

--
Dmitry Dulepov
Twitter: http://twitter.com/dmitryd/
Web: http://dmitry-dulepov.com/

euan.g...@gmail.com

unread,
Jun 24, 2010, 7:03:59 AM6/24/10
to Django users
We are in the process of switching from a custom Xapain installation
to Solr as we found Xapian quite limited in its ability to do
faceting, spelling suggestions, or highlighting. If you don't need any
of those things, I would recommend Xapian (although I've not used it
through Haystack). I have used Haystack with Whoosh and agree that
Whoosh needs a fair bit of work yet. That said I do like the haystack
idea.

Solr is really good for what we want and is really powerful, but we
have had to allocate it 1.5GB of memory for reindexing!

I have no experience of sphinx, but from what people have said above
it seems like a good solution.

Euan
Reply all
Reply to author
Forward
0 new messages