I've got a smallish site with not a ton of data at the moment.. but all that could change at some point so I'd like to plan with that in mind. Currently I'm deployed on an nginx/mongrel stack that works quite well. My site uses Ferret for search and it's ok.. the big problem is that some terms don't show up as expected.. especially if there are apostrophes, plurals, etc involved.
I've got two choices that I see... pony up the O'reilly mini-pdf and tweak ferret settings or scrap ferret and go with Sphinx (and hope it handles cases like this better). I'm not sure how much time the latter would take me but, assuming that I'm going to spend somewhere around 40 hours anyway, which route would you all recommend?
> I've got a smallish site with not a ton of data at the moment.. but > all that could change at some point so I'd like to plan with that in > mind. Currently I'm deployed on an nginx/mongrel stack that works > quite well. My site uses Ferret for search and it's ok.. the big > problem is that some terms don't show up as expected.. especially if > there are apostrophes, plurals, etc involved.
> I've got two choices that I see... pony up the O'reilly mini-pdf and > tweak ferret settings or scrap ferret and go with Sphinx (and hope it > handles cases like this better). I'm not sure how much time the > latter would take me but, assuming that I'm going to spend somewhere > around 40 hours anyway, which route would you all recommend?
We've used ferret on past projects... and now use sphinx. We're not likely going back to ferret. ;-)
Robby
-- Robby Russell Founder and Executive Director
PLANET ARGON, LLC Design, Development, and Hosting with Ruby on Rails
> On Jan 4, 2008, at 8:26 AM, Vince Wadhwani wrote:
> > I've got a smallish site with not a ton of data at the moment.. but > > all that could change at some point so I'd like to plan with that in > > mind. Currently I'm deployed on an nginx/mongrel stack that works > > quite well. My site uses Ferret for search and it's ok.. the big > > problem is that some terms don't show up as expected.. especially if > > there are apostrophes, plurals, etc involved.
> > I've got two choices that I see... pony up the O'reilly mini-pdf and > > tweak ferret settings or scrap ferret and go with Sphinx (and hope it > > handles cases like this better). I'm not sure how much time the > > latter would take me but, assuming that I'm going to spend somewhere > > around 40 hours anyway, which route would you all recommend?
> We've used ferret on past projects... and now use sphinx. We're not > likely going back to ferret. ;-)
> Robby
> -- > Robby Russell > Founder and Executive Director
> PLANET ARGON, LLC > Design, Development, and Hosting with Ruby on Rails
>> I've got a smallish site with not a ton of data at the moment.. but >> all that could change at some point so I'd like to plan with that in >> mind. Currently I'm deployed on an nginx/mongrel stack that works >> quite well. My site uses Ferret for search and it's ok.. the big >> problem is that some terms don't show up as expected.. especially if >> there are apostrophes, plurals, etc involved.
>> I've got two choices that I see... pony up the O'reilly mini-pdf and >> tweak ferret settings or scrap ferret and go with Sphinx (and hope it >> handles cases like this better). I'm not sure how much time the >> latter would take me but, assuming that I'm going to spend somewhere >> around 40 hours anyway, which route would you all recommend?
> We've used ferret on past projects... and now use sphinx. We're not > likely going back to ferret. ;-)
Can you elaborate on why? I'm mostly just curious :)
To the parent...
the ferret PDF booklet is pretty full of good information if you stick with ferret. I don't however remember if it discusses how to handle words with apostrophes in it. It does talk about how to hand plurals via the StemFilter though.
>>> I've got a smallish site with not a ton of data at the moment.. but >>> all that could change at some point so I'd like to plan with that in >>> mind. Currently I'm deployed on an nginx/mongrel stack that works >>> quite well. My site uses Ferret for search and it's ok.. the big >>> problem is that some terms don't show up as expected.. especially if >>> there are apostrophes, plurals, etc involved.
>>> I've got two choices that I see... pony up the O'reilly mini-pdf and >>> tweak ferret settings or scrap ferret and go with Sphinx (and hope >>> it >>> handles cases like this better). I'm not sure how much time the >>> latter would take me but, assuming that I'm going to spend somewhere >>> around 40 hours anyway, which route would you all recommend?
>> We've used ferret on past projects... and now use sphinx. We're not >> likely going back to ferret. ;-)
> Can you elaborate on why? I'm mostly just curious :)
> To the parent...
> the ferret PDF booklet is pretty full of good information > if you stick with ferret. I don't however remember if it discusses > how to > handle words with apostrophes in it. It does talk about how to hand > plurals via the StemFilter though.
Ferret is unstable in production. Segfaults, corrupted indexes galore. We've switched around 40 clients form ferret to sphinx and solved their problems this way. I will never use ferret again after all the problems I have seen it cause peoples production apps.
Plus sphinx can reindex many many times faster then ferret and uses less cpu and memory as well.
A decent search option is Lucene via acts_as_solr plugin. I never used Sphynx though. Can anyone with firsthand experience of both Lucene and Sphynx give their opinion?
>>>> I've got a smallish site with not a ton of data at the moment.. but >>>> all that could change at some point so I'd like to plan with that in >>>> mind. Currently I'm deployed on an nginx/mongrel stack that works >>>> quite well. My site uses Ferret for search and it's ok.. the big >>>> problem is that some terms don't show up as expected.. especially if >>>> there are apostrophes, plurals, etc involved.
>>>> I've got two choices that I see... pony up the O'reilly mini-pdf and >>>> tweak ferret settings or scrap ferret and go with Sphinx (and hope >>>> it >>>> handles cases like this better). I'm not sure how much time the >>>> latter would take me but, assuming that I'm going to spend somewhere >>>> around 40 hours anyway, which route would you all recommend?
>>> We've used ferret on past projects... and now use sphinx. We're not >>> likely going back to ferret. ;-)
>> Can you elaborate on why? I'm mostly just curious :)
>> To the parent...
>> the ferret PDF booklet is pretty full of good information >> if you stick with ferret. I don't however remember if it discusses >> how to >> handle words with apostrophes in it. It does talk about how to hand >> plurals via the StemFilter though.
> Ferret is unstable in production. Segfaults, corrupted indexes > galore. We've switched around 40 clients form ferret to sphinx and > solved their problems this way. I will never use ferret again after > all the problems I have seen it cause peoples production apps.
Huh. I must be lucky. Or not have that much to index (true) or users don't complain about not finding anything (probably very true)
:-)
I'll have t ogive sphinx a go next time around... thanks ezra
Ferret has been very unstable for us. It is unfortunate because it seems like it would be more customizable than Sphinx. But I must admit that I like that Sphinx can take the data by itself from MySQL and index it really fast. AEM
On Jan 4, 2008 1:37 PM, Ezra Zygmuntowicz <ezmob...@gmail.com> wrote:
> On Jan 4, 2008, at 11:41 AM, Philip Hallstrom wrote:
> >>> I've got a smallish site with not a ton of data at the moment.. but > >>> all that could change at some point so I'd like to plan with that in > >>> mind. Currently I'm deployed on an nginx/mongrel stack that works > >>> quite well. My site uses Ferret for search and it's ok.. the big > >>> problem is that some terms don't show up as expected.. especially if > >>> there are apostrophes, plurals, etc involved.
> >>> I've got two choices that I see... pony up the O'reilly mini-pdf and > >>> tweak ferret settings or scrap ferret and go with Sphinx (and hope > >>> it > >>> handles cases like this better). I'm not sure how much time the > >>> latter would take me but, assuming that I'm going to spend somewhere > >>> around 40 hours anyway, which route would you all recommend?
> >> We've used ferret on past projects... and now use sphinx. We're not > >> likely going back to ferret. ;-)
> > Can you elaborate on why? I'm mostly just curious :)
> > To the parent...
> > the ferret PDF booklet is pretty full of good information > > if you stick with ferret. I don't however remember if it discusses > > how to > > handle words with apostrophes in it. It does talk about how to hand > > plurals via the StemFilter though.
> Ferret is unstable in production. Segfaults, corrupted indexes > galore. We've switched around 40 clients form ferret to sphinx and > solved their problems this way. I will never use ferret again after > all the problems I have seen it cause peoples production apps.
> Plus sphinx can reindex many many times faster then ferret and uses > less cpu and memory as well.
On Jan 4, 2008, at 1:09 PM, Alexey Verkhovsky wrote:
>> Ferret is unstable in production > Very true.
> A decent search option is Lucene via acts_as_solr plugin. > I never used Sphynx though. Can anyone with firsthand experience of > both Lucene and Sphynx give their opinion?
> -- > Alexey Verkhovsky
We have a bunch of clients using solr as well. In general it is more powerful then sphinx but a lot slower to reindex and querey. Also it uses 50 times the memory of sphinx. If you have a box or vm to put SOLR on by itself then it is a good option as well. but if sphinx can do everything you need from a a search indexer then it is a way better option cost wise.
On Fri, 2008-01-04 at 12:37 -0800, Ezra Zygmuntowicz wrote: > Ferret is unstable in production. Segfaults, corrupted indexes > galore. We've switched around 40 clients form ferret to sphinx and > solved their problems this way. I will never use ferret again after > all the problems I have seen it cause peoples production apps.
Just out of interest, were corrupted indexes seen even with only one process writing to the index (via DRb as is recommended)? Multiple writers are unsupported and cause these kinds of problems.
Segfaults were quite common in older version too, but it's settled down now and I've had it rather stable in a few small production sites (though I'm not talking Twitter-like load :).
> On Fri, 2008-01-04 at 12:37 -0800, Ezra Zygmuntowicz wrote: >> Ferret is unstable in production. Segfaults, corrupted indexes >> galore. We've switched around 40 clients form ferret to sphinx and >> solved their problems this way. I will never use ferret again after >> all the problems I have seen it cause peoples production apps.
> Just out of interest, were corrupted indexes seen even with only one > process writing to the index (via DRb as is recommended)? Multiple > writers are unsupported and cause these kinds of problems.
> Segfaults were quite common in older version too, but it's settled > down > now and I've had it rather stable in a few small production sites > (though I'm not talking Twitter-like load :).
Yes we have tried every way possible of running ferret, by itself, drb server etc. I really like ferrets interface and integration with rails but unfortunately it causes nothing but problems for so many people that I cannot recommend it with a straight face. Not meaning to bash on the ferret devs here at all, just stating what I've seen across hundreds of deployments.
On Fri, 2008-01-04 at 11:26 -0500, Vince Wadhwani wrote: > I've got a smallish site with not a ton of data at the moment.. but > all that could change at some point so I'd like to plan with that in > mind. Currently I'm deployed on an nginx/mongrel stack that works > quite well. My site uses Ferret for search and it's ok.. the big > problem is that some terms don't show up as expected.. especially if > there are apostrophes, plurals, etc involved.
> I've got two choices that I see... pony up the O'reilly mini-pdf and > tweak ferret settings or scrap ferret and go with Sphinx (and hope it > handles cases like this better). I'm not sure how much time the > latter would take me but, assuming that I'm going to spend somewhere > around 40 hours anyway, which route would you all recommend?
Hi Vince,
They're different tools really. I've found the flexibility of Ferret to be really quite awesome. I can (in Ruby):
* set boost values independently per field and per record * write custom text tokenizers, stemmers and stop lists (and use different ones per field even) * highlight matches in results using the same engine that does the searching * manage my own indexes, merging them at will, or just merging results from them. * Index content generated on the fly, without having to store it in my sql database (pull in all the associated tags for a post as you index it for example). * Store original data in the index (though most people use it to index an SQL database anyway). * other awesome stuff I can't remember right now.
Looking at the documentation for Sphinx (and it's usual usage, with MySQL), many (if not all) of those features are missing. But Sphinx is reportedly quicker, supports distributed searching, and appears to be undergoing more development that Ferret is at the moment so I think it depends on your needs.
I'd recommend you ask on the Ferret mailing list about your search result issues though - I'm surprised you're having problems with that. I'm sure it can be solved.
> > A decent search option is Lucene via acts_as_solr plugin. > > I never used Sphynx though. Can anyone with firsthand experience of > > both Lucene and Sphynx give their opinion?
... > We have a bunch of clients using solr as well. In general it is more > powerful then sphinx but a lot slower to reindex and querey. Also it > uses 50 times the memory of sphinx. If you have a box or vm to put > SOLR on by itself then it is a good option as well. but if sphinx can > do everything you need from a a search indexer then it is a way better > option cost wise.
I don't have first hand experiences with sphinx, but i can confirm that given a decent hw setup solr (with acts_as_solr) is really good (not only in terms of performance but also of flexibility, and functionality). We used it for miojob.it and it powers almost any aspect of that site, which is built around faceted browsing of job postings and has a only a few spots where caching was appropriate without sweating under a traffic which is in the multi hundred K hits per day (i don't have the real numbers)
Anyhow given the lower system requirements, I'd like to give a try to sphinx to see what can it do!
I've been using Ferret since it's beginning, I'm also the french
translator
of the Ferret Shortcut's for O'Reilly, and i can tell one thing: Don't
use Ferret.
It's really unstable and the development has stopped a while ago...
That's
really sad because it was really an AWESOME product but it never
reached
a stable state.
I've experienced also huge problems with act_as_solr, so finally i'd
just
say "use Sphinx". That's for me the safier decision.
I've been humming and hawing all weekend about whether or not to put
in the time to use Sphinx, and I guess the mountain of evidence is
clear: I'll be moving my project over to Sphinx today.
James
On Jan 7, 3:19 am, "ahF...@gmail.com" <Ahf...@gmail.com> wrote:
> I've been using Ferret since it's beginning, I'm also the french
> translator
> of the Ferret Shortcut's for O'Reilly, and i can tell one thing: Don't
> use Ferret.
> It's really unstable and the development has stopped a while ago...
> That's
> really sad because it was really an AWESOME product but it never
> reached
> a stable state.
> I've experienced also huge problems with act_as_solr, so finally i'd
> just
> say "use Sphinx". That's for me the safier decision.
Ya we use ferret right now on our site. It's ok, but it does segfault about once a week. It's not a huge deal I suppose, but doesn't make me feel good. Right now I'm evaluating switching to solr or sphinx. It would be nice to have the 'more like this' ability that AAF/Ferret has. I didn't really see this feature with sphinx. We would also like to be able to write a custom sort method, which I haven't been able to do with ferret. I see there's an ability to do that with sphinx which looks nice.
Anyways, can anyone recommend a sphinx plugin for Rails? There's 3 so far that I found. acts_as_sphinx, ultrasphinx, and sphinctor. Are they all actively updated?
s.net> wrote:
> Ya we use ferret right now on our site. It's ok, but it does segfault
> about once a week. It's not a huge deal I suppose, but doesn't make me
> feel good. Right now I'm evaluating switching to solr or sphinx. It
> would be nice to have the 'more like this' ability that AAF/Ferret has.
> I didn't really see this feature with sphinx. We would also like to be
> able to write a custom sort method, which I haven't been able to do with
> ferret. I see there's an ability to do that with sphinx which looks
> nice.
> Anyways, can anyone recommend a sphinx plugin for Rails?
> There's 3 so far that I found. acts_as_sphinx, ultrasphinx, and
> sphinctor. Are they all actively updated?
I'm not sure about acts_as_sphinx and sphinctor being actively
updated, but I can confirm that both Ultrasphinx and Thinking Sphinx
(my own plugin - http://ts.freelancing-gods.com) are regularly updated
- and under the hood they both use the same Ruby Sphinx client -
Riddle (http://riddle.freelancing-gods.com - again, mine - sorry for
blowing my own trumpet), which I've been keeping up to date to match
the recent releases of Sphinx.
Evan's and my plugins do a lot of the same things, just different
approaches, so, with as little bias as possible, I think either can do
the job for you. I can't speak for the other two plugins though, as
it's been so long since I've looked into them.
s.net> wrote:
> Ya we use ferret right now on our site. It's ok, but it does segfault
> about once a week. It's not a huge deal I suppose, but doesn't make me
> feel good. Right now I'm evaluating switching to solr or sphinx. It
> would be nice to have the 'more like this' ability that AAF/Ferret has.
> I didn't really see this feature with sphinx. We would also like to be
> able to write a custom sort method, which I haven't been able to do with
> ferret. I see there's an ability to do that with sphinx which looks
> nice.
> Anyways, can anyone recommend a sphinx plugin for Rails?
> There's 3 so far that I found. acts_as_sphinx, ultrasphinx, and
> sphinctor. Are they all actively updated?
On Jan 18, 2008 4:17 PM, Jeff <jeff.caban...@gmail.com> wrote:
> ... > How difficult would it be to change over to Sphinx?
That would really depend on how you hooking up with Ferret and if you were using any advanced features. My guess is that it shouldn't be too hard to switch.
> How difficult would it be to change over to Sphinx?
The overall process? Not hard, with the caveat Adrian mentioned (ie: advanced Ferret features).
But keep in mind Sphinx does not allow updating fields of index records (Ferret does) - you have to re-index to get the latest changes into Sphinx. There are ways around this, to some extent - delta indexes, containing just the recent changes - but it doesn't seem to be critical to everyone.
Essentially, though: - Choose a sphinx plugin, and install it. - Set up the configuration and indexes, either manually, or within your models (depending on the plugin) - Install sphinx - Index your data - Switch your ferret-specific search calls to use the sphinx plugin's search calls. - Start the sphinx daemon (searchd) - Confirm everything works
Or something along those lines. I'm sure the EngineYard crew have a better idea though.
We did that back in early '06 and since talking with tsearch2 is basically normal SQL, all you have to do is to write a custom finder method.
I have no idea how the performance compares to other engines but I find it pretty cool that everything happens transparently inside the database so you have one less process to monitor and keep fresh. So if you're using PostgreSQL, it should definitely be worth a shot. It's been around forever, so it should be void of most pediatric diseases.
> Ferret is unstable in production. Segfaults, corrupted indexes
> galore. We've switched around 40 clients form ferret to sphinx and
> solved their problems this way. I will never use ferret again after
> all the problems I have seen it cause peoples production apps.
I'd really like anybody experiencing problems like this to contact me
or even
better the ferret-talk mailing list about such problems. I have
several sites using
Ferret with DRb server runs rock solid there. I must admit that
they're relatively low
traffic, but high load is nothing that will make Ferret crash or
currupt indexes, if you
use it in the right way (say, one process accessing the index).
Without doubt there
are cases when Ferret will segfault, i.e. because of platform specific
problems, poor
argument checking and error handling in the C code and so on, but they
may be
circumvented most of the time. Not nice, but acts_as_ferret already
does most of this
for you.
I also did some load tests with acts_as_ferret's DRb server a while
ago, where it handled
> 30 mixed indexing and search requests per second from multiple client processes for hours,
and no crash or index corruption (index size was 7GB at the end of the
run) happened.
So to summarize: it's definitely possible to have a stable Ferret
setup, before you take on the
work to switch to something else why not drop me a line and I'll be
happy to have a look at your
problem.
However from what I've read here I'll be sure to check out Sphinx soon
so I know what you're
talking about here ;-)
> I'd really like anybody experiencing problems like this to contact me
I had trouble with it using version 0.11.6. I was having intermittent problems every time I tagged a store. When I removed my rescue I found it was ferret (don't have the exact error on me.. sorry). Stepping back to 0.11.3 seems to have resolved this (this is the last version I can remember that worked for me somewhat reliably). With 0.11.6, removing my index solves it temporarily (6 or 7 tag actions) but then it comes back.
Feel free to move this to the ferret talk list, I'll go check on it there.
I took a good look at Sphinx and Ultrasphinx, even tried
implementation in my app. Unfortunately these were show stoppers for
me:
- No real integration with activerecord (plugin just generates sql
statements outside of the context of AR. Therefore you can't really
use your own custom model methods as fields... as far as I could tell)
- No wildcards at all (Sphinx doesn't support them)
- No automatic updates - must rebuild entire index using cron jobs.
Again using straight SQL, not the current state of your models
On the contrary, I could see Sphinx being very appropriate for certain
types of apps... but these were important features for my particular
use (especially wildcards)