Full-Text Search suggestions

10 views
Skip to first unread message

chiangf

unread,
Jan 22, 2007, 9:17:36 PM1/22/07
to TurboGears
Has anyone actually employed a full-text search that they like? I was
reading another thread and there were many tools listed (hype,
pylucene, merquery, etc.), but it seemed that either no one could get
it working (hype), didn't like it (pylucene), or it hasn't actually
started yet (merquery).

My database right now is MySQL so I can't implement the Postgres
tsearch2 either.

Does anyone have any suggestions / implementations that they have used
and liked?


Thanks!

Felix Schwarz

unread,
Jan 23, 2007, 4:49:05 AM1/23/07
to turbo...@googlegroups.com

chiangf schrieb:

> Has anyone actually employed a full-text search that they like? I was
> reading another thread and there were many tools listed (hype,
> pylucene, merquery, etc.), but it seemed that either no one could get
> it working (hype), didn't like it (pylucene), or it hasn't actually
> started yet (merquery).
>
(...)

> Does anyone have any suggestions / implementations that they have used
> and liked?

Why didn't you like pylucene? I only worked with (Java) Lucene but this was a
very pleasant experience. Since that project I really prefer using an external
search engine instead of getting a much stronger dependency on a specific database.

fs

Sylvain Hellegouarch

unread,
Jan 23, 2007, 5:25:15 AM1/23/07
to turbo...@googlegroups.com

IIRC pylucene has its own threading implementation that does not work
well with built-in threads used by CherryPy. Google would tell you more
about this and they may have solved that issue since then...

- Sylvain

chiangf

unread,
Jan 23, 2007, 4:14:49 PM1/23/07
to TurboGears
I have used Java Lucene as well, which I enjoyed very much, but as
Sylvain said, there seems to be a lot of issues with CherryPy.

Has no one used full-text search with Turbogears yet?

I would have used MySQL's full-text searching but I would really like
the stemming feature, which I don't think it has. I prefer external
search engines as well, because I agree that dependence on a specific
database is a bit too rigid for my tastes.

Lee McFadden

unread,
Jan 23, 2007, 4:32:21 PM1/23/07
to turbo...@googlegroups.com
The only full text search I have implemented has been using SQLAlchemy
and MySQL. Yes, it ties you to a particular RBDMS, but for this
project in particular it wasn't an issue.

So far it's worked out nicely and without all the bother of using
PyLucene (which was my first choice) which I tried to get working with
TG a long while ago and failed due to the aforementioned threading
issues.

Lee

--
Lee McFadden

blog: http://www.splee.co.uk
work: http://fireflisystems.com
skype: fireflisystems

Alberto Valverde

unread,
Jan 23, 2007, 5:12:22 PM1/23/07
to turbo...@googlegroups.com

I've had good luck with (hype)estraier. I'd recommend doing any
writes to the index (when updating it) in a separate process because
of blocking issues (or implement something yourself using something
like [1])

Alberto

[1] http://www.majid.info/mylos/weblog/2004/11/rwlock.py

lasizoillo

unread,
Jan 23, 2007, 5:15:29 PM1/23/07
to turbo...@googlegroups.com
Merquery is focused in django. Maybe http://swapoff.org/wiki/pyndexter (pyndexter) would be a better solution.

Excuse my worse english.

2007/1/23, chiangf <chi...@gmail.com>:

Jeff Hinrichs - DM&T

unread,
Jan 23, 2007, 11:06:33 PM1/23/07
to turbo...@googlegroups.com
Kind of surprised that no one has mentioned htdig. I've
accessed(shelling a cmd line and parsing the results) it from a number
of languages (the major P's) If I recall correctly htdig 4 uses
cLucene. I don't normally index the site proper, I create a shadow
indexing structure and index that since almost all of the sites I've
done have been data driven and that way I can completely control what
content is being indexed.

-Jeff

anders pearson

unread,
Jan 24, 2007, 12:21:39 AM1/24/07
to turbo...@googlegroups.com
On 2007-01-23 02:17:36 -0000, chiangf wrote:
> My database right now is MySQL so I can't implement the Postgres
> tsearch2 either.

If you can run Postgres on another machine, you could still make use of
Fozzy:

http://microapps.sourceforge.net/fozzy/

Which is basically a simple REST wrapper on tsearch2.

(One caveat is that I haven't gotten around to porting it to TG
1.0. It shouldn't be hard to do, but it does need to be done if you
don't want to go through the trouble of getting TG 0.8.9 running).

--
anders pearson : http://www.columbia.edu/~anders/
C C N M T L : http://www.ccnmtl.columbia.edu/
weblog : http://thraxil.org/

Nadav Samet

unread,
Jan 24, 2007, 9:04:01 AM1/24/07
to TurboGears
Hi,

I had a good experience with Xapian. Since it is not thread safe, I
wrote a XML RPC server with Twisted Python that handle the search and
return the results. The TurboGears application can call the XML RPC
using the standard xmlrpclib.

I can post the code if interested.

On Jan 24, 7:21 am, anders pearson <and...@columbia.edu> wrote:
> On 2007-01-23 02:17:36 -0000, chiangf wrote:
>
> > My database right now is MySQL so I can't implement the Postgres

> > tsearch2 either.If you can run Postgres on another machine, you could still make use of


> Fozzy:
>
> http://microapps.sourceforge.net/fozzy/
>
> Which is basically a simple REST wrapper on tsearch2.
>
> (One caveat is that I haven't gotten around to porting it to TG
> 1.0. It shouldn't be hard to do, but it does need to be done if you
> don't want to go through the trouble of getting TG 0.8.9 running).
>
> --
> anders pearson :http://www.columbia.edu/~anders/
> C C N M T L :http://www.ccnmtl.columbia.edu/
> weblog :http://thraxil.org/
>

> application_pgp-signature_part
> 1KDownload

chiangf

unread,
Jan 25, 2007, 10:58:14 AM1/25/07
to TurboGears
Thanks everyone for the suggestions! Very informative. I'll poke
around some of them and see which one I like the most.

Frank

Kevin Horn

unread,
Jan 25, 2007, 11:48:29 PM1/25/07
to turbo...@googlegroups.com
There was a basic text search in Docudo...

(looks at code)

It looks like a homegrown solution, as it doesn't seem to import
anything but sqlobject, the model.py file for Docudo, and the time
module.

Looks fairly simple, but I had tested it out a few times and it seemed
to work well. About 7 functions and a list of stop words in under 200
lines of code. Very nice. Of course this was for a specific
application where we knew everything worth indexing would be in the
database, and how it would be stored, but it seems it's not a huge
task to "roll your own" (depending on your application).

Since the Docudo SVN server is MIA, I can't run "svn blame", but it'd
be a good guess that Ronald Jaramillo wrote it (he wrote almost all
the more involved bits of Docudo).

Kevin Horn

Krys

unread,
Jan 28, 2007, 4:30:16 AM1/28/07
to TurboGears
Hi gang,

I just want to say that I have a working implementation of PyLucene
that works with TG and CherryPy. I am using it on my own blog, but
other than that is not widely tested yet. It also currently only
supportes English stemming, etc.

I am planning on releasing it soon as TurboLucene, just as soon as I
extract it from my project and generalize it.

Basically, the way it sorts out the whole threading issue (without
monkey-patching CherryPy) is to create a separate indexer thread and
search-thread-factory thread at initialization time and then just send
them messages telling them what you want. It actually works quite
well and is wrapped up in a simple and clean interface.

Anyway, I expect to release it in the next couple weeks, but if anyone
wants to see the code sooner, feel free to e-mail me.

Hope this helps,
Krys

Felix Schwarz

unread,
Jan 28, 2007, 6:53:48 AM1/28/07
to turbo...@googlegroups.com
Krys schrieb:

> I am planning on releasing it soon as TurboLucene, just as soon as I
> extract it from my project and generalize it.
>
> Basically, the way it sorts out the whole threading issue (without
> monkey-patching CherryPy) is to create a separate indexer thread and
> search-thread-factory thread at initialization time and then just send
> them messages telling them what you want. It actually works quite
> well and is wrapped up in a simple and clean interface.

Really awesome! Looking forwarding to using it :-)

fs

ans...@gmail.com

unread,
Jan 29, 2007, 12:46:38 PM1/29/07
to TurboGears
I'll second that! This could be huge for my project.

> > well and is wrapped up in a simple and clean interface.Really awesome! Looking forwarding to using it :-)
>
> fs

Robin Haswell

unread,
Jan 31, 2007, 6:32:01 AM1/31/07
to turbo...@googlegroups.com
chiangf wrote:
> Has anyone actually employed a full-text search that they like? I was
> reading another thread and there were many tools listed (hype,

Couldn't make it compile (think)

> pylucene

Barfs on CP threads

> merquery

Again wouldn't compile I think

> , etc.), but it seemed that either no one could get
> it working (hype), didn't like it (pylucene), or it hasn't actually
> started yet (merquery).

I suggest you bite the bullet and use MySQL FULLTEXT search with
handwritten SQL. I stupidly set my app up with InnoDB tables which don't
support FTS, in the end I ended up using PyLucene and a simple XML-RPC
server to access it with. Ridiculous.

-Rob

Krys

unread,
Jan 31, 2007, 5:46:47 PM1/31/07
to TurboGears
Well guys, it's out. I have finally released TurboLucene.
Announcements to follow.

You can just easy_install TurboLucene to get it.

The website is http://dev.krys.ca/turbolucene/.

It still needs a lot of work, but it is functional. Docs are still to
come, but the source is well commented and there really is not a lot
to it.

This is my first Open Source project, so I would really appreciate
feedback and suggestions. (Go easy on me!) :-D

Anyway, I hope someone finds it useful.

Enjoy!
Krys

Anay

Nadav Samet

unread,
Feb 4, 2007, 2:10:29 PM2/4/07
to TurboGears
I've written a tutorial on how to use Xapian search engine with
TurboGears:

http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/


On Jan 26, 6:48 am, "Kevin Horn" <kevin.h...@gmail.com> wrote:
> There was a basic textsearchin Docudo...


>
> (looks at code)
>
> It looks like a homegrown solution, as it doesn't seem to import
> anything but sqlobject, the model.py file for Docudo, and the time
> module.
>
> Looks fairly simple, but I had tested it out a few times and it seemed
> to work well. About 7 functions and a list of stop words in under 200
> lines of code. Very nice. Of course this was for a specific
> application where we knew everything worth indexing would be in the
> database, and how it would be stored, but it seems it's not a huge
> task to "roll your own" (depending on your application).
>
> Since the Docudo SVN server is MIA, I can't run "svn blame", but it'd
> be a good guess that Ronald Jaramillo wrote it (he wrote almost all
> the more involved bits of Docudo).
>
> Kevin Horn
>

Reply all
Reply to author
Forward
0 new messages