Re: slow whoosh searches

417 views
Skip to first unread message

Abhishek Pratap

unread,
Aug 9, 2012, 1:02:00 PM8/9/12
to who...@googlegroups.com
Hi Guys

Just in case my message was missed earlier. Any help appreciated.

Thanks!
-Abhi

On Wednesday, August 8, 2012 5:45:50 PM UTC-7, Abhishek Pratap wrote:
The following is cross posted on stackoverflow as I am new and not sure which is a good place for whoosh Q&A.

http://stackoverflow.com/questions/11875171/slow-whoosh-searches

I am getting close to 1000 search results in 15 seconds after having created a whoosh index with a simple schema and indexed 1.5 million records.

schema = Schema(tax_id=STORED, name=TEXT(stored=True))
The size of MAIN*.seg file is about 190 Mb.

the way I am searching is as follows

 ix=open_dir("index")
  with ix.searcher() as searcher:
        query = QueryParser("name", ix.schema).parse(u'putrefaciens')
         results = searcher.search(query)
I am wondering if this performance is on the expected lines of can we do faster full text searching with whoosh given the index size.

Thanks!

Matt Chaput

unread,
Aug 9, 2012, 1:35:40 PM8/9/12
to who...@googlegroups.com
> I am wondering if this performance is on the expected lines of can
> we do faster full text searching with whoosh given the index size.

Sorry, I can't say if it's something in Whoosh that can be improved or
just Pythonic slowness -- I've never used Whoosh for an index that big.
Even though it's a simple schema, that still a huge amount of data.

If I could get your index or something similar I could look for
performance issues in the code, but I'm guessing it's just the limits of
interpreted code.

Matt

Abhishek Pratap

unread,
Aug 9, 2012, 1:38:17 PM8/9/12
to who...@googlegroups.com
Thanks Matt. I just wanted to get an idea if I am reaching the threshold....fair enough.

Could you please recommend any other full text search that I could try.

Thanks!
-Abhi

Matt Chaput

unread,
Aug 9, 2012, 1:45:42 PM8/9/12
to who...@googlegroups.com
On 09/08/2012 1:38 PM, Abhishek Pratap wrote:
> Could you please recommend any other full text search that I could try.

Lucene is a Java search library. Solr is a search server (based on
Lucene) you interact with through HTTP/REST, with wrapper libraries
available in Python. ElasticSearch is similar to Solr. Xapian is a
search library in C++. Sphinx is a search program optimized for indexing
data in SQL databases.

Google is your friend ;)

Matt


Abhishek Pratap

unread,
Aug 9, 2012, 1:53:05 PM8/9/12
to who...@googlegroups.com
Thanks Matt. few lines of concise info helps..

-A
> --
> You received this message because you are subscribed to the Google Groups
> "Whoosh" group.
> To post to this group, send email to who...@googlegroups.com.
> To unsubscribe from this group, send email to
> whoosh+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/whoosh?hl=en.
>

Mehdy DREF

unread,
Aug 9, 2012, 1:56:39 PM8/9/12
to who...@googlegroups.com
Hi, in my case Postgresql with ts_vector is enough ( http://www.slideshare.net/billkarwin/full-text-search-in-postgresql )

m.

2012/8/9 Abhishek Pratap <abhish...@gmail.com>

Abhishek Pratap

unread,
Aug 9, 2012, 2:09:35 PM8/9/12
to who...@googlegroups.com
pretty good ..I wish I could also see elasticsearch comparison ...but it helped

-A
Reply all
Reply to author
Forward
0 new messages