Whoosh limits

168 views
Skip to first unread message

Paul Starrett

unread,
Oct 13, 2016, 3:50:36 PM10/13/16
to Whoosh
I am trying to select a search engine and I have tested Solr and ELK which tend to be beyond what I need. I use search for my individual use only (I do business intelligence and due diligence), not as a back end for any website or similar nor in a production environment in the typical sense. My question is if anyone has any thoughts or data on the outward limits or bounds of Whoosh. I have run the demo and some sample code and it is perfect for my needs as a tool (thank you so much to the developers and the creator!). However, I would like to know if there are any known limits on index size, number of indexed files, etc. where Whoosh limits are reached (I know this is dependent on RAM size, machine speed, etc.). I have done some online research (e.g. Stackoverflow) and found some anecdotal info but nothing I would call credible. Note that I have looked into Sphinx and Xapian / Xappy already. I think there are many others looking into this question so any answers should be useful to many. Thank you for any ideas!

Roger Binns

unread,
Oct 13, 2016, 4:27:21 PM10/13/16
to who...@googlegroups.com
On 13/10/16 12:50, Paul Starrett wrote:
> I am trying to select a search engine and I have tested Solr and ELK
> which tend to be beyond what I need.

I originally used solr before switching to whoosh. It came to a head
because the XML config files alone were larger than the code to just use
whoosh. And then doing any kind of customisation for solr involved yet
more copious xml and verbose java.

Fundamentally whoosh works well, and is easy to customise.

I don't believe there are any arbitrary limits. My data measures in the
gigabytes. What you'll generally see is that the time to process
queries gets larger and larger. If sub one second response times are a
requirement for you, then you'll need to pay close attention. If you
don't mind how long queries take then you'll be fine.

Roger

signature.asc

Paul Starrett

unread,
Oct 13, 2016, 5:08:35 PM10/13/16
to who...@googlegroups.com
Thank you very much for the input. I could say a lot here but will keep it brief (for the benefit of future reviewers of this post). 

Yes, I have found Solr and ELK to be rather intensive in ways that do not result in a net gain. Whoosh is a God send for people like me. I doubt I will ever get much beyond a million documents and a search time of a few seconds (even tens of seconds) is of no particular concern. File size for me is about 5k (emails, web pages, articles, etc.) and that results in about 200k files per GB. If Whoosh is comfortable in the under 5GB corpus range (other issues considered equal such as stop words, stemming, etc)., I should be in good shape. Thank you again!

Paul Starrett, Esq., LL.M.
Licensed Private Investigator and Attorney
Certified Fraud Examiner
EnCase Certified Computer Forensics Examiner (EnCE)
Master of Science in Predictive Analytics (Northwestern U.)


Roger

--
You received this message because you are subscribed to a topic in the Google Groups "Whoosh" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/whoosh/qLf87TfbYf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to whoosh+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages