Creating an elasticsearch equivalent on Python using Whoosh

388 views
Skip to first unread message

Ryan Stuart

unread,
Jul 7, 2013, 11:16:26 PM7/7/13
to who...@googlegroups.com
Hi All,

I work at a new text analytics company. We have a Python codebase to automatically discover insights from unstructured text using pure statistical methods. Doing this requires a lot of the tasks that Whoosh already does (indexing, stemming, tokenization etc.). The direction we would like to head is to create a data storage and insights engine in pure Python.

I was very happy to discover Whoosh via a PyCon video at http://pyvid.org/. I thought we might be stuck with Lucene and elasticsearch (something we have a lot of experience with) which wasn't where we wanted to be. Having had a chance to take a look at Whoosh and read the documentation is seems like Whoosh provides for the Python community what Lucene does for the Java community. It seems what is missing is the Solr/elasticsearch etc. type project which ties it all together as a stand alone tool.

We need that stand alone tool for what we are doing and we are more then happy to contribute it back to the community. I'm interested to hear if people think that tool should be external to the Whoosh project or part of the Whoosh project. Also interested to hear if people have any thoughts about it's design.

Cheers

--
Ryan Stuart, B.Eng (Software)

webmedic

unread,
Jul 9, 2013, 7:57:09 PM7/9/13
to who...@googlegroups.com
I believe that whether or not it is internal or external is really up to Matt in the end.

I have to admit tearing into all this is a bit beyond me but I have been working at integrating whoosh into a desktop app to index epubs and make them searchable. As such I have been looking for something like this but really there is nothing out there. There are little bits of code here and there but all part of much larger projects and all tied to one web framework or another. So far in my case none of this is much help.

I am using pyqt and webkit and intercepting all the calls so there is no webserver involved at all. In this use case it is a simple matter of parsing my data sending it through the temaplating engine and then writing it out directly to webkit mainwindow html. 

I have been searching for something simple that has ajax support so I can make my html forms a little more dynamic but do far I have not found anything like this. 

In the instance of what you are building It would be great if there was the api but not necessarily tied to a specific display type. That way if using a traditional gui then no issue if using a webgui then no issue in either case just plugin the way you want to display the data and you have a good solution.

So far Whoosh is wonderful and is a really helpful bit of code that saved me tons of work. 

I hope to see the results of your hard work and I hope that it is a wonderful contribution also.

Thanks
Brook
Reply all
Reply to author
Forward
0 new messages