Ann: SQLite based alternative to consider

105 views
Skip to first unread message

Roger Binns

unread,
Nov 19, 2024, 11:40:09 AM11/19/24
to who...@googlegroups.com
I used Whoosh in several projects over the years, being very pleased with its functionality. I also maintain a binding between SQLite and Python. The latest release supports SQLite's full text search (FTS5) extension.

Unlike Whoosh, this means the writing of the index is done by SQLite which is not pure Python code. If this is a deal breaker, then you can stop reading.

There were Whoosh features I heavily used:

* Custom control over tokenization
* Custom scoring of results
* More like
* Query correction

All of those are implemented.

About the Python/SQLite extension itself:

https://rogerbinns.github.io/apsw/about.html

Highlights and documentation for full text search:

https://rogerbinns.github.io/apsw/textsearch.html

Example code doing full text search:

https://rogerbinns.github.io/apsw/example-fts.html

You can get the majority of Whoosh's functionality without having to write SQL, but will need to write SQL for more complex cases.

Roger

Philippe Ombredanne

unread,
Nov 19, 2024, 5:47:02 PM11/19/24
to who...@googlegroups.com
Hi Roger:
Your tools is awesome!
Your licensing is too:
https://github.com/aboutcode-org/scancode-toolkit/issues/3995
--
Cheers
Philippe
> --
> You received this message because you are subscribed to the Google Groups "Whoosh" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to whoosh+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/whoosh/d4b5d483-0377-455f-b0af-32217cda90fc%40app.fastmail.com.

Michael Avrukin

unread,
Nov 19, 2024, 5:48:56 PM11/19/24
to who...@googlegroups.com
Will have to check this out, sounds quite promising 

Roger Binns

unread,
Nov 22, 2024, 6:45:31 PM11/22/24
to who...@googlegroups.com
> Your tools is awesome!

Thanks. Any suggestions for improvements from Whoosh users are welcome.

> Your licensing is too:

I didn't realise there is a SPDX identifier for it, and am adding that now.

Roger

clach04

unread,
Nov 24, 2024, 1:22:12 AM11/24/24
to Whoosh
On Tuesday, November 19, 2024 at 8:40:09 AM UTC-8 Roger Binns wrote:
I used Whoosh in several projects over the years, being very pleased with its functionality. I also maintain a binding between SQLite and Python. The latest release supports SQLite's full text search (FTS5) extension.

Cool, thanks for sharing.

The other option is the built-in sqlite3 module. most CPython distributions have this enabled out of box:

    Python 3.12.5 (tags/v3.12.5:ff3bc82, Aug  6 2024, 20:45:27) [MSC v.1940 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sqlite3
    >>> con = sqlite3.connect(':memory:')
    >>> cur = con.cursor()
    >>> cur.execute('pragma compile_options;')
    <sqlite3.Cursor object at 0x000001EC4D5F7A40>
    >>> available_pragmas = cur.fetchall()
    >>> ('ENABLE_FTS5',) in available_pragmas
    True

It's not a lot of SQL to get some decent results https://www.sqlite.org/fts5.html#prefix_indexes

That's what I ended up using when I modernized by FTS search app to Python 3, so far so good it's been working well.

https://www.sqlitetutorial.net/sqlite-full-text-search/ is an easier read than the manual to get started.


Roger Binns

unread,
Nov 24, 2024, 11:41:31 AM11/24/24
to who...@googlegroups.com
> The other option is the built-in sqlite3 module. most CPython
> distributions have this enabled out of box:

APSW is also based on SQLite :)

Using the built-in sqlite3 and FTS5 has the following limitations:

* You can only use the standard tokenizers and auxiliary functions - ie you cannot provide your own

* Those tokenizers are based on old (2012) Unicode definitions, and are not aware of grapheme clusters, or able to work on languages that do not use spaces as word separators like Chinese and Japanese.

if you can work within those limitations then great. Note that you can use both APSW and the built-in sqlite3 module at the same time in the same process, as long as you don't access the same database files on Unix platforms.

Because APSW binds to the SQLite and FTS5 C APIs you do not have the limitations above, and you then get control over tokenization, ranking etc. Then you get the extra tokenizers, ranking functions, more like, query correction/suggestion etc - all great features in Whoosh too.

Roger
Reply all
Reply to author
Forward
0 new messages