Strange FTS search behaviour

15 views
Skip to first unread message

Brendan Duddridge

unread,
Nov 2, 2015, 10:57:55 PM11/2/15
to Couchbase Mobile
Hi,

Today I searched my CBL (using SQLCipher) for the term "decorative". Along with search results that include that keyword, it also found "decor". I'm just wondering if there's some stemming going on here that's causing this? Since decor is a root word of decorative, perhaps that's what's really going on. The odd thing is my old version with the build of SQLCipher and FTS4 in my old app version doesn't find "decor" when I search for "decorative". I'd rather it not find "decor" actually. I think this may be the Porter tokenizer which may be doing this? One of its jobs is to reduce English language words into common roots. Sounds like this is what's happening here.

This is the compiler options being used when I compile SQLCipher, but I don't think that would have any effect (except maybe the unicode61 option, which I am also using in my old app):

./configure --enable-tempstore=yes --with-crypto-lib=commoncrypto CFLAGS="-O2 -DSQLITE_HAS_CODEC -DNDEBUG -DSQLITE_TEMP_STORE=2 -DSQLITE_THREADSAFE -DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_ENABLE_FTS4_UNICODE61"


Thanks,

Brendan

Jens Alfke

unread,
Nov 6, 2015, 12:27:50 AM11/6/15
to mobile-c...@googlegroups.com
CBL installs a custom tokenizer (called Snowball IIRC) that does stemming for a bunch of European languages. The tokenizer that comes with SQLite is useless for anything but English.

Currently CBL doesn’t have the option of turning off stemming, but that’s a reasonable thing to request as an enhancement. As a workaround, you can post-filter the query results and reject any rows that don’t contain the literal string you searched for (ignoring case.)

—Jens

Brendan Duddridge

unread,
Nov 6, 2015, 4:37:13 AM11/6/15
to Couchbase Mobile
Hi Jens,

Ok. Thanks for the info. It's helpful to know that I wasn't doing anything wrong.

Thanks,

Brendan
Reply all
Reply to author
Forward
0 new messages