Matching subwords

12 views
Skip to first unread message

Florian Lindner

unread,
Feb 19, 2016, 4:08:53 AM2/19/16
to mu-di...@googlegroups.com
Hello,

I recall having read a discussion about that, but can't remember the outcome nor find it.

A feature I miss very much in mu is matching subwords, i.e. a query for "funktionen" also returns "funktionentheorie", "sinusfunktionen", ... It is especially important for languages such as German that use a lot of inflection, i.e. modify words to express grammatical categories.

Is is probably not a restriction of mu, but rather of the indexer (xapian? lucene?) it is based on. Is there any change that this would be realised in mu?

Best Regards,
Florian

Tamas Papp

unread,
Feb 19, 2016, 4:21:20 AM2/19/16
to mu-di...@googlegroups.com
Hi,

funktionen* should match funktionentheorie, but I don't know if there is
a way to match prefixes.

Best,

Tamas

Marcin Borkowski

unread,
Feb 19, 2016, 5:40:06 AM2/19/16
to mu-di...@googlegroups.com

On 2016-02-19, at 10:21, Tamas Papp <tkp...@gmail.com> wrote:

> Hi,
>
> funktionen* should match funktionentheorie, but I don't know if there is
> a way to match prefixes.

AFAIK, no, but in a pinch you can simply M-x rgrep your Maildir. Of
course, this will be considerably slower.

BTW, I think that a grep-based search (as an alternative for when
prefixes are not enough) would be a nice addition to mu4e.

> Best,
>
> Tamas

Best,

--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University

Dirk-Jan C. Binnema

unread,
Feb 19, 2016, 12:34:17 PM2/19/16
to mu-di...@googlegroups.com
Xapian doesn't support sub-words.

It does have some language-specific 'stemming' though[1], that would
allow for variations of words to match (e.g., plural/singular or case
inflections). But for that to work, you'd need some automatic language
detection - e.g. using ngrams.

Kind regards,
Dirk.

[1] https://xapian.org/docs/stemming.html

--
Dirk-Jan C. Binnema Helsinki, Finland
e:dj...@djcbsoftware.nl w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C

Tamas Papp

unread,
Feb 19, 2016, 1:43:10 PM2/19/16
to mu-di...@googlegroups.com
I wonder if it would be possible to do the following:

1. for a given regexp, list all keys in the database that match,
2. search for those keys.

Eg .*funktionen.* would match funktionen, funktionentheorie,
sinusfunktionen, then one could run a query on all of them.

I don't know if 1. is feasible in Xapian though.

best,

Tamas
Reply all
Reply to author
Forward
0 new messages