A new query parser

32 views
Skip to first unread message

Dirk-Jan C. Binnema

unread,
Oct 26, 2017, 4:09:24 PM10/26/17
to mu-di...@googlegroups.com
Dear friends,

I've added one more feature to the upcoming release, and it's a fairly
big one: a new, custom query parser - i.e. the component that turns
queries like: "subject:hello and not flag:unread" into lists of
messages.

Since the early days (2008!), mu used Xapian's built-in query-parser;
that worked okayish, but there were a number of things that didn't work
so well (at least in 2008), related to unicode-folding,
special-character handling etc. So over the years, mu accumulated quite
a bit of ugly pre-processing to make that somewhat work.

Somewhat. It helped in quite a few cases, but broke others; not in the
least my inner-peace. There are quite a few bugs from frustrated users
that found their queries unexpectedly fail.

So I wrote a custom query parser from scratch as part of my little
"future mu" research; and I have now back-ported it to the current mu.

There's a new `mu-query` man page which goes into a lot more detail, but
let me share a few highlights:

- overall, the language the new query-parser accepts is quite similar to
the Xapian language; in almost all cases, your queries should continue
to work; however we have some new features;
- phrase searches: you can now search multi-word phrases, e.g.:
$ mu find subject:\"hi there\"
apart from the parser, this also required some changes in indexing,
since Xapian now has to remember the positions of words.
- more precise date/time searches ("3m" is now really 3 months,
not 90 days), a few more such small improvements
- regular expression searches; this might be the biggest new feature.
You can search for regular expressions enclosed in //, e.g.
$ mu find subject:"/f.*bar?/"
which should get you messages about foobar, freebase, feedback, ...

And for me the biggest new feature is that we got rid of quite a bit of
old code :-)

Anyhow, the code's available in git. The new version requires a full
re-index (should be done automatically when you run 'mu index' from the
command line). It also requires a C++14-compatible compiler.

What if something doesn't work as expected?
- check the output of `mu find <your-query> --format=xquery`. That
should give you an idea of how your query is interpreted. The shell
does all kind of processing before mu gets your query, so it's
good to check you got all the quoting/escaping correct
- if it's still unexpected, and mu fails to find some message, please
make a github ticket, but include the raw message (removing any
identifying information), so we can reproduce the issue and perhaps
add it to the unit tests.

Kind regards,
Dirk.

--
Dirk-Jan C. Binnema Helsinki, Finland
e:dj...@djcbsoftware.nl w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C

Joost Kremers

unread,
Oct 26, 2017, 4:45:45 PM10/26/17
to mu-di...@googlegroups.com

On Thu, Oct 26 2017, Dirk-Jan C. Binnema wrote:
> - regular expression searches; this might be the biggest new
> feature.
> You can search for regular expressions enclosed in //, e.g.
> $ mu find subject:"/f.*bar?/"
> which should get you messages about foobar, freebase,
> feedback, ...

That's awesome. :-)

But. I'm unfortunately getting an error trying to compile the
latest git version:

==============================

make[3]: Entering directory '/home/joost/build/mu/lib/parser'
CXX parser.lo
In file included from parser.hh:29:0,
from parser.cc:19:
./../parser/tree.hh: In static member function ‘static constexpr
const char* Mux::Node::type_name(Mux::Node::Type)’:
./../parser/tree.hh:64:2: error: expression ‘<throw-expression>’
is not a constant-expression

==============================

> Anyhow, the code's available in git. The new version requires a
> full
> re-index (should be done automatically when you run 'mu index'
> from the
> command line). It also requires a C++14-compatible compiler.

Given that autogen.sh finishes successfully, am I correct in
assuming that this is cannot be the cause of my problem?

Thanks,

Joost



--
Joost Kremers
Life has its moments

Yuri D'Elia

unread,
Oct 26, 2017, 5:28:15 PM10/26/17
to mu-di...@googlegroups.com
On Thu, Oct 26 2017, Dirk-Jan C. Binnema wrote:
> - phrase searches: you can now search multi-word phrases, e.g.:
> $ mu find subject:\"hi there\"

Works perfectly. If I can comment on the issue #999 (supplying an entire
expression as a single argument to mu find), it seems that now it works
as intended. For example, the following:

mu find 'from:Dirk AND subject:query'

works, while it didn't in the past.

But I found an edge case:

mu find 'Dirk'

works (finds anything with Dirk anywhere), but:

mu find 'Dirk '

doesn't as it likely includes the whitespace at the end.

> - regular expression searches; this might be the biggest new feature.
> You can search for regular expressions enclosed in //, e.g.
> $ mu find subject:"/f.*bar?/"

Heavy, but priceless. Thanks a lot for this.

There is only one thing I miss: prefix wildcards. Storing a reversed
index is a price I would gladly pay for some fields where it makes a lot
of sense, like maildir: or list:. Something to be considered, at least,
in the next mu iteration.

Now that regexp search is available, there's a fallback for the message
body where I can take the slower query time.

Dirk-Jan C. Binnema

unread,
Oct 26, 2017, 5:45:11 PM10/26/17
to mu-di...@googlegroups.com
The compiler error is really about an unsupported c++14 feature; there's
a check in configure that should give an error at configure time though.

Did you do a
./autogen.sh && configure && make
?

Somewhere in the configure output should be a line:
checking whether g++ supports C++14 features by default...
can you see it? And how does it end?

Joost Kremers

unread,
Oct 27, 2017, 3:05:44 AM10/27/17
to mu-di...@googlegroups.com

On Thu, Oct 26 2017, Dirk-Jan C. Binnema wrote:
> On Thursday Oct 26 2017, Joost Kremers wrote:
>
>> On Thu, Oct 26 2017, Dirk-Jan C. Binnema wrote:
>
>> Given that autogen.sh finishes successfully, am I correct in
>> assuming
>> that this is cannot be the cause of my problem?
>
> The compiler error is really about an unsupported c++14 feature;
> there's
> a check in configure that should give an error at configure time
> though.
>
> Did you do a
> ./autogen.sh && configure && make
> ?

Well, autogen.sh seems to run configure itself, but just to be
sure I just also ran it manually.

> Somewhere in the configure output should be a line:
> checking whether g++ supports C++14 features by default...
> can you see it? And how does it end?

I have this:

checking whether g++ supports C++14 features by default... no
checking whether g++ supports C++14 features with -std=gnu++14...
yes

and then lower down a bunch of checks that mention `g++
-std=gnu++14` and they all say `yes`.

So I'm guessing that during the actual compilation, the flag
`-std=gnu++14` isn't passed to g++? Could it be made so that it
is? I don't really fancy the idea of installing a newer gcc just
to compile mu... Or is it that this version of g++ claims to
support C++14 but really doesn't?

BTW:

====================

joost@IdeaPad:~/build/mu$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

====================

Dirk-Jan C. Binnema

unread,
Oct 27, 2017, 1:44:40 PM10/27/17
to mu-di...@googlegroups.com
Hi Joost,
The check was broken it seems, and there were actually some post-C++14
thingies. Can you try again? And there's #1139 on github to track this.
signature.asc

Joost Kremers

unread,
Oct 27, 2017, 5:06:22 PM10/27/17
to mu-di...@googlegroups.com
Hi Dirk,

On Fri, Oct 27 2017, Dirk-Jan C. Binnema wrote:
> On Friday Oct 27 2017, Joost Kremers wrote:
>> So I'm guessing that during the actual compilation, the flag
>> `-std=gnu++14` isn't passed to g++? Could it be made so that it
>> is? I
>> don't really fancy the idea of installing a newer gcc just to
>> compile
>> mu... Or is it that this version of g++ claims to support C++14
>> but
>> really doesn't?
>
> The check was broken it seems, and there were actually some
> post-C++14
> thingies. Can you try again? And there's #1139 on github to
> track this.

I pulled the latest commits, did a make distclean, ./autogen.sh
and make, and everything went smoothly. Running the latest mu/mu4e
as we speak. Well, write.

Just wondering about the "secure method=pgpmime mode=sign" thingy
up there at the top of this message. I have nothing set up to sign
my messages...
signature.asc
Reply all
Reply to author
Forward
0 new messages