complex query parser

417 views
Skip to first unread message

balazs

unread,
May 24, 2012, 10:31:23 AM5/24/12
to rav...@googlegroups.com
I'm still rather new to Lucene and Raven and trying to determine whether it's query capabilities will support our needs. We have a number of concerns regarding more complex queries such as proximity searches involving wildcards.  I am seeing frequent mention of the ComplexPhraseQueryParser being able to handle these sorts of things in Lucene ( http://lucene.apache.org/core/old_versioned_docs/versions/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/complexPhrase/ComplexPhraseQueryParser.html) but it doesn't seem that I can do them with Raven and may have to elect another alternative such as SOLR.  I'd appreciate any suggestions or comments from the community.  An example complex query (from the Lucene site) that matches some of our needs is something like: "(john jon jonathan~) peters*"

Itamar Syn-Hershko

unread,
May 24, 2012, 11:40:13 AM5/24/12
to rav...@googlegroups.com
Currently Raven doesn't use this QP, but it's really a no-brainer to do that. You can even do that yourself through a bundle.

balazs

unread,
May 24, 2012, 11:51:36 AM5/24/12
to rav...@googlegroups.com
Does Raven currently have any mechanism for me to indicate which queryparser to use for any given query I'm making?  I don't see anything like that in the source and can probably implement that fairly easily, but I don't want to reinvent the wheel if there is anything already there.

Itamar Syn-Hershko

unread,
May 24, 2012, 11:55:45 AM5/24/12
to rav...@googlegroups.com
No, because there are no other queryparsers handy there

Take a look at the sources of the MoreLikeThis bundle, we do pretty much the same thing there. You will need to create a responder, run that QP on the query string and pass the Query object you get to the searcher, which is available to you as well.

We might incorporate this ourselves to the core product, just need to make sure the API keeps simple and all outcomes are considered

Itamar Syn-Hershko

unread,
May 24, 2012, 11:57:02 AM5/24/12
to rav...@googlegroups.com
It is actually going to be much easier to do this through AbstractIndexQueryTrigger , that way you won't have to handle paging etc

balazs

unread,
May 24, 2012, 1:29:52 PM5/24/12
to rav...@googlegroups.com
Not sure I quite follow you on that.  Beyond just implementing/porting the parser what else do I need to do?

Matt Warren

unread,
May 24, 2012, 5:15:47 PM5/24/12
to rav...@googlegroups.com
@balzas what he means is that you can implement a bundle that will be called as part of every query. In that bundle you can re-write the query to make it do whatever you want.

See the code https://github.com/ravendb/ravendb/blob/master/Raven.Database/Indexing/Index.cs#L640, this allows any class that inherits AbstractIndexQueryTrigger to control/modify all queries.

Matt Warren

unread,
May 24, 2012, 5:30:02 PM5/24/12
to rav...@googlegroups.com
See here http://ravendb.net/docs/server/bundles for more info on how RavenDB bundles work. They don't mention this specific one (AbstractIndexQueryTrigger) but all triggers work in the same way.

Just create a class that inherits from AbstractIndexQueryTrigger and put it in the "\Plugins" directory and the server will pick it up and call it every time a query is performed.

balazs

unread,
May 25, 2012, 12:22:05 AM5/25/12
to rav...@googlegroups.com
This inheriting class only needs to override ProcessQuery(...), right?  I'm not sure what exactly the "query" and "originalQuery" parameters to that method are.  Another problem is that once I manager to override this method correctly, I'm not sure how to debug this "bundle" while running RavenDB embedded.  I tried to add a Plugins folder to /bin/debug and add an existing bundle to it, but Raven seems to have completely ignored it.  So, say I've got the bundle in one project, my unit tests in another project, all in one solution.  How do I "deploy" the bundle while running RavenDB embedded, then run a test with a breakpoint somewhere in the ProcessQuery(...) method.  My apologies if this is naive- I've only been at Raven a few days and the document on bundles and the like seems pretty sparse.  

Itamar Syn-Hershko

unread,
May 25, 2012, 4:15:00 AM5/25/12
to rav...@googlegroups.com
inline

On Fri, May 25, 2012 at 7:22 AM, balazs <bal...@czifra.net> wrote:
This inheriting class only needs to override ProcessQuery(...), right?  I'm not sure what exactly the "query" and "originalQuery" parameters to that method are.

Yes

Take the string indexQuery.Query and pass it to the ComplexQP, it will return a Lucene Query object, and return that
 
Another problem is that once I manager to override this method correctly, I'm not sure how to debug this "bundle" while running RavenDB embedded.  I tried to add a Plugins folder to /bin/debug and add an existing bundle to it, but Raven seems to have completely ignored it.  So, say I've got the bundle in one project, my unit tests in another project, all in one solution.  How do I "deploy" the bundle while running RavenDB embedded, then run a test with a breakpoint somewhere in the ProcessQuery(...) method.  My apologies if this is naive- I've only been at Raven a few days and the document on bundles and the like seems pretty sparse.  

Take a look at the Raven.Bundles solution, there is a test suit for server side bundles there

Basically, in your tests you run an EmbeddedDocumentStore and add the bundle assembly to the MEF catalog via the doc store configurations

Matt Warren

unread,
May 25, 2012, 4:39:39 AM5/25/12
to rav...@googlegroups.com
Take a look at the bundle I just wrote, this line is where the plugin is wired up in the test,  https://github.com/ayende/ravendb/blob/master/Bundles/Raven.Bundles.Tests/IndexedProperties/IndexedProperties.cs#L25 

balazs

unread,
Jun 26, 2012, 3:50:29 PM6/26/12
to rav...@googlegroups.com
I've implemented the following, relying on the suggestions in this thread:

public class ComplexPhraseIndexUpdateTrigger : AbstractIndexQueryTrigger
{
        public override Lucene.Net.Search.Query ProcessQuery(string indexName, Lucene.Net.Search.Query query, Raven.Abstractions.Data.IndexQuery originalQuery)
        {
            var cpqp = new ComplexPhraseQueryParser(Lucene.Net.Util.Version.LUCENE_29, "Description", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
            var q = cpqp.Parse(originalQuery.Query);
            return q;
        }
}

Unfortunately wildcard phrase queries aren't processed correctly.  What seems to happen is that in Lucene.Net.Index.DirectoryReader.MultiTermEnum the IndexReader[] array passed to the constructor is empty.  I think this is the method where the wildcard term prefix is supposed to get queried against the index to identify matching expansions of the term.  However, because readers.Length == 0 the code that does so never gets run.  As a result, a query for "labrador ret*" ends up coming back as a boolean query for "labrador" and "Dummy clause because no terms found - must match nothing" (ComplexPhraseQueryParser, ~ line 267).  Does anyone have any ideas why this might be happening?

balazs

unread,
Jun 26, 2012, 4:03:26 PM6/26/12
to rav...@googlegroups.com
I've attached the ComplexPhraseQueryParser for reference.
ComplexPhraseQueryParser.cs

Oren Eini (Ayende Rahien)

unread,
Jun 27, 2012, 3:52:39 AM6/27/12
to rav...@googlegroups.com
A few things, first, you need to dispose of  StandardAnalyzer , otherwise you have a memory leak.
And I am not sure about the actual problem. Can you provide a failing test?

balazs

unread,
Jun 27, 2012, 12:24:06 PM6/27/12
to rav...@googlegroups.com
Hi Oren.  Thanks for your attention.  I've attached a vs 2012 solution that contains one test, which fails.  I've included comments above the test that will hopefully help you better understand the situation.  Thanks!
CPQP.rar

balazs

unread,
Jun 27, 2012, 4:12:01 PM6/27/12
to rav...@googlegroups.com
I'd been working off of build 888.  Upon using build 960 this problem seems to have been resolved.  I'm not sure what changed, but it works so we can leave it at that :)

Oren Eini (Ayende Rahien)

unread,
Jun 27, 2012, 4:13:06 PM6/27/12
to rav...@googlegroups.com
okay, great
Reply all
Reply to author
Forward
0 new messages