Effects of FieldIndexing and FieldStorage on Querying?

298 views
Skip to first unread message

Stephen McKamey

unread,
Jun 9, 2010, 3:10:32 PM6/9/10
to ravendb
I realize that the querying interface has undergone a lot of change
recently so I'm trying to make sure I understanding this correctly.

From what I've read between the Raven DB docs and the Lucene docs:

1. If a field isn't present in the map projection then it is not able
to be accessed via a where clause in either IRavenQueryable<T> or
IDocumentQuery<T>

2. By default fields are given FieldIndexing.Analyzed and
FieldStorage.No

3. If IRavenQueryable<T> cannot convert a LINQ expression to Lucene
syntax then it will throw NotSupportedException

4. FieldStorage.No causes values to not be available to be retrieved
via IDocumentQuery<T>.SelectFields<TProjection>(...)

5. FieldIndexing.No causes values to not be available in where clauses
(similarly to not being present in the original projection)

6. FieldIndexing.Analyzed causes values to be converted to strings and
to be parsed up into words similar to search engines (whitespace and
punctuation ignored)

7. FieldIndexing.NotAnalyzed causes strings to be treated as a single
token and matches must be exact

Am I understanding this correctly? Things aren't quite working as I
expected so I want to make sure my understanding isn't the problem.

Stephen McKamey

unread,
Jun 9, 2010, 4:35:50 PM6/9/10
to ravendb
The more I play with querying, the more I'm realizing I can *only* get
back results if FieldIndexing is set to Analyzed. In Analyzed, the
string equals operator == basically acts like String.Contains:

docStore.DatabaseCommands.PutIndex(
"FindFoo",
new IndexDefinition<Foo, Foo>
{
Map = items =>
from item in items
select new
{
item.ExactMatchValue,
item.FuzzyMatchValue
},
Indexes =
{
{ item => item.ExactMatchValue, FieldIndexing.NotAnalyzed },
{ item => item.FuzzyMatchValue, FieldIndexing.Analyzed }
}
},
true);
...

docSession.Store(new Foo
{
ExactMatchValue = "Lucene must match this exactly",
FuzzyMatchValue = "Lucene must match this fuzzily",
});

docSession.Store(new Foo
{
ExactMatchValue = "Lucene must match this exactly 2",
FuzzyMatchValue = "Lucene must match this fuzzily 2",
});

...

var exactResults = docSession.Query<Foo>("FindFoo").Where(foo =>
foo.ExactMatchValue == "Lucene must match this exactly").ToList(); //
Count == 0
var fuzzyResults = docSession.Query<Foo>("FindFoo").Where(foo =>
foo.FuzzyMatchValue == "Lucene must match this fuzzily").ToList(); //
Count == 2

I'm using Build 95 in case that matters. What am I doing wrong?

Chad

unread,
Jun 10, 2010, 6:03:45 AM6/10/10
to ravendb
Second attempt at this, not sure where my first answer went to.

I'm new at Raven/Lucene as well so take this with a grain of salt.

Your first query should return 2 results, my guess is that the index
hasn't yet updated?

Lucene is full text. The "Lucene must match this exactly" phrase is
present in both documents, so it will return both. Just as searching
only on "Lucene" will return both. To do exact matches you need to
have the field set to NotAnalyzed and wrap your search term in
[[Value]] (hat tip @ayende).

DaRKoN_

unread,
Jun 10, 2010, 5:53:56 AM6/10/10
to ravendb
My first post!

I'm still a Raven/lucene noob, so take this with a grain of salt.

Your first query should be returning 2, I've just tested it here (on
build 93) and it works. I'm going to take a guess that maybe the Index
hasn't updated yet? Add WaitForNonStaleResults() to your query and try
again.

To do an exact match in Raven, you need to set the field to
'NotAnalyzed' and the query needs to be wrapped in [[Val]] (hat tip
@ayende for this). Lucene is full text search, so searching for
"Lucene must match this exactly" (or even just "Lucene") will return
both of your records because that phrase exists in the index, even
though they don't match exactly.

-Chad.

Ayende Rahien

unread,
Jun 10, 2010, 6:13:44 AM6/10/10
to rav...@googlegroups.com
Stephen,
Yes, your summary is accurate

Ayende Rahien

unread,
Jun 10, 2010, 6:14:34 AM6/10/10
to rav...@googlegroups.com
The actual problem is that BOTH the field and the query need to go through the NotAnalyzed phase.
 
In the query, you mark NotAnalyzed as [[Val]]

Stephen McKamey

unread,
Jun 10, 2010, 1:51:16 PM6/10/10
to ravendb
Thanks, that clears up a lot.

So I'm finding that Analyzed requires quoted values
(FuzzyMatchValue:"Lucene must match this fuzzily") and NonAnalyzed
requires square bracketed values (ExactMatchValue:[[Lucene must match
this exactly]]). This is too bad that the querying syntax requires
knowing the FieldIndexing setting on the field. I'm guessing that
since this is all delegated to Lucene there is no real way around this
coupling. That could make the LINQ provider harder to build.

Stephen McKamey

unread,
Jun 24, 2010, 12:18:18 PM6/24/10
to ravendb
I've an issue with [[not analyzed field values]] that contain escaped
chars: apparently in Lucene you cannot escape within [[...]].

For instance, I've got a NotAnalyzed field which contains drive path
values (e.g. "C:\Foo\blah"), I cannot use Analyzed fields because it
matches on substrings, but I cannot escape characters within a query.
I've been fighting this one trying to find a syntax that will work in
the Lucene docs:

http://lucene.apache.org/java/3_0_2/queryparsersyntax.html

For this particular instance, it appears to be fine to just not escape
the chars, but what if the field actually contained "]]"? Also does
anyone know where the "[[...]]" syntax is defined in Lucene? I cannot
find any references to it.

Stephen McKamey

unread,
Jun 24, 2010, 1:22:53 PM6/24/10
to ravendb
Looks like [[...]] is a RavenDB-specific syntax:

http://github.com/ravendb/ravendb/blob/master/Raven.Database/Indexing/QueryBuilder.cs

I'm looking into fixing the escaping when using this syntax. Once that
is fixed I think I'll be able to do everything I need without worry of
user-entered data breaking queries.

Stephen McKamey

unread,
Jun 26, 2010, 7:55:10 PM6/26/10
to ravendb
I've got a fix for the issue of escaping in NotAnalyzed fields as well
as some improvements to both the client and server handling of
queries:

http://github.com/mckamey/ravendb

More specifically:

- escaping has been fixed for NotAnalyzed fields
- the IDocumentQuery<T> interface has been extended to handle a richer
set of Lucene queries (e.g. range queries)
- the LINQ provider can differentiate between NotAnalyzed (Equals) and
Analyzed (Contains) as it leverages IDocumentQuery<T> rather than
duplicating functionality
- the LINQ provider supports more standard LINQ concepts (e.g.
Count(), Any(), Contains(), All())
- the QueryBuilder on the server-side has been fixed to support nested
boolean expressions, e.g. (a AND b) OR (c AND d)
- a bunch of additional unit tests have been added for both client and
server query components

On Jun 24, 10:22 am, Stephen McKamey <step...@jsonfx.net> wrote:
> Looks like [[...]] is a RavenDB-specific syntax:
>
>    http://github.com/ravendb/ravendb/blob/master/Raven.Database/Indexing...

Asbjørn Ulsberg

unread,
Jul 6, 2010, 9:49:15 AM7/6/10
to ravendb, Stephen McKamey
On Sun, 27 Jun 2010 01:55:10 +0200, Stephen McKamey <ste...@jsonfx.net>
wrote:

> I've got a fix for the issue of escaping in NotAnalyzed fields as well
> as some improvements to both the client and server handling of
> queries:
>
> http://github.com/mckamey/ravendb

I think you should do a pull-request to the main branch of RavenDB so
Ayende can pull it in, otherwise he would have to find out about your work
and how to merge it in a more manual process.

--
Asbjørn Ulsberg -=|=- asb...@ulsberg.no
«He's a loathsome offensive brute, yet I can't look away»

Stephen McKamey

unread,
Jul 6, 2010, 9:56:25 AM7/6/10
to ravendb
I actually did a while back. Last week in fact Ayende pulled it into
the main fork. He hasn't made a binary from it because I think he is
working on a bigger release, but you can pull from the master at
http://github.com/ravendb/ravendb and all the new LINQ / LuceneQuery
stuff is in there.

On Jul 6, 6:49 am, Asbjørn Ulsberg <asbjo...@gmail.com> wrote:
> On Sun, 27 Jun 2010 01:55:10 +0200, Stephen McKamey <step...@jsonfx.net>  
> wrote:
>
> > I've got a fix for the issue of escaping in NotAnalyzed fields as well
> > as some improvements to both the client and server handling of
> > queries:
>
> >    http://github.com/mckamey/ravendb
>
> I think you should do a pull-request to the main branch of RavenDB so  
> Ayende can pull it in, otherwise he would have to find out about your work  
> and how to merge it in a more manual process.
>
> --
> Asbjørn Ulsberg           -=|=-        asbj...@ulsberg.no
Reply all
Reply to author
Forward
0 new messages