string.Contains in Where maps to equals

635 views
Skip to first unread message

Thomas Ardal

unread,
Jul 14, 2011, 4:55:04 AM7/14/11
to ravendb
I'm experiencing an error when doing the following query:

var ravenQueryable = _session.Query<Car>().Where(x =>
x.Name.Contains("azd"));

The query doesn't return the document with Name equals "Mazda". The
queryable shows that the contains method is mapped to Name:azd,
meaning equals.

Shouldn't this be implemented as a contains rather than an equals or
should I use Lucene natively for these kind of queries?

Regards,
Thomas

Itamar Syn-Hershko

unread,
Jul 14, 2011, 6:43:36 AM7/14/11
to rav...@googlegroups.com
Raven's "Contains" is not like SQL's LIKE operator. It's a full-text search operator, hence you see Name:azd.

You should be looking at wildcards - the StartsWith operator for example.

You should note the general advise is NOT to use leading wildcards (*azda) for performance reasons. If you need to support that, consider using a custom analyzer with a ReverseFilter.

jalchr

unread,
Jul 14, 2011, 6:53:28 AM7/14/11
to rav...@googlegroups.com
I have been watching so many questions about the "Contains" operator and how it works in Raven.

The "Contains" operator does sound like the SQL's LIKE operator and it behaves so in the programming languagues as well. You can't change the semantics for "Performance Reasons". 

If the developer wants to use "Contains" then he is not meaning an "Equal" .
If a say 
sentence.Contains(word); then I mean 
Sentence:*word* 

which is a wild-card search anyway

If I want to get
word1.Equals(word2) then I definitely want:
word1:word2


May be Raven could have a "Convention" option or Configuration per indexed field to improve "Contains" performance but using that "custom analyzer with a ReverseFilter" internally ... if it helps performance.


Itamar Syn-Hershko

unread,
Jul 14, 2011, 12:05:29 PM7/14/11
to rav...@googlegroups.com
Contains is not Equal. Its behavior is dependent of the analyzer you are using. We will have this described in the docs - that Raven's Contains operator is a full-text search one.

Simply changing Contains to work with leading wildcard doesn't make sense performance wise, and always using ReverseFilter by default isn't really helpful. The StartsWith operator will render inefficient for all use cases, and that's bad.

Perhaps we can provide another analyzer that has the current behavior but also uses the ReverseFilter as mentioned, and have an article in the docs explaining pros and cons for each.

Jonathan Curtis

unread,
Jul 14, 2011, 2:06:35 PM7/14/11
to ravendb
I'm a little confused by this also. Surely this is a common scenario?

I have documents with a text field that has multiple words in it. I
want to return all documents with a text field that contains a search
phrase. How do I do that?

On Jul 14, 5:05 pm, Itamar Syn-Hershko <ita...@hibernatingrhinos.com>
wrote:

Matt Warren

unread,
Jul 14, 2011, 6:24:17 PM7/14/11
to ravendb
You can absolutely do this, you just need to use a different analyser,
the default a lower-case keyword one, but you can use a WhiteSpace or
StopAnalyser one instead. For more info see http://ravendb.net/documentation/how-indexes-work.

If you use a Whitespace analyser (for instance), you can search for an
individual word and Raven will return the docs that have that work
(full-text), rather than an exact match.

Chris Sainty

unread,
Jul 14, 2011, 6:24:44 PM7/14/11
to ravendb
It works on full word matching.

"This is my sentance"
Gets saved in the index (in simplistic terms) as "This" "is" "my"
"text". That index then Contains() each of those words, and can be
queried Field:my or Field:This.
It can not be queried Field:ext.

The backing indexer is Lucene, so read up on it's query support,
everything there is supported as far as I know
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

*xxx* wildcards, or the CLR Contains(), do not work because there is
no way to index them for a dynamic value, you have to run through
every value in the index and try match it, which is woefully slow.
> > > with a ReverseFilter" internally ... if it helps performance.- Hide quoted text -
>
> - Show quoted text -

Matt Warren

unread,
Jul 14, 2011, 6:39:53 PM7/14/11
to ravendb
I definately think this should be an explicit step, if Raven is too
clever and tries to do things behind the scenes it may well be
brittle. I guess the main problem is that the query doesn't (on the
client) have access to the index, so it doesn't know which analyser
was used. Also if the index wasn't created with full-text searching
"enabled" then there's not much that can be done.

Also "enabling" full-text search isn't exactly uncommon (like in SQL
server), so maybe it can just be wrapped up more nicely.

For instance rather than having to specify the analyser and know what
they even do, just set a flag that says I want to use the field/index
for full-text searching. Then Raven can (behind-the-scenes) use the
correct analyser.

On Jul 14, 5:05 pm, Itamar Syn-Hershko <ita...@hibernatingrhinos.com>
wrote:

Itamar Syn-Hershko

unread,
Jul 14, 2011, 7:07:52 PM7/14/11
to rav...@googlegroups.com
On Fri, Jul 15, 2011 at 1:39 AM, Matt Warren <matt...@gmail.com> wrote:
For instance rather than having to specify the analyser and know what
they even do, just set a flag that says I want to use the field/index
for full-text searching. Then Raven can (behind-the-scenes) use the
correct analyser.

I'm pretty sure this is how we do things in static indexes with the Analyzer property in IndexDefinitonBuilder? yes, it is not too self-descriptive, but this can be covered by proper docs, code comments and perhaps some static pre-set values with more descriptive names.

Otherwise I don't see how the API can be more clearer about this. Please bear in mind full-text search is ALWAYS enabled, we don't want to bypass Lucene's defaults too much or we will lose performance.

Re. Contains - there may be another way of doing this - using ShingleFilter, or an n-grams analyzer. It will bloat the index, but should work nicely with mid-word searches if handled correctly (say, 2/3-grams).

Note that this is completely a full-text search feature, not so DB-ish.

Ayende Rahien

unread,
Jul 15, 2011, 4:41:43 AM7/15/11
to rav...@googlegroups.com
Yes, we are handling this that what when you set the field to Analyzed.

Anders

unread,
Aug 4, 2011, 3:02:12 AM8/4/11
to rav...@googlegroups.com
Is it possible to set a field as analyzed in a dynamic index?

Ayende Rahien

unread,
Aug 4, 2011, 3:19:40 AM8/4/11
to rav...@googlegroups.com
No, you have to make a static index for that.

Nicolas Garfinkiel

unread,
Nov 8, 2011, 12:53:05 PM11/8/11
to rav...@googlegroups.com
I know this post is old and that I'm a couple of months overdue to add my two cents, but...

Wouldn't it be cool if you could specify the analyzers you want to use for your dynamic query up-front?

Something like:

var query = documentSession.Query<Foo>()
    .Where(x => x.Bar.Contains("term"))
        .AnalyzeField(field, analyzer => analyzer.UseStandard(field.Bar));

or for custom analyzers:

var query = documentSession.Query<Foo>()
    .Where(x => x.Bar.Contains("term"))
        .AnalyzeField(field, analyzer => analyzer.UseCustom<AnalyzerDefinition>(field.Bar));
 
With this, to give an example, you won't have to drop your dynamic indexes for simple things like implementing an auto-complete combo-box.

If this is even remotely possible, I think it would make many people happy.

What do you think, is this just crazy talking?

Nicolas Garfinkiel

unread,
Nov 8, 2011, 12:54:10 PM11/8/11
to rav...@googlegroups.com

Itamar Syn-Hershko

unread,
Nov 8, 2011, 1:56:03 PM11/8/11
to rav...@googlegroups.com
This is just bloating the API and encourages people to use ad-hoc querying. If you want a specific analyzer, create a static index...

Nicolas Garfinkiel

unread,
Nov 8, 2011, 4:04:11 PM11/8/11
to rav...@googlegroups.com
Ok, maybe you are right about the bloating (and anyways you guys make that calls! ;-))

But what is wrong about encouraging people to do ad-hoc queries?

I thought that the preferred way of querying Raven was doing dynamic queries, while making sure the indexes it creates get promoted to permanent indexes.

That is of course if you don't need to do anything special with the indexes (And IMHO analyzers are not a special indexing feature, they just point the way Lucene indexes fields) 

Just to name an example of the posts I've read about this: http://ayende.com/blog/4667/ravens-dynamic-queries

Cheers!

Itamar Syn-Hershko

unread,
Nov 8, 2011, 4:17:50 PM11/8/11
to rav...@googlegroups.com
That is an old post :)

RavenDB's default analyzer makes it behave like a "standard" database, e.g. queries are made on strict values

Changing the analyzer to almost anything else creates a Full-Text Search index, what also might cause queries to behave somewhat differently

Ad-hoc queries are good for development, but are not encouraged for production. Promoting an index is nice, but isn't always the best practice. Forcing you to create a strictly typed index for anything non-trivial (Map/Reduce, FTS, Spatial, etc) is a great practice to make sure you think on how your indexes look like and how you can reuse them.

It is documented here, but perhaps we need to extend it a bit: http://beta.ravendb.net/docs/consumer/querying/static-indexes
Reply all
Reply to author
Forward
0 new messages