normalizing strings during queries

21 views
Skip to first unread message

Stacey

unread,
Apr 27, 2016, 2:29:28 PM4/27/16
to RavenDB - 2nd generation document database
I'm not sure if this is by design or if I'm simply looking in the wrong place - but is there a way to normalize strings during a query? Or perhaps the term "normalize" isn't what I'm looking for...

but essentially, given this code...

await RavenSession<Tags>().Where(n => Normalize(n.Name) == Normalize(name)).ToListAsync()

public string Normalize(string source) {
    return source.ToLowerInvariant(); // or other normalizing options
}

It's easy to normalize the value you're going to pass in, since that can be manipulated before you enter the query - but once inside the query you're limited to whatever Raven understands on the server if I'm not mistaken. I tried ToLower() and ToLowerInvariant() and neither of them were acceptable it seems.

So is there a standard practice for this kind of problem? I know you guys are infinitely smarter than I am and have thought of this a million times - I'm just not clear what it's called or even where to begin such a search.

Daniel Häfele

unread,
Apr 27, 2016, 2:56:58 PM4/27/16
to RavenDB - 2nd generation document database
You write an index and analyze the field you wanna query on using an analyzer.

Michael Yarichuk

unread,
Apr 27, 2016, 3:21:41 PM4/27/16
to RavenDB - 2nd generation document database
Note that RavenDB actually does ToLower() on string fields during indexing by default - default analyzer used in indexes is LowCaseKeywordAnalyzer.


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Best regards,

 

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Michael Yarichuk l RavenDB Core Team 

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 

RavenDB paving the way to "Data Made Simple"   http://ravendb.net/  

Chris Marisic

unread,
Apr 28, 2016, 9:42:32 AM4/28/16
to RavenDB - 2nd generation document database
You should research tokenization of text, stop words, and how that is all a component of full text search.

After you have a core understanding of this, you might want to take a common place algorithm like the Levenshtein distance and write a couple of small tests and see what happens depending how you tokenize your terms.

Following this reading about the lucene analzyers will make more sense and you'll see that the default RavenDB analyzer serves 80-90% of need. I've only ever used a custom analyzer once and that was for doing an expensive NGRAM search with highlighting for a very specific purpose that was only used on 1 project for 1 or 2 indexes.
Reply all
Reply to author
Forward
0 new messages