LUCENE Query Parser escape Underscore

954 views
Skip to first unread message

mador...@gmail.com

unread,
Mar 10, 2014, 10:48:55 AM3/10/14
to rav...@googlegroups.com

Hey,
when i use lucene query parser parse I get the following:
QueryParser parser = new QueryParser....
parser.parse("Type:Ni_Sa_Test") => Type:"ni sa test"
How can i escape the underscore?
I wish my result to be=> Type:ni_sa_test

moreover, strange thing i saw when parsing "Type:Ni_Sa4_Test" => Type:Ni_Sa_Test
why when i'm using numbers in my query I get the expected result but only with string the underscore is been removed?

thanks

Itamar Syn-Hershko

unread,
Mar 10, 2014, 10:50:46 AM3/10/14
to rav...@googlegroups.com
Use the KeywordAnalyzer on the Type field (or simply set the field to not be Analyzed)

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mador...@gmail.com

unread,
Mar 10, 2014, 11:24:25 AM3/10/14
to rav...@googlegroups.com
Hey Itamar,
I can't use the KeywordAnalyzer because i have my own analyzer.
what do you mean by setting the field to not be analyzed??
I have the next query:
string query = "Yellow && Type:Cars_BMW_Test"
after queryParser.parse i get the following:
+Content:Yellow +Type:"Cars BMW Test"
how can i parse one field but not the other? and still query for them both?

thanks :-)

Itamar Syn-Hershko

unread,
Mar 10, 2014, 11:26:41 AM3/10/14
to rav...@googlegroups.com
Your analyzer is tokenizing on underscore, hence what you are seeing. I suggest you read a bit on analyzed fields before using custom analyzers.

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant


Itamar Syn-Hershko

unread,
Mar 10, 2014, 11:27:37 AM3/10/14
to rav...@googlegroups.com
I'm also not sure why you are using the query parser directly???

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant


Oren Eini (Ayende Rahien)

unread,
Mar 10, 2014, 11:28:52 AM3/10/14
to ravendb

Your analyzer which does what?

mador...@gmail.com

unread,
Mar 10, 2014, 11:41:00 AM3/10/14
to rav...@googlegroups.com
My analyzer extends the standard analyzer and replaces hebrew final letters by normal ones (sorry if i made you laugh, i don't know how to describe it better in english :-)

I'm using query parser because I'm creating BooleanQuery which using Occur.SHOULD between different queries i have.
Each query is an outcome of query parser. 

explained well enough??

Itamar Syn-Hershko

unread,
Mar 10, 2014, 11:59:39 AM3/10/14
to rav...@googlegroups.com
If you are trying to allow for proper Hebrew search, take a look at this: http://code972.com/hebmorph . Extending the StandardAnalyzer doesn't take into account Hebrew acronyms, for example.

Unless you are using the QueryParser on the server side, it won't have much effect and will probably cause a lot of confusion later on. I suggest you look at LuceneQuery.BeginClause and similar to do that, and use a proper analyzer for your use case (StandardAnalyzer tokenizes on underscore, for example).

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant


mador...@gmail.com

unread,
Mar 10, 2014, 12:09:22 PM3/10/14
to rav...@googlegroups.com
Thank you Itamar,
I'll check your suggestions. 
Reply all
Reply to author
Forward
0 new messages