I am using solrpy in a project, so thanks for the code. FWIW I have
written a function that can be used to escape search terms when
constructing a Solr query. Please feel free to use or modify.
--David
import re
# Lucene/Solr special characters: + - ! ( ) { } [ ] ^ " ~ * ? : \
# There are also operators && and ||, but we're just going to escape
# the individual ampersand and pipe chars.
# http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping+Special+Characters
ESCAPE_CHARS_RE = re.compile(r'(?<!\\)(?P<char>[&|+\-!(){}
[\]^"~*?:])')
def solr_escape(value):
"""Escape un-escaped special characters and return escaped value.
>>> solr_escape('foo+')
'foo\\\\+'
>>> solr_escape('foo\+')
'foo\\\\+'
>>> solr_escape('foo\\+')
'foo\\\\+'
"""
return ESCAPE_CHARS_RE.sub(r'\\\g<char>', value)
Hi,I think the doctests were a little confusing, so I've replaced them with these:>>> solr_escape('foo+') == 'foo\\+'True>>> solr_escape('foo\+') == 'foo\\+'True>>> solr_escape('foo\\+') == 'foo\\+'True
--
You received this message because you are subscribed to the Google Groups "solrpy" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solrpy+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrpy?hl=en.
Using "raw" strings makes tests with backslash literals in the tests
more readable as well.
Note that for doctests embedded in docstrings (as opposed to external
.txt files), the docstring itself needs to be a raw string as well:
def escape(value):
r'''\
Escape a string for inclusion in a Solr query as a literal.
>>> escape(r'foo\+') == r'foo\\+'
True
>>> escape(r'foo\+') == r'foo\\+'
True
>>> escape(r'foo\\+') == r'foo\\\\+'
True
'''
-Fred
--
Fred L. Drake, Jr. <fdrake at gmail.com>
"Chaos is the score upon which reality is written." --Henry Miller
As for the raw strings and doctests, thanks for the tip. I now have
this, which is much better:
# Solr/Lucene special characters: + - ! ( ) { } [ ] ^ " ~ * ? : \
# There are also operators && and ||, but we're just going to escape
# the individual ampersand and pipe chars.
# Also, we're not going to escape backslashes!
# http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping+Special+Characters
ESCAPE_CHARS_RE = re.compile(r'(?<!\\)(?P<char>[&|+\-!(){}
[\]^"~*?:])')
def solr_escape(value):
r"""
Escape un-escaped special characters and return escaped value.
>>> solr_escape(r'foo+') == r'foo\+'
True
>>> solr_escape(r'foo\+') == r'foo\+'
True
>>> solr_escape(r'foo\\+') == r'foo\\+'
True
"""