Function to escape values for Solr

David Chandek-Stark

unread,

Mar 10, 2010, 12:50:15 PM3/10/10

to solrpy

Hi,

I am using solrpy in a project, so thanks for the code. FWIW I have
written a function that can be used to escape search terms when
constructing a Solr query. Please feel free to use or modify.

--David

import re

# Lucene/Solr special characters: + - ! ( ) { } [ ] ^ " ~ * ? : \
# There are also operators && and ||, but we're just going to escape
# the individual ampersand and pipe chars.
# http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping+Special+Characters
ESCAPE_CHARS_RE = re.compile(r'(?<!\\)(?P<char>[&|+\-!(){}
[\]^"~*?:])')

def solr_escape(value):
"""Escape un-escaped special characters and return escaped value.

>>> solr_escape('foo+')
'foo\\\\+'
>>> solr_escape('foo\+')
'foo\\\\+'
>>> solr_escape('foo\\+')
'foo\\\\+'
"""
return ESCAPE_CHARS_RE.sub(r'\\\g<char>', value)

David Chandek-Stark

unread,

Mar 10, 2010, 3:10:13 PM3/10/10

to solrpy

Hi,

I think the doctests were a little confusing, so I've replaced them with these:

>>> solr_escape('foo+') == 'foo\\+'

True

>>> solr_escape('foo\+') == 'foo\\+'

True

>>> solr_escape('foo\\+') == 'foo\\+'

True

--David

--
David Chandek-Stark
dchand...@gmail.com

Leonardo Santagada

unread,

Mar 10, 2010, 6:09:08 PM3/10/10

to sol...@googlegroups.com

On Mar 10, 2010, at 5:10 PM, David Chandek-Stark wrote:

Hi,

I think the doctests were a little confusing, so I've replaced them with these:

   >>> solr_escape('foo+') == 'foo\\+'
   True

   >>> solr_escape('foo\+') == 'foo\\+'
   True
   >>> solr_escape('foo\\+') == 'foo\\+'
   True

shouldn't it be 'foo\+', 'foo\\\+' and 'foo\\\\\+' like in the old tests? But yep, this doctest style looks better to me at least.

--
You received this message because you are subscribed to the Google Groups "solrpy" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solrpy+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrpy?hl=en.

--

Leonardo Santagada

santagada at gmail.com

Fred Drake

unread,

Mar 10, 2010, 11:31:00 PM3/10/10

to sol...@googlegroups.com

On Mar 10, 2010, at 5:10 PM, David Chandek-Stark wrote:
> I think the doctests were a little confusing, so I've replaced them with
> these:

Using "raw" strings makes tests with backslash literals in the tests
more readable as well.

Note that for doctests embedded in docstrings (as opposed to external
.txt files), the docstring itself needs to be a raw string as well:

def escape(value):
r'''\
Escape a string for inclusion in a Solr query as a literal.

>>> escape(r'foo\+') == r'foo\\+'
True
>>> escape(r'foo\+') == r'foo\\+'
True
>>> escape(r'foo\\+') == r'foo\\\\+'
True

'''

-Fred

--
Fred L. Drake, Jr. <fdrake at gmail.com>
"Chaos is the score upon which reality is written." --Henry Miller

David Chandek-Stark

unread,

Mar 12, 2010, 2:11:44 PM3/12/10

to solrpy

A note I should have added: I'm not escaping backslashes. Maybe
someone can figure out how to do that in a sensible way. :)

As for the raw strings and doctests, thanks for the tip. I now have
this, which is much better:

# Solr/Lucene special characters: + - ! ( ) { } [ ] ^ " ~ * ? : \

# There are also operators && and ||, but we're just going to escape
# the individual ampersand and pipe chars.

# Also, we're not going to escape backslashes!

# http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping+Special+Characters
ESCAPE_CHARS_RE = re.compile(r'(?<!\\)(?P<char>[&|+\-!(){}
[\]^"~*?:])')

def solr_escape(value):
r"""

Escape un-escaped special characters and return escaped value.

>>> solr_escape(r'foo+') == r'foo\+'
True
>>> solr_escape(r'foo\+') == r'foo\+'
True
>>> solr_escape(r'foo\\+') == r'foo\\+'
True
"""

Reply all

Reply to author

Forward