How to escape search queries? Search Service API (python) raising QueryException

554 views
Skip to first unread message

Owen Nelson

unread,
May 23, 2012, 4:56:23 PM5/23/12
to google-a...@googlegroups.com
Hi, 
I've been playing with the new search service as of 1.6.5 and I have a bit of a head scratcher here...
The documentation mentions that the numeric operators in search queries should be escaped (https://developers.google.com/appengine/docs/python/search/overview#Numeric_Operators). These operators being, as I read it, "<>=:". 
Encoding these as html entities, which is what I'm assuming the docs are recommending I do, prevents an exception from being raised, true, however I found that "foo: bar" was no longer being parsed correctly (it only worked when I did not encode the colon).
I also found that there were a range of other characters that I need to strip out, or otherwise encode. Characters including, but not limited to: comma, curley braces, parenthesis, asterisk, the list goes on....

Failing encode/strip out these characters resulted in the following exception being raised:
QueryException: line 1:0 no viable alternative at character u'*'

In addition, in my quest to learn exactly what characters the parser would find offensive, I set up a unit test where I fed ``string.printable`` into ``my_index.search()`` and learned that colon itself (in this specific string) would in fact raise QueryException making me wonder if it would raise given a colon following a string that could not be recognized as a field on the index... maybe I'm just losing it?

tl;dr

Is there a known list of characters that absolutely need to be escape/encoded?
What is the recommended way to encode them (so that they will be parsed correctly)?
Are there special rules we need to observe in order to correctly encode (or leave alone) particular characters?

Owen Nelson

unread,
May 24, 2012, 11:55:44 AM5/24/12
to google-a...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages