Hi,
I've been playing with the new search service as of 1.6.5 and I have a bit of a head scratcher here...
Encoding these as html entities, which is what I'm assuming the docs are recommending I do, prevents an exception from being raised, true, however I found that "foo: bar" was no longer being parsed correctly (it only worked when I did not encode the colon).
I also found that there were a range of other characters that I need to strip out, or otherwise encode. Characters including, but not limited to: comma, curley braces, parenthesis, asterisk, the list goes on....
Failing encode/strip out these characters resulted in the following exception being raised:
QueryException: line 1:0 no viable alternative at character u'*'
In addition, in my quest to learn exactly what characters the parser would find offensive, I set up a unit test where I fed ``string.printable`` into ``my_index.search()`` and learned that colon itself (in this specific string) would in fact raise QueryException making me wonder if it would raise given a colon following a string that could not be recognized as a field on the index... maybe I'm just losing it?
tl;dr
Is there a known list of characters that absolutely need to be escape/encoded?
What is the recommended way to encode them (so that they will be parsed correctly)?
Are there special rules we need to observe in order to correctly encode (or leave alone) particular characters?