Weef Bellington wrote:
> 1) Sometimes when I enter a Unicode string containing a forward slash
> as input to a TEXT field, only a part of the string is stored in the
> index. Why is this? Does the slash need to be escaped somehow?
Not sure what you mean by "part of the string", but in the default
analyzer, a slash is considered a word separator, so indexing
u"alfa/bravo" is the same as indexing u"alfa bravo", that is it will be
indexed as two separate "words".
The first thing to try would be making your own analyzer with a custom
RegexTokenizer, using a custom term regex (or, write a regex to match
the "whitespace" between terms and set gaps=True).
from whoosh.analysis import *
# Create a tokenizer using a custom regex
mytokenizer = RegexTokenizer(r"\w+(/?\w+)*")
# Add the filters you want
myanalyzer = (mytokenizer | LowercaseFilter()
| StopFilter() | StemFilter())
# You can test the analyzer like this...
print list(token.text
for token
in myanalyzer(u"How to index this/that and the other"))
# [u'how', u'index', u'this/that', u'other']
# Use your analyzer in a field specification
from whoosh.fields import *
schema = Schema(content=TEXT(analyzer=myanalyzer))
If you needed more complex tokenization than a regular expression, you'd
need to write your own tokenizer class. Check out the whoosh.analysis
module to see what's available in terms of text analysis.
> 2) I've tried using the DATETIME field as described on the website
> here:
http://files.whoosh.ca/whoosh/docs/latest/changes.html but I'm
> getting an error message. I've tried looking at the source but I'm not
> yet familiar enough with it to understand what's going on. Here's the
> error:
>
Sorry, the DATETIME field type was a bit of experimentation that I
started and left in an unfinished state :( For now, you should avoid it
and index/store date/time fields yourself using another field type such
as ID, by manually converting your dates to lexically sortable
representations, eg. 20091210.
Hope that helps, let me know if anything isn't clear, or if I missed the
point somewhere. Example code showing the problems helps too :)
Matt