TypeError in filedb/filetables.py", line 601, in keycoder

22 views
Skip to first unread message

Nikolaus Rath

unread,
Oct 18, 2011, 7:41:26 PM10/18/11
to who...@googlegroups.com
Hello,

The code

query = And([Term('hardcopy', True),
DateRange('hardcopy_expiration_date',
datetime.fromtimestamp(0),
datetime.today())])
for document in searcher.search(query):

fails with

Traceback (most recent call last):
File "projekte/archiver/archive_expire.py", line 115, in <module>
main(sys.argv[1:])
File "projekte/archiver/archive_expire.py", line 97, in main
for document in searcher.search(query):
File "/usr/lib/python2.7/dist-packages/whoosh/searching.py", line 688, in search
return collector.search(self, q, allow=filter, restrict=mask)
File "/usr/lib/python2.7/dist-packages/whoosh/searching.py", line 988, in search
self.add_matches(q, offset, scorefn)
File "/usr/lib/python2.7/dist-packages/whoosh/searching.py", line 1002, in add_matches
for score, offsetid in self.pull_matches(q, offset, scorefn):
File "/usr/lib/python2.7/dist-packages/whoosh/searching.py", line 1035, in pull_matches
matcher = q.matcher(self.subsearcher)
File "/usr/lib/python2.7/dist-packages/whoosh/query.py", line 964, in matcher
lambda q: 0 - q.estimate_size(r), searcher)
File "/usr/lib/python2.7/dist-packages/whoosh/query.py", line 775, in _matcher
subms = [(q_weight_fn(q), q.matcher(searcher)) for q in subs]
File "/usr/lib/python2.7/dist-packages/whoosh/query.py", line 964, in <lambda>
lambda q: 0 - q.estimate_size(r), searcher)
File "/usr/lib/python2.7/dist-packages/whoosh/query.py", line 926, in estimate_size
return ixreader.doc_frequency(self.fieldname, self.text)
File "/usr/lib/python2.7/dist-packages/whoosh/filedb/filereading.py", line 267, in doc_frequency
return self.termsindex.doc_frequency((fieldname, text))
File "/usr/lib/python2.7/dist-packages/whoosh/filedb/filetables.py", line 618, in doc_frequency
datapos = self.range_for_key(key)[0]
File "/usr/lib/python2.7/dist-packages/whoosh/filedb/filetables.py", line 543, in range_for_key
return OrderedHashReader.range_for_key(self, self.keycoder(key))
File "/usr/lib/python2.7/dist-packages/whoosh/filedb/filetables.py", line 601, in keycoder
return pack_ushort(fnum) + utf8encode(text)[0]
TypeError: coercing to Unicode: need string or buffer, bool found


Am I doing something wrong, or is this a bug in whoosh?

$ python -c 'import whoosh; print whoosh.__version__'
(2, 2, 2)

Thanks,

-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

shuki

unread,
Oct 18, 2011, 8:13:04 PM10/18/11
to Whoosh
without getting into the specific place in which your error occured,
in my experience with whoosh:

99% of the errors i had with whoosh are because of non-unicode
strings.
it is your responsiblity to feed whoosh with unicode strings.
whoosh will not and cannot guess what it receives.

if you put hard-coded strings, write them u'בדרך הזאת' (with u''
unicode indicator), and a proper -*- encoding markup at the top of the
source code.

i think whoosh should verify, at least as an option, whether its
inputs are unicode-type, and catch these exceptions as near as
possible to the input, so it's easier for the programmer to know where
it came from.

until whoosh will or will not implement unicode-input-verification, it
is your responsibility.
if you don't know a text's encoding, there are several ways to guess
the encoding.
you should verify that the text is really unicode type. if it is not,
convert it or pass it an empty (unicode) string.

Nikolaus Rath

unread,
Oct 18, 2011, 10:16:52 PM10/18/11
to who...@googlegroups.com
shuki <asafgreenberg-Re5J...@public.gmane.org> writes:
> without getting into the specific place in which your error occured,
> in my experience with whoosh:
>
> 99% of the errors i had with whoosh are because of non-unicode
> strings.
> it is your responsiblity to feed whoosh with unicode strings.
> whoosh will not and cannot guess what it receives.

Well, from the traceback:

>>   File "/usr/lib/python2.7/dist-packages/whoosh/filedb/filetables.py", line 601, in keycoder
>>     return pack_ushort(fnum) + utf8encode(text)[0]
>> TypeError: coercing to Unicode: need string or buffer, bool found

and I'm indeed storing booleans in the index. I'd really rather not
store them as utf-8 strings :-).


Best,

Nikolaus Rath

unread,
Dec 13, 2011, 9:45:19 PM12/13/11
to who...@googlegroups.com
shuki <asafgreenberg-Re5J...@public.gmane.org> writes:
> without getting into the specific place in which your error occured,
> in my experience with whoosh:
>
> 99% of the errors i had with whoosh are because of non-unicode
> strings.
> it is your responsiblity to feed whoosh with unicode strings.
> whoosh will not and cannot guess what it receives.

Hmm. I'm using a BOOLEAN field, and I think it's reasonable to put
BOOLEAN values in there rather than strings, just as you put dates
rather than date strings into DATETIME fields.

I've reported this as https://bitbucket.org/mchaput/whoosh/issue/213


Best,

Reply all
Reply to author
Forward
0 new messages