I ran into an issue where my search queries were returning duplicate results, and eventually narrowed it down to a problem with incremental indexing where each update to the index would create a duplicate entry for updated files. My incremental indexing routine closely follows the
example in the Whoosh docs.
The minimal code to reproduce the problem is
ix = index.open_dir("path/to/index")
p = "path/to/foo"
with ix.searcher() as searcher:
for entry in searcher.all_stored_fields():
if entry["path"] == p:
print(p)
writer = ix.writer()
ndeleted = writer.delete_by_term("path", p)
print(ndeleted)
which, for my index, prints
path/to/foo
path/to/foo
path/to/foo
0
Regardless of the state of my index or my schema, shouldn't this output be impossible, i.e. shouldn't it print (and delete) 3 instead of 0?
Can anyone tell if this is a Whoosh bug, or if I'm doing something wrong?