writer.delete_by_term doesn't delete matching entries

61 views

Skip to first unread message

Ian Fisher

unread,

Apr 14, 2021, 12:27:46 AM4/14/21

to Whoosh

Hello,

I ran into an issue where my search queries were returning duplicate results, and eventually narrowed it down to a problem with incremental indexing where each update to the index would create a duplicate entry for updated files. My incremental indexing routine closely follows the example in the Whoosh docs.

The minimal code to reproduce the problem is

ix = index.open_dir("path/to/index")

p = "path/to/foo"

with ix.searcher() as searcher:

for entry in searcher.all_stored_fields():

if entry["path"] == p:

print(p)

writer = ix.writer()

ndeleted = writer.delete_by_term("path", p)

print(ndeleted)

which, for my index, prints

path/to/foo

Regardless of the state of my index or my schema, shouldn't this output be impossible, i.e. shouldn't it print (and delete) 3 instead of 0?

Can anyone tell if this is a Whoosh bug, or if I'm doing something wrong?

Ian Fisher

unread,

Apr 15, 2021, 6:59:01 PM4/15/21

to Whoosh

I fixed it by changing the type of the path field from TEXT to ID per this discussion: https://groups.google.com/g/whoosh/c/6s3stgs2ST8/m/z99lxjPEbGAJ

I do think the incremental indexing example in the docs should use a more robust method, or at least call out that some field types like TEXT may not work.

Reply all

Reply to author

Forward

0 new messages