writer.delete_by_term doesn't delete matching entries

33 views
Skip to first unread message

Ian Fisher

unread,
Apr 14, 2021, 12:27:46 AM4/14/21
to Whoosh
Hello,

I ran into an issue where my search queries were returning duplicate results, and eventually narrowed it down to a problem with incremental indexing where each update to the index would create a duplicate entry for updated files. My incremental indexing routine closely follows the example in the Whoosh docs.

The minimal code to reproduce the problem is

ix = index.open_dir("path/to/index")
p = "path/to/foo"
with ix.searcher() as searcher:
    for entry in searcher.all_stored_fields():
        if entry["path"] == p:
            print(p)

    writer = ix.writer()
    ndeleted = writer.delete_by_term("path", p)
    print(ndeleted)

which, for my index, prints

path/to/foo
path/to/foo
path/to/foo
0

Regardless of the state of my index or my schema, shouldn't this output be impossible, i.e. shouldn't it print (and delete) 3 instead of 0?


Can anyone tell if this is a Whoosh bug, or if I'm doing something wrong?

Ian Fisher

unread,
Apr 15, 2021, 6:59:01 PM4/15/21
to Whoosh
I fixed it by changing the type of the path field from TEXT to ID per this discussion: https://groups.google.com/g/whoosh/c/6s3stgs2ST8/m/z99lxjPEbGAJ

I do think the incremental indexing example in the docs should use a more robust method, or at least call out that some field types like TEXT may not work.

Reply all
Reply to author
Forward
0 new messages