> That was my goal. I initially thought maybe if a search matched
> several fields a document could be returned multiple times, which is
> why I was trying to uniquify the results. It turns out I do have
> duplicates in the index. My main question is why do I have
> duplicates
Hi, sorry for the late reply. Work :(
I'm afraid I don't know why you have duplicates. Using a numeric field
as the unique key does work (or at least, I have a
unit test for it that's passing :/ ).
You might need to try reducing your indexing pipeline to a reproducable
test case.
I should point out you're doing extra work by first deleting any
existing documents, and then using update_document(). update_document()
just deletes any documents matching the unique fields and then calls
add_document().
> As a side question though, should my collapsing have worked - and if
> not what should I have done instead?
I think there's a bug in the code for getting the number of results when
the results are collapsed and the number of results to return is
limited (with the limit= keyword). I tried to reproduce your problem and
found a bug, but it might be a different bug :/ Can you try changing
your code to use limit=None and see if it fixes the number of hits reported?
Thanks!
Matt