> But could somebody explain to me, why the score would change after a commit,
> even if the related fields have not changed?
This is a lucene issue (not a bug... a design trade-off).
1) When you delete a document, it is only marked as deleted so that it
is not returned with search results. It is really deleted when the
segment it is in undergoes a merge.
2) The inverted-index is completely unaffected by deleted documents.
It would be extraordinarily expensive to reflect deletions in these
structures.
3) Part of full-text scoring includes "idf" (inverse document
frequency)... this depends on both the number of documents in the
index and the number of documents containing the term. The latter is
part of the inverted index structure and does not reflect deletions.
Example: I have an index with 1M documents. The term "text:foobar"
occurs in 500 documents. The IDF function is passed (500,1000000).
Now I update (re-index) one of the documents, changing one of the
other fields. The overwritten document hasn't been merged away yet,
and it's statistics still appear in the inverted index. Now the
number of documents containing the term "text:foobar" is 501, and the
IDF calculation has been changed.
Fixes: really remove deleted documents, which will cause their
inverted index statistics to be updated
- call optimize... this merges all segments and hence removes all
deleted docs. This is extremely expensive as it re-writes the entire
index.
- use "expungeDeletes=true" on your commit command. This is also very
expensive, rewriting the complete segment if it has even a single
deletion.
-Yonik