a brother suggested scoring top visited results first
things would work in the following order:
some documents are added to index
documents are viewed by users
statistics are collected for each hit
some user queries the index
we want to give him popular results first
the problem, popularity is not available at indexing time or it will
change for sure
we need to be able to pass a custom real-time score function to be
multiplied with the default whoosh index-time score function
def popularity_score(result):
m=mymodel.max_hits()
if m==0: return 1.0
return mymodel.get_hits(result['id'])/m
this function will return a float between 0.0 and 1.0
where 1.0 is for the most popular document
You could write your own entirely new term scoring algorithm of course. But since you just want to influence the final score of each document after all the "normal" scoring is done, check out the Weighting.final() method.
http://packages.python.org/Whoosh/api/scoring.html#whoosh.scoring.Weighting.final
Subclass the Weighting algorithm you want to use, and override the final() method, e.g.:
class MyWeighting(scoring.BM25F):
def final(searcher, docnum, score):
# Let's say your model associates a document ID with the hit count
# for each document, and the document ID is in the "id" stored field.
# First, get the contents of the "id" field for this document
docid = searcher.stored_fields(docnum)["id"]
# Look up the document's hit count in my model
maxhits = mymodel.max_hits()
hitcount = mymodel.get_hits(docid)
# Multiply the computed score for this document by the popularity
return score * (hitcount / maxhits)
Then, when you open a searcher, use the "weighting" keyword argument to use your custom Weighting class:
searcher = myindex.searcher(weighting=MyWeighting)
This will be in a tutorial someday ;)
Cheers,
Matt
from whoosh import scoring
class MyWeighting(scoring.BM25F):
def final(searcher, docnum, score):
use_final = True
# Let's say your model associates a document ID with the hit count
# for each document, and the document ID is in the "id" stored field.
# First, get the contents of the "id" field for this document
docid = searcher.stored_fields(docnum)["id"]
# Look up the document's hit count in my model
maxhits = mymodel.max_hits()
hitcount = mymodel.get_hits(docid)
# Multiply the computed score for this document by the popularity
return score * (hitcount / maxhits)Cheers,
def final