passing realtime scoring function

113 views
Skip to first unread message

Muayyad AlSadi

unread,
Feb 15, 2010, 10:24:06 AM2/15/10
to whoosh
hello,

a brother suggested scoring top visited results first

things would work in the following order:
some documents are added to index
documents are viewed by users
statistics are collected for each hit
some user queries the index
we want to give him popular results first

the problem, popularity is not available at indexing time or it will
change for sure

we need to be able to pass a custom real-time score function to be
multiplied with the default whoosh index-time score function

def popularity_score(result):
m=mymodel.max_hits()
if m==0: return 1.0
return mymodel.get_hits(result['id'])/m

this function will return a float between 0.0 and 1.0
where 1.0 is for the most popular document

Matt Chaput

unread,
Feb 15, 2010, 3:13:38 PM2/15/10
to who...@googlegroups.com
> we need to be able to pass a custom real-time score function to be
> multiplied with the default whoosh index-time score function
>
> def popularity_score(result):
> m=mymodel.max_hits()
> if m==0: return 1.0
> return mymodel.get_hits(result['id'])/m
>
> this function will return a float between 0.0 and 1.0
> where 1.0 is for the most popular document

You could write your own entirely new term scoring algorithm of course. But since you just want to influence the final score of each document after all the "normal" scoring is done, check out the Weighting.final() method.

http://packages.python.org/Whoosh/api/scoring.html#whoosh.scoring.Weighting.final

Subclass the Weighting algorithm you want to use, and override the final() method, e.g.:

class MyWeighting(scoring.BM25F):
def final(searcher, docnum, score):
# Let's say your model associates a document ID with the hit count
# for each document, and the document ID is in the "id" stored field.

# First, get the contents of the "id" field for this document
docid = searcher.stored_fields(docnum)["id"]

# Look up the document's hit count in my model
maxhits = mymodel.max_hits()
hitcount = mymodel.get_hits(docid)

# Multiply the computed score for this document by the popularity
return score * (hitcount / maxhits)

Then, when you open a searcher, use the "weighting" keyword argument to use your custom Weighting class:

searcher = myindex.searcher(weighting=MyWeighting)


This will be in a tutorial someday ;)

Cheers,

Matt

Muayyad AlSadi

unread,
Feb 15, 2010, 3:31:14 PM2/15/10
to who...@googlegroups.com
thank you, that was very useful

Cédric Beuzit

unread,
May 14, 2013, 11:40:19 AM5/14/13
to who...@googlegroups.com
For the next users that will be reading this:

It seems things have slightly changed by that time, now, I think we should set use_final to True for the overridden method to be taken into account, like this:

 from whoosh import scoring
 
class MyWeighting(scoring.BM25F):
 
def final(searcher, docnum, score):
    use_final
= True


   
# Let's say your model associates a document ID with the hit count
   
# for each document, and the document ID is in the "id" stored field.

   
# First, get the contents of the "id" field for this document
    docid
= searcher.stored_fields(docnum)["id"]
   
   
# Look up the document's hit count in my model
    maxhits
= mymodel.max_hits()
    hitcount
= mymodel.get_hits(docid)

   
# Multiply the computed score for this document by the popularity
   
return score * (hitcount / maxhits)

Cheers,

Martin Čech

unread,
Jan 27, 2015, 4:55:36 PM1/27/15
to who...@googlegroups.com
The 
use_final = True
has to be outside of the
def final
in order for this to work.
Reply all
Reply to author
Forward
0 new messages