Field boosting by function

61 views
Skip to first unread message

Roger Binns

unread,
Apr 11, 2011, 7:51:16 PM4/11/11
to who...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have a need to boost scores based initially on date, but later on using a
python function of my choosing to calculate the value. When using Solr I
could do this using FunctionQuery which augments the existing scoring system:


http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

As far as I can tell the closest thing is writing a scoring class as in
http://packages.python.org/Whoosh/api/scoring.html

Ideally what I'd like is a way of providing an additional boost function
that runs after the existing scorers with the entire document available and
can then adjust the score calculated so far.

Another doc "bug" - there is no Cosine class as shown in the example:

http://packages.python.org/Whoosh/searching.html#scoring

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2jk+sACgkQmOOfHg372QRJJQCfVtqj0wrqEHuW6Zpi5MZk1ogs
VtwAn1HSDMg4hEkDMzdfFsiNMml9w7iH
=FGPw
-----END PGP SIGNATURE-----

Roger Binns

unread,
Apr 12, 2011, 1:59:57 AM4/12/11
to who...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/11/2011 04:51 PM, Roger Binns wrote:
> Ideally what I'd like is a way of providing an additional boost function
> that runs after the existing scorers with the entire document available and
> can then adjust the score calculated so far.

The answer for anyone who comes across this post later and if it isn't added
to the doc is:

- - Derive a class from one in scoring (eg BM25F)
- - Set the attribute use_final to True
- - Define a final() method
- - Supply the class or an instance as the weighting parameter when making a
Searcher
- - The final method will be called after the document has been scored and
should return an adjusted score

The final method signature is:

def final(self, searcher, docnum, score):
# This will get any stored fields for the document
fields=searcher.stored_fields(docnum)
# Return the score you want
return score*1

Doing this I was able to implement a date bias with the same calculations as
the Solr articles recommend. I also cheated and used monkey-patching to get
my final method used.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2j6lkACgkQmOOfHg372QQRJQCg0JNeNS9YBvS1tNR40FN3xcMv
MZYAoJeM2AQLyMk4lfskJP/TZD5GpkLs
=3+aY
-----END PGP SIGNATURE-----

Jason D. Williams

unread,
Apr 12, 2011, 8:36:37 AM4/12/11
to who...@googlegroups.com, Roger Binns
One follow-up question: Does the final(...) method have access to the
query? I looked into using final() at one point and I didn't see a
way to access the query

- J

> --
> You received this message because you are subscribed to the Google Groups "Whoosh" group.
> To post to this group, send email to who...@googlegroups.com.
> To unsubscribe from this group, send email to whoosh+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/whoosh?hl=en.
>
>

Matt Chaput

unread,
Apr 12, 2011, 11:13:15 AM4/12/11
to who...@googlegroups.com
On 12/04/2011 8:36 AM, Jason D. Williams wrote:
> One follow-up question: Does the final(...) method have access to the
> query? I looked into using final() at one point and I didn't see a
> way to access the query

No, but you could instantiate your custom weighting class per-request
and just give it the query.

Matt

Roger Binns

unread,
Apr 12, 2011, 12:08:54 PM4/12/11
to who...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

As an example I put the start time of the request into the weighting class
so that when I do the date bias I can subtract the document date from that
to get its age. Without that I'd have to call time.time() for every document.

It is probably somewhat obvious but should also be mentioned that the final
method is only called on documents that match in some way and not every
document in the collection. Consequently you cannot use the final method to
change the score of documents that do not match at all.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2keRIACgkQmOOfHg372QR7/QCfdpJQmtNBoDXT5VEISUHWqVM/
vMYAn3ONEvysGc+Zgn6xA8XfW3FV3liL
=yiXQ
-----END PGP SIGNATURE-----

Reply all
Reply to author
Forward
0 new messages