Relevancy Ranking

1 view
Skip to first unread message

mikeybe

unread,
Jan 10, 2008, 10:03:30 AM1/10/08
to FacBackOPAC
Anyone have any thoughts on changing some of the fields in the solr
index to make relevancy ranking an actual possibility? Currently the
keyword search is done by creating a field called "text", copying all
of the fields that we want searched into it, then searching that one
field.

Unfortunately, this makes relevancy ranking hard because you can't
give boosts to specific fields to help results float to the top.

Is there an easy way to "unimplement" this text field and search it in
a legitamate way? I would be happy to do it, but don't want to get in
too deep before talking to someone who actually might know what this
would involve.

Thanks,
Mike

Gabriel Sean Farrell

unread,
Jan 10, 2008, 11:43:56 AM1/10/08
to FacBackOPAC
The only "unimplementation" that needs to happen is in the way the
search is done. A catch-all text field is a good idea, and should be
kept in the search, just ranked lower. I believe the slickest way to
do field weighting is with the DisMaxRequestHandler. See
http://wiki.apache.org/solr/SolrRequestHandler and
http://wiki.apache.org/solr/DisMaxRequestHandler for help.

While you're at it, the schema.xml could use cleaning up. Compare
ours to VuFind's (https://vufind.svn.sourceforge.net/svnroot/vufind/
trunk/solr/conf/schema.xml). I'll get around to this eventually if it
doesn't excite you.

> Thanks,
> Mike

Dan Scott

unread,
Jan 10, 2008, 12:12:04 PM1/10/08
to facba...@googlegroups.com
+1 for schema.xml clean-up
+1 for DisMax
+1 for comparing & collaborating with VuFind

I believe that Casey was pretty proud of the improvements he made to relevancy ranking in his cut of the Helios code - there might be some ideas we can mine from there, too.

On 10/01/2008, Gabriel Sean Farrell <gsf...@gmail.com> wrote:

On Jan 10, 10:03 am, mikeybe <MikeBecca...@gmail.com> wrote:
> Anyone have any thoughts on changing some of the fields in the solr
> index to make relevancy ranking an actual possibility? Currently the
> keyword search is done by creating a field called "text", copying all
> of the fields that we want searched into it, then searching that one
> field.
>
> Unfortunately, this makes relevancy ranking hard because you can't
> give boosts to specific fields to help results float to the top.
>
> Is there an easy way to "unimplement" this text field and search it in
> a legitamate way? I would be happy to do it, but don't want to get in
> too deep before talking to someone who actually might know what this
> would involve.

The only "unimplementation" that needs to happen is in the way the
search is done.  A catch-all text field is a good idea, and should be
kept in the search, just ranked lower.  I believe the slickest way to
do field weighting is with the DisMaxRequestHandler.  See
http://wiki.apache.org/solr/SolrRequestHandler and
http://wiki.apache.org/solr/DisMaxRequestHandler for help.

While you're at it, the schema.xml could use cleaning up.  Compare

trunk/solr/conf/schema.xml).  I'll get around to this eventually if it
doesn't excite you.

> Thanks,
> Mike



--
Dan Scott
Laurentian University

Casey

unread,
Jan 11, 2008, 12:21:38 PM1/11/08
to FacBackOPAC
Yes, I'd definitely recommend looking at the relevancy for Helios.
It's a very subjective (and tedious) process to determine whether, for
instance, the general keyword search should give more of a boost to
authors or titles.

When you search on Stephen King, should the first result be a book by
him, or about him? When you search on "cat", should the first result
be a book about cats, "The Cat Who Walks Through Walls", a cookbook by
Cat Cora or the latest album featuring Cat Power? You pretty much
have to do a bunch of searches and decide whether they gave you what
you were looking for. So the relevancy in Helios is very much a
reflection of my prejudices about how a general keyword search should
work and bound to provoke some disagreement, but it would be a good
place to start.


On Jan 10, 9:12 am, "Dan Scott" <deni...@gmail.com> wrote:
>
> I believe that Casey was pretty proud of the improvements he made to
> relevancy ranking in his cut of the Helios code - there might be some ideas
> we can mine from there, too.
>
> On 10/01/2008, Gabriel Sean Farrell <gsf...@gmail.com> wrote:
>
>
>
>
>
> > On Jan 10, 10:03 am, mikeybe <MikeBecca...@gmail.com> wrote:
> > > Anyone have any thoughts on changing some of the fields in the solr
> > > index to make relevancy ranking an actual possibility? Currently the
> > > keyword search is done by creating a field called "text", copying all
> > > of the fields that we want searched into it, then searching that one
> > > field.
>
> > > Unfortunately, this makes relevancy ranking hard because you can't
> > > give boosts to specific fields to help results float to the top.
>
> > > Is there an easy way to "unimplement" this text field and search it in
> > > a legitamate way? I would be happy to do it, but don't want to get in
> > > too deep before talking to someone who actually might know what this
> > > would involve.
>
> > The only "unimplementation" that needs to happen is in the way the
> > search is done. A catch-all text field is a good idea, and should be
> > kept in the search, just ranked lower. I believe the slickest way to
> > do field weighting is with the DisMaxRequestHandler. See
> >http://wiki.apache.org/solr/SolrRequestHandlerand
> >http://wiki.apache.org/solr/DisMaxRequestHandlerfor help.
>
> > While you're at it, the schema.xml could use cleaning up. Compare
> > ours to VuFind's (https://vufind.svn.sourceforge.net/svnroot/vufind/
Reply all
Reply to author
Forward
0 new messages