Search results ranking

16 views
Skip to first unread message

Chaiyasit (Sit) Manovit

unread,
Nov 22, 2012, 9:53:58 PM11/22/12
to ep...@googlegroups.com
We currently do not utilize Lucene's results ranking and only sort the results by dates/times. Would we also want an option to sort by rankings? If so, how should that be presented to users (a traditional list of sorted messages or any visualization aid)? So far we haven't looked at the ranking quality though.

Thanks,

--Sit

Peter Chan

unread,
Nov 27, 2012, 1:54:50 PM11/27/12
to ep...@googlegroups.com, s...@ixoratech.com
Hi Sit,

Do you mind explain how the ranking work? Or point me to the relevant documents?

Thanks

Chaiyasit (Sit) Manovit

unread,
Nov 27, 2012, 3:22:35 PM11/27/12
to ep...@googlegroups.com
Hi Peter,

"In general, the idea [behind the VSM] is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query." (from http://lucene.apache.org/core/3_6_0/scoring.html)

A query can be viewed as a short document itself and Lucene will find documents that are similar (relevant) to it. Each document is represented by an N-dimension vector where N counts the number of unique terms found across all documents. The value in each dimension of a document vector is a function of each term's frequency in that document relative to its frequency in all documents. Similarity between two documents is defined as the angle between their corresponding vectors. More details in http://en.wikipedia.org/wiki/Vector_Space_Model.

Full detail of computing similarity with formulas is in http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Similarity.html.

Thanks,

--Sit

Peter Chan

unread,
Nov 28, 2012, 11:02:21 PM11/28/12
to ep...@googlegroups.com, s...@ixoratech.com
Hi Sit,

Since the no. of hits in our search is relatively small, I believe researchers would like to explore all hits. I would recommend we don't implement ranking at this moment. 

Thanks for letting me know such options exist and I will certainly consider it once we have to deal with very big archives.

Peter
Reply all
Reply to author
Forward
0 new messages