Paging issue when docs have the same score and one is updated

154 views
Skip to first unread message

Barry

unread,
Oct 26, 2012, 6:01:25 PM10/26/12
to rav...@googlegroups.com
I'm seeing an issue when attempting to page search results using an infinite scroll ui.  If a search is performed that returns documents with the same score, the ordering of those documents will change if one is updated between paging calls.  What seems to happen is the document that is updated moves to the end of the group of items with the same score.  

I've attached a test that shows this behavior.  I'm wondering if anyone has ideas of how to get around this because we still want Lucene score to be the primary ordering, but we almost need a secondary sort to ensure document updates do not move things around between page calls.

 
PagingSameScoreTests.cs

Oren Eini (Ayende Rahien)

unread,
Oct 26, 2012, 6:03:45 PM10/26/12
to rav...@googlegroups.com
You pretty much have to do it this way, yes, because otherwise, the sort order within the same score is undefined.

Matt Warren

unread,
Oct 26, 2012, 6:04:43 PM10/26/12
to rav...@googlegroups.com
You could use the RavenDB document Id as a 2nd sort, just include it in the index

Barry

unread,
Oct 26, 2012, 6:14:58 PM10/26/12
to rav...@googlegroups.com
Feels dumb to ask, but could you show a code snippet of how I can do Advanced.LuceneQuery and have score be the primary order and Id the secondary?  As soon as I use AddOrder or OrderBy then that seems to be used instead of score - unless I'm missing something.

Matt Warren

unread,
Oct 27, 2012, 5:17:17 AM10/27/12
to rav...@googlegroups.com
No it's a good point, because I don't think that RavenDB currently exposes this ability.

If you specify a sort order yourself that is applied first, then the Lucene score is used as a tie-breaker, followed by the Lucene ID is the scores are equal. The Lucene ID can change as items are deleted/updated/inserted, so that's why the order changes.

I'm not sure how/if RavenDB can do this though, because I'm not sure if Lucene lets you apply your own sort order after it's own scores, I think that it expects to apply the sorting first.

Oren Eini (Ayende Rahien)

unread,
Oct 27, 2012, 5:24:00 AM10/27/12
to rav...@googlegroups.com
No, it is possible, we just need to enable that.

ZNS

unread,
Oct 27, 2012, 6:56:18 AM10/27/12
to rav...@googlegroups.com


On a related note I'd really like to see what solr calls "field boosting", changing the score of a document based on the *value* of a field. This is extremely useful for example to boost certain documents in a query based on popularity or what have you.

Matt Warren

unread,
Oct 27, 2012, 7:16:42 AM10/27/12
to rav...@googlegroups.com
Michael posted a nice trick posted the other day, see https://groups.google.com/d/msg/ravendb/ksJeF_oaAt0/DwdGBRPqAQkJ

docs.Products
.Select(x => new () {
Content = new System.Object []{ x.Name },
Category = x.Category,
}.Boost((float)Math.Log10(x.ProductViewCount + 1) + (float)Math.Log10(x.ProductReviewCount*10 + 1))
)

Is that what you mean?

ZNS

unread,
Oct 27, 2012, 7:32:42 AM10/27/12
to rav...@googlegroups.com

That certainly looks like what I'm looking for, very cool :)

What does it mean when you attach the boost-method not to a field but to the result, I mean how are they different in terms of what happens at query-time?

Matt Warren

unread,
Oct 27, 2012, 8:11:22 AM10/27/12
to rav...@googlegroups.com
I think it's just a way to give all the fields inside the select the same boost, without having to boost each one individually.

Oren Eini (Ayende Rahien)

unread,
Oct 27, 2012, 3:02:53 PM10/27/12
to rav...@googlegroups.com
Correction, I can't figure out how to do that using the Lucene API, so it is pretty hard to do, it appears.

Matt Warren

unread,
Oct 27, 2012, 7:03:49 PM10/27/12
to rav...@googlegroups.com
Glad I wasn't going mad, I looked into how lucene did sorting a while back and didn't remember seeing a way to change to sort like that.

BTW couldn't this be handled purely in RavenDB. It can pull the "page" of results from Lucene and if any items have matching Lucene scores and matching values for any "OrderBy" fields, then sort the results on RavenDB ID before returning them to the user?

Or are there some edge cases that make this unsuitable?

Oren Eini (Ayende Rahien)

unread,
Oct 27, 2012, 7:23:59 PM10/27/12
to rav...@googlegroups.com
Assume that you have 10,000 items, we would have to sort it twice and load a lot of values in RavenDB to do that.

Barry

unread,
Oct 27, 2012, 7:38:25 PM10/27/12
to rav...@googlegroups.com
My edge case is when a set of results with the same score span a page break.

For example, result set is 100 items, page size is 20, and results 15 through 25 have the same score.  Load page one, and then update results #15.  This causes #15 to become #25 and all of the other items with the same score shift up one position.  Now when you load page two, you get #15 showing up again as result #25.  The more critical issue for us is that #21 is completely skipped.   It became the last item on the first page of results due to the update of #15.

We have two use cases where this comes up.  One is a user searching for documents and then tagging them.  The tag update is triggering the reordering.  The other is just a more generic concurrency problem if you are browsing result pages and someone else is doing doc updates.

Chris Marisic

unread,
Oct 30, 2012, 10:39:57 AM10/30/12
to rav...@googlegroups.com
What about sorting by created time ascending? Should guarantee sort order, i guess except perhaps with replication.

Matt Warren

unread,
Oct 30, 2012, 10:44:29 AM10/30/12
to rav...@googlegroups.com
The main problem is that there doesn't seem to be a way to ask Lucene to order the results by score and then if there's a tie-break use another field (Id, created time etc).

Lucene will sort by your field first and then use score as the tie-breaker, not the other way round, which is what Barry needs to maintain consistent paging.

Oren Eini (Ayende Rahien)

unread,
Oct 31, 2012, 9:00:16 AM10/31/12
to rav...@googlegroups.com
Matt,
We actually got a pull request for that, which will allow us to do just that. It will be in the next build.

Chris Marisic

unread,
Oct 31, 2012, 11:22:35 AM10/31/12
to rav...@googlegroups.com
Can you give an example of how this will work?

Oren Eini (Ayende Rahien)

unread,
Oct 31, 2012, 11:23:30 AM10/31/12
to rav...@googlegroups.com
var students = session.Advanced.LuceneQuery<Student>()
.WaitForNonStaleResults()
.WhereStartsWith("FirstName", "David").Boost(3)
.WhereStartsWith("LastName", "David")
.OrderBy(Constants.TemporaryScoreValue, "LastName")
.ToList();


var users = s.Query<User> ("test")
.Customize (x => x.WaitForNonStaleResults ())
.Where (x => x.Email == "doors")
.OrderByScore().ThenBy(x => x.Name)
.ToList ();

Chris Marisic

unread,
Oct 31, 2012, 11:36:09 AM10/31/12
to rav...@googlegroups.com
Fantastic improvement.

Barry

unread,
Nov 1, 2012, 11:09:28 AM11/1/12
to rav...@googlegroups.com
Picked up build 2134 and this solution works great for my original issue.  I'm sorting by score then by doc id.

Thanks Mamanze and RavenDB team!

Eniep Yrekcaz

unread,
Dec 20, 2012, 10:44:52 AM12/20/12
to rav...@googlegroups.com
This functionality is going to be in the 2.0 release? Will this work also for a .Search()?

Oren Eini (Ayende Rahien)

unread,
Dec 20, 2012, 11:47:37 AM12/20/12
to rav...@googlegroups.com
What issue? This thread is months old

Eniep Yrekcaz

unread,
Dec 20, 2012, 2:49:28 PM12/20/12
to rav...@googlegroups.com
When we perform a query and get a list of results. If there are multiple objects that have the same search score, they are returned in ascending order of last modified date. If the user then modifies the object that is returned first, we want to perform the same search(to display the results with the new data) and assuming the didn't update any of the searchable fields, the doc will have the same search score, but will now show up at the bottom of the list. Will there be a way to perform a query and have the results ordered by search score and then secondarily ordered by last modified date descending order?

Oren Eini (Ayende Rahien)

unread,
Dec 20, 2012, 2:52:56 PM12/20/12
to rav...@googlegroups.com
You need to sort by score and something else


On Thursday, December 20, 2012, Eniep Yrekcaz wrote:

Eniep Yrekcaz

unread,
Dec 20, 2012, 2:56:01 PM12/20/12
to rav...@googlegroups.com
can you give an example of how to do this?

Eniep Yrekcaz

unread,
Dec 20, 2012, 2:58:50 PM12/20/12
to rav...@googlegroups.com
That's done on the index. So in the studio, I would have to create fields for document meta-data and then set their sort? I don't see how you can specify desc or asc. I'm using #960.

Oren Eini (Ayende Rahien)

unread,
Dec 20, 2012, 4:12:45 PM12/20/12
to rav...@googlegroups.com
You specify desc / asc during the query, not during the indexing.

Eniep Yrekcaz

unread,
Dec 20, 2012, 4:16:25 PM12/20/12
to rav...@googlegroups.com
But don't you specify the sortoption on the index? How would I specify a sort option on the index for the document metadata of last updated through the 960 version of the management studio?

Oren Eini (Ayende Rahien)

unread,
Dec 20, 2012, 4:17:40 PM12/20/12
to rav...@googlegroups.com
You don't specify the sort option on the index. You just specify the fields you want in the index.
Sorting happen on query time.
Reply all
Reply to author
Forward
0 new messages