Counting resulting documents only.

158 views
Skip to first unread message

Matt Johnson

unread,
Jan 24, 2013, 3:00:22 PM1/24/13
to rav...@googlegroups.com
I'm attempting to answer this SO question: http://stackoverflow.com/q/14490557/634824

When breaking out a document into multiple index entries using this common pattern:
from foo in foos
from bar in foo.bars
select new { foo.A, bar.B, etc... }

How can one get the total resulting documents when querying?  The TotalResults in the statistics reflects the total index entries.  The SkippedResults reflects only the results that were skipped up to that page, so that doesn't help either.  Something like TotalSkippedResults might be in order, but that doesn't exist.

I thought I had a good answer of how to index differently to handle this situation, but it doesn't work for his use case.

Is there a solution?  Perhaps facets would help?  I admit - I am grossly nonfluent when it comes to facets.

Matt Warren

unread,
Jan 24, 2013, 6:09:54 PM1/24/13
to ravendb
Unfortunately facets aren't going to help here either, they count the items in the lucene index, so if you write an index with 2 from clauses, they'll suffer the same issue.

However "Query intersection" might help the question on SO. See http://ravendb.net/docs/client-api/querying/intersection and http://mattwarren.org/2012/06/27/ravendb-query-intersection/. They might be able to write an index that gives them the correct counts and lets them do the queries they want. BUT it's late here, so I could be wrong!!

It's really interesting that this has never come up before?


--
 
 

Matt Warren

unread,
Jan 24, 2013, 6:11:53 PM1/24/13
to ravendb
The only other option I can think of is to have 2 indexes.

1 with only one from clause
1 with 2 from clauses

And then do different queries against each. However that could be problematic if 1 index is "more stale" than the other.

Matt Johnson

unread,
Jan 24, 2013, 7:51:17 PM1/24/13
to rav...@googlegroups.com
I don't think query intersection will work.  He just wants to query on one set of terms.

Two indexes wont work either, because he's not interested in a general count, but a count of the docs that matched the terms.

In his domain terms, every restaurant has a list of promotions.  Each promotion is defined by multiple attributes, such as city and food type.  Give me all restaurants that have a promotion for a particular city and food type.  And I want the count of restaurants that matched.

In simple linq, I would say:

var results = session.Query<Restuarant>().Where(x=> x.Promotions.Any(y=> y.City == "New York" && y.Food == "Pizza"));

Results are exactly what I want, but statistics are off because they matched the terms of the inner list, not the outer document.  This makes pagination problematic.  You know the total results, but you don't know the skipped results until you skip through all of them.  It seems that statistics needs a TotalSkippedResults property.

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 1:42:16 AM1/25/13
to ravendb
Matt,
TotalSkippedResults would require us to go over the entire set of results, which may be very large.
We don't want to do that.

Troy

unread,
Feb 5, 2013, 11:06:53 AM2/5/13
to rav...@googlegroups.com
@Matt Johnson, did you ever come up with a solution to this? I am facing the same issue. Or does anyone have a solution for something similar to a TotalSkippedResults?

Matt Johnson

unread,
Feb 5, 2013, 1:21:19 PM2/5/13
to rav...@googlegroups.com
No, I did not come up with a solution.  But if you look at the comments in that StackOverflow post, the person who originally had the question figured out another way around the problem.  I don't think he solved this particular problem, he just did something different to not need this solved.

Barry

unread,
Feb 5, 2013, 2:01:39 PM2/5/13
to rav...@googlegroups.com
For this issue in general, I think you have to make the additional distinct query upfront:

var actualResultCount = session.Query<Restuarant>().Where(x=> x.Promotions.Any(y=> y.City == "New York" && y.Food == "Pizza")).Select(x=>x.RestID).Distinct().Count();

While that number gives you a realistic total, you will still have issues with non-sequential paging because you don't know where the skipped results are located until you get each page.


For the specific SO question, assuming the fields to query are always the same you could do some domain specific encoding for the index:

from rest in docs 
select new
{
    rest.RestName,
    CityTypeFood= rest.RestPromotions.Select(x=> string.Join("|", x.City, x.Type, x.Food))
}

Then querying on the joined index field will still provide an accurate TotalResults value:

var lookup = string.Join("|","New York","Italian","Pizza");
var results = session.Advanced.LuceneQuery<Restaurant, RestaurantIndex>().WhereEquals("CityTypeFood",lookup);


The string join encoding only works if you are always querying all of the same fields, but you could probably concoct other encodings to handle more complex queries (e.g. flags enum emulation)

Troy

unread,
Feb 5, 2013, 2:24:29 PM2/5/13
to rav...@googlegroups.com
Yeah, I basically did the same thing...
KeyValues = new object[] {
        logEntry.Dictionary.Select(x => x.Key + "|" + x.Value )
    }

And just | them together... not ideal, but it works rather than 2 from clauses.

Troy

unread,
Feb 5, 2013, 2:24:45 PM2/5/13
to rav...@googlegroups.com
Thanks Matt for replying!
Reply all
Reply to author
Forward
0 new messages