Ranking query in Raven DB

150 views
Skip to first unread message

nabils

unread,
Oct 12, 2011, 12:36:00 PM10/12/11
to ravendb
Any ideas we can perform a rank query using a map reduce such as the
one below where we always want to know Jacks position whether he is in
top 5 or not?

Top 5 Sales People in 2010

1) Fred 10m
2) Joe 8m
3) Michael 6m
4) Bob 5m
5) Peter 2m
-------------
27) Jack 110k

Thanks,
Nabil

Ayende Rahien

unread,
Oct 13, 2011, 4:31:14 AM10/13/11
to rav...@googlegroups.com
You would need to load the first five baesed on the rank and check

Ryan Heath

unread,
Oct 13, 2011, 5:02:25 AM10/13/11
to rav...@googlegroups.com
Querying for 'Jack' would not return its rank at 27 ...

What I would like to see is
that Raven could generate those rankings for me base on which field(s)
I would sort.

Maybe when indexing store the rank (rownumber) with the doc?

// Ryan

Ayende Rahien

unread,
Oct 13, 2011, 5:06:02 AM10/13/11
to rav...@googlegroups.com
The problem is that what you want is actually to different things.
There is the ranking, and there is whatever a user is in the top X
It is easy to get either.
You get the ranking by just saying something like:

session.Query<...>().OrderByDescending(x=>x.Ranking)

And you can get whatever a user is in the top X using:

var item = session.Query<...>().Where(x=>x.User = user).single();
var xPos = session.Query.Skip(X-1).Take(1).Single();

return x.Pos.Ranking - item.Ranking;

Ryan Heath

unread,
Oct 13, 2011, 5:27:09 AM10/13/11
to rav...@googlegroups.com
That implies that you know the ranks before hand.
What if I only had <user,sales>. How would we get the rank based on sales?

// Ryan

Ayende Rahien

unread,
Oct 13, 2011, 5:29:31 AM10/13/11
to rav...@googlegroups.com
You don't need to know the rank, all you need to know is the user name and whatever or not he is in the top X

Ryan Heath

unread,
Oct 13, 2011, 6:54:23 AM10/13/11
to rav...@googlegroups.com
Hmm, let me see if we're on the same level ...

Querying top X is simple.
But Jack is at 27, how would you get that number 27?

// Ryan

nabils

unread,
Oct 13, 2011, 6:59:54 AM10/13/11
to ravendb
I'm with Ryan here. Don't quite understand what you mean Ayende?
How do we get raven to precompute the ranks and store them in the
index so we can query on them?


On Oct 13, 11:54 am, Ryan Heath <ryan.q.he...@gmail.com> wrote:
> Hmm, let me see if we're on the same level ...
>
> Querying top X is simple.
> But Jack is at 27, how would you get that number 27?
>
> // Ryan
>
>
>
> On Thu, Oct 13, 2011 at 11:29 AM, Ayende Rahien <aye...@ayende.com> wrote:
> > You don't need to know the rank, all you need to know is the user name and
> > whatever or not he is in the top X
>
> > On Thu, Oct 13, 2011 at 11:27 AM, Ryan Heath <ryan.q.he...@gmail.com> wrote:
>
> >> That implies that you know the ranks before hand.
> >> What if I only had <user,sales>. How would we get the rank based on sales?
>
> >> // Ryan
>
> >> On Thu, Oct 13, 2011 at 11:06 AM, Ayende Rahien <aye...@ayende.com> wrote:
> >> > The problem is that what you want is actually to different things.
> >> > There is the ranking, and there is whatever a user is in the top X
> >> > It is easy to get either.
> >> > You get the ranking by just saying something like:
> >> > session.Query<...>().OrderByDescending(x=>x.Ranking)
> >> > And you can get whatever a user is in the top X using:
> >> > var item = session.Query<...>().Where(x=>x.User = user).single();
> >> > var xPos = session.Query.Skip(X-1).Take(1).Single();
> >> > return x.Pos.Ranking - item.Ranking;
> >> > On Thu, Oct 13, 2011 at 11:02 AM, Ryan Heath <ryan.q.he...@gmail.com>
> >> > wrote:
>
> >> >> Querying for 'Jack' would not return its rank at 27 ...
>
> >> >> What I would like to see is
> >> >> that Raven could generate those rankings for me base on which field(s)
> >> >> I would sort.
>
> >> >> Maybe when indexing store the rank (rownumber) with the doc?
>
> >> >> // Ryan
>
> >> >> On Thu, Oct 13, 2011 at 10:31 AM, Ayende Rahien <aye...@ayende.com>
> >> >> wrote:
> >> >> > You would need to load the first five baesed on the rank and check
>
> >> >> > On Wednesday, October 12, 2011, nabils
> >> >> > <alt%shuhaiber....@gtempaccount.com>
> >> >> > wrote:
> >> >> >> Any ideas we can perform a rank query using a map reduce such as the
> >> >> >> one below where we always want to know Jacks position whether he is
> >> >> >> in
> >> >> >> top 5 or not?
>
> >> >> >> Top 5 Sales People in 2010
>
> >> >> >> 1) Fred 10m
> >> >> >> 2) Joe 8m
> >> >> >> 3) Michael 6m
> >> >> >> 4) Bob 5m
> >> >> >> 5) Peter 2m
> >> >> >> -------------
> >> >> >> 27) Jack 110k
>
> >> >> >> Thanks,
> >> >> >> Nabil- Hide quoted text -
>
> - Show quoted text -

Ayende Rahien

unread,
Oct 13, 2011, 7:39:42 AM10/13/11
to rav...@googlegroups.com
This is what we want:

) Fred 10m
2) Joe 8m
3) Michael 6m
4) Bob 5m
5) Peter 2m
-------------
27) Jack 110k

In order to do that, we assume the following documents:

{ "Name": "Fred", "Amount": 10,000,000 }

etc

We build two indexes for this. First, is the ranking index, which gives us access to all of the distinct price points:

from doc in docs
select new { Amount = new[]{doc.Amount}  } 

from result in results
group result by "constant" into g
select { Amount =  g.SelectMany(x=>x.Amount).Distinct().OrderByDescending(x=>x) }

This gives us a single result for the index, sorted according to value.
We have another index for the Name and Amount
Using the Amount, we can load the result from the map/reduce index and get the value.
Note that this assumes that the number of distinct amounts is relatively small (a few thousands). If you have more than that, we would need a different approach.

nabils

unread,
Oct 13, 2011, 8:18:37 AM10/13/11
to ravendb
As you said this probably wouldn't work with the amount of data I am
dealing with. I need the queries on this to be as fast as possible as
we will have some very dynamic filters on it.

100000 sales people and 3m sales (fake data). You would need to group
sum those sale values in a map reduce first to get the total sales.

Any other ideas? It would be very easy if an index had ranking built
in that you could query on.
> On Thu, Oct 13, 2011 at 12:59 PM, nabils <alt%shuhaiber....@gtempaccount.com
> > > - Show quoted text -- Hide quoted text -

Ryan Heath

unread,
Oct 13, 2011, 8:22:34 AM10/13/11
to rav...@googlegroups.com
> Any other ideas? It would be very easy if an index had ranking built
> in that you could query on.
>

When an index is built with an orderby, isn't there an implicit rownumber?

// Ryan

Ayende Rahien

unread,
Oct 13, 2011, 8:37:53 AM10/13/11
to rav...@googlegroups.com
No, that isn't how it works, actually.
We build the index incrementally

Ayende Rahien

unread,
Oct 13, 2011, 8:38:30 AM10/13/11
to rav...@googlegroups.com
Would it help to have ranges?
First 10, you have the actual results.
Next 100, you round, and so on.
You would have only a very few results that way

jalchr

unread,
Oct 13, 2011, 8:42:29 AM10/13/11
to rav...@googlegroups.com


This is simple with chained methods
http://weblogs.asp.net/fmarguerie/archive/2008/11/10/using-the-select-linq-query-operator-with-indexes.aspx


I don't know how this works here:

from doc in docs
select new { Amount = new[]{doc.Amount}  } 

var row = 0;
from result in results
group result by "constant" into g
select { 
                   Amount =  g.SelectMany(x=>x.Amount).Distinct().OrderByDescending(x=>x),
                   Rank = index ( ??)
        }

Ayende Rahien

unread,
Oct 13, 2011, 9:24:37 AM10/13/11
to rav...@googlegroups.com
That doesn't work the way we are doing things, we don't operate over the whole set.

nabils

unread,
Oct 13, 2011, 7:07:27 PM10/13/11
to ravendb
The other problem is that I need to filter this data on the fly at a
very granular level as this is for a dashboard.
So for example I might just want to see the same ranking for a date
range with accuracy of a day and/or select certain regions and/or
countries as well etc.
Therefore map reduce will not work here as I need all the data
available to me.

So resorting to querying the data without aggregation I run into
another problem. Now I need to do the aggregation on the client (web
server in my case). Meaning high memory usage and also running into
ravens 128 page limit.
Am I using the right tool for the job?

So far my best option is SQL.

On Oct 13, 2:24 pm, Ayende Rahien <aye...@ayende.com> wrote:
> That doesn't work the way we are doing things, we don't operate over the
> whole set.
>
>
>
>
>
>
>
> On Thu, Oct 13, 2011 at 2:42 PM, jalchr <jal...@gmail.com> wrote:
>
> > This is simple with chained methods
>
> >http://weblogs.asp.net/fmarguerie/archive/2008/11/10/using-the-select...
>
> > I don't know how this works here:
>
> > from doc in docs
> > select new { Amount = new[]{doc.Amount}  }
>
> > var row = 0;
> > from result in results
> > group result by "constant" into g
> > select {
> >                    Amount =  g.SelectMany(x=>x.Amount).**
> > Distinct().OrderByDescending(**x=>x),

Ayende Rahien

unread,
Oct 13, 2011, 7:27:55 PM10/13/11
to nabils, ravendb
Can you provide source data and expected outputs?

Sent from my Windows Phone
From: nabils
Sent: 10/14/2011 1:07
To: ravendb
Subject: [RavenDB] Re: Ranking query in Raven DB

Ryan Heath

unread,
Oct 14, 2011, 3:36:05 AM10/14/11
to rav...@googlegroups.com
The culprit here is that for any change the rank needs to operate at the whole set of docs, while map/reduce operate at a limited set of docs.

Your requirements sound like 'reporting', how 'hot' should the data be? Could you rebuild your data into another database with all the rankings 'built-in'?

// Ryan

nabils

unread,
Oct 14, 2011, 4:09:52 AM10/14/11
to ravendb
The rankings need to be dynamic based on the filters so pre computing
them will not work.

This is more ad hoc analysis rather than reporting. Data is only
refreshed once a day.

Ayende,

I will send you a sample data source and an idea of the filters and
output by email.

On Oct 14, 8:36 am, Ryan Heath <ryan.q.he...@gmail.com> wrote:
> The culprit here is that for any change the rank needs to operate at the
> whole set of docs, while map/reduce operate at a limited set of docs.
>
> Your requirements sound like 'reporting', how 'hot' should the data be?
> Could you rebuild your data into another database with all the rankings
> 'built-in'?

>
> // Ryan
>
> On Friday, October 14, 2011, nabils <alt%shuhaiber....@gtempaccount.com>
Reply all
Reply to author
Forward
0 new messages