A good way to handle user ratings and favorites?

187 views
Skip to first unread message

EisenbergEffect

unread,
Sep 1, 2011, 3:20:39 PM9/1/11
to ravendb
I have an entity Foo. Users want to rate instances of Foo on a scale
of 1 to 5. Users also want to mark an instance of Foo as their
favorite. Users can rate lots of Foos and can have lots of Favorites.
The UI needs to show basic Foo info along with it's number of
favorites, number of raters and average rating. Some times an
individual Foo is displayed this way, but frequently we need to
display a list of Foo, such as the top 10 highest rated Foo or the
most recently created Foo or the Foo that are most favorited.

My initial design had stored basic statistics along with Foo. So,
every time a user rated a Foo, I incremented the rating count and
recalculated the average. With favorites, I just incremented the
count. I also stored instances of FooFavorite and FooRating, so that
we could tell the User if the Foo they were looking at was their
favorite and what they had rated it as, along side the average rating.
This was fairly simple and works, but I'm not sure how well it would
scale. It doesn't quite feel right to me. It also seamed like Map/
Reduce was a good fit for this. So, I removed the statistics from the
Foo and created Map/Reduce indexes to calculate the info based on the
FooFavorite and FooRating instances. That was pretty cool, but then I
realized that querying for a page of Foo, was going to be pretty ugly.
So, I'm thinking about switching back to my first design. However, I'm
pretty sure that there's other options that are better. I just haven't
tuned my document database mind to think of them. Is there some
combination of Map/Reduce/Transform that would make this simple?

How would you design something like this?

Frank Schwieterman

unread,
Sep 1, 2011, 11:57:14 PM9/1/11
to rav...@googlegroups.com
What more precisely is your concern with building the list of Foos?
The queries you mention based on such an index seem doable. If the
concern is pulling in the actual Foo objects, be sure you have looked
at the Include() command.

Daniel Lang

unread,
Sep 2, 2011, 1:37:01 AM9/2/11
to rav...@googlegroups.com
Take at look at Rob Ashtons StackOverlow style voting with Live Projections
This should help you.

EisenbergEffect

unread,
Sep 2, 2011, 5:36:05 PM9/2/11
to ravendb
That was a good read Daniel. Unfortunately, it won't work because I
need to sort my data based on information that isn't in the Map needed
to do the aggregation. I'm starting to be convinced that what I'm
trying to do just isn't possible. Let me put some simple code here and
requirements and see if anyone can make a recommendation:

public class Foo{
string Id;
DateTimeOffset UpdatedAt;
}

public class OpinionOfFoo{
string UserId;
string FooId;
bool IsFavorite;
int Rating;
}

Ultimately, I need some output that looks like this:

public class FooProfile{
float RatingAverage;
int NumberOfRatings;
int NumberOfFavorites;
Foo Foo;
}

I can create a map/reduce on OpinionOfFoo and use a projection to load
in Foo so that I get the output shape that I want. The problem is,
sometimes I need to sort these results on UpdateAt, which is not part
of the map. So, I'm really unsure how to structure things at this
point. Ultimately, I need to load a list of Foo and for each instance
of Foo in that list, I need to know its RatingAverage, NumberOfRatings
and NumberOfFavorites. I'd like to do that in one query, because this
is only part of the data that needs to be displayed in this particular
screen....also, I need to load these up individually in a number of
different places.




On Sep 2, 1:37 am, Daniel Lang <d.l...@flexbit.at> wrote:
> Take at look at Rob Ashtons StackOverlow style voting with Live Projections<http://codeofrob.com/archive/2010/12/16/ravendb-stackoverflow-style-v...>
> This should help you.

Matt Warren

unread,
Sep 3, 2011, 4:04:30 AM9/3/11
to ravendb
In the Reduce part of your query you can group by the FooId, but you
can also carry-through the UpdatedAt value of all the OpinionOfFoo you
reduced on, *something* like this

results => from result in results
group result by result.FooId
into g
select new
{
FooId = g.FooId
UpdatedAt = g.Max(x =>
x.UpdatedAt),
......
}

You'll then need to make you Map *something like this*

Map = children => from opinion in OpinionOfFoos
let actualFoo =
dBase.Load<Foo>(opinion.FooId)
select new
{
FooId = opinion.FooID,
UpdatedAt =
actualFoo.UpdatedAt
......
},

The when you query the Map/Reduce you can SortBy "UpdatedAt" and get
the results in order.

Frank Schwieterman

unread,
Sep 3, 2011, 4:28:00 AM9/3/11
to rav...@googlegroups.com
I think the issue is that he wants to sort the index result based on
Foo.LastUpdated, when the index was built for type OpinionOfFoo.
Linq-defined indexes in particular are built from a single type, so
the field from Foo is not available to sort on.

It is possible to index multiple types in the same index (so you can
count OpinionsOfFoo while tracking fields in Foo) but I do not think
it can be done with a linq-based index definition

There is a test that shows how to index multiple types in one index at
Raven.Tests.Bugs.MultiEntityIndex. Another example at
Raven.Tests.Bugs.Indexing.ComplexUsage shows how to actually merge the
fields from different types. I don't know if there is a simpler way.
FWIW I don't think I've tried this.

Ayende Rahien

unread,
Sep 3, 2011, 6:25:11 AM9/3/11
to rav...@googlegroups.com
What is more important? Tracking rating on Foo or tracking user's rating?

EisenbergEffect

unread,
Sep 3, 2011, 8:32:08 AM9/3/11
to ravendb
@Matt/Frank - Frank is correct. The problem is that LastUpdated lives
on Foo, not on OpinionOfFoo.

@Ayende - I'm not sure if I understand the question entirely. But, I
do need to know what each user's rating of a particular Foo was,
independent of the aggregation. The aggregation data is basically for
a summary screen. When a user drills into the details, I want to show
the overall rating and their personal rating of Foo side-by-side. They
also need to be able to remove a favorite that they previously added.

On Sep 3, 6:25 am, Ayende Rahien <aye...@ayende.com> wrote:
> What is more important? Tracking rating on Foo or tracking user's rating?
>

Ayende Rahien

unread,
Sep 3, 2011, 8:37:00 AM9/3/11
to rav...@googlegroups.com
How about something like this, then?

foos/123
foos123/rating

users/345
users/345/favorites
users/345/rating

You basically store all of the rating for an item in a single document.
AND store all of the rating for a user in a document.
That gives you easy querying and very cheap stats.

Carlos Mendes

unread,
Sep 3, 2011, 2:31:11 PM9/3/11
to rav...@googlegroups.com
Ayende,

if we store all the rating for an item in a single document and all the ratings for a user in a single document how easy it will be to make changes - changing or cancelling - to a existing rating, without loading the entire document?  

Matt Warren

unread,
Sep 3, 2011, 5:03:42 PM9/3/11
to ravendb

Matt Warren

unread,
Sep 3, 2011, 5:26:10 PM9/3/11
to ravendb
I've realised that the code I wrote below wasn't valid, but what about
if it was?!

MapAdvanced = (database, OpinionOfFoos) => from opinion in
OpinionOfFoos
let actualFoo =
database.Load<Foo>(opinion.FooId)
select new
{
FooId = opinion.FooID,
UpdatedAt =
actualFoo.UpdatedAt
......
},

Would you ever want to support this scenarion? It's kinda-like Live-
Projections, but it happens in the Map/Select phase of an index

I guess it's only useful is you have a normalised reference and want
to pull data in from it, so that it can be indexed as part of another
doc. So the index query would still return OpinionOfFoo doc's, but you
could search using info contained in the linked Foo doc?

Frank Schwieterman

unread,
Sep 3, 2011, 6:27:48 PM9/3/11
to rav...@googlegroups.com
Matt: I don't think patching works for updating/deleting opinions
stored within Foo since it modifies arrays be index only, you couldn't
target a member by field (so far as I know).

Ayende: I would have two concerns storing OpinionOfFoo with Foo, one
is the likelihood of concurrency problems the other is that the
documents will get too large if you have a lot of traffic.

*: One advantage of storing each OpinionOfFoo as a separate document
is that you can use the document identity to ensure only one opinion
per foo/user. So no need to read anything when someone saves a
rating.

Is there any drawback to storing OpinionOfFoo separately besides the
difficulty of defining an index that takes multiple types as input?
It seems to me the linq index definitions could be extended to support
this, though not trivial.

Matt, if I were try to support linq-based indexes that can map
multiple types, it might look like:
public class OverallOpinion : AbstractIndexCreationTask<?>
{
public OverallOpinion()
{
Map<Foo>(docs => from doc in docs select new { Id =
doc.Id, LastUpdated = doc.LastUpdated }
Map<OpinionOfFoo>(docs => from doc in docs select new
{ Id = Doc.DocId, Rating = doc.Rating, Count = 1}

Reduce = docs => from doc in docs
group doc by doc.Id into g
select new {
Id = g.Key,
LastUpdated = g.Values.Where(f =>
f.LastUpdated != null).FirstOrDefault(),
Rating = g.Values.Rating.Sum(),
Count = g.Values.Count.Sum()
}
// average rating is Rating / Count
}
}

It seems like some clever code could combine the different map
expressions into one.

EisenbergEffect

unread,
Sep 3, 2011, 8:29:44 PM9/3/11
to ravendb
Thanks everyone for the feedback! Here's what I ended up doing. It
seams to work well for my scenario. I have the basic Foo document:

public class Foo{
string Id;
string DisplayName;
...more properties here...
}

Then I have an OpinionOfFoo document for each User who rates or
favorites a Foo. It looks like this:

public class OpinionOfFoo{
string Foo;
string User;
int Rating;
bool IsFavorite;
}

Next, I defined a aggregate of opionions called FooConcensus, which
looks like this:

public class FooConcensus{
string Foo;
int NumberOfFavorites;
int NumberOfRatings;
float AverageRating;
}

with a Map/Reduce over OpinionOfFoo that outputs the FooConcensus. It
looks Like this:

Map = opinions => from opinion in opinions
select new {
Foo = opinion.Foo,
NumberOfFavorites = opinion.IsFavorite ? 1 :
0,
NumberOfRatings = opinion.Rating != 0 ? 1 : 0,
AverageRating = (float)opinion.Rating
};

Reduce = results => from result in results
group result by result.Foo
into g
select new {
Foo = g.Key,
NumberOfFavorites = g.Sum(x =>
x.NumberOfFavorites),
NumberOfRatings = g.Sum(x =>
x.NumberOfRatings),
AverageRating = g.Where(x => x.AverageRating !
= 0).Average(x => x.AverageRating)
};

Now I basically have two types of queries. The first type queries
based on the indexed statistics and includes the Foo document. An
example of that would be "GetTopFoo" and would look like this:

var concensuses = session.Query<FooConcensus,
OpinionOfFoo_Concensus>()
.Include<FooConcensus>(x => x.Foo)
.OrderByDescending(x => x.NumberOfFavorites)
.OrderByDescending(x => x.AverageRating)
.Take(5);

The second type queries based on Foo itself, such as "GetNewestFoo",
but needs to include the statistics. This can't be done with a call to
Include (as far as I know), so I use two queries like so:

var query = session.Query<Foo>()
.OrderByDescending(x => x.CreatedAt)
.Take(5)
.ToList();

var concensuses = session.Advanced.LuceneQuery<FooConcensus,
OpinionOfFoo_Concensus>()
.WhereContains("Foo", query.Select(x => x.Id))
.ToList();

I then join these together in memory on Id.

I ended up encapsulating both types of queries in a repository so that
I could hide the details of how the different scenarios where handled
and then just expose the results in a common format. To do that, I
defined a class called FooOverview, which looks like this:

public class FooOverview {
FooConcensus concensus;

public Foo Foo { get; set; }

public FooConcensus Concensus {
get {
return concensus ?? (concensus = new FooConcensus {
NumberOfFavorites = 0,
NumberOfRatings = 0,
AverageRating = 0
});
}
set { concensus = value; }
}

public FooOpinion Opinion { get; set; }
}

I'm using some fanciness in the getter just in case the index returns
nothing, because the particular Foo hasn't been rated of favorited.
Also, FooOpinion, is used only in certain cases where I need to load
the current user's opinion of the Foo. This doesn't ever happen for
lists, only for single details screens.

Finally, I fenagled things a bit more so that I could use Lazy queries
whenever possible. Basically, most screens will have one round trip to
the database, if they are querying on the index, or two roundtrips if
they have to query the Foo first. That seamed pretty reasonable.

Let me know if any of this is a really bad idea. So far, it seams to
meet my needs and it's definitely better than what I started with. I
can update favorites and ratings without any contention, the
aggregates are handled by the database and with a small number of
queries, I can obtain the precise information I need for several
varying scenarios.

Ayende Rahien

unread,
Sep 4, 2011, 12:38:38 AM9/4/11
to rav...@googlegroups.com
Matt,
We can't support this, we have no way of notifying the index that the associated reference have changed.

Ayende Rahien

unread,
Sep 4, 2011, 12:41:10 AM9/4/11
to rav...@googlegroups.com
Frank,
That is a... very interesting approach for the multiple type thing, and probably the one that is making the absolute most sense to me.
Providing multiple Map clauses is actually fairly easy, and assuming that all map clauses generate the same type (which we now enforce), this wouldn't be hard to implement at ALL.
_Very_ well done, I would put it on the same level of having the idea for Includes.

Ayende Rahien

unread,
Sep 4, 2011, 12:43:19 AM9/4/11
to rav...@googlegroups.com
Rob,
This looks more than fine, yes.
One comment on the use of the LuceneQuery, you can use just Query the In() support, instead.

Ayende Rahien

unread,
Sep 4, 2011, 12:38:00 AM9/4/11
to rav...@googlegroups.com
a) Why the aversion of loading the entire document? It would be cheap and easy to do so.
b) Patching is a possibility here, but I would go with simple document modification instead.

Carlos Mendes

unread,
Sep 4, 2011, 9:30:43 AM9/4/11
to rav...@googlegroups.com
I was just wondering if it wouldn't be expensive to get the whole document to perform the changes in a scenario with a high number of ratings per item.

Ayende Rahien

unread,
Sep 4, 2011, 9:44:18 AM9/4/11
to rav...@googlegroups.com
Carlos,
You have to define expensive.
There is small likelihood of hotspots, which encourage simpler designs.
If you expect hotspots, it is the patch that will deal with it. 

Frank Schwieterman

unread,
Sep 4, 2011, 12:45:17 PM9/4/11
to rav...@googlegroups.com
That's good to hear. That's awesome you already have it in the latest build.

Matt Warren

unread,
Sep 4, 2011, 4:25:02 PM9/4/11
to ravendb
Frank,
Wow, yeah that's a really nice solution to the problem. You've got a
knack of coming up with these elegant solutions!!

>   Matt, if I were try to support linq-based indexes that can map
> multiple types, it might look like:
>         public class OverallOpinion : AbstractIndexCreationTask<?>
>         {
>             public OverallOpinion()
>             {
>                 Map<Foo>(docs => from doc in docs select new { Id =
> doc.Id, LastUpdated = doc.LastUpdated }
>                 Map<OpinionOfFoo>(docs => from doc in docs select new
> { Id = Doc.DocId, Rating = doc.Rating, Count = 1}
>
>                 Reduce = docs => from doc in docs
>                                  group doc by doc.Id into g
>                                  select new {
>                                     Id = g.Key,
>                                     LastUpdated = g.Values.Where(f =>
> f.LastUpdated != null).FirstOrDefault(),
>                                     Rating = g.Values.Rating.Sum(),
>                                     Count = g.Values.Count.Sum()
>                                  }
>                 // average rating is Rating / Count
>             }
>         }
>
>   It seems like some clever code could combine the different map expressions into one.
>

Matt Warren

unread,
Sep 4, 2011, 4:26:53 PM9/4/11
to ravendb
Oh I see, you get the latest eTag of all the docs passed into the
Index Select staement, so if it could pull in "newer" docs it would be
a problem.

Anyway Franks solution is much better and would cover more scenarios

On Sep 4, 5:38 am, Ayende Rahien <aye...@ayende.com> wrote:
> Matt,
> We can't support this, we have no way of notifying the index that the
> associated reference have changed.
>
Reply all
Reply to author
Forward
0 new messages