Index for a user's replies

107 views
Skip to first unread message

Mircea Chirea

unread,
Aug 19, 2012, 6:29:14 PM8/19/12
to rav...@googlegroups.com
I need a way to load a user's replies, including information about the post they're on. In the SQL world I'd have several joins which would bring in all the information, but here I'm a stumped. I am not sure how to do the query without causing a bazillion of queries to load the posts, which would be a significant amount if the user posts a stupid comment. I suspect a Map/Reduce (probably multi-map) index would need to be created to process the replies and simply query them by user ID. This is the basic idea http://codepaste.net/mp8x1p

I also need to get a count of how many unread replies there are for a particular user. That is a much simpler query: http://codepaste.net/m8fxp9 (and just do a Count() of the results).

Each comment doesn't store its replies directly, it stores a Ref class (which at the moment only has the Id property), as I need every comment to be addressed separately by an ID. The same applies for posts (cmp.First is a reference class).



Note: I have no idea if those queries would even run on RavenDB; I suspect they will, albeit not terribly efficient, I made them up to get the idea across.

Mircea Chirea

unread,
Aug 19, 2012, 6:44:47 PM8/19/12
to rav...@googlegroups.com
For reference, if I were to do this without an index, I would do this:  http://codepaste.net/xfcesg

Oren Eini (Ayende Rahien)

unread,
Aug 20, 2012, 8:58:56 AM8/20/12
to rav...@googlegroups.com
There is no relation between having a unique id for a reply and having a separate document.
How many replies do you expect to have?
Why are you splitting them apart like this?

And is there a reason you can't just query for the reply based on the user id?

Mircea Chirea

unread,
Aug 20, 2012, 9:03:19 AM8/20/12
to rav...@googlegroups.com
Well how do I get an unique ID while having a separate document? I kinda need a way to load all comments, regardless of reply or not, and sort them on various properties, as well as search within them.
I expect to have an at most 50 replies to any single comment.
I am splitting them apart because I need to address comments independently.

I cannot query replies directly based on user ID, because I need the replies too all a user's comments. So I have users/5, he has five comments, each with a few replies. I need all of those replies.

Oren Eini (Ayende Rahien)

unread,
Aug 20, 2012, 9:08:11 AM8/20/12
to rav...@googlegroups.com
Well, look at how we do it for RaccoonBlog, we have a numeric id that is unique within the scope of a single document.
So the comment have an id that is valid within the doc, and the full "comment id" include the doc id as well.

What are your aggregates, and what are the ops you need to support?

Mircea Chirea

unread,
Aug 20, 2012, 10:37:43 AM8/20/12
to rav...@googlegroups.com
I kinda need all comments to have unique ids, whether they're a reply to another or not. All of them. The JS code and default page sorting depends on this and it would be a lot of work to rewrite those.

My aggregates are User, Comment, Comparison. Each comparison has two topics; each topic within a comparison has comments; A user has several comments and needs to be notified of new replies; I want to show the user on where the reply was posted, so a join with Comparison (the topic's name is denormalized).

These are the relevant model classes:  http://codepaste.net/d6bafj 

So I need to get all comments which are a reply to user/1234`s comments (ReplyTo is not null), including the comparison they're posted on (ids and names of both topics), as well as the information of the user who wrote the reply (this is denormalized in the User.Ref class, so not an issue).

Oren Eini (Ayende Rahien)

unread,
Aug 20, 2012, 3:49:00 PM8/20/12
to rav...@googlegroups.com
inline

On Mon, Aug 20, 2012 at 5:37 PM, Mircea Chirea <chirea...@gmail.com> wrote:
I kinda need all comments to have unique ids, whether they're a reply to another or not. All of them. The JS code and default page sorting depends on this and it would be a lot of work to rewrite those.


As I said, you don't need to have a separate doc for each reply to give them unique ids.
 
My aggregates are User, Comment, Comparison. Each comparison has two topics; each topic within a comparison has comments; A user has several comments and needs to be notified of new replies; I want to show the user on where the reply was posted, so a join with Comparison (the topic's name is denormalized).
 

These are the relevant model classes:  http://codepaste.net/d6bafj 

So I need to get all comments which are a reply to user/1234`s comments (ReplyTo is not null), including the comparison they're posted on (ids and names of both topics), as well as the information of the user who wrote the reply (this is denormalized in the User.Ref class, so not an issue).


Model comments like this:

public class Comment
{
   
    public Comment[] Replies;   
}

Otherwise, you are going to have to do two queries to do this, and you probably don't want to go there.
 

Kijana Woodard

unread,
Aug 20, 2012, 4:18:19 PM8/20/12
to rav...@googlegroups.com
I took a peek into RaccoonBlog:

How is Raccoon avoiding race conditions on comments.GenerateNewCommentId()?
Are BackgroundTasks processed single threaded?

Mircea Chirea

unread,
Aug 20, 2012, 5:11:01 PM8/20/12
to rav...@googlegroups.com

On Mon, Aug 20, 2012 at 5:37 PM, Mircea Chirea <chirea...@gmail.com> wrote:
I kinda need all comments to have unique ids, whether they're a reply to another or not. All of them. The JS code and default page sorting depends on this and it would be a lot of work to rewrite those.


As I said, you don't need to have a separate doc for each reply to give them unique ids.

I can't find exactly how to have a globally unique id for each reply; globally unique as no other document has the same ID, so not unique only with one comment. That was I can easily load all of them, reply or not and make sure the rest of the code is oblivious of their status. Unless of course you have a better idea; since I'm rewriting almost all database access code I can rewrite it in any way.
 
My aggregates are User, Comment, Comparison. Each comparison has two topics; each topic within a comparison has comments; A user has several comments and needs to be notified of new replies; I want to show the user on where the reply was posted, so a join with Comparison (the topic's name is denormalized).
 

These are the relevant model classes:  http://codepaste.net/d6bafj 

So I need to get all comments which are a reply to user/1234`s comments (ReplyTo is not null), including the comparison they're posted on (ids and names of both topics), as well as the information of the user who wrote the reply (this is denormalized in the User.Ref class, so not an issue).


Model comments like this:

public class Comment
{
   
    public Comment[] Replies;   
}

Otherwise, you are going to have to do two queries to do this, and you probably don't want to go there.

Right. That wouldn't be a problem assuming the ID issue has an easy fix.
 
The thing is, how do I query all comments? I wan thinking of this index, but `Hierarchy` is not found:

Map = comments => from c in comments
                  let all = Hierarchy(c, "Replies").Union(new[] { c })
                  from x in all
                  select x;

Index(c => c.Text, FieldIndexing.Analyzed);
Index(c => c.Poster.Email, FieldIndexing.NotAnalyzed);
Index(c => c.Poster.DisplayName, FieldIndexing.NotAnalyzed);

Mircea Chirea

unread,
Aug 20, 2012, 5:19:12 PM8/20/12
to rav...@googlegroups.com
Also, to query replies, I was thinking of something like this, if it makes sense: http://codepaste.net/zqotnm

Oren Eini (Ayende Rahien)

unread,
Aug 20, 2012, 6:18:07 PM8/20/12
to rav...@googlegroups.com
This is guaranteed to always be serialized, because the change is in the scoped of a single document.

Oren Eini (Ayende Rahien)

unread,
Aug 20, 2012, 6:19:11 PM8/20/12
to rav...@googlegroups.com
inline

On Tue, Aug 21, 2012 at 12:11 AM, Mircea Chirea <chirea...@gmail.com> wrote:

On Mon, Aug 20, 2012 at 5:37 PM, Mircea Chirea <chirea...@gmail.com> wrote:
I kinda need all comments to have unique ids, whether they're a reply to another or not. All of them. The JS code and default page sorting depends on this and it would be a lot of work to rewrite those.


As I said, you don't need to have a separate doc for each reply to give them unique ids.

I can't find exactly how to have a globally unique id for each reply; globally unique as no other document has the same ID, so not unique only with one comment. That was I can easily load all of them, reply or not and make sure the rest of the code is oblivious of their status. Unless of course you have a better idea; since I'm rewriting almost all database access code I can rewrite it in any way.

a) GUID
b) Use a numeric unique to the scope of a document, append it the doc id, done.

 
 
My aggregates are User, Comment, Comparison. Each comparison has two topics; each topic within a comparison has comments; A user has several comments and needs to be notified of new replies; I want to show the user on where the reply was posted, so a join with Comparison (the topic's name is denormalized).
 

These are the relevant model classes:  http://codepaste.net/d6bafj 

So I need to get all comments which are a reply to user/1234`s comments (ReplyTo is not null), including the comparison they're posted on (ids and names of both topics), as well as the information of the user who wrote the reply (this is denormalized in the User.Ref class, so not an issue).


Model comments like this:

public class Comment
{
   
    public Comment[] Replies;   
}

Otherwise, you are going to have to do two queries to do this, and you probably don't want to go there.

Right. That wouldn't be a problem assuming the ID issue has an easy fix.
 
The thing is, how do I query all comments? I wan thinking of this index, but `Hierarchy` is not found:

Map = comments => from c in comments
                  let all = Hierarchy(c, "Replies").Union(new[] { c })
                  from x in all
                  select x;

Recurse(comment, x=>x.Replies)
 

Oren Eini (Ayende Rahien)

unread,
Aug 20, 2012, 6:19:41 PM8/20/12
to rav...@googlegroups.com
Way to complex, look at Recurse, instead.

Mircea Chirea

unread,
Aug 20, 2012, 7:16:04 PM8/20/12
to rav...@googlegroups.com
Alright, so that becomes:

from c in comments
from all in Recurse(c, x => x.Replies)
from x in all
select x;

Would Recurse also include the source? In that query meaning the `c` in `from c in comments`?
Or do I need to add it to the Recurse result? I couldn't find documentation for the function.

Also, what about the Comments_Replies index? ( http://codepaste.net/zqotnm ) Is that alright?

Oren Eini (Ayende Rahien)

unread,
Aug 21, 2012, 3:15:53 AM8/21/12
to rav...@googlegroups.com
Yes, that is good.
But I am not sure about the index, it seems to imply that a Comparison contains all comments and replies in a single doc.
Just to verify, we expect there to be a small (< 1,000) number of those, right?

Mircea Chirea

unread,
Aug 21, 2012, 4:10:30 AM8/21/12
to rav...@googlegroups.com
Yes, a comparison contains all comments inside a single doc. Most won't have more than 100 comments, some will have quite a bit but still < 1000, I expect only a handful to have lots of comments but I don't think a few big docs would be a problem.

Kijana Woodard

unread,
Sep 5, 2012, 11:17:45 PM9/5/12
to rav...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages