Bug on Map/Reduce over "inner" entities of a document? (build 910 and 888)

34 views
Skip to first unread message

Bruno Lopes

unread,
May 6, 2012, 11:02:14 PM5/6/12
to rav...@googlegroups.com
Hello all,

I think I hit a bit of non-intuitive behaviour for ravendb and I'd like some help in figuring it out.
When I store a document and create an index with counts for an given inner property of that document, the counts add up whenever that document is changed, instead of calculating properly.

The model is below at [1], and the index is at [2]. Project instances are entirely saved as an aggregate root, and I index over tasks per user to get a count.

I understand that it might make sense to store each entity separately, but I was building this as a sample for a presentation to guide some discussion around different ways of modelling data and it might make sense on some domains. Also, my impression is that this would also cause issues with models like tag-counts...

I've created a console application using an in-memory store that can serve as a test with build 910 (and I've caught this also on 888). It's up at https://gist.github.com/2625592 .

I've also tried de-normalizing the owner into ownerid and ownername in case that was the issue, bit the error remains.

My feeling is that it's a bug, but I'm not positive.

Ideas?


[1]-
   public class Project
    {
        public string Id { get; set; }
        public string Name { get; set; }
        public IList<Activity> Activities { get; set; }

        public Project()
        {
            Activities = new List<Activity>();
        }
    }

    public class Activity
    {
        public string Name { get; set; }
        public Person Owner { get; set; }
        public IList<Task> Tasks { get; set; }

        public Activity()
        {
            Tasks = new List<Task>();
        }
    }

    public class Task
    {
        public string Name { get; set; }
        public Person Owner { get; set; }
        public bool Done { get; set; }
    }

    public class Person
    {
        public string Id { get; set; }
        public string Name { get; set; }
    }

[2]-
   public class TasksCount_ForPerson : AbstractIndexCreationTask<Project, TasksCount_ForPerson.Result>
    {
        public class Result
        {
            public Person Owner { get; set; }
            public int Count { get; set; }
        }

        public TasksCount_ForPerson()
        {
            Map = projects =>
                projects
                .SelectMany(p => p.Activities)
                .SelectMany(a => a.Tasks)
                    .Select(task =>
                            new
                            {
                                task.Owner,
                                Count = 1
                            }
                    );

            Reduce =
                results => results
                    .GroupBy(g => g.Owner.Id)
                    .Select(p => new
                    {
                        Owner = p.Select(r => r.Owner).First(),
                        Count = p.Sum(result => result.Count)
                    });
        }
    }


Oren Eini (Ayende Rahien)

unread,
May 7, 2012, 1:37:29 PM5/7/12
to rav...@googlegroups.com
Okay, I found the issue.
It is a bug in the way we are handling getting the document id in complex scenarios.

The problem is here:

Map = projects =>
projects
.SelectMany(p => p.Activities)
.SelectMany(a => a.Tasks)
.Select(task =>
new
{
OwnerId = task.Owner.Id,
Count = 1
}
);


Which we re-write to be:

Map = projects =>
projects
.SelectMany(p => p.Activities)
.SelectMany(a => a.Tasks)
.Select(task =>
new
{
__document_id = task.__document_id,
OwnerId = task.Owner.Id,
Count = 1
}
);

Obviously, task doesn't have this property. 

The bug is that we didn't properly detected an error on this issue.

You can fix the index by using:

Map = projects => from project in projects
                  from task in project.Activities.SelectMany(a => a.Tasks)
                  select new
                  {
                  OwnerId = task.Owner.Id,
                  Count = 1
                  };

Bruno Lopes

unread,
May 7, 2012, 4:47:32 PM5/7/12
to rav...@googlegroups.com
Okay, thanks, that fixed it.

Out of curiosity, was I supposed to be able to add Sort(t => t.Owner.Name, SortOptions.String) to the index and then to do .OrderBy(r => r.Owner.Name) on a linq query?

Attempting it fails with  "Additional information: The field 'Owner_Name' is not indexed, cannot sort on fields that are not indexed", which makes sense, but I was hoping the linq statement would end up being translated sucessfully (Owner.Name being Owner_Name)


Oren Eini (Ayende Rahien)

unread,
May 7, 2012, 4:54:08 PM5/7/12
to rav...@googlegroups.com
No, that won't work with the output you had.
Owner would get stored as a single item, not as a object that can be sorted upon.
Reply all
Reply to author
Forward
0 new messages