Better support for embedded Arrays and better documentation on it.

245 views
Skip to first unread message

Thomas Burkhart

unread,
Jul 4, 2016, 10:48:31 AM7/4/16
to mongodb-csharp
Hi,

so far I'm using two collections in my project with one referencing objects in the other by using a foreign key. I really would like to combine them into one Collection but it looks like that this is with the current driver really cumbersome.
I joined the mongo university class hoping that it would shed some light on this topic, but was greatly disappointed.

on this but did not get satisfying answers yet although one user tried hard to help me but also his solution does not completely worked at the end. So it would be really helpful if you could help me sort out this issue. I really wonder how other users deal with this.


Also I have some doubt, if it's really a good idea to embedd arrays if I have a look n how much is needed to get the desired result. 

Looking at the explanation of the WiredTiger database engine I get even more doubts if embedding an array makes sense if you plan to add elements repeatedly as WiredTiger does not provision extra space for documents but instead creates a new copy if an document size increases.

I really find is confusing and in a way contradictory to recommend on the one side to embed data but on the other side offer so view support to access embedded arrays. 

In my case I have a daily growing array meaning after 10 years I will have 3650 elements in the array. Does the server always load the whole document into memory when adding an element?

mono promises to be the solution for todays Applications, but I feel that 1 to many relations are not really well supported especially as there are no transactions that involve more than one document.

Any help with this topics would be really great because I don't know if it makes sense to further try getting my data into one collection or just stick to the design with two collections. Beyond my app it might be helpful to other developers to document this topic.

Thanks
Thomas   


 

Craig Wilson

unread,
Jul 11, 2016, 11:07:27 PM7/11/16
to mongodb-csharp
Hi Thomas,

It seems like your stackoverflow post had a lot of good answers in there regarding how to do things. I'm sorry you are having some trouble getting exactly what you want. I'll try to answer some of your meta questions, but won't get into specifics as none were asked here. Ultimately, you are asking about schema design. And schema design isn't straightforward in MongoDB. The answer, "it depends" comes up a lot because there are many ways to model things and each one has inherent strengths and weaknesses. If you are fighting with a model one way, then perhaps that's not the best way to handle the problem. The MongoDB courses discuss schema design a lot and they do it well. There are also many resources on the internet related to it. It seems like you are getting bogged down in the minutia of implementing something in the .NET driver without having come up with a good schema design yet, and so you are fighting on two fronts (how to model and how to implement). I'd suggest you step back and evaluate your options for how to design your schema, taking into account things you mentioned like "my array will be growing daily." That's important and will likely inform your decision. Anything that can be done in MongoDB can be done in the .NET driver,.

Regarding your stackoverflow post, it seems like the answers provided by professor79 are accurate. What more do you need?

I know this probably isn't the answer you are wanting, but without getting specific, it'll be hard to help. At some point, the advice we give is the same as has been given many other places that can be read. 
Craig

Thomas Burkhart

unread,
Jul 12, 2016, 5:53:51 AM7/12/16
to mongodb-csharp
Hi Craig,
 
you are completely right that this isn't the answer I was looking for:

1. The answer that the professor79 gave me are great besides it does not solve my last question how to query a single element in an embedded array. A most basic operation in my opinion. It's quite easy to add/delete/update elements to a embedded array but not to query them again.

2. You did not answer my question related to the Wired Tiger engine

3. The solution that professor79 came up on SO are in no way what I would expect to have embedded arrays supported by c# drivers. I actually have to use native mongodb query syntax to achieve most basic CRUD operations

4. It seems an overkill that I have to use unwind and agregate just to access single elements of an embedded array. Is this performant at all?

5. "It depends" is exactly not helpful. In http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb mongo is even recommended to store time series data, which would mean that doducments are extended contiously. In the university course there is a lot about design considurations and they strongly recommend using nested deisgns but not much about implmenting access to embedded array in c#

6. I would like to have all data of one user in one document so I have a clear idea how my design should be, but does it make sense if I add a record per day? And even if yes does it make sense if it's so complicated to access the nested data from c#?

I ensure you I spend hours search the web for information on this topics but did not find satisfying answers because most posts use the old drivers and not V2

Best
Thomas  
 
 

Craig Wilson

unread,
Jul 12, 2016, 9:48:40 AM7/12/16
to mongodb-csharp
1. The answer that the professor79 gave me are great besides it does not solve my last question how to query a single element in an embedded array. A most basic operation in my opinion. It's quite easy to add/delete/update elements to a embedded array but not to query them again.

You use $elemMatch in your projection. https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/. However, this doesn't work in the aggregation framework, which is why you use the $unwind stage.

2. You did not answer my question related to the Wired Tiger engine

I believe this is a relevant link: https://groups.google.com/forum/#!topic/mongodb-user/nV-DWbgB2oM. Long and short of it is that wired tiger does better with embedded arrays than mmapv1.

3. The solution that professor79 came up on SO are in no way what I would expect to have embedded arrays supported by c# drivers. I actually have to use native mongodb query syntax to achieve most basic CRUD operations

I don't understand your question. We've done our best to create a type-safe API to allow you to do almost everything that is possible with native syntax. Perhaps an example of something you can't do which is requiring you to fall back would be helpful.
..
4. It seems an overkill that I have to use unwind and agregate just to access single elements of an embedded array. Is this performant at all?

It depends on what you are doing. The aggregation framework is always making performance improvements. How performant it is can only be determined by you and your requirements. A great many users use unwind.

5. "It depends" is exactly not helpful. In http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb mongo is even recommended to store time series data, which would mean that doducments are extended contiously. In the university course there is a lot about design considurations and they strongly recommend using nested deisgns but not much about implmenting access to embedded array in c#

I don't see a question here. There are many ways of storing time-series data. In the article linked, documents are not stored continuously. Each document could represent 1 minute, 1 hour, 1 day, etc... Then, it holds subdivided data at a different increment. Once the next top-level increment happens (new minute, new hour, new day), a new document is created. So while there is a continuous update for a period of time, that will stop at some point.

6. I would like to have all data of one user in one document so I have a clear idea how my design should be, but does it make sense if I add a record per day? And even if yes does it make sense if it's so complicated to access the nested data from c#?

There are many ways to think about this. I understand that your user is your aggregate root and it would be nice to have all the data of that aggregate root in a single document. If the design feels right, then go for  it. If it doesn't, then perhaps your aggregate root is wrong. However, it's not any more complicated to access nested data in C# than it is in other languages or the shell. It seems your complaint lies not with the driver, but with the general syntax. If I'm hearing this incorrectly, please point out places where we can improve. We are always happy to receive constructive feedback.

7. I ensure you I spend hours search the web for information on this topics but did not find satisfying answers because most posts use the old drivers and not V2

This really shouldn't be a problem. I'm sorry there are less resources available. It's a new API and people haven't written about it as much. Here is the link to the documentation (which can always be improved): http://mongodb.github.io/mongo-csharp-driver/. However, the basic principles remain the same in whichever API your are using. Understand what you are trying to do in native MongoDB syntax, then translate that into C#.

Craig

Thomas Burkhart

unread,
Jul 12, 2016, 11:07:13 AM7/12/16
to mongodb-csharp
Thanks for the explanation let me elaborate


1. The answer that the professor79 gave me are great besides it does not solve my last question how to query a single element in an embedded array. A most basic operation in my opinion. It's quite easy to add/delete/update elements to a embedded array but not to query them again.

You use $elemMatch in your projection. https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/. However, this doesn't work in the aggregation framework, which is why you use the $unwind stage.

        var pojectionIsAlive =
           
BsonDocument.Parse(
               
"{_id:1, name:1, children:{$filter:{ input:'$children', as:'kids', cond:{$eq:['$$kids.IsAlive', true]}}}}");

       
var kids = collection.Aggregate().Match(x => x.Id == f.Id).Project<Family>(pojectionIsAlive).ToList();


Still does not return an array of kids. What do I need to change here?

 
2. You did not answer my question related to the Wired Tiger engine

I believe this is a relevant link: https://groups.google.com/forum/#!topic/mongodb-user/nV-DWbgB2oM. Long and short of it is that wired tiger does better with embedded arrays than mmapv1.

In the online course it was said thate WiredTiger does not reserve space for extending documents but that if document are extended it will copy the whole document to a new location. Therefore I was wondering if it then is a good choice if I will add array elements.

 
3. The solution that professor79 came up on SO are in no way what I would expect to have embedded arrays supported by c# drivers. I actually have to use native mongodb query syntax to achieve most basic CRUD operations

I don't understand your question. We've done our best to create a type-safe API to allow you to do almost everything that is possible with native syntax. Perhaps an example of something you can't do which is requiring you to fall back would be helpful.

Just compare:

    //Add child
    families
.UpdateOne(Builders<Family>.Filter.Where(x=>x.name=="Burkhart"), Builders<Family>.Update.AddToSet("children",
       
new Child() {dateOfBirth = new DateTime(2005, 4, 26), givenName = "Finn"}));

   
// Add another
    families
.UpdateOne(Builders<Family>.Filter.Where(x => x.name == "Burkhart"), Builders<Family>.Update.AddToSet("children",
       
new Child() { dateOfBirth = new DateTime(2007, 4, 26), givenName = "Florentina" }));

   
//remove one
    families
.UpdateOne(Builders<Family>.Filter.Where(x => x.name == "Burkhart"),
       
Builders<Family>.Update.PullFilter(c => c.children, m => m.givenName == "Florentina"));

   
//update one
    families
.UpdateOne(Builders<Family>.Filter.Where(x => x.name == "Burkhart" && x.children.Any(c => c.givenName =="Finn")),
                       
Builders<Family>.Update.Set(x=> x.children[-1].givenName,"Finn Linus"));




which is pretty straight forward using the API because it is suported by the builders to

        var sort = BsonDocument.Parse("{\"kids.dateOfBirth\": -1}"); // get the youngest 
       
var project =
           
BsonDocument.Parse("{_id:'$children._id', dateOfBirth:'$children.dateOfBirth', givenName:'$children.givenName', IsAlive:'$children.IsAlive'}");
       
var aggregate = collection.Aggregate().Match(x => x.Id == f.Id)
           
.Unwind("children").Sort(sort).Limit(1).Project<Child>(project);

       
var result = aggregate.FirstOrDefault();


To access the youngest child. I would expect to have an easier way to access and filter elements of embedded arrays

 
6. I would like to have all data of one user in one document so I have a clear idea how my design should be, but does it make sense if I add a record per day? And even if yes does it make sense if it's so complicated to access the nested data from c#?

There are many ways to think about this. I understand that your user is your aggregate root and it would be nice to have all the data of that aggregate root in a single document. If the design feels right, then go for  it. If it doesn't, then perhaps your aggregate root is wrong. However, it's not any more complicated to access nested data in C# than it is in other languages or the shell. It seems your complaint lies not with the driver, but with the general syntax. If I'm hearing this incorrectly, please point out places where we can improve. We are always happy to receive constructive feedback.

Yes you got my intention. Ok it might then be a problem with the general syntax, but it would be great to if the c# driver would make it easier to work with arrays.
 
7. I ensure you I spend hours search the web for information on this topics but did not find satisfying answers because most posts use the old drivers and not V2

This really shouldn't be a problem. I'm sorry there are less resources available. It's a new API and people haven't written about it as much. Here is the link to the documentation (which can always be improved): http://mongodb.github.io/mongo-csharp-driver/. However, the basic principles remain the same in whichever API your are using. Understand what you are trying to do in native MongoDB syntax, then translate that into C#.

Yes, I've read your documentation at the link your provided. Unfortunately I did not find anything about the usage of arrays. Would be great if this could be added.

I really don't want to be a pain in your neck, I'm just trying to understand. As said before I have my app running now with seperate collections, but it would be nice if I can change that.

Best
Thomas

Craig Wilson

unread,
Jul 15, 2016, 9:11:47 AM7/15/16
to mongodb-csharp
Answers/Questions inline. 


On Tuesday, July 12, 2016 at 10:07:13 AM UTC-5, Thomas Burkhart wrote:
Thanks for the explanation let me elaborate

1. The answer that the professor79 gave me are great besides it does not solve my last question how to query a single element in an embedded array. A most basic operation in my opinion. It's quite easy to add/delete/update elements to a embedded array but not to query them again.

You use $elemMatch in your projection. https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/. However, this doesn't work in the aggregation framework, which is why you use the $unwind stage.

        var pojectionIsAlive =
           
BsonDocument.Parse(
               
"{_id:1, name:1, children:{$filter:{ input:'$children', as:'kids', cond:{$eq:['$$kids.IsAlive', true]}}}}");

       
var kids = collection.Aggregate().Match(x => x.Id == f.Id).Project<Family>(pojectionIsAlive).ToList();


Still does not return an array of kids. What do I need to change here?

I'm not sure what you are wanting out of this. The first file in this gist uses your code, as well as two other methods of generating the same thing without falling back to parsing json: https://gist.github.com/craiggwilson/60cbabbb6755617fe9cda7dfdd2611ff



 
2. You did not answer my question related to the Wired Tiger engine

I believe this is a relevant link: https://groups.google.com/forum/#!topic/mongodb-user/nV-DWbgB2oM. Long and short of it is that wired tiger does better with embedded arrays than mmapv1.

In the online course it was said thate WiredTiger does not reserve space for extending documents but that if document are extended it will copy the whole document to a new location. Therefore I was wondering if it then is a good choice if I will add array elements.


I'm sorry I didn't answer this satisfactorily. This isn't my area of expertise. Asking on the mongodb-user list would get better responses: https://groups.google.com/forum/?pli=1#!forum/mongodb-user. Perhaps focusing your question there on just Wired Tiger  will get some good responses.
 
 
3. The solution that professor79 came up on SO are in no way what I would expect to have embedded arrays supported by c# drivers. I actually have to use native mongodb query syntax to achieve most basic CRUD operations

I don't understand your question. We've done our best to create a type-safe API to allow you to do almost everything that is possible with native syntax. Perhaps an example of something you can't do which is requiring you to fall back would be helpful.

Just compare:

    //Add child
    families
.UpdateOne(Builders<Family>.Filter.Where(x=>x.name=="Burkhart"), Builders<Family>.Update.AddToSet("children",
       
new Child() {dateOfBirth = new DateTime(2005, 4, 26), givenName = "Finn"}));

   
// Add another
    families
.UpdateOne(Builders<Family>.Filter.Where(x => x.name == "Burkhart"), Builders<Family>.Update.AddToSet("children",
       
new Child() { dateOfBirth = new DateTime(2007, 4, 26), givenName = "Florentina" }));

   
//remove one
    families
.UpdateOne(Builders<Family>.Filter.Where(x => x.name == "Burkhart"),
       
Builders<Family>.Update.PullFilter(c => c.children, m => m.givenName == "Florentina"));

   
//update one
    families
.UpdateOne(Builders<Family>.Filter.Where(x => x.name == "Burkhart" && x.children.Any(c => c.givenName =="Finn")),
                       
Builders<Family>.Update.Set(x=> x.children[-1].givenName,"Finn Linus"));




which is pretty straight forward using the API because it is suported by the builders to

        var sort = BsonDocument.Parse("{\"kids.dateOfBirth\": -1}"); // get the youngest 
       
var project =
           
BsonDocument.Parse("{_id:'$children._id', dateOfBirth:'$children.dateOfBirth', givenName:'$children.givenName', IsAlive:'$children.IsAlive'}");
       
var aggregate = collection.Aggregate().Match(x => x.Id == f.Id)
           
.Unwind("children").Sort(sort).Limit(1).Project<Child>(project);

       
var result = aggregate.FirstOrDefault();


To access the youngest child. I would expect to have an easier way to access and filter elements of embedded arrays


You are correct. In a type safe API, it is very difficult to come up with a good API for how to work with an IEnumerable<T> as simply a T for the purposes of an API. We have a ticket open for this, but I'm struggling to find it. However, what you have here can already be done using LINQ or the aggregation API. The second file in this gist shows alternative ways of handling this as well as showing that yours also works.
 
 
6. I would like to have all data of one user in one document so I have a clear idea how my design should be, but does it make sense if I add a record per day? And even if yes does it make sense if it's so complicated to access the nested data from c#?

There are many ways to think about this. I understand that your user is your aggregate root and it would be nice to have all the data of that aggregate root in a single document. If the design feels right, then go for  it. If it doesn't, then perhaps your aggregate root is wrong. However, it's not any more complicated to access nested data in C# than it is in other languages or the shell. It seems your complaint lies not with the driver, but with the general syntax. If I'm hearing this incorrectly, please point out places where we can improve. We are always happy to receive constructive feedback.

Yes you got my intention. Ok it might then be a problem with the general syntax, but it would be great to if the c# driver would make it easier to work with arrays.

We'll have to put some thought into it. Hopefully, given what you have found out coupled with the gist I put up, I personally don't feel it's that difficult. Of course, I'm biased and, having written some of the driver, have some inside knowledge.
 
 
7. I ensure you I spend hours search the web for information on this topics but did not find satisfying answers because most posts use the old drivers and not V2

This really shouldn't be a problem. I'm sorry there are less resources available. It's a new API and people haven't written about it as much. Here is the link to the documentation (which can always be improved): http://mongodb.github.io/mongo-csharp-driver/. However, the basic principles remain the same in whichever API your are using. Understand what you are trying to do in native MongoDB syntax, then translate that into C#.

Yes, I've read your documentation at the link your provided. Unfortunately I did not find anything about the usage of arrays. Would be great if this could be added.

We will work to update our documentation with more examples of handling embedded arrays. I've filed this ticket: https://jira.mongodb.org/browse/CSHARP-1711
 

I really don't want to be a pain in your neck, I'm just trying to understand. As said before I have my app running now with seperate collections, but it would be nice if I can change that.

Best
Thomas


Totally understand. We are happy to help. 
Craig 

Thomas Burkhart

unread,
Jul 15, 2016, 9:42:59 AM7/15/16
to mongodb-csharp
Hi Craig,

thanks for your reply. Yes your second gist file looks much nicer. The other one was the solution the professor came up with.

But when looking at

            var aggQuery = families.Aggregate()
               .Match(x => x.Id == 2)
               .Unwind<Family, UnwoundFamily>(x => x.Children)
               .Limit(1)
               .Project(x => new { x.Child });
           //.Project(x => x.Child) wanted to do this, but seems like we have a bug

            Console.WriteLine(aggQuery);
           Console.WriteLine();

            foreach (var result in aggQuery.ToEnumerable())
           {
               Console.WriteLine(result);
           }

            var linqQuery = families.AsQueryable()
               .Where(x => x.Id == 2)
               .SelectMany(x => x.Children)
               .Take(1);

            Console.WriteLine(linqQuery);
           Console.WriteLine();

            foreach (var result in linqQuery)
           {
               Console.WriteLine(result);
           }



How do you in the first version get the youngest child? Limit does only limit the result doesn't it?

And in the second you just take one child, but not based on any criteria like it's name e.g.

Craig Wilson

unread,
Jul 15, 2016, 10:12:04 AM7/15/16
to mongodb-csharp
Oh, your sample had that as a sort, but didn't actually use it. I've updated the gist. Once the children are unwound, you just sort by the DateOfBirth in descending order.

.SelectMan(x => x.Children)
.OrderByDescending(x => x.DateOfBirth)
.Take(1);


Another option is to always keep the array sorted when you push a new child onto the array. You can supply a sort function when you use $push (https://docs.mongodb.com/manual/reference/operator/update/sort/#up._S_sort) such that the youngest child is always the first element of the array.

Let me know if you have other questions,
Craig

Thomas Burkhart

unread,
Jul 15, 2016, 10:46:25 AM7/15/16
to mongodb-csharp
Yes, I was almost guessing that. You added the Sort in both versions, the one with Aggregate and the one with SelectMany.

What exactly is the difference now? Does SelectMany tranfers the whole List of Children to the client and does there the Order and Take? 


And who would I just acess one child based e.g. on it's name?

Craig Wilson

unread,
Jul 15, 2016, 11:49:11 AM7/15/16
to mongodb-csharp
They do exactly the same thing. SelectMany is a LINQ operator, which actually translates into both an $unwind and a $project, so technically, the LINQ version is doing a little more work. You can compare the console output to see the actual pipelines getting generated and see the  difference.

You would access just one child based on his name by doing the same thing you did to access all the children who were alive. It can be done a couple of different ways, either similar to the first file in the gist or the second, depending on what other data you are wanting. In LINQ from the second file, this would like the below:

var linqQuery = families.AsQueryable()
.Where(x => x.Id == 2)
.SelectMany(x => x.Children)
.Where(x => x.Name == "Jane");

Craig Wilson

unread,
Jul 15, 2016, 12:02:53 PM7/15/16
to mongodb-csharp
Oh, and no. Neither one does any of the work client-side. It's all done in the server.  If we can't do all the work in the server, we'll throw an exception.

Thomas Burkhart

unread,
Jul 15, 2016, 12:04:06 PM7/15/16
to mongodb-csharp
Thanks a lot!

And that would executed on the server side?

Craig Wilson

unread,
Jul 15, 2016, 12:06:51 PM7/15/16
to mongodb-csharp
Yep, everything is executed server side.

Thomas Burkhart

unread,
Jul 15, 2016, 12:24:07 PM7/15/16
to mongodb-csharp
Sounds great! So Linq is the way to go here.

Thanks a lot!
Reply all
Reply to author
Forward
0 new messages