MongoDB, Merging objects in array

105 views
Skip to first unread message

Primož Ratej

unread,
May 21, 2019, 5:22:15 AM5/21/19
to mongodb-user

I'm kinda new to the whole NoSQL euphoria and starting on a project with MongoDB as a database.

Goal: Print a list of lodging that has at least 3 grades, most of the grades (half or more) are higher than 6. Inside lodging.

Problem: I have an array of reviews. I would like to merge the objects into an array of grades and exclude the reviewer and date from each object.

Finishing up: Then I will push the objects that are $gt than 5 (so 6 and more) to isGtArray and wice wersa to isLtArray. I can then count if there is realy half the grades that are higher than 6 and project that in isMostOfTheGradesMoreThan_6 and with $match print them out. By now: The fist part of lodging that has at least 3 grades is solved. All thanks for the help.

Code by now:

db.lodging.aggregate([
{$unwind: "$lodging.reviews"},
{$group:{
    _id:"$guid",
    lodging: { $first: "$lodging"},
    reviews: { $push: "$lodging.reviews"}}
},
{$project: { 
    _id: 1,
    lodging: 1,
    reviews: 1,
    reviewsCount: { $size:"$reviews" }}},
{ $match : { reviewsCount: { $gt: 2 } } },
])

Json data

{ 
    "_id" : "5bf31f8c20b3acacc2194faa", 
    "registered" : "2016-04-22T08:54:50 -02:00", 
    "host" : {
        "name" : "Schultz", 
        "surname" : "French",  
        "contact" : {"email" : "schult...@orbean.com"}, }, 
    "lodging" : { 
        "reviews" : [
             {
                "reviewer" : "Mariana Ayers", 
                "date" : "2014-04-13T08:30:02 -02:00", 
                "cleanliness" : NumberInt(9), 
                "location" : NumberInt(6), 
                "food" : NumberInt(4)}, 
             {
                "reviewer" : "Cole Simmons", 
                "date" : "2015-05-07T06:00:09 -02:00", 
                "cleanliness" : NumberInt(1), 
                "location" : NumberInt(3), 
                "food" : NumberInt(2)}]}
}

Solution

{ 
    "_id" : "5bf31f8c20b3acacc2194faa", 
    "registered" : "2016-04-22T08:54:50 -02:00", 
    "host" : {
        "name" : "Schultz", 
        "surname" : "French",  
        "contact" : {"email" : "schult...@orbean.com"},}, 
    "lodging" : { 
        "reviews" : [
            {
                "reviewer" : "Mariana Ayers", 
                "date" : "2014-04-13T08:30:02 -02:00", 
                "cleanliness" : NumberInt(9), 
                "location" : NumberInt(6), 
                "food" : NumberInt(4)}, 
            {
                "reviewer" : "Cole Simmons", 
                "date" : "2015-05-07T06:00:09 -02:00", 
                "cleanliness" : NumberInt(1), 
                "location" : NumberInt(3), 
                "food" : NumberInt(2)}]
        "reviews_grades" : [
                {"cleanliness" : NumberInt(9)}, 
                {"location" : NumberInt(6)}, 
                {"food" : NumberInt(4)},
                {"cleanliness" : NumberInt(1)}, 
                {"location" : NumberInt(3)}, 
                {"food" : NumberInt(2)}}]}
}

Robert Cochran

unread,
May 21, 2019, 9:21:51 PM5/21/19
to mongodb-user
Hi:

I am playing with your code. Let me see if I understand clearly what you are providing here. 

1. You are showing your complete aggregation query.
2. Under "json data", you are providing a single sample document.
3. Under "solution", you are showing the result of your aggregation query on that one sample document.

Is the above correct?

If so, can you provide at least 2 more sample documents?

In your aggregation code, you are grouping on $guid, but that field is not present in the document you provide. MongoDB Enterprise version 4.0.9, the $group "_id" field is null as a result. 

Please note tat the dates you give in the sample documents are not true date values. They are text strings. If you want them to be true date values, you should enter them as ISODate() values. 

I will play with your data later on this week, but I suspect you can remove the reviewer names easily through projection.

You did a better job than most people do by stating your exact issue, providing sample code, and providing a sample document. Thank you for going to the effort of providing a very good inquiry backed by pretty good data. 2 more sample documents will be very helpful. 

Lastly, I am not an employee of MongoDB, Inc. I am just another list member trying to be helpful.

Thanks so much

Bob

Robert Cochran

unread,
May 21, 2019, 10:10:39 PM5/21/19
to mongodb-user
Hi,

I also meant to say that the "json data" you provided only shows 2 reviews, so of course by your aggregation logic, there will be no actual output: The $match is looking for a review count greater than 2. That is one of the reasons why I am asking for more sample documents. It is very helpful to provide others with a sufficient number of sample documents to produce the desired results.

It would also be very helpful if you list your MongoDB version and operating system name and version.

Thanks so much

Bob

Robert Cochran

unread,
May 24, 2019, 9:22:37 PM5/24/19
to mongodb-user
Hi,

First, I would like to suggest that you take a free MongoDB course from MongoDB University to learn how MongoDB works, and to learn the basics of aggregation queries. There are free courses available which show how to use MongoDB in a variety of programming languages.

I have found it necessary to change the schema of your collection documents, since they are missing an important field: the name of the establishment being rated. This is in addition to the name of the host. I according added such a field, calling it "establishment_name". I also changed the date fields, which are merely text strings, to true dates as produced by ISODate(). It is likely that the precise UTC time that a reviewer posted a review is not at all critical; the only important date component is the year, month and day of the review. So a UTC time of 00:00 (midnight) is fine for this purpose. I also gave some of the date fields more meaningful names.

So now we have a document such as this one to work with. Notice that I've purposely omitted the "_id" field. That is because MongoDB will automatically generate an _id for you when the document is inserted. Notice also that in the lodging.reviews array, there are 3 reviews.

{
"registered" : ISODate("2019-05-22T23:46:44.767Z"),
"establishment_name" : "Bel Air",
"host" : {
"name" : "Bob",
"surname" : "French-Smith",
"contact" : {
"email" : "b...@bfs.info"
}
},
"lodging" : {
"reviews" : [
{
"reviewer" : "Mariana Ayers",
"reviewed_date" : ISODate("2018-05-13T00:00:00Z"),
"cleanliness" : 9,
"location" : 7,
"food" : 6
},
{
"reviewer" : "Porter Tickseed",
"reviewed_date" : ISODate("2018-06-21T00:00:00Z"),
"cleanliness" : 9,
"location" : 2,
"food" : 1
},
{
"reviewer" : "Sophie Gilrecht",
"reviewed_date" : ISODate("2018-07-01T00:00:00Z"),
        "cleanliness" : 9,
"location" : 8,
"food" : 0
}
]
}
}

 
I edited your original document (with only 2 reviews) in the same manner, and added it to the collection of test documents to ensure that my aggregation code doesn't select this document. In general, it is good practice to test with only a small number of documents at first. The test collection has 2 documents. 

Here is the aggregation code I used:

db.better1.aggregate( [
{ "$addFields" : { "reviewsCount" : { "$size" : "$lodging.reviews" } } },
{ "$match" : { "reviewsCount" : { "$gt" : 2 } } },
{ "$project" : { "_id" : 0, "establishment_name" : 1, "lodging" : 1 } },
{ "$unwind" : "$lodging.reviews" },
{ "$group" : { "_id" : "$establishment_name", "reviews" : { "$sum" : 1 }, "avgClean" : { "$avg" : "$lodging.reviews.cleanliness" }, "avgLocation" : { "$avg" : "$lodging.reviews.location" }, "avgFood" : { "$avg" : "$lodging.reviews.food" } } }, 
{ "$sort" : { "avgClean" : -1 } }
] )


My first goal was to select only the documents that had more than 2 reviews. Accordingly, I used $addFields and $match as my first 2 aggregation stages. The way to test an aggregation pipeline is to start with the first stage of the pipeline, and test that. So I begin by testing my $addFields stage. This should add a new key/field to each document, named "reviewsCount", and the value of the reviewsCount key should be the number of reviews in the lodging.reviews array. After seeing that the output of the new key/field is correct, I then added the next stage, $match, and tested that. The new stage should only allow the document with 3 reviews to go on to the next pipeline stage. The document with 2 reviews is dropped from further processing. It is important to use a $match stage as close to the beginning of the pipeline as possible: that reduces the processing load on the following stages of the aggregation.

Next, I wanted to project out the fields that i need. I need just the establishment_name and lodging details, which includes the lodging.reviews array, and I get rid of everything else.

Next, I unwind the lodging.reviews array.

Next, I group on establishment_name. All I'm interested in here is to see the mathematical averages of all the cleanliness, location, and food scores awarded by each reviewer. Doing this gives me meaningful summary information about what all the reviewers think of this establishment. The reviewer names do not appear in the grouping.

Finally, I would like to sort the output so I can see the establishments rated highest for cleanliness first. But that could as easily be changed to sort on food or location. 

Thanks so much

Bob





On Tuesday, May 21, 2019 at 9:21:51 PM UTC-4, Robert Cochran wrote:

Robert Cochran

unread,
May 25, 2019, 8:23:14 PM5/25/19
to mongodb-user
Hi,

After posting the above aggregation query solution, I realized I was silly to suggest a $sort stage as the last stage in the query. You simply won't need it. You can remove the $sort stage from the above query. 

In fact, if all you want to do is show reviews for a lodging establishment without the reviewer names appearing, you can easily do that with a db.collection.find query like this one:

db.better1.find( { "lodging.reviews.2" : { "$exists" : true } }, { "establishment_name" : 1, "lodging.reviews.cleanliness" : 1, "lodging.reviews.location" : 1, "lodging.reviews.food" : 1, "_id" : 0 } ).pretty()


The above query checks for a lodging.reviews array with at least 3 existing elements, and it pretty prints only the desired fields: the establishment name and the scores for cleanliness, location, and food by each reviewer. The "_id" field must be explicitly excluded from output and should be the last field in the projection.

Thanks so much

Bob  
Reply all
Reply to author
Forward
0 new messages