Aggregation on multiple array(parent fields) elements

250 views
Skip to first unread message

SrcKode

unread,
Sep 9, 2018, 2:31:13 PM9/9/18
to mongodb-user
Hi ,

This is a sample of what I am trying to do in one of my projects ; the data is dummy and is indicative only

I am trying to store multiple parents in a single document based on the Array of Ancestor pattern .
The document below stores the information of a student and his marks in  a subject called 'R'.The collection will hold data of all students for all subjects . 
studGroup  is the HIERARCHY of the GROUP the STUDENT belongs to .
subjectGroup is the HIERARCHY of the SUBJECT
teacherGroup is the HIERARCHY of the GROUP the teacher belongs to.

    "_id" : "1001", 
    "student_name" : "Mac", 
    "studGroup" : [
        "Varsity", 
        "Regular", 
        "Fulltime"
    ], 
    "subject" : "R", 
    "score" : 9.0, 
    "marks" : 90.0, 
    "subjectGroup" : [
        "Artificial_Intelligence", 
        "Microsoft", 
        "R"
    ], 
    "teacher" : "Phil", 
    "teacherGroup" : [
        "IMS", 
        "CognitiveServices", 
        "MachineLearning"
    ]
}

What I want to do now is aggregate the marks at different levels of each hierarchical fields .
Ex. I want to know the total marks of all student who are 'FULLTIME', subject that belongs to 'MICROSOFT' and taught by a teacher from a group called 'IMS'
The data should be grouped by all these three elements that belong to different arrays, I also want to project the required number of fields/elements from each of these arrays too.
Ex. Output 
{"total_marks": 1000 ,   "subjectGroup" : [
        "Artificial_Intelligence", 
        "Microsoft", 
            ],
 "studGroup" : [
        "Regular", 
        "Fulltime"
    ], 
  "teacherGroup" : [
        "IMS", 
      ]
}

I have tried with $projection , $filter , $slice  but having difficulty .
single ancestor works okay . In a single document I may have more than 5 such parent arrays . and some values from the array may also be used for computation(not provided in the example).
The collection may hold millions of documents in the future on which the aggregation must be run.

Thanks in Advance
ScK.




Robert Cochran

unread,
Sep 9, 2018, 3:00:17 PM9/9/18
to mongodb-user
Hi,

I suggest that you take one of the "M101" type courses offered by MongoDB University and also take a course devoted to schema design, if there is one.

Thanks

Bob

SrcKode

unread,
Sep 9, 2018, 3:07:46 PM9/9/18
to mongodb-user
Thanks for the suggestion Bob.I will certainly take a look at the courses.
Is the schema so bad ? We had nested documents earlier but the performance was terrible when the data was just around 3 mil .
This is what we were thinking to do , any quick corrections/suggestions ? 
Can you point me to any examples that have similar objectives ?

Thanks .
ScK.

Robert Cochran

unread,
Sep 9, 2018, 7:57:53 PM9/9/18
to mongodb-user
Hi!

What I would do is: take just a few documents -- two (2) of them will suffice -- and put them in a collection. The example you gave shows all fields of string data. That's not correct for use cases where computations, or sometimes indexing, need to be made. "_id" in this case needs to be of type NumberInt. The score and marks fields both need to be NumberDecimal.

After carefully considering the data formats for each field in a document, and correcting them, start playing with the aggregation. The best advice you can get comes from the book "MongoDB the Definitive Guide Second Edition", by Kristina Chodorow. The second edition is outdated now, but the aggregation chapter is still a worthy read because Chodorow offers excellent technical advice: execute one pipeline stage at a time in the shell, to inspect the output. Don't code the entire aggregation in one blob: instead, code one pipeline stage at a time, and check the output to determine if it meets the requirements.

Also, the MongoDB website itself offers examples and test data so that you can try out aggregations and see the different schemas. It is worth copying their test data and doing a quick aggregate with their example code. It helps clarify questions you might have. 

Then go to MongoDB University and take some of their classes. They offer a lot of benefit and are absolutely free of charge. Some of the homework exercises on the M101 type courses are difficult, and take significant time to solve. But they sure give you a lot of hands on experience. 

Go back to your 2 test documents. Ask yourself, is this schema really going to give the desired result in the most efficient way? 

Did you set up any indexes to help you out? 

Are you making use of a method to validate the content of your collection(s)? You should, to make certain every field is of the correct data type. For example, NumberDecimal fields conform to particular requirements and can be used in high-precision computations. You want to make sure that the score and marks fields are uniformly of type NumberDecimal.

I did play with your sample document. I fixed the fields that need to be numeric, added it to a MongoDB (server version: 4.0.2) test collection, and noted the 3 arrays in that one document. I built an aggregation query and unwound each array in successive pipeline stages. That resulted in a  big explosion of unwound documents. If you have x count of documents in the collection and each has 1 to y arrays, and you unwind even 2 of the arrays, you are quickly going to bloat your aggregation pipeline beyond the limits of the process. And it is still unclear what you want to group on even then. 

Thanks

Bob

Robert Cochran

unread,
Sep 9, 2018, 8:50:51 PM9/9/18
to mongodb-user
This aggregation can make some sense:

db.test1.aggregate( [ 

   { "$unwind" : "$studGroup" }, 

   { "$sort" : { "studGroup" : 1 } },

   { "$group" : { "_id" : "$studGroup", "totalMarks" : { "$sum" : "$marks" } } } 

] )


...it results in this output:

{ "_id" : "Varsity", "totalMarks" : NumberDecimal("90.0") }

{ "_id" : "Regular", "totalMarks" : NumberDecimal("90.0") }

{ "_id" : "Fulltime", "totalMarks" : NumberDecimal("90.0") }


...it lets you get a certain view of the academic performance of students in each $studGroup. 


But then trying to group on "teacherGroup", for example, seems a little strange. If you have a lot of teacherGroup classifications and a lot of teachers, you'll have to do a lot of array unwinding, which brings up challenges in pipeline size. Does it make sense to do that? I do not know.


Bob


Wan Bachtiar

unread,
Sep 10, 2018, 3:05:09 AM9/10/18
to mongodb-user

I want to know the total marks of all student who are ‘FULLTIME’, subject that belongs to ‘MICROSOFT’ and taught by a teacher from a group called ‘IMS’

Hi Sck,

It’s not entirely clear from your example document and expected output what you’re trying to achieve. Below are example documents, which are extrapolated from your example to simulate approximately what you’re after:

{ "_id" : "1001", "student_name" : "Mac", "studGroup" : [ "Varsity", "Regular", "Fulltime" ], "subject" : "R", "score" : 9, "marks" : 90, "subjectGroup" : [ "Artificial_Intelligence", "Microsoft", "R" ], "teacher" : "Phil", "teacherGroup" : [ "IMS", "CognitiveServices", "MachineLearning" ] }
{ "_id" : "1002", "student_name" : "John", "studGroup" : [ "Regular", "Fulltime" ], "subject" : "R", "score" : 9, "marks" : 80, "subjectGroup" : [ "Artificial_Intelligence", "Microsoft" ], "teacher" : "Phil", "teacherGroup" : [ "IMS", "CognitiveServices" ] }
{ "_id" : "1003", "student_name" : "Mac", "studGroup" : [ "Regular", "Fulltime" ], "subject" : "Math", "score" : 9, "marks" : 80, "subjectGroup" : [ "Artificial_Intelligence", "Google" ], "teacher" : "Phil", "teacherGroup" : [ "IMS", "CognitiveServices" ] }
{ "_id" : "1004", "student_name" : "Mac", "studGroup" : [ "Regular", "Fulltime" ], "subject" : "Math", "score" : 9, "marks" : 60, "subjectGroup" : [ "Artificial_Intelligence", "Microsoft" ], "teacher" : "Robert", "teacherGroup" : [ "IMS" ] }

You can try below aggregation pipeline:

db.student.aggregate([
    {"$match":{
        "studGroup":"Fulltime", 
        "subjectGroup":"Microsoft", 
        "teacherGroup":"IMS"}
    }, 
    {"$group":{"_id":"$student_name", 
               "total_marks":{"$sum":"$marks"}, 
               "subjectGroup":{"$addToSet":"$subjectGroup"}, 
               "studGroup":{"$addToSet":"$studGroup"}, 
               "teacherGroup":{"$addToSet":"$teacherGroup"}
               }
    }
]);

As you can see from the above example, three of the filters are grouped into a single $match stage.
The aggregation pipeline grouped the documents by student_name and just combined the groups of the student (duplicates). 


Is the schema so bad ? We had nested documents earlier but the performance was terrible when the data was just around 3 mil .

Data Model suitability depends on your application interaction with the data. Without knowing your application requirements it’s hard to determine whether the schema is bad or good.

In addition to MongoDB University course that Bob’s mentioned, I would also recommend to review the following resources:

Regards,
Wan.

SrcKode

unread,
Sep 10, 2018, 6:29:30 AM9/10/18
to mongodb-user
Thanks a lot Bob and Wan for your time and the details . As I mentioned in my question , this is dummy data and the _id printed as a string was a honest typo.

The basic need was to add parent hierarchies using a simple method such as representing them as string arrays in the document as Array of Ancestors  than nested/embedded documents. However , we have more than one such parent hierarchies because all dimensions (student , subject , teachers ) involved here have an associated hierarchy .

The expected outcome is to be able to group the data at each of these elements /levels and also project the required set of elements.

In the example studGroup would represent a hierarchy of the fields 

school
: { "University" , "Institue" ... }
schedule
:{ "Regular"  ,  "PartTime" ... }
course_type
: {"Fulltime" , "Distance" ....}

|-school
 
|-schedule
     
|-course_type
       

The data of these fields is represented as one single array in the document as  "studGroup" : [ "University", "Regular", "Fulltime" ] => student group hierarchy


"please ignore the names of the fields"

The need is to group the data at any level -> school or schedule or course_type
and should be able to pass the values for the level chosen [course_type : 'FullTime']
and should be able to choose the number of levels to project ; E.g.[ $slice : 2] , to retrieve the parent/child field values. The outcome is formatted to show the actual fields



like wise the other two arrays
subjectGroup
teachGroup



Wan , thank you for the query ; it works but how do I use $slice with $addToSet

If there is any better way of representing multiple parent hierarchies in a single document please suggest . 

I will take a look at the courses and try to quickly enrol too.

Thanks again for your time,

ScK
Message has been deleted

Wan Bachtiar

unread,
Sep 16, 2018, 10:13:21 PM9/16/18
to mongodb-user

it works but how do I use $slice with $addToSet .

Hi,

You can append $project stage after the $group stage, to specify $slice operation. Note that you can specify negative number to specify the starting position from the end of the array. See the operator manual for more information on usage.

If there is any better way of representing multiple parent hierarchies in a single document please suggest .

Depending on your use requirements, you can see more example schemas on Model Tree Structures.

Regards
Wan.

Reply all
Reply to author
Forward
0 new messages