How to strike balance between storing data with OO approach while maintaining the size of BSON documents

31 views
Skip to first unread message

Juzer Ali

unread,
Apr 9, 2012, 3:35:36 PM4/9/12
to mongod...@googlegroups.com
I have got a use case and I am wondering how should I use this ODM DB, and weather it is a good fit for my purpose at all. So mongo users I will greatly appreciate your help.

I need to set up a web portal for an organization. The organization has different departments say (A,B,C) Each year, in a specific month- say April- all the departments will create a questionnaire (of 100 objective questions) for each day of the month. (3 departments * 100 questions each * 30 days = 9000 questions). Each questionnaire will be divided into two sections say SI and SII.

500,000 to 1,000,000 users will register to take up questionnaire on this website and book a slot on one of the days to take it up. One registration will entitle them to take up only one questionnaire with one of the departments.

Now I need to give the organization the capability
* Create departments
* Upload questions for each day of the month for each department
* Record the user info/profile with a photograph (less than 100KB) and another file (less than 100KB)
* Record the user replies
* Perform analytics/reporting based on user replies.

Now a complete OO approach will suggest 

  A: {
       departmentName: 'depName',
       OtherDetailsAboutDepartment: {},
       days : {
               day1: {
                      users: [
                               {
                                 user_name: 'user_name', 
                                 photo: 'BLOB',
                                 file    : 'BLOB',
                                 otherinfo: {},
                                 userAnswers: { 
                                        q1: '1',
                                        q2: '3',
                                        q4: '2',
                                        .
                                        .
                                        .
                                        q100: '3'
                                       },
                    },
                    {
                      username: 'user2',
                      .
                      .
                      .
                     }
                   .
                   .
                   .,
                   {
                     username: 'user100000'
                     .
                     .
                     .
                   },
               day2: {... similarly ...}
               day3: {... similarly ...}
          ]
}



Now we all know better that this isn't feasible since these BSON documents will grow too large.
Another way can be to store department in a different collection, users into another, questions into another, and user answers into yet another and provide references for mapping among them. But that would be rather too relational.

My question is how to strike balance between two methods 'safely'.
Another question is that I have heard mongo is not fit for reporting, can anybody comment on that.
Thanks in advance, looking forward to complete this app in mongo.

Steve Francia

unread,
Apr 10, 2012, 4:35:57 PM4/10/12
to mongod...@googlegroups.com
In MongoDB typically you will want to model each business entity in it's own collection. Especially when they contain data not directly related to each other, for example users and departments. 

I would probably model this data in a more  "relational" way as you have described it here. There doesn't seem to be much benefit to embedding them here, and as you say, you will hit size limits pretty quickly. 

I would have user and department as two separate collections for the questions, I would have each questionnaire as it's own document. It would contain all the questions in an embedded array. 

Lastly I would likely embed the answers into the questionnaire document. Based on the numbers provided, unless the answers are very long, it should fit fine within the 16Mb limit. If that is a concern, I would separate answers out into their own collection.

One thing you didn't explain is how you will be accessing the data after the users complete the questionnaires. This may dramatically change the schema design.

To answer the question about MongoDB being fit for reporting. It depends on what you mean by reporting. Until recently MongoDB didn't have a good answer for aggregate functions, but the current development release has an aggregation framework that provides the simplicity of sql's aggregate functions, and manages to provide more functionality. MongoDB isn't a data processing engine though, so depending on your needs you may need an additional processing tool like Hadoop alongside MongoDB. 

Lastly, for performance reasons, it often makes sense to precalculate values, or calculate on write. This makes generating / reading reports very fast, while adding a very small additional load on write. Regardless of the database being used, this is often the right approach to take. 


Juzer Ali

unread,
Apr 11, 2012, 12:52:53 PM4/11/12
to mongod...@googlegroups.com
Answers would be one liner, so they can very well fit within questionnaire document.
I didn't understand what you meant by "How will you access this data". I would simply retrieve the document and make joins appropriately.
Reply all
Reply to author
Forward
0 new messages