Best schema design for a survey database

6,088 views
Skip to first unread message

willz

unread,
Jan 5, 2012, 9:33:58 PM1/5/12
to mongodb-user
Hello everyone,
Hoping someone can point me in the right direction.

I need to build a mongo database to store historical survey data
but I am unsure of what would be a good schema design
Since the data is historical there won't really be any new updates/
inserts, but many of these surveys have large amounts of responses
[50000 per question in a 40 question survey],so I still was hoping to
still maintain decent querying performance.

my current structure is:

Surveys{
Survey{
Questions{}
Responses{}
}

Survey{
Questions{}
Responses{}
}
}


But I wasnt sure if this would be ok since most of the querying is
going to happen on Responses anyway.

The other possible way I was thinking was to separate out each
collection

Surveys{ ---> This would contain mostly metadata on the survey

}



Questions{ ---> the questions would contain a dbref or something to
link it to the object in the survey collection

}

Responses{ ---> same thing as questions

}

Since I am pretty new I am unsure as to the pros/cons of either
approach.
Thank you in advance for any thoughts and tips :)


-Will

Julio Barros

unread,
Jan 6, 2012, 11:35:00 AM1/6/12
to mongod...@googlegroups.com
Thats a really good question and I would love to hear thoughts from others more experienced.

My first thought would be to make questions part of the survey document since it is reasonable to consider them one document/thing.

And then have a responses collection where each document is a set of responses to a particular survey. You would need to add the id of the survey.

A question I have, Since each set of response documents is likely to be very different (survey's themselves may have nothing to do with each other) is there any benefit to breaking out each set of responses into its own collection or is that not worth worrying about?

Thanks in advance.

Julio

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Ricardo Fukui

unread,
Jan 6, 2012, 11:59:38 AM1/6/12
to mongod...@googlegroups.com
I Dont know what really you want do "save", but maybe this work better.
{
Surveys:
[ #starting a array or survey
{questions:
[ #starting array of question by survey
{question:"value", response:"value"},
{question:"value", response:"value"}
]
},
{questions:
[
{question:"value", response:"value"},
{question:"value", response:"value"}
]
}
]
}


2012/1/6 Julio Barros <julioe...@gmail.com>:

Marc

unread,
Jan 6, 2012, 5:44:47 PM1/6/12
to mongodb-user
It is important to avoid creating documents that have the potential to
grow without limit. This is inefficient because Mongo will constantly
have to readjust the Document size, and you may eventually run into
the maximum 16mb document size. That being said, these issues affect
documents that are constantly being added to, and it sounds like the
surveys that you are storing have already been taken and there is no
additional responses being added.

For maximum query efficiency, it is important to organize your data in
such a way that you can put an index on the field that you are going
to be querying on. The Mongo Documentation on indexing may be found
here:
http://www.mongodb.org/display/DOCS/Indexes

If you decide to store all of the responses for each question in an
array, you can create an index on that array, and each element
(response) in the array will be added to the index. There is more
information about this in the Mongo Document titled "Multikeys":
http://www.mongodb.org/display/DOCS/Multikeys

Again, if your application were actively collecting new responses, I
would not recommend this.

Another approach would be to store each response as a separate
document, possibly with a different collection for each survey.

> db.favorites.find()
{
_id:0,
question_number:1,
question:"What is your favorite movie?",
survey_taker:"Steve",
response:"Empire Records"
},
{
_id:1,
question_number:2,
question:"What is your favorite book?",
survey_taker:"Steve",
response:"Catcher in the Rye"
},
{
_id:2,
question_number:3,
question:"What is your favorite food?",
survey_taker:"Steve",
response:"sushi"
},
{
_id:3,
question_number:1,
question:"What is your favorite movie?",
survey_taker:"Tyler",
response:"Spinal Tap"
},
{
_id:4,
question_number:2,
question:"What is your favorite book?",
survey_taker:"Steve",
response:"Earth"
},
{
_id:5,
question_number:3,
question:"What is your favorite food?",
survey_taker:"Steve",
response:"pizza"
}

With this structure, it would be very easy to create an index on the
"response" field and quickly find how many documents have the same
response for a given question, how many documents match
{question:"What is your favorite food?", response:"sushi"}, for
example.

As you can see, by storing all of the responses as separate documents,
there will be a lot of repeated values. You might consider using
database references to store information about each survey.

> db.surveys.find()
{
"_id" : 0,
"name" : "favorites",
"questions" : [
{
"number" : 1,
"question" : "Favorite Movie"
},
{
"number" : 2,
"question" : "Favorite Book"
},
{
"number" : 3,
"question" : "Favorite Food"
}
]
}
{
"_id" : 1,
"name" : "vital_statistics",
"questions" : [
{
"number" : 1,
"question" : "What is your height?"
},
{
"number" : 2,
"question" : "What is your weight?"
},
{
"number" : 3,
"question" : "what is your age?"
}
]
}

> db.responses.find()
{
"_id" : 4,
"survey" : {
"$ref" : "surveys",
"$id" : 1
},
"question_number" : 1,
"survey_taker" : Tyler,
"Response" : "5ft, 11in"
}
{
"_id" : 5,
"survey" : {
"$ref" : "surveys",
"$id" : 1
},
"question_number" : 2,
"survey_taker" : Tyler,
"Response" : "165 lbs"
}
{
"_id" : 6,
"survey" : {
"$ref" : "surveys",
"$id" : 1
},
"question_number" : 3,
"survey_taker" : Tyler,
"Response" : "27"
}

For further information on Database References, please see the Mongo
Documentation:
http://www.mongodb.org/display/DOCS/Database+References

Hopefully the above has given you a few new ideas to consider for how
to organize your data. Good luck!

willz

unread,
Jan 9, 2012, 12:58:23 AM1/9/12
to mongod...@googlegroups.com
Thanks you guys for the responses!  
I ended up leaning towards making them separate but I did like this:

Survey
Survey.Questions
Survey.Responses

I was thinking it might be more feasible to have small documents instead of a collection of gigantic survey documents.  I originally tried to do the other way with everything being contained in the survey object but i hit a particular survey that had crazy amounts of data and i ended up hitting the mongo document size limit.  So thats what made me go with the separate collections.

Like Marc suggested now, each question and response are their own documents that contain dbrefs to the survey.
So I am still in the process of moving all that historic data into mongo but its going good so far.
I am really loving Mongo! 

   
Reply all
Reply to author
Forward
0 new messages