It is important to avoid creating documents that have the potential to
grow without limit. This is inefficient because Mongo will constantly
have to readjust the Document size, and you may eventually run into
the maximum 16mb document size. That being said, these issues affect
documents that are constantly being added to, and it sounds like the
surveys that you are storing have already been taken and there is no
additional responses being added.
For maximum query efficiency, it is important to organize your data in
such a way that you can put an index on the field that you are going
to be querying on. The Mongo Documentation on indexing may be found
here:
http://www.mongodb.org/display/DOCS/Indexes
If you decide to store all of the responses for each question in an
array, you can create an index on that array, and each element
(response) in the array will be added to the index. There is more
information about this in the Mongo Document titled "Multikeys":
http://www.mongodb.org/display/DOCS/Multikeys
Again, if your application were actively collecting new responses, I
would not recommend this.
Another approach would be to store each response as a separate
document, possibly with a different collection for each survey.
> db.favorites.find()
{
_id:0,
question_number:1,
question:"What is your favorite movie?",
survey_taker:"Steve",
response:"Empire Records"
},
{
_id:1,
question_number:2,
question:"What is your favorite book?",
survey_taker:"Steve",
response:"Catcher in the Rye"
},
{
_id:2,
question_number:3,
question:"What is your favorite food?",
survey_taker:"Steve",
response:"sushi"
},
{
_id:3,
question_number:1,
question:"What is your favorite movie?",
survey_taker:"Tyler",
response:"Spinal Tap"
},
{
_id:4,
question_number:2,
question:"What is your favorite book?",
survey_taker:"Steve",
response:"Earth"
},
{
_id:5,
question_number:3,
question:"What is your favorite food?",
survey_taker:"Steve",
response:"pizza"
}
With this structure, it would be very easy to create an index on the
"response" field and quickly find how many documents have the same
response for a given question, how many documents match
{question:"What is your favorite food?", response:"sushi"}, for
example.
As you can see, by storing all of the responses as separate documents,
there will be a lot of repeated values. You might consider using
database references to store information about each survey.
> db.surveys.find()
{
"_id" : 0,
"name" : "favorites",
"questions" : [
{
"number" : 1,
"question" : "Favorite Movie"
},
{
"number" : 2,
"question" : "Favorite Book"
},
{
"number" : 3,
"question" : "Favorite Food"
}
]
}
{
"_id" : 1,
"name" : "vital_statistics",
"questions" : [
{
"number" : 1,
"question" : "What is your height?"
},
{
"number" : 2,
"question" : "What is your weight?"
},
{
"number" : 3,
"question" : "what is your age?"
}
]
}
> db.responses.find()
{
"_id" : 4,
"survey" : {
"$ref" : "surveys",
"$id" : 1
},
"question_number" : 1,
"survey_taker" : Tyler,
"Response" : "5ft, 11in"
}
{
"_id" : 5,
"survey" : {
"$ref" : "surveys",
"$id" : 1
},
"question_number" : 2,
"survey_taker" : Tyler,
"Response" : "165 lbs"
}
{
"_id" : 6,
"survey" : {
"$ref" : "surveys",
"$id" : 1
},
"question_number" : 3,
"survey_taker" : Tyler,
"Response" : "27"
}
For further information on Database References, please see the Mongo
Documentation:
http://www.mongodb.org/display/DOCS/Database+References
Hopefully the above has given you a few new ideas to consider for how
to organize your data. Good luck!