Hello,
We are porting an application over to mongo using morphia and have the need to store a map of variable data points that is easily searchable (e.g. which I think is called an "embedded index").
My Java object is a simple wrapper for the map with an ObjectId, and some other attributes ("tag", "group", and some audit fields):
@Entity
public class Record {
ObjectId objectId;
String tag;
Group group;
Map<String, Object> values;
/** g/setters **/
}
Morphia/Mongo is storing the records like so:
> db.Record.find();
{
"_id" : ObjectId("4e5bcb9161e2247bc2857283"),
"values" : { "address" : "100 Main Street", "postalCode" : "10019", "gender" : "m"},
"tag" : "Set 1",
"group" : { "$ref" : "Group", "$id" : ObjectId("4e56cf5261e291103a66c510") },
"createDate" : ISODate("2011-08-29T17:25:36.965Z"),
"lastModifiedDate" : ISODate("2011-08-29T17:25:37.043Z"))
}
{
"_id" : ObjectId("4e5bcbbe61e2247bc2857284"),
"values" : { "address" : "30 Spadina", "favouriteColour" : "red", "gender" : "m", "city" : "Toronto" },
"tag" : "Set 1",
"group" : { "$ref" : "Group", "$id" : ObjectId("4e56cf5261e291103a66c510") },
"createDate" : ISODate("2011-08-29T17:25:51.497Z"),
"lastModifiedDate" : ISODate("2011-08-29T17:26:22.564Z")
}
{
"_id" : ObjectId("4e5be0c661e2247bc2857285"),
"values" : { "source" : "upstream", "favouriteFood" : "Hamburger", "gender" : "m", "city" : "New York" },
"tag" : "Set 1",
"group" : { "$ref" : "Group", "$id" : ObjectId("4e56cf5261e291103a66c510") },
"createDate" : ISODate("2011-08-29T18:56:06.210Z"),
"lastModifiedDate" : ISODate("2011-08-29T18:56:06.211Z"),
}
My questions are: considering we need very quick searches on the "values" set over potentially 100 million records (sharding etc aside),
e.g.
db.Record.find({"values.source" : "upstream"})
OR
db.Record.find({"values.favouriteFood" : "Hamburger", "values.city":"New York"})
... etc...
1) Is the above method of storing the data recommended (e.g. in an embedded set)?
2) If so, using Morphia, how can I ensure that each of the dynamically added key pairs of "values" are indexed as they're added to the collection?
3) Is there another recommended way of storing dynamic data sets as above?
I'm happy to contribute code to the project is no such thing exists, and it makes sense to store/annotate the data as above.
I appreciate your help!
Corey