I am in the process of implementing an autocomplete searchbox for message recipients similar to the likes of gmail with MongoDB. The returned set will be based on previous interaction and prefix matches on any number of emails per user and/or full name.
Currently I'm using this schema
{
_id: 1,
Fullname: "Aaa Bbb",
Keys: ["xxx yyy mmm nnn aaa bbb"],
Connections: [ 19, 21, 32 /*Links to other users*/]
}
A search could look like this
db.User.find({"Connections._id" : 19, "Keys" : /^m/})
where I am user 19 and I've typed "m" into the search box. The above document would match.
Keys are indexed and the array is created on insertion using all the subcomponents of all emails and names.
Connections is also indexed and contain the *reciprocal* connections based on previous interaction such that i A has ever messaged B then A will be in B's Connection array. The idea is to exploit Mongos indexing/sharding to restrict the search set.
The question is, is this a good idea?
An alternative would be to have Connections the other way around i.e. if A ever messaged B then B would be in A's connection array. A search for the same match would then be
db.User.find({"_id" : 19, "Connections.Keys" : /^m/})
I guess it comes down to how Mongo uses indexes. If it's a question of two consecutive lookups then the second approach seems appropriate. However, if there is a more complicated/smart merging of indexes going on then I can imagine the first one performing better.
The first also has the added benifit that Connections doesn't even have to contain linked objects, just ids. I'm not sure of the gain there though.