Indexes on key names

Glenn Maynard

unread,

Mar 29, 2012, 5:26:52 PM3/29/12

to mongod...@googlegroups.com

(This may already be filed, but there are too many index-related tickets to find any one in particular, and the ticket search is not very good.)

Given a collection like this:

db.player = {

name: "Glenn",

scores: {

ping_pong: { points: 2000 },

golf: { points: 100 }

}
}

I want an index on each of the keys of scores, so I can efficiently eg. find and sort on 'scores.ping_pong.points'. However, there are too many keys to create indexes on 'scores.ping_pong', 'scores.golf', and so on--there many be hundreds of possible keys.

It would help if it was possible to create an index on 'scores.$key.points', which would effectively be a compound index on (key name, points), so this type of structure can be indexed without creating an index for every possible key. That way, a single index would work for both find({scores.ping_pong.points: 100}) and find({scores.golf.points: 100}). In the above data, two index entries would be created: one on ('ping_pong', 2000) and the other on ('golf', 100).

This could probably also be used to accelerate $exists. A ({'name': 1, 'scores.$key': 1}) index should allow find({name: 'Glenn', 'scores.ping_pong': {$exists: true}}) to be done efficiently.

A compound key of ('x', 'scores.$key.points', 'y') would expand to ('x', 'key name', 'points', 'y'). Similarly, ('a.$key.b', 'b.$key.c') expands to ('first key name', 'b', 'second key name', 'c').

The workarounds for this are awkward, like representing scores as an array and moving the keys to fields, eg. scores: [{game: 'ping_pong', points: 2000}], so I can index on 'scores.game'. That's painfully unnatural (extra hoops have to be jumped to ensure duplicate entries aren't created; looking up a score for a game is harder, and so on), and prevents covered index searches entirely (since it's now a multikey index). It'd be nice to not need hacks like this.

(Moving scores entries to their own collection isn't ideal, either, since it would make another use case--retrieving all scores for a single user--more expensive. If the user has 100 score entries to look up, and the records are scattered across the collection, that's a lot of disk seeking. That's one reason it's all stored in one document. There's more data in the real entries, so covered-index searches aren't a comfortable workaround--it won't fit in memory.)

--
Glenn Maynard

Kyle Banker

unread,

Mar 29, 2012, 5:39:00 PM3/29/12

to mongod...@googlegroups.com

Please post this text as a new feature request at jira.mongodb.org.

There we can talk through variations on this and prioritize appropriately.

Glenn Maynard

unread,

Mar 29, 2012, 5:51:45 PM3/29/12

to mongod...@googlegroups.com

On Thu, Mar 29, 2012 at 4:39 PM, Kyle Banker <kyleb...@gmail.com> wrote:

Please post this text as a new feature request at jira.mongodb.org.

There we can talk through variations on this and prioritize appropriately.

Few people actually communicate on trackers; it's a lot more likely to get useful discussion on lists than in a ticket.

--
Glenn Maynard

Reply all

Reply to author

Forward