Indexes on key names

17 views
Skip to first unread message

Glenn Maynard

unread,
Mar 29, 2012, 5:26:52 PM3/29/12
to mongod...@googlegroups.com
(This may already be filed, but there are too many index-related tickets to find any one in particular, and the ticket search is not very good.)

Given a collection like this:

db.player = {
    name: "Glenn",
    scores: {
        ping_pong: { points: 2000 },
        golf: { points: 100 }
    }
}

I want an index on each of the keys of scores, so I can efficiently eg. find and sort on 'scores.ping_pong.points'.  However, there are too many keys to create indexes on 'scores.ping_pong', 'scores.golf', and so on--there many be hundreds of possible keys.

It would help if it was possible to create an index on 'scores.$key.points', which would effectively be a compound index on (key name, points), so this type of structure can be indexed without creating an index for every possible key.  That way, a single index would work for both find({scores.ping_pong.points: 100}) and find({scores.golf.points: 100}).  In the above data, two index entries would be created: one on ('ping_pong', 2000) and the other on ('golf', 100).

This could probably also be used to accelerate $exists.  A ({'name': 1, 'scores.$key': 1}) index should allow find({name: 'Glenn', 'scores.ping_pong': {$exists: true}}) to be done efficiently.

A compound key of ('x', 'scores.$key.points', 'y') would expand to ('x', 'key name', 'points', 'y').  Similarly, ('a.$key.b', 'b.$key.c') expands to ('first key name', 'b', 'second key name', 'c').

The workarounds for this are awkward, like representing scores as an array and moving the keys to fields, eg. scores: [{game: 'ping_pong', points: 2000}], so I can index on 'scores.game'.  That's painfully unnatural (extra hoops have to be jumped to ensure duplicate entries aren't created; looking up a score for a game is harder, and so on), and prevents covered index searches entirely (since it's now a multikey index).  It'd be nice to not need hacks like this.

(Moving scores entries to their own collection isn't ideal, either, since it would make another use case--retrieving all scores for a single user--more expensive.  If the user has 100 score entries to look up, and the records are scattered across the collection, that's a lot of disk seeking.  That's one reason it's all stored in one document.  There's more data in the real entries, so covered-index searches aren't a comfortable workaround--it won't fit in memory.)

--
Glenn Maynard

Kyle Banker

unread,
Mar 29, 2012, 5:39:00 PM3/29/12
to mongod...@googlegroups.com
Please post this text as a new feature request at jira.mongodb.org.

There we can talk through variations on this and prioritize appropriately.

Glenn Maynard

unread,
Mar 29, 2012, 5:51:45 PM3/29/12
to mongod...@googlegroups.com
On Thu, Mar 29, 2012 at 4:39 PM, Kyle Banker <kyleb...@gmail.com> wrote:
Please post this text as a new feature request at jira.mongodb.org.

There we can talk through variations on this and prioritize appropriately.

Few people actually communicate on trackers; it's a lot more likely to get useful discussion on lists than in a ticket.

--
Glenn Maynard

Reply all
Reply to author
Forward
0 new messages