Hi everyone, I'm currently in the process of building a geo search query on a rather large database (6gb and growing). Once we started growing the database we noticed very poor (or completely locked up) searches. We've narrowed it down to the use of the `.sort()` function, but I think it might be related to how we are describing our indexes. Here's what the records look like:
{
"_id" : ObjectId("539cf2a19cbae40803bc4e27"),
"source_name" : "..",
"name" : "...",
"end_date" : ISODate("2014-06-16T00:00:00Z"),
"url" : "...",
"image_small_url" : "",
"start_date" : ISODate("2014-06-16T00:00:00Z"),
"image_large_url" : "",
"description" : "...",
"account_id" : "53958eebde2602eb7feec2ce",
"dates" : {
"updated" : ISODate("2014-06-15T01:10:57.659Z"),
"created" : ISODate("2014-06-15T01:10:57.658Z")
},
"venue" : {
"geo" : [ -75.18491, 39.952171 ],
"name" : "...",
"address" : {
"country" : "...",
"zip" : "",
"street" : "...",
"city" : "..." }
},
"keywords" : [ ],
"__v" : 0
}
When I do a geo nearSphere search without sort, the search returns fairly quickly. Though I suspect that it should be faster, it is within a second:
db.events.find({ _id: { '$nin': [] }, end_date: { '$gte': new Date("Mon, 16 Jun 2014 00:00:00 GMT") }, 'venue.geo': { '$nearSphere': [ -74.00597, 40.71427 ], '$maxDistance': 0.03 } }).limit(3).explain()
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 27032,
"nscanned" : 27032,
"nscannedObjectsAllPlans" : 27032,
"nscannedAllPlans" : 27032,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 211,
"nChunkSkips" : 0,
"millis" : 231,
"indexBounds" : {
},
"server" : "ip-10-75-0-187:27017",
"filterSet" : false
}
However, once I add sort, the query can take up to 20 seconds to complete:
db.events.find({ _id: { '$nin': [] }, end_date: { '$gte': new Date("Mon, 16 Jun 2014 00:00:00 GMT") }, 'venue.geo': { '$nearSphere': [ -74.00597, 40.71427 ], '$maxDistance': 0.03 } }).sort({end_date:1}).limit(3).explain()
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 10495027,
"nscanned" : 10495027,
"nscannedObjectsAllPlans" : 10495027,
"nscannedAllPlans" : 10495027,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 81993,
"nChunkSkips" : 0,
"millis" : 32743,
"indexBounds" : {
},
"server" : "ip-10-75-0-187:27017",
"filterSet" : false
}
Or it can crash completely and never return if I use `start_date` as the sort property:
db.events.find({ _id: { '$nin': [] }, end_date: { '$gte': new Date("Mon, 16 Jun 2014 00:00:00 GMT") }, 'venue.geo': { '$nearSphere': [ -74.00597, 40.71427 ], '$maxDistance': 0.03 } }).sort({start_date:1}).limit(3).explain()
... never returns, CPU maxes out.
I've been reading up on the various types of indexes and created an index with all three properties (start_date, end_date, venue.geo), but that didn't seem to help performance:
> db.events.getIndexes()
[
{
"v" : 1,
"name" : "_id_",
"key" : {
"_id" : 1
},
"ns" : "getevents_api.events"
},
{
"v" : 1,
"key" : {
},
"name" : "venue.id_1",
"ns" : "getevents_api.events"
},
{
"v" : 1,
"key" : {
"account_id" : 1
},
"name" : "account_id_1",
"ns" : "getevents_api.events"
},
{
"v" : 1,
"key" : {
"keywords" : 1
},
"name" : "keywords_1",
"ns" : "getevents_api.events"
},
{
"v" : 1,
"key" : {
"start_date" : 1,
"end_date" : 1,
"venue.geo" : "2dsphere"
},
"name" : "start_date_1_end_date_1_venue.geo_2dsphere",
"ns" : "getevents_api.events",
"2dsphereIndexVersion" : 2
},
{
"v" : 1,
"key" : {
"venue.geo" : "2dsphere"
},
"name" : "venue.geo_2dsphere",
"ns" : "getevents_api.events",
"2dsphereIndexVersion" : 2
},
{
"v" : 1,
"key" : {
"start_date" : -1,
"end_date" : 1,
"venue.geo" : "2dsphere"
},
"name" : "start_date_-1_end_date_1_venue.geo_2dsphere",
"ns" : "getevents_api.events",
"2dsphereIndexVersion" : 2
}
]
Any ideas what's going on here? I'm fairly new to mongo, but not with the concept of indexes (albeit there are probably significant differences between my understanding and the way mongo functions), so any help, insight, or suggestions on improvements even outside the scope of this issue would be very much appreciated. This database will continue to grow, and we'd like our queries to be fairly snappy. I don't know how realistic an expectation this is, or if we should be implementing a caching layer in front, but either way the basic functionality of the queries should be executing timely and without complete server overload.
Thanks in advance for any help!
-Nick