Using indexes to improve sort performance during geo searches

91 views
Skip to first unread message

ni...@silverbucket.net

unread,
Jun 18, 2014, 9:08:23 AM6/18/14
to mongod...@googlegroups.com
Hi everyone, I'm currently in the process of building a geo search query on a rather large database (6gb and growing). Once we started growing the database we noticed very poor (or completely locked up) searches. We've narrowed it down to the use of the `.sort()` function, but I think it might be related to how we are describing our indexes. Here's what the records look like:


  "_id" : ObjectId("539cf2a19cbae40803bc4e27"), 
  "source_name" : "..", 
  "name" : "...", 
  "end_date" : ISODate("2014-06-16T00:00:00Z"), 
  "url" : "...", 
  "image_small_url" : "", 
  "start_date" : ISODate("2014-06-16T00:00:00Z"), 
  "image_large_url" : "", 
  "description" : "...", 
  "account_id" : "53958eebde2602eb7feec2ce", 
  "dates" : { 
    "updated" : ISODate("2014-06-15T01:10:57.659Z"), 
    "created" : ISODate("2014-06-15T01:10:57.658Z") 
  }, 
  "venue" : { 
    "geo" : [ -75.18491, 39.952171 ], 
   "name" : "...", 
   "address" : {
     "country" : "...", 
     "zip" : "", 
     "street" : "...", 
     "city" : "..." } 
  }, 
  "keywords" : [ ], 
  "__v" : 0 
}

When I do a geo nearSphere search without sort, the search returns fairly quickly. Though I suspect that it should be faster, it is within a second:

db.events.find({ _id: { '$nin': [] }, end_date: { '$gte': new Date("Mon, 16 Jun 2014 00:00:00 GMT") },  'venue.geo': { '$nearSphere': [ -74.00597, 40.71427 ], '$maxDistance': 0.03 }  }).limit(3).explain()
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 27032,
"nscanned" : 27032,
"nscannedObjectsAllPlans" : 27032,
"nscannedAllPlans" : 27032,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 211,
"nChunkSkips" : 0,
"millis" : 231,
"indexBounds" : {
},
"server" : "ip-10-75-0-187:27017",
"filterSet" : false
}


However, once I add sort, the query can take up to 20 seconds to complete:

db.events.find({ _id: { '$nin': [] }, end_date: { '$gte': new Date("Mon, 16 Jun 2014 00:00:00 GMT") },  'venue.geo': { '$nearSphere': [ -74.00597, 40.71427 ], '$maxDistance': 0.03 }  }).sort({end_date:1}).limit(3).explain()
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 10495027,
"nscanned" : 10495027,
"nscannedObjectsAllPlans" : 10495027,
"nscannedAllPlans" : 10495027,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 81993,
"nChunkSkips" : 0,
"millis" : 32743,
"indexBounds" : {
},
"server" : "ip-10-75-0-187:27017",
"filterSet" : false
}


Or it can crash completely and never return if I use `start_date` as the sort property:

db.events.find({ _id: { '$nin': [] }, end_date: { '$gte': new Date("Mon, 16 Jun 2014 00:00:00 GMT") },  'venue.geo': { '$nearSphere': [ -74.00597, 40.71427 ], '$maxDistance': 0.03 }  }).sort({start_date:1}).limit(3).explain()

... never returns, CPU maxes out.

I've been reading up on the various types of indexes and created an index with all three properties (start_date, end_date, venue.geo), but that didn't seem to help performance:

> db.events.getIndexes()
[
{
"v" : 1,
"name" : "_id_",
"key" : {
"_id" : 1
},
"ns" : "getevents_api.events"
},
{
"v" : 1,
"key" : {
"venue.id" : 1
},
"name" : "venue.id_1",
"ns" : "getevents_api.events"
},
{
"v" : 1,
"key" : {
"account_id" : 1
},
"name" : "account_id_1",
"ns" : "getevents_api.events"
},
{
"v" : 1,
"key" : {
"keywords" : 1
},
"name" : "keywords_1",
"ns" : "getevents_api.events"
},
{
"v" : 1,
"key" : {
"start_date" : 1,
"end_date" : 1,
"venue.geo" : "2dsphere"
},
"name" : "start_date_1_end_date_1_venue.geo_2dsphere",
"ns" : "getevents_api.events",
"2dsphereIndexVersion" : 2
},
{
"v" : 1,
"key" : {
"venue.geo" : "2dsphere"
},
"name" : "venue.geo_2dsphere",
"ns" : "getevents_api.events",
"2dsphereIndexVersion" : 2
},
{
"v" : 1,
"key" : {
"start_date" : -1,
"end_date" : 1,
"venue.geo" : "2dsphere"
},
"name" : "start_date_-1_end_date_1_venue.geo_2dsphere",
"ns" : "getevents_api.events",
"2dsphereIndexVersion" : 2
}
]


Any ideas what's going on here? I'm fairly new to mongo, but not with the concept of indexes (albeit there are probably significant differences between my understanding and the way mongo functions), so any help, insight, or suggestions on improvements even outside the scope of this issue would be very much appreciated. This database will continue to grow, and we'd like our queries to be fairly snappy. I don't know how realistic an expectation this is, or if we should be implementing a caching layer in front, but either way the basic functionality of the queries should be executing timely and without complete server overload.

Thanks in advance for any help!
-Nick

Asya Kamsky

unread,
Jun 18, 2014, 11:58:25 PM6/18/14
to mongodb-user
Near automatically sorts by closest to furthest.
You are then *negating* that work by re-sorting the results.

If you want them sorted by non-geo attribute then use $geoWithin instead of $geoNear.

Asya



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b7d58b70-50f6-4e38-a3dc-6ce5718aa90a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ni...@silverbucket.net

unread,
Jun 20, 2014, 12:06:21 PM6/20/14
to mongod...@googlegroups.com
Hi Asya,

 Thanks for your reply, this helped a lot to stop the crashing/hanging of mongo. I'm now getting results back from my queries, however with certain coordinates (I can't find any clues as to why some return faster than others), or even most coordinates,  I'm still getting up to 20+ seconds before the query returns:

db.events.find({ _id: { '$nin': [] }, 'venue.geo': { '$geoWithin': { '$center': [[ 14.4667, 50.0833 ], 0.04 ]}}, end_date: { '$gte': new Date("Mon, 18 Jun 2014 00:00:00 GMT") } }).sort({start_date:1}).limit(3).explain()
{
"cursor" : "BtreeCursor start_date_1",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 77902,
"nscanned" : 77903,
"nscannedObjectsAllPlans" : 311605,
"nscannedAllPlans" : 467415,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 3652,
"nChunkSkips" : 0,
"millis" : 4429,
"indexBounds" : {
"start_date" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "ip-10-75-0-187:27017",
"filterSet" : false
}



Whereas, when I remove the sort, the response is sent within a second or so (though I have noticed the geoNear is faster than geoWithin when not using sort):

db.events.find({ _id: { '$nin': [] }, 'venue.geo': { '$geoWithin': { '$center': [[ 14.4667, 50.0833 ], 0.04 ]}}, end_date: { '$gte': new Date("Mon, 18 Jun 2014 00:00:00 GMT") } }).limit(3).explain()
{
"cursor" : "BtreeCursor _id_",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 5266,
"nscanned" : 5267,
"nscannedObjectsAllPlans" : 15798,
"nscannedAllPlans" : 26337,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 206,
"nChunkSkips" : 0,
"millis" : 323,
"indexBounds" : {
"_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "ip-10-75-0-187:27017",
"filterSet" : false
}


Is there a specific index I should be using to speed up these queries? I've re-read the index documentation but haven't found a good explain of using geo searches and ordering by date. To make things more confusing, sometimes the keyword field is used as well (an array of strings) to further narrow down the search.

Any ideas as to what I may be doing wrong? Or something I can do to improve the response?

Cheers
Nick

Asya Kamsky

unread,
Jun 21, 2014, 9:17:43 AM6/21/14
to mongod...@googlegroups.com
It might help to have a compound index when you filter and sort by various fields, you may find this write-up helpful:

Btw, is there a reason you have an empty _id clause?

Ron Yosipovich

unread,
Jan 20, 2015, 1:21:46 PM1/20/15
to mongod...@googlegroups.com
What is the S2NearCursor cursor mentioned above? Does it imply that geospatial indexes are being used?
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

Asya Kamsky

unread,
Jan 20, 2015, 6:19:11 PM1/20/15
to mongodb-user
Yes, that's the indication geo index is being used.
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/e861667e-21b3-40c5-9847-a38cf1101c40%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages