I'm also seeing very slow performance using $within and even $near.
The collection has 6.6 million objects I created a compound index,
with geoPt and timestamp, but I'm only using the geoPt field for this
test.
> db.checkins.stats()
{
"ns" : "locis.checkins",
"count" : 6679005,
"size" : 3006781324,
"storageSize" : 4762098432,
"numExtents" : 31,
"nindexes" : 6,
"lastExtentSize" : 800206080,
"paddingFactor" : 1.0099999999479072,
"flags" : 0,
"totalIndexSize" : 2338543232,
"indexSizes" : {
"_id_" : 289514432,
"tweetId_1" : 305505216,
"locisHash_1" : 434447296,
"timestamp_-1" : 351626176,
"id_1" : 492274624,
"geoPt__timestamp_-1" : 465175488
},
"ok" : 1
}
> db.checkins.find({geoPt:{'$within':{'$box':[[37.7490234375, -122.431640625], [37.79296875, -122.34375]]}}}).explain()
{
"cursor" : "GeoBrowse-box",
"indexBounds" : [ ],
"nscanned" : 39422,
"nscannedObjects" : 39422,
"n" : 39422,
"millis" : 50270,
"oldPlan" : {
"cursor" : "GeoBrowse-box",
"indexBounds" : [ ]
},
"allPlans" : [
{
"cursor" : "GeoBrowse-box",
"indexBounds" : [ ]
}
]
}
> db.checkins.find({geoPt:{'$near': [37.79296875, -122.34375]}}).limit(100).explain()
{
"cursor" : "GeoSearchCursor",
"indexBounds" : [ ],
"nscanned" : 100,
"nscannedObjects" : 100,
"n" : 100,
"millis" : 772,
"oldPlan" : {
"cursor" : "GeoSearchCursor",
"indexBounds" : [ ]
},
"allPlans" : [
{
"cursor" : "GeoSearchCursor",
"indexBounds" : [ ]
}
]
}
> db.checkins.stats()
{
"ns" : "locis.checkins",
"count" : 6679005,
"size" : 3006781324,
"storageSize" : 4762098432,
"numExtents" : 31,
"nindexes" : 6,
"lastExtentSize" : 800206080,
"paddingFactor" : 1.0099999999479072,
"flags" : 0,
"totalIndexSize" : 2338543232,
"indexSizes" : {
"_id_" : 289514432,
"tweetId_1" : 305505216,
"locisHash_1" : 434447296,
"timestamp_-1" : 351626176,
"id_1" : 492274624,
"geoPt__timestamp_-1" : 465175488
},
"ok" : 1
}
The whole time that query is running, my CPU is maxed out. A within
box query should be a simple range of the index, which should be very
fast. I can see how $near or within center,radius would take a lot of
CPU since it needs to do some kind of distance function. The docs say
that mongo uses a geohash index internally, which makes sense. I'm not
sure how it does the actual query, but I'm betting it could be
optimized. It could do a series of range comparisons using the index
(shouldn't need more than 3 disjoint ranges) and then scan the results
to cull extraneous records. As it is, I maintain my own geohash index
and querying on a single range usually returns in < 100ms for the same
amount of results.
> Your comparing a raw table scan to a much more complex operation.
> Table scan require 0 CPU. Maybe it's slightly inefficient but I don't
> think that's the right why to look at performance
>
> Also $near is the fastest geo query in general if you want speed.
>
> > For more options, visit this group athttp://
groups.google.com/group/mongodb-user?hl=en