[mongodb-user] $within $box queries on geospatial index is very slow

166 views
Skip to first unread message

Douglas Li

unread,
May 23, 2010, 12:41:50 AM5/23/10
to mongodb-user
I've got a collection of roughly 160k points, each having only the
following fields:
{_id: XYZ, geo: {lat: 0.12, long: 13.1}}

Querying for all points within a box without using the index takes
282ms.
Querying with using the index takes 34806ms?!

> db.points.find({'geo.lat': {'$gte': 0, '$lte': 170}, 'geo.long': {'$gte': 0, '$lte': 170}}).explain()
{
"cursor" : "BasicCursor",
"indexBounds" : [ ],
"nscanned" : 165452,
"nscannedObjects" : 165452,
"n" : 81856,
"millis" : 282,
"allPlans" : [
{
"cursor" : "BasicCursor",
"indexBounds" : [ ]
}
]
}
> db.points.ensureIndex({geo: '2d'}, {min: -190, max: 190})
> var box = [[0, 0], [170, 170]]
> db.points.find({geo: {'$within': {'$box': box}}}).explain()
{
"cursor" : "GeoBrowse-box",
"indexBounds" : [ ],
"nscanned" : 81856,
"nscannedObjects" : 81856,
"n" : 81856,
"millis" : 34806,
"oldPlan" : {
"cursor" : "GeoBrowse-box",
"indexBounds" : [ ]
},
"allPlans" : [
{
"cursor" : "GeoBrowse-box",
"indexBounds" : [ ]
}
]
}

I'm on the following version:
db version v1.4.2, pdfile version 4.5
Sun May 23 04:40:16 git version:
53749fc2d547a3139fcf169d84d58442778ea4b0

Any ideas why as to it may be performing 100x worse?

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Eliot Horowitz

unread,
May 23, 2010, 7:33:46 AM5/23/10
to mongod...@googlegroups.com
Your comparing a raw table scan to a much more complex operation.
Table scan require 0 CPU. Maybe it's slightly inefficient but I don't
think that's the right why to look at performance

Also $near is the fastest geo query in general if you want speed.

Douglas Li

unread,
May 25, 2010, 9:35:30 AM5/25/10
to mongodb-user
Ah. I thought it would be a good idea to use a geospatial index,
seeing that it can support $box queries. I'm trying to render map
tiles of points (like those used in Google Maps), and from my
experimentation, a compound index on longitude then latitude works
much better. Or maybe even a simple index on longitude.

Maybe when I have more points to look through, the geospatial index
might be useful.

- Doug



On May 23, 4:33 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Your comparing a raw table scan  to a much more complex operation.  
> Table scan require 0 CPU. Maybe it's slightly inefficient but I don't  
> think that's the right why to look at performance
>
> Also $near is the fastest geo query in general if you want speed.
>
> > For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en

Zac Witte

unread,
Jun 12, 2010, 10:32:34 PM6/12/10
to mongodb-user
I'm also seeing very slow performance using $within and even $near.
The collection has 6.6 million objects I created a compound index,
with geoPt and timestamp, but I'm only using the geoPt field for this
test.

> db.checkins.stats()
{
"ns" : "locis.checkins",
"count" : 6679005,
"size" : 3006781324,
"storageSize" : 4762098432,
"numExtents" : 31,
"nindexes" : 6,
"lastExtentSize" : 800206080,
"paddingFactor" : 1.0099999999479072,
"flags" : 0,
"totalIndexSize" : 2338543232,
"indexSizes" : {
"_id_" : 289514432,
"tweetId_1" : 305505216,
"locisHash_1" : 434447296,
"timestamp_-1" : 351626176,
"id_1" : 492274624,
"geoPt__timestamp_-1" : 465175488
},
"ok" : 1
}

> db.checkins.find({geoPt:{'$within':{'$box':[[37.7490234375, -122.431640625], [37.79296875, -122.34375]]}}}).explain()
{
"cursor" : "GeoBrowse-box",
"indexBounds" : [ ],
"nscanned" : 39422,
"nscannedObjects" : 39422,
"n" : 39422,
"millis" : 50270,
"oldPlan" : {
"cursor" : "GeoBrowse-box",
"indexBounds" : [ ]
},
"allPlans" : [
{
"cursor" : "GeoBrowse-box",
"indexBounds" : [ ]
}
]
}

> db.checkins.find({geoPt:{'$near': [37.79296875, -122.34375]}}).limit(100).explain()
{
"cursor" : "GeoSearchCursor",
"indexBounds" : [ ],
"nscanned" : 100,
"nscannedObjects" : 100,
"n" : 100,
"millis" : 772,
"oldPlan" : {
"cursor" : "GeoSearchCursor",
"indexBounds" : [ ]
},
"allPlans" : [
{
"cursor" : "GeoSearchCursor",
"indexBounds" : [ ]
}
]
}

> db.checkins.stats()
{
"ns" : "locis.checkins",
"count" : 6679005,
"size" : 3006781324,
"storageSize" : 4762098432,
"numExtents" : 31,
"nindexes" : 6,
"lastExtentSize" : 800206080,
"paddingFactor" : 1.0099999999479072,
"flags" : 0,
"totalIndexSize" : 2338543232,
"indexSizes" : {
"_id_" : 289514432,
"tweetId_1" : 305505216,
"locisHash_1" : 434447296,
"timestamp_-1" : 351626176,
"id_1" : 492274624,
"geoPt__timestamp_-1" : 465175488
},
"ok" : 1
}

The whole time that query is running, my CPU is maxed out. A within
box query should be a simple range of the index, which should be very
fast. I can see how $near or within center,radius would take a lot of
CPU since it needs to do some kind of distance function. The docs say
that mongo uses a geohash index internally, which makes sense. I'm not
sure how it does the actual query, but I'm betting it could be
optimized. It could do a series of range comparisons using the index
(shouldn't need more than 3 disjoint ranges) and then scan the results
to cull extraneous records. As it is, I maintain my own geohash index
and querying on a single range usually returns in < 100ms for the same
amount of results.



On May 23, 4:33 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Your comparing a raw table scan  to a much more complex operation.  
> Table scan require 0 CPU. Maybe it's slightly inefficient but I don't  
> think that's the right why to look at performance
>
> Also $near is the fastest geo query in general if you want speed.
>
> > For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en

Eliot Horowitz

unread,
Jun 12, 2010, 10:37:11 PM6/12/10
to mongod...@googlegroups.com
Can you run the $near with the geoNear command instead and send the
stats segment of the output
Reply all
Reply to author
Forward
0 new messages