indexed geo search slower than index-less naive search!?!

Brandon Heller

unread,

Nov 21, 2010, 6:08:37 AM11/21/10

to mongodb-user, Evan Rosenfeld, Eli Marschner

I would like to do geo queries against a collection with 1M docs or
more. I'm seeing terrible performance for geo queries, even though
the outputs from find() commands and getIndexes() indicate that a 2d
geo index is being used for the query.

Each entry looks like this:

> db.commits.findOne()
{
"_id" : ObjectId("4ce4ba7df360623d32000000"),
"loc" : [
55.59664,
13.00156
],
"sha1" : "88d2a028ebfb7ddc9f8a8b11efac03503c4ddd7f",
"parents" : [
"5f2c2ed26d5a467b83d5df91c8366f3c4a7caa23"
],
"location" : "Malmö / Sweden",
"committed_date" : "2010-04-24T05:49:34-07:00",
"committed_date_native" : "Fri Apr 23 2010 22:49:34 GMT-0700 (PDT)",
"author" : "FredrikL",
"authored_date" : "2010-04-24T05:49:34-07:00"
}

Indexes:
> db.commits.getIndexes()
[
{
"name" : "_id_",
"ns" : "processed.commits",
"key" : {
"_id" : 1
}
},
{
"ns" : "processed.commits",
"name" : "sha1_1",
"key" : {
"sha1" : 1
}
},
{
"ns" : "processed.commits",
"name" : "loc_2d",
"key" : {
"loc" : "2d"
}
},
{
"ns" : "processed.commits",
"name" : "committed_date_native_1",
"key" : {
"committed_date_native" : 1
}
}
]

Of the 955K entries, about 73K are from a geo search for the SF bay
area. Oddly, the naive javascript filter runs faster than using the
geo index - 20 seconds vs 43 sec. The results are the same.

var geo_javascript = "this.loc ? (this.loc[0] > 37.200000000000003 &&
this.loc[0] < 38.0 && this.loc[1] > -123.0 && this.loc[1] < -121.0) :
false";
var geo_filter = {'loc': {"$within": {"$box": [[37.200000000000003,
-123.0], [38.0, -121.0]]}}};
> db.commits.find(geo_javascript).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 955831,
"nscannedObjects" : 955831,
"n" : 73774,
"millis" : 19962,
"indexBounds" : {

}
}
> db.commits.find(geo_filter).explain()
{
"cursor" : "GeoBrowse-box",
"nscanned" : 73774,
"nscannedObjects" : 73774,
"n" : 73774,
"millis" : 42900,
"indexBounds" : {

}
}

This order flips, and the indexed version is faster, when I use a 100K-
entry subset of the same data (with about the same proportion of
entries matching the query):

> use processed100k
switched to db processed100k
> db.commits.find(geo_javascript).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 100000,
"nscannedObjects" : 100000,
"n" : 8588,
"millis" : 2911,
"indexBounds" : {

}
}
> db.commits.find(geo_filter).explain()
{
"cursor" : "GeoBrowse-box",
"nscanned" : 8588,
"nscannedObjects" : 8588,
"n" : 8588,
"millis" : 214,
"indexBounds" : {

}
}

I converted the query to use geoNear, but since I can't get back a
cursor, the returned data set ends up too large somewhere between 20K
and 40K results:
> db.runCommand({geoNear:"commits", near:[37.600000000000001, -122.0], num:100000, maxDistance: 1.0770329614269003}).results.length
Sun Nov 21 02:28:13 uncaught exception: error {
"$err" : "Invalid BSONObj spec size: 28385338 (3A20B101) first
element:ns: \"processed.commits\" ",
"code" : 10334
}

... so I can't use geoNear.

As for using the regular find() with a circular bounds box, I can't
seem to change the number of values to return from 100:
> db.commits.find({"loc": {"$near": [37.600000000000001, -122.0], "$maxDistance": 1.0770329614269003}}).count()
100

including '$num' and 'num' as parameters returns nothing.

I'm not the first one with this problem:

http://stackoverflow.com/questions/3889601/mongodbs-geospatial-index-how-fast-is-it

Any ideas as to how to get reasonable geo performance with larger
collections?

I wonder if it's some aspect of my data that is made worse at large
sizes - there are lots of duplicate lat/long pairs, plus about half of
the entries don't have a loc field. For a query search to occur, must
all entries in a collection have the query fields defined?

I guess an ugly workaround would be to make a separate locations
collection with pointers back to the larger commits collection, to
reduce the search size by only having unique locations.

All the numbers are from a months-old MacBook with 4G RAM, 4x2.4 GHz
cores, and an SSD, running mongo 1.6.3 installed via brew.

Thanks,
Brandon

Eliot Horowitz

unread,

Nov 21, 2010, 10:15:35 AM11/21/10

to mongod...@googlegroups.com, mongodb-user, Evan Rosenfeld, Eli Marschner

Can you try 1.7.3? There were a lot
Of speed improvements to $box

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Brandon Heller

unread,

Nov 22, 2010, 2:10:57 AM11/22/10

to mongodb-user

No change from before, running v1.7.3:

> db.commits.find(geo_javascript).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 955831,
"nscannedObjects" : 955831,
"n" : 73774,

"millis" : 20835,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,

"indexBounds" : {

}
}
> db.commits.find(geo_filter).explain()
{
"cursor" : "GeoBrowse-box",
"nscanned" : 73774,
"nscannedObjects" : 73774,
"n" : 73774,

"millis" : 42026,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

On Nov 21, 7:15 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Can you try 1.7.3? There were a lot
> Of speed improvements to $box
>

> >http://stackoverflow.com/questions/3889601/mongodbs-geospatial-index-...

Brandon Heller

unread,

Nov 22, 2010, 2:28:12 AM11/22/10

to mongodb-user

One more thing: w/1.7.3, distinct() doesn't seem to work any more.
Field 'loc' is an [lat, long] array, while 'location' is a string.

What does 'scan and order required' mean?

> db.runCommand({distinct: 'commits', key: 'location'}).values.length
663
> db.runCommand({distinct: 'commits', key: 'loc'})
{
"errmsg" : "exception: best guess plan requested, but scan and order
required: query: {} order: { loc: \"2d\" } choices: { $natural: 1 } ",
"code" : 13284,
"ok" : 0

Eliot Horowitz

unread,

Nov 22, 2010, 7:26:29 AM11/22/10

to mongod...@googlegroups.com

Can you attach the data and query to a Jura case

Brandon Heller

unread,

Nov 22, 2010, 5:14:18 PM11/22/10

to mongodb-user

Sure, added http://jira.mongodb.org/browse/SERVER-2134

Also made the distinct() error easier to replicate and added
http://jira.mongodb.org/browse/SERVER-2135

-Brandon

On Nov 22, 4:26 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Can you attach the data and query to a Jura case
>

Reply all

Reply to author

Forward