Need help speeding up spatial query

53 views
Skip to first unread message

Thamesmead

unread,
Nov 14, 2012, 9:58:45 AM11/14/12
to mongod...@googlegroups.com
Hi,

I'm really struggling to get performance out of a spatial query. Perhaps I am on the limit of what my hardware can do, but I have a feeling that there is more that can be done.

I'd really appreciate some feedback here. I've been trying to figure out how to speed this up for days to no avail, but then I am noob...

My documents contain:

{
  "l" : [ -0.0013, 51.4779 ],
  "d" : ISODate("2012-11-14T00:00:00Z"),
  "w" : ObjectId("50a292d2abddc1bd89000001"),
  "k" : 13
}

I need to query for specific w and k, ranges of d, and boxes (later polygons) containing l.

I'm querying like so:

$ db.locs.find({ l: { $within: { $box: [[lon1, lat1], [lon2, lat2]] } }, w: ObjectId("50a292d2abddc1bd89000001"), k: 13 }).explain()
{
  "cursor" : "GeoBrowse-box",
  "isMultiKey" : false,
  "n" : 11156,
  "nscannedObjects" : 11156,
  "nscanned" : 11156,
  "nscannedObjectsAllPlans" : 11156,
  "nscannedAllPlans" : 11156,
  "scanAndOrder" : false,
  "indexOnly" : false,
  "nYields" : 90,
  "nChunkSkips" : 0,
  "millis" : 1457,
  "indexBounds" : {
    "l" : [ ]
  },
  "lookedAt" : NumberLong(621631),
  "matchesPerfd" : NumberLong(459159),
  "objectsLoaded" : NumberLong(11156),
  "pointsLoaded" : NumberLong(41),
  "pointsSavedForYield" : NumberLong(0),
  "pointsChangedOnYield" : NumberLong(0),
  "pointsRemovedOnYield" : NumberLong(0)
}

Above it reports 1457 milliseconds. The logs show the queries sometimes take up to 7 seconds. In any event, 1.4 seconds seems a little slow to me. Is it?

There are 623k documents in the collection (I get only a tiny improvement if I trim that down to 10% of the size):

$ db.locs.stats()
{
  "ns" : "development.locs",
  "count" : 623129,
  "size" : 85332600,
  "avgObjSize" : 136.94210990019724,
  "storageSize" : 123936768,
  "numExtents" : 11,
  "nindexes" : 2,
  "lastExtentSize" : 37625856,
  "paddingFactor" : 1,
  "systemFlags" : 1,
  "userFlags" : 0,
  "totalIndexSize" : 113409296,
  "indexSizes" : {
    "_id_" : 61516224,
    "l_2d_d_1_w_1_k_1" : 51893072
  },
  "ok" : 1
}

These are my indexes:

[
  {
    "v" : 1,
    "key" : {
      "_id" : 1
    },
    "ns" : "development.locs",
    "name" : "_id_"
  },
  {
    "v" : 1,
    "key" : {
      "l" : "2d",
      "d" : 1,
      "w" : 1,
      "k" : 1
    },
    "unique" : true,
    "ns" : "development.locs",
    "name" : "l_2d_d_1_w_1_k_1"
  }
]

$ version()
version: 2.2.1

Thanks for any help.

Dwight Merriman

unread,
Nov 15, 2012, 3:52:56 PM11/15/12
to mongod...@googlegroups.com
a couple thoughts

one, the query had a fairly high # of matches.  did you want all 11,000 returned?  if not, consider using limit() which *sometimes* might speed things up. 

two, your query codesn't use 'd' -- which is ok but if that is second in the index compound key, a lot may have to be checked.  you might want to try a different order in that case to the index key pattern.  the geo part has to come first though, i believe.

   "key" : {
      "l" : "2d",
      "d" : 1,
      "w" : 1,
      "k" : 1
    },

so perhaps put d last if it is used the least in queries (i.e. maybe l,w,k,d)).  things that are more selective are better to come earlier

Thamesmead

unread,
Nov 16, 2012, 6:37:43 AM11/16/12
to mongod...@googlegroups.com
Thanks for the suggestions Dwight.

I am looking to retrieve all matches I'm afraid (well, perhaps a map/reduce of them). The query paramaters should keep the result sets reasonably small, but I'm trying to develop for a bad (if not worst) case scenario.

I also tried reordering the index, but there was no significant improvement. I even tried removing the index and adding a unique on just l (L) (dropping the dups so the total number of docs was significantly reduced too). But again a query on just l wasn't what I would call acceptable.

I'm starting to think that I'm on the limit of what can be done, and will just have to reduce the matching documents and rely on the production servers having enough umph to perform.

That said, I'm confused by a result I've found that a regular index on l and a pair of $lt and $gt conditions on the lat and lng is about twice as fast as a spatial query. Surely the point of 2d indexing is that it is optimized for this?
Reply all
Reply to author
Forward
0 new messages