[OSM-dev] Fwd: Re: OSM and MongoDB

4 views
Skip to first unread message

Nolan Darilek

unread,
Apr 12, 2011, 4:39:39 PM4/12/11
to OpenStreetMap Development
Oopse, meant for this to go to the whole list.



-------- Original Message --------
Subject: Re: [OSM-dev] OSM and MongoDB
Date: Tue, 12 Apr 2011 15:26:41 -0500
From: Nolan Darilek <no...@thewordnerd.info>
To: Ian Dees <ian....@gmail.com>


I had/am having a somewhat bad experience storing OSM data in MongoDB.

Initially I stored all map data in MongoDB, but queries took ages. The same queries that happen in 100-200 MS now often took nearly a second. Additionally, some took upwards of 5, and I even found spots on my map sparsely populated with points, but which reliably performed the queries I need in 30+ seconds.

I filed a thorough bug in their tracker, including a dataset and queries that reliably duplicated the issue. It was marked wontfix, I abandoned MongoDB, and it was apparently re-opened and fixed several months later. So perhaps it's a non-issue now.

I'm still using MongoDB for part of my current project, user POI storage. It does indeed use geohashes, and I'm experiencing strange accuracy issues. My platform is pedestrian navigation with many small distance queries. Points in the non-MongoDB dataset are reliably detected in a radius roughly 100 meters around the traveler. Points in MongoDB queried with the same bounding boxes don't appear until they're within 30-40 meters. I recently updated from an older version to a new build of 1.8. The older version widely varied the detection range. Some points were detected 100 or so meters out, while others weren't picked up until 30 or so. It was always the same points, too. The point for my apartment remains reliably visible for ~100 meters or so, while the corner store and restaurant didn't appear until I was very close. 1.8 at least appears to be consistent, always detecting at 30 meters or so. I can only assume that this is a geohash oddity that only appears for very small differences, something that works out to rounding error for larger values.

I like MongoDB for many things, but not for geospatial data more complicated than a series of points. I'm working on migrating user/POI storage to a geospatial store.


On 04/12/2011 01:20 PM, Ian Dees wrote:
Yep, and I think Mongo uses geohashes as their index behind the scenes. One of the problems with that, though, is they have some arbitrary length that they compute the geohash to and when you have lots of points (as OSM data does) the buckets they're searching are very full.

On Tue, Apr 12, 2011 at 1:00 PM, Steve Coast <st...@asklater.com> wrote:
bbox queries using the built in spatial indexing presumably? OSM has it's own magical bitmask for that, that may also be as fast in mongo, who knows.


On 4/11/2011 5:58 PM, Ian Dees wrote:
On Mon, Apr 11, 2011 at 6:36 PM, Sergey Galuzo <ser...@microsoft.com> wrote:

Hi,

 

I am working on evaluation of MongoDB for several storage solutions at hand. Some of them resemble current OSM editing database. I have heard that OSM dev is/was evaluating MongoDB also. I was wondering whether it possible to share the findings?

 


In my experimentation with MongoDB (seen here: https://github.com/iandees/mongosm/) I found it to be very slow. Inserts were speedy, but bounding-box queries took a long time.

The most recent dev version of MongoDB includes "multi-location documents" support:

This would allow a single way document to be indexed at multiple locations and vastly speed up the map query.
_______________________________________________ dev mailing list

_______________________________________________
dev mailing list
d...@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


_______________________________________________ dev mailing list d...@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev

Steve Coast

unread,
Apr 12, 2011, 4:47:31 PM4/12/11
to d...@openstreetmap.org
how was the data put in the db though? 1 document per node?

Ian Dees

unread,
Apr 12, 2011, 4:50:36 PM4/12/11
to Steve Coast, d...@openstreetmap.org
Yes, one document per node/way/relation.

Steve Coast

unread,
Apr 12, 2011, 4:51:10 PM4/12/11
to Ian Dees, d...@openstreetmap.org
and using the builtin spatial index?

Ian Dees

unread,
Apr 12, 2011, 4:52:14 PM4/12/11
to Steve Coast, d...@openstreetmap.org
Yep.

Nolan Darilek

unread,
Apr 12, 2011, 4:52:45 PM4/12/11
to d...@openstreetmap.org
On 04/12/2011 03:47 PM, Steve Coast wrote:
how was the data put in the db though? 1 document per node?



Yes, with deeper structures for ways and relations.

Steve Coast

unread,
Apr 12, 2011, 4:56:58 PM4/12/11
to Ian Dees, d...@openstreetmap.org
Interesting.

How efficient is the (big)int indexing and/or masking?

Was this all on a single machine?

Ian Dees

unread,
Apr 13, 2011, 9:52:51 AM4/13/11
to Steve Coast, d...@openstreetmap.org
On Tue, Apr 12, 2011 at 3:56 PM, Steve Coast <st...@asklater.com> wrote:
Interesting.

How efficient is the (big)int indexing and/or masking?

I haven't had a chance to look at the integer indexing/masking. If I remember it from discussions on dev a long while ago I think it's very close to geohashes.
 

Was this all on a single machine?

Yes.

Greg Studer

unread,
Apr 13, 2011, 10:35:35 AM4/13/11
to d...@openstreetmap.org
MongoDB does use a geohash as the indexing method for geo-searches, but
pretty sure that's not the cause of the huge query times. The
geohashing tends to be very fast, but the way points were buffered for
return in pre-1.9 releases could in particular point distributions cause
these slowdowns - I'm guessing the neighboring boxes had many more
points.

Exact point checks and distances are also being introduced in 1.9, so
when/if the hash isn't precise enough to complete your search, you
shouldn't get these types of inaccurate results (the hash is currently
tunable to 32 bits of precision). Of course, these are all new
developments (along with polygon searches and multi-location documents),
geo-indexing has gotten a lot of attention as of late.

disclaimer: as per my email address, I work at 10gen on MongoDB

Andreas Scheucher

unread,
Apr 13, 2011, 3:35:51 PM4/13/11
to Ian Dees, d...@openstreetmap.org
hi,

some weeks ago, i got interested in NoSQL datababase products. I had no experience with them up to now, but as it was a requirement for an job, I started to read about apache cassandra and thougth, this would be interesting for openstreetmaps.

up to now my findings are only theoreticaly, but I would like to digg deeper, when I find time.

But one think I wonder about is, you tested it on one machine. Isn't it like that, you need several nodes and loads of data to really benefit from NoSQL databases? At least this was my understanding of the whole thing...

greets,
Andreas

2011/4/13 Ian Dees <ian....@gmail.com>

Ian Dees

unread,
Apr 13, 2011, 3:44:13 PM4/13/11
to Andreas Scheucher, d...@openstreetmap.org
On Wed, Apr 13, 2011 at 2:35 PM, Andreas Scheucher <andreas....@gmail.com> wrote:
hi,

some weeks ago, i got interested in NoSQL datababase products. I had no experience with them up to now, but as it was a requirement for an job, I started to read about apache cassandra and thougth, this would be interesting for openstreetmaps.


Yep, Cassandra would be an interesting option to try. In fact many moons ago I spoke with the folks at SimpleGeo about attempting to host some OSM data there in their infrastructure. At the time they didn't support anything but point features (and had no other way of dealing with metadata) so I haven't pursued it.

Additionally, this talk they gave was quite informative and gave quite a bit of information about how they store their location data in Cassandra: http://www.youtube.com/watch?v=7J61pPG9j90
 
up to now my findings are only theoreticaly, but I would like to digg deeper, when I find time.

But one think I wonder about is, you tested it on one machine. Isn't it like that, you need several nodes and loads of data to really benefit from NoSQL databases? At least this was my understanding of the whole thing...

The purpose of multiple machines in this case is to have relatively reliable storage and multiple copies of the data on different machines, not necessarily an increase in read speed (Greg, maybe you could correct me?). Last time I looked at MongoDB seriously for OSM I imported an entire planet, so it was "loads of data" :). I have not tried a whole planet with the more recent versions, though.

Greg Studer

unread,
Apr 13, 2011, 4:41:38 PM4/13/11
to d...@openstreetmap.org
Agree, think the issue in this case definitely wasn't related to
multiple machines. In general, though, you often can do much better
performance-wise on large data sets by running queries on data subsets
across multiple systems, whatever software you use. Most NoSQL dbs try
to make this particularly easy.
Reply all
Reply to author
Forward
0 new messages