Re: Why does MongoDB's geo-spacial querying uses N-vector formula instead of Haversine?

359 views
Skip to first unread message

Benoit Clennett-Sirois

unread,
Oct 6, 2012, 3:20:17 PM10/6/12
to mongod...@googlegroups.com
Also I just benchmarked both formulas and Haversine is actually significantly faster... so what am I missing?

On Saturday, October 6, 2012 2:14:39 PM UTC-4, Benoit Clennett-Sirois wrote:
Hi,

I'm not sure which is the "right" way, however MongoDB's method to calculate the arc distance between two geo-spacial coordinates returns results that are quite different from the "Haversine" method, which I *though* was more accurate.

Here's a sample of coordinates with the calculated distance using MongoDB's current implementation, compared with haversine formula:

As you will see, they are quite different, and it makes me wonder: Which implementation is the most accurate? If it's haversine, why isn't MongoDB using that instead?

[
    {
    "p1":[45.521046,-73.585507],
    "p2":[45.511608,-73.575414],
    "haversine":1.3114036503289386,
    "n-vector":1.160834264227145
    },
    {
    "p1":[45.521046,-73.585507],
    "p2":[45.483326,-73.5976875],
    "haversine":4.300355589028358,
    "n-vector":1.7994981532084626
    },
    {
    "p1":[45.521046,-73.585507],
    "p2":[45.582284,-73.582404],
    "haversine":6.813640270700366,
    "n-vector":1.955079712212976
    },
    {
    "p1":[45.521046,-73.585507],
    "p2":[45.485694,-73.559039],
    "haversine":4.439292229824245,
    "n-vector":3.1460695586348666
    },
    {
    "p1":[45.521046,-73.585507],
    "p2":[45.505948,-73.553786],
    "haversine":2.987903093989977,
    "n-vector":3.5590344926244413
    },
    {
    "p1":[45.521046,-73.585507],
    "p2":[45.574233,-73.553982],
    "haversine":6.4033904891560995,
    "n-vector":3.884098493891246
    },
    {
    "p1":[45.521046,-73.585507],
    "p2":[45.544309,-73.620963],
    "haversine":3.783968871027327,
    "n-vector":4.009577729156416
    },
    {
    "p1":[45.521046,-73.585507],
    "p2":[45.601754,-73.621931],
    "haversine":9.411674219140682,
    "n-vector":4.777158708086106
    },
    {
    "p1":[45.521046,-73.585507],
    "p2":[45.627458,-73.616601],
    "haversine":12.077449145883097,
    "n-vector":4.807680946753732
    }
]


FYI, these are the functions I used to come-up with these numbers:

function haversine(point1, point2) {
    var lat1 = point1[0];
    var lat2 = point2[0];
    var lon1 = point1[1];
    var lon2 = point2[1];

    // Haversine formula (I think)
    var dLat = (lat2-lat1).toRad();
    var dLon = (lon2-lon1).toRad();
    var lat1 = lat1.toRad();
    var lat2 = lat2.toRad();

    var a = Math.sin(dLat/2) * Math.sin(dLat/2) +
        Math.sin(dLon/2) * Math.sin(dLon/2) * Math.cos(lat1) * Math.cos(lat2);
    var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
    return 6371 * c;
}

function nvector(point1, point2) {
    var p1_x = point1[0].toRad();
    var p1_y = point1[1].toRad();
    var p2_x = point2[0].toRad();
    var p2_y = point2[1].toRad();

    var sin_x1 = Math.sin(p1_x);
    var cos_x1 = Math.cos(p1_x);
    var sin_y1 = Math.sin(p1_y);
    var cos_y1 = Math.cos(p1_y);
    var sin_x2 = Math.sin(p2_x);
    var cos_x2 = Math.cos(p2_x);
    var sin_y2 = Math.sin(p2_y);
    var cos_y2 = Math.cos(p2_y);

    var cross_prod = (cos_y1 * cos_x1 * cos_y2 * cos_x2) + (cos_y1 * sin_x1 * cos_y2 * sin_x2) + (sin_y1 * sin_y2);

    if (cross_prod >= 1.0 || cross_prod <= -1.0) {
    if (cross_prod > 0.0) {
        return 0.0;
    } else {
        return Math.PI;
    }
    }

    return Math.acos(cross_prod) * 6371;
}

William Zola

unread,
Oct 9, 2012, 12:37:55 AM10/9/12
to mongod...@googlegroups.com
Hi Benoit!

MongoDB uses GeoHashing to implement the geospatial distances.  You can read a detailed description of the algorithm here:
 - http://www.kchodorow.com/blog/2011/06/08/mongo-in-flatland/

MongoDB stores all geospatial coordinates as if there were 2-d, and applies a correction to return radians: see below for details

 - https://github.com/mongodb/mongo/blob/master/src/mongo/db/geo/2d.cpp#L2909
 - https://github.com/mongodb/mongo/blob/master/src/mongo/db/geo/2d.cpp#L2128

As to why?  I wasn't there for the decision, but I assume that the decision was made that geohashing would work better for large data sets.

I hope this answers your question.  Have a great day!

 -William

Wes Melton

unread,
Dec 16, 2014, 12:09:38 PM12/16/14
to mongod...@googlegroups.com
So I know this is a older thread, but for anyone who lands on this page from a web search, it's worth noting that GeoHashing was implemented intentionally for 2-d point calculations. Using the OP's example, it should be a much fairer comparison to Haversine to use MongoDB's nearSphere function which assumes you're using a real world (round) map. Documentation here - http://docs.mongodb.org/manual/reference/operator/query/nearSphere/
Reply all
Reply to author
Forward
0 new messages