Problems with Geohash / Inaccuracy / ow to deal with this ?

360 views
Skip to first unread message

Pierre

unread,
Sep 3, 2008, 4:23:35 PM9/3/08
to Google App Engine
Hi,

I need to find points (lat/lng) within a certain bounding box. To do
so, I tried Geohash python implementation (http://mappinghacks.com/
code/geohash.py.txt).

Problem's data is :
- I have 2 points (South west SW corner / North east NE corner)
describing a geographic Bounding box
- I have a model (let's call it 'Poi') having a field "geohash" of
type Db.StringProperty + Lat and Lng field (Float)

Question is : How on earth (lol :-) ) can I find all Poi within my
bounding box ?

I've already tried something like :

Create Geohash for SW, create Geohash for NE and do something like :

"SELECT * FROM Poi WHERE geohash >:1 and geohash <= :2, geohash(SW),
geohash(NE)"

Also tried with geoindex, at different depth but all this fails and
still give points not located within Bounding box.
Also tried like describe in "http://labs.metacarta.com/blog/27.entry/
geographic-queries-on-google-app-engine/" to create a small
surrounding box for each Poi and then create a geohash for this
specific bounding box to be in between my SW->NE bounding box....

Nothing is working even with "not worst case" (I mean equatorial
problem...) case/

Any help will be greatly appreciated.

Thanks all,

Pierre

Ubaldo Huerta

unread,
Sep 4, 2008, 6:33:23 AM9/4/08
to Google App Engine
Do you have a example of a point whose geohash value isn't greater
than SW, NE geohashes?

I've tried, in the python interpreter, many examples, and haven't
observed what you're describing. It's pretty scary because I'm writing
a lot of code for a geo app that's based on that property of
geohashes, described at http://labs.metacarta.com/blog/27.entry/,
working. I knew about equator, etc cases needed special handling, but
I'm not too concerned.

-U

Pierre

unread,
Sep 4, 2008, 6:59:35 AM9/4/08
to Google App Engine
Hi,

Here's an example :
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import geohash
>>> hashSW = geohash.Geohash((1.5765380859375,42.894076403348976))
>>> hashNE = geohash.Geohash((6.1907958984375,44.3002644115815))
>>> hashPoi = geohash.Geohash((3.867915,43.733638))
>>> hashPoi > hashSW
False
>>> hashPoi < hashNE
True

With geoIndex :
>>> hashSW = geohash.Geoindex((1.5765380859375,42.894076403348976))
>>> hashNE = geohash.Geoindex((6.1907958984375,44.3002644115815))
>>> hashPoi = geohash.Geoindex((3.867915,43.733638))
>>> hashPoi > hashSW
False
>>> hashPoi < hashNE
True

I might be missing something but I thought : hash(SW) < hashPoi <
hash(NE)

How do you use it and do you find something different ?

Regards,

Pierre

On 4 sep, 12:33, Ubaldo Huerta <uba...@gmail.com> wrote:
> Do you have a example of a point whose geohash value isn't greater
> than SW, NE geohashes?
>
> I've tried, in the python interpreter, many examples, and haven't
> observed what you're describing. It's pretty scary because I'm writing
> a lot of code for a geo app that's based on that property of
> geohashes, described athttp://labs.metacarta.com/blog/27.entry/,

Adam

unread,
Sep 4, 2008, 1:31:47 PM9/4/08
to Google App Engine
Should you be using:

str(hashPoi) > str(hashSW)

or Geostring instead?

A

Pierre

unread,
Sep 4, 2008, 3:06:44 PM9/4/08
to Google App Engine

Yep Adam, you're right.
Turning into string seems to turn false to true. Good point but it
doesn't solve my troubles completely.

I try to copy paste some code and output to show you some weird
things.

Regards,

Pierre

Pierre VANNIER

unread,
Sep 4, 2008, 3:10:24 PM9/4/08
to google-a...@googlegroups.com
Pierre a écrit :
Here is some code + corresponding output, you will see that it retrieves a point located outside bbox :

geoHash1 = geohash.Geohash((float(point[1]), float(point[0])))
geoHash2 = geohash.Geohash((float(point[3]), float(point[2])))
logging.info(" SW : "+str(point[0])+" "+str(point[1])+"\n\t\t"+str(geoHash1))
logging.info(" NE : "+str(point[2])+" "+ str(point[3])+"\n\t\t"+str(geoHash2))
pois = db.GqlQuery("Select * from Poi WHERE geoHash >:1 and geoHash <:2",str(geoHash1),str(geoHash2))
pois1 = Poi.all()
logging.info("Points already in the datastore : ")
for poi in pois1:
   logging.info(" Geohash poi "+poi.geoHash+" poiKey : "+str(poi.key())+" lat : "+str(poi.lat)+" Lng : "+str(poi.lng)+" ")
   logging.info("-------------------------------------------------------")

logging.info("Points for the given Bbox : ("+str(point[0])+","+str(point[1])+")->("+str(point[2])+" "+ str(point[3])+")")
if pois.count() != 0:
  for poi in pois:
     logging.info(poi.geoHash+" poiKey : "+str(poi.key())+" lat : "+str(poi.lat)+" lng : "+str(poi.lng))


************************************************************************************
And now, let's have a look at some output when draging map around and trying to see if point disapear both from screen and from query result when out of the (SW, NE) bbox :

INFO:root: SW : 43.26720631662829 4.61151123046875
                speq0uw18pb50
INFO:root: NE : 44.18220395771566 8.12713623046875
                spvg2gn5bpb08
INFO:root:Points already in the datastore :
INFO:root: Geohash poi spfb018s0yn6w poiKey : ag9hbmFseXRpY3NtYXNodXByCQsSA1BvaRgBDA lat : 43.602709 Lng : 3.867915
INFO:root:-------------------------------------------------------
INFO:root:Points for the given Bbox : (43.26720631662829,4.61151123046875)->(44.18220395771566 8.12713623046875)
INFO:root:spfb018s0yn6w poiKey : ag9hbmFseXRpY3NtYXNodXByCQsSA1BvaRgBDA lat : 43.602709 lng : 3.867915

*********************
Clearly : the unique point should be OUTSIDE the bbox thus not return in output "Points for the given Bbox". So, Am I doing something wrong either with datastore / GQL or geohash ?

Thanks.

Pierre

Adam

unread,
Sep 4, 2008, 5:02:32 PM9/4/08
to Google App Engine
No idea really.

Though assuming

sw < poi < ne

for all poi in the bbox, doesn't imply there are no poi outside of the
bbox which don't also satisfy this.

The implications of this:

sw = geohash.Geohash('speq0uw18pb50')
ne = geohash.Geohash('spvg2gn5bpb08')

bb = sw + ne
bb.bbox()
(0.0, 39.375, 11.25, 45.0)

are interesting.

A


Pierre

unread,
Sep 4, 2008, 5:35:38 PM9/4/08
to Google App Engine
Thanks for your reply Adam.

That's true "doesn't imply there are no poi outside of the
> bbox which don't also satisfy this.".

But, then, it means this heuristic : "SW Geohash / NE geohash then GQL
for points > to SW and < to NE" will retrieve points also located
outside bounding box ...
What is it then ? are we looking for a Minimum bounding rectangle ?
But this program returns also outside points... where is the magic ?
Why does everybody spring around geohash then ?
This heuristic returns an exact set of points located in a bounding
box as you could say a car cost between 100 and 10.000 dollars...No
precision at all.

So, I thought I missunderstood how to use geohash but it seems it's
not the case.

Strange.

Is there any geohash master over there to explain if yes or no it is
possible to be precise for points located in a bounding box with
geohash and gae ?

Regards,

Pierre

Mark

unread,
Sep 5, 2008, 3:55:52 PM9/5/08
to Google App Engine
Hi Pierre, take a look at the following link:

http://github.com/davetroy/geohash-js/tree/master/README

Creating an arbitrary bounding box may overlap Geohash 'fault lines'
which can be very non-contiguous in terms of sorting.

Pierre VANNIER

unread,
Sep 5, 2008, 5:53:51 PM9/5/08
to google-a...@googlegroups.com
Mark a écrit :
Thanks mark for this link.

To conclude, correct me if I'm wrong :
- Geohash is good to find a given Poi's neighborhood
- Geohash cannot be precise enough to retrieve points in a given Box even if this box is geohash coded

My primary goal was to load some Poi on google map. As per performance issues, I'd have liked to load solely Pois located in the visible area of the screen and fetch new Poi as the user is draging the map.

This seems to be a bad idea and I'm going to look for another method.

Thanks Folks for your help.

Pierre

ready...@gmail.com

unread,
Sep 9, 2008, 2:58:50 AM9/9/08
to Google App Engine
I've got a better solution than geohashing.

I break down the grid into sub-degree squares by truncating after the
first decimal point, and then when I save something that I need to
find on the map later, I save metadata with that point indicating the
surrounding grid squares.

So if I have a point with long -122.123123123 and lat 35.56565, that's
in a grid square called -122.1x35.5, and it's surrounded as follows:

[-122.2x35.6][-122.1x35.6][-122.0x35.6]
[-122.2x35.5][-122.1x35.5][-122.0x35.5]
[-122.2x35.4][-122.1x35.4][-122.0x35.4]

Those are all represented in my object as a list
(db.StringListProperty, so you have to do the right permutations to
make them into strings), and because of the way lists work, if a point
that you're searching on is in any of the grid squares associated with
a saved point, that saved point will come up.

To wit, if you have saved that above point, and someone comes in
searching on an address that corresponds to LONG -122.0857 LAT
35.69999, that corresponds to grid square '-122.0x35.6', which is in
your upper right hand corner. Thus, if you search for something like:

square = '-122.0x35.6'
points = (SELECT * FROM Locations WHERE gridList=:1", square)

...you'll find that the original point we saved above will return.

It's all about metadata. Don't think in terms of inequalities and
boundary conditions, think in terms of inclusive ranges.

Best,


On Sep 3, 1:23 pm, Pierre <pierre.vann...@gmail.com> wrote:
> Hi,
>
> I need to find points (lat/lng) within a certain bounding box. To do
> so, I triedGeohashpython implementation (http://mappinghacks.com/
> code/geohash.py.txt).
>
> Problem's data is :
> - I have 2 points (South west SW corner / North east NE corner)
> describing a geographic Bounding box
> - I have a model (let's call it 'Poi') having a field "geohash" of
> type Db.StringProperty + Lat and Lng field (Float)
>
> Question is : How on earth (lol :-) ) can I find all Poi within my
> bounding box ?
>
> I've already tried something like :
>
> CreateGeohashfor SW, createGeohashfor NE and do something like :
>
> "SELECT * FROM Poi WHEREgeohash>:1 andgeohash<= :2,geohash(SW),geohash(NE)"
>
> Also tried with geoindex, at different depth but all this fails and
> still give points not located within Bounding box.
> Also tried like describe in "http://labs.metacarta.com/blog/27.entry/
> geographic-queries-on-google-app-engine/" to create a small
> surrounding box for each Poi and then create ageohashfor this

Pierre

unread,
Sep 9, 2008, 12:23:24 PM9/9/08
to Google App Engine
Hi readyassist,

Thank you for your post.
Trying to understant your algorithm, It seems to me that given a
bounding box defined by to points SW and NE, your heuristic can also
retrieve points outside the box, ?
If I understand what you say, you defined a sort of "grid" surrounding
each recorded points. Then, with inclusion process of lists, you can
tell whether a grid of a point is located in the grid of another
point ?

With this process you cannot deal with overlapping grids of points
which could be "relatively" far from each other, can you ?

Or I might be copletely wrong also ;-)

Regards,

Pierre

On Sep 9, 8:58 am, "readyass...@gmail.com" <readyass...@gmail.com>
wrote:

ready...@gmail.com

unread,
Sep 9, 2008, 4:13:07 PM9/9/08
to Google App Engine
Sure; I don't see why you couldn't conceivably have multiple lists to
get varying distances. I just talked about having one 3x3 grid of
squares, each being 0.1 deg x 0.1 deg, but you could also have
something larger, either with more squares (5x5, 10x10), or with
larger squares (0.15 degrees each direction)

The idea is that when you figure out what box you're in, you also want
to now what your neighboring boxes are, because if you only search in
your box, and if you're near one of the boundaries of that box (i.e.
your search point is on the southwest corner), then results from the
far end (say, the northeast corner) may be less relevant than other
results which are just over the boundary in the next box. So when you
store a point, you store its own box (B1), and you also say, "all
these boxes are neighbors (N1-Nx)", so you end up with something like
this overlaid on a map:

[N1][N2][N3]
[N4][B1][N5]
[N6][N7][N8]

When someone does a search, the point they are searching from is
normalized, and if it turns out that the new point is in any of those
boxes--either the originally-stored point's box or any of it's
neighbors--then that originally-stored point comes up in the query.

There's no reason you couldn't get more creative..
[N9]
[N1][N2][N3]
[N12][N4][B1][N5][N10]
[N6][N7][N8]
[N11]

And so-on...

-B

Pierre

unread,
Sep 9, 2008, 4:39:59 PM9/9/08
to Google App Engine
Hi Ben ("readyassist"... I hate the way they truncate my email
address!) :-)

You're definitively right. The probelm is :
What is the problem ?
Problem is to find neighboor points ?
Problem is to find points in a MBR ? aka Minimum Bounding Rectangle

Those 2 problems are different and implies 2 different answers.
First one can be approach by your smart heuristic
second one implies "exact" data that could be achieved other spatial
algorith such as R-tree.(IMHO)

Question is : Do we have enough tools to implement some kind of ("more
precise than geohash algorithm") algorithm to solve MBR with google
app engine ?

Kind regards,

Pierre

On Sep 9, 10:13 pm, "readyass...@gmail.com" <readyass...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages