geocoding

205 views
Skip to first unread message

Jonathon Paarlberg

unread,
Jul 20, 2015, 10:27:50 AM7/20/15
to openr...@googlegroups.com
I'm getting an error when I use the method outlined on an openrefine tutorial on geocoding at https://code.google.com/p/google-refine/source/browse/wiki/Geocoding.wiki?=1342

Here's the code they recommend.

with(value.parseJson().results[0].geometry.location, pair, pair.lat + ", " + pair.lng)

Unfortunately, when I run it on my url's, obtained from google maps in the way they recommended, I get an error: "Error: Object does not have any field, including results"

Any help is greatly appreciated.

Thank you.

Owen Stephens

unread,
Jul 20, 2015, 10:52:24 AM7/20/15
to openr...@googlegroups.com
Hi Jonathon,

That wiki is now outdated - all the relevant materials have been moved to the new wiki at https://github.com/OpenRefine/OpenRefine/wiki

If I use the example URL on this page:


Then use the "Add Column by Fetching URLs..." function to get back the geocoding response, then run the GREL expression you quote on that response I get the lat/long as expected - hopefully you can see in this screenshot


Can you confirm that you get the geocoding response as expected, and give us any more information on what happens when you run the GREL expression?

Owen

Jonathon Paarlberg

unread,
Jul 20, 2015, 11:37:06 AM7/20/15
to openr...@googlegroups.com
I get a positive result with the example address, as I did with some of my searches. The result looks like this:
{ "results" : [ { "address_components" : [ { "long_name" : "77", "short_name" : "77", "types" : [ "street_number" ] }, { "long_name" : "Massachusetts Institute of Technology", "short_name" : "Massachusetts Institute of Technology", "types" : [] }, { "long_name" : "Massachusetts Avenue", "short_name" : "Massachusetts Ave", "types" : [ "route" ] }, { "long_name" : "MIT", "short_name" : "MIT", "types" : [ "neighborhood", "political" ] }, { "long_name" : "Cambridge", "short_name" : "Cambridge", "types" : [ "locality", "political" ] }, { "long_name" : "Middlesex County", "short_name" : "Middlesex County", "types" : [ "administrative_area_level_2", "political" ] }, { "long_name" : "Massachusetts", "short_name" : "MA", "types" : [ "administrative_area_level_1", "political" ] }, { "long_name" : "United States", "short_name" : "US", "types" : [ "country", "political" ] }, { "long_name" : "02139", "short_name" : "02139", "types" : [ "postal_code" ] }, { "long_name" : "4301", "short_name" : "4301", "types" : [ "postal_code_suffix" ] } ], "formatted_address" : "Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA", "geometry" : { "location" : { "lat" : 42.35925390000001, "lng" : -71.09313420000001 }, "location_type" : "ROOFTOP", "viewport" : { "northeast" : { "lat" : 42.36060288029151, "lng" : -71.09178521970851 }, "southwest" : { "lat" : 42.35790491970851, "lng" : -71.09448318029152 } } }, "place_id" : "ChIJ35-fmqlw44kRgOw_2dpRnJQ", "types" : [ "street_address" ] }, { "address_components" : [ { "long_name" : "77", "short_name" : "77", "types" : [ "street_number" ] }, { "long_name" : "Massachusetts Institute of Technology", "short_name" : "Massachusetts Institute of Technology", "types" : [] }, { "long_name" : "Massachusetts Avenue", "short_name" : "Massachusetts Ave", "types" : [ "route" ] }, { "long_name" : "MIT", "short_name" : "MIT", "types" : [ "neighborhood", "political" ] }, { "long_name" : "Cambridge", "short_name" : "Cambridge", "types" : [ "locality", "political" ] }, { "long_name" : "Middlesex County", "short_name" : "Middlesex County", "types" : [ "administrative_area_level_2", "political" ] }, { "long_name" : "Massachusetts", "short_name" : "MA", "types" : [ "administrative_area_level_1", "political" ] }, { "long_name" : "United States", "short_name" : "US", "types" : [ "country", "political" ] }, { "long_name" : "02139", "short_name" : "02139", "types" : [ "postal_code" ] } ], "formatted_address" : "Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA", "geometry" : { "location" : { "lat" : 42.359155, "lng" : -71.09305759999999 }, "location_type" : "ROOFTOP", "viewport" : { "northeast" : { "lat" : 42.3605039802915, "lng" : -71.09170861970848 }, "southwest" : { "lat" : 42.35780601970851, "lng" : -71.0944065802915 } } }, "place_id" : "ChIJv9X2bqpw44kR_ax_lASXUxo", "types" : [ "street_address" ] }, { "address_components" : [ { "long_name" : "77 Mass Ave - MIT", "short_name" : "77 Mass Ave - MIT", "types" : [ "bus_station", "transit_station", "point_of_interest", "establishment" ] }, { "long_name" : "Massachusetts Institute of Technology", "short_name" : "Massachusetts Institute of Technology", "types" : [] }, { "long_name" : "MIT", "short_name" : "MIT", "types" : [ "neighborhood", "political" ] }, { "long_name" : "Cambridge", "short_name" : "Cambridge", "types" : [ "locality", "political" ] }, { "long_name" : "Middlesex County", "short_name" : "Middlesex County", "types" : [ "administrative_area_level_2", "political" ] }, { "long_name" : "Massachusetts", "short_name" : "MA", "types" : [ "administrative_area_level_1", "political" ] }, { "long_name" : "United States", "short_name" : "US", "types" : [ "country", "political" ] }, { "long_name" : "02142", "short_name" : "02142", "types" : [ "postal_code" ] } ], "formatted_address" : "77 Mass Ave - MIT, Massachusetts Institute of Technology, Cambridge, MA 02142, USA", "geometry" : { "location" : { "lat" : 42.359254, "lng" : -71.093667 }, "location_type" : "APPROXIMATE", "viewport" : { "northeast" : { "lat" : 42.3606029802915, "lng" : -71.0923180197085 }, "southwest" : { "lat" : 42.3579050197085, "lng" : -71.09501598029151 } } }, "place_id" : "ChIJF39Eb6pw44kRKtttPE9nyMQ", "types" : [ "bus_station", "transit_station", "point_of_interest", "establishment" ] } ], "status" : "OK" }

As you can see, the latitude and longitude are included in the result. So the problem seems to be the parsing script.


with(value.parseJson().results[0].geometry.location, pair, pair.lat + ", " + pair.lng)

It ghrows the error "Object does not have any field, including results".

Owen Stephens

unread,
Jul 20, 2015, 12:23:36 PM7/20/15
to openr...@googlegroups.com
That error "Object does not have any field, including results" sounds like the value you are trying to parse as JSON doesn't have any contents. Are you sure that the URLs in question successfully retrieve some JSON results?

Can you share and example of a URL and/or JSON that doesn't work?

Owen

Jonathon Paarlberg

unread,
Jul 20, 2015, 12:43:40 PM7/20/15
to openr...@googlegroups.com
The error occurs on the test of the URL which was provided as an example. The JSON I quoted is an example of one that triggers the error. I can't post actual results from the searches I'm doing as I'm working with sensitive clinical data.

Owen Stephens

unread,
Jul 20, 2015, 2:51:10 PM7/20/15
to openr...@googlegroups.com

Hi,

I'm struggling to recreate this error. If I paste the JSON you quoted into a cell, then use 'Edit cells'->'Transform' and paste in the formula then it successfully gets me the Latitude and Longitude.

Can you supply a screenshot of the URL/JSON you quoted failing to give the lat/long?

Owen

Owen Stephens

unread,
Jul 20, 2015, 2:54:35 PM7/20/15
to openr...@googlegroups.com
Just to check, are you using the 'Add column'->'Add column by fetching URLs' to get the JSON before applying the 'with(value.parseJson().results[0].geometry.location, pair, pair.lat + ", " + pair.lng)' expression?

Owen

Jonathon Paarlberg

unread,
Jul 20, 2015, 6:22:14 PM7/20/15
to openr...@googlegroups.com
Thanks, Owen.

Yes, I added the column by fetching the URL's as the tutorial said to do.

Owen Stephens

unread,
Jul 21, 2015, 3:48:29 AM7/21/15
to openr...@googlegroups.com
Hi Jonathon,

I can't recreate that error - if there is anything more you can give us on what is happening here that might help someone work out the problem. Is there any chance of a screenshot of what happens when you attempt to use the transformation in the GREL expression box?

Owen

Jonathon Paarlberg

unread,
Jul 21, 2015, 11:00:22 PM7/21/15
to OpenRefine
I'll give it a try tomorrow. Geocoding isn't really high on our priority list at the moment, as it turns out.

Jonathon Paarlberg

unread,
Jul 22, 2015, 10:16:48 AM7/22/15
to OpenRefine
O.k., I apologize. It must have been a typo or something -- although no syntax errors were caught. When I proceeded by cutting and pasting the code rather than typing it in manually, everything worked as expected.

Thanks for your patience.

Owen Stephens

unread,
Jul 22, 2015, 10:24:22 AM7/22/15
to OpenRefine
Great - glad it's working

Owen

Tom Morris

unread,
Jul 22, 2015, 10:29:37 AM7/22/15
to openr...@googlegroups.com
Glad you got it resolved -- and thanks to Owen for stepping up to help.

In debugging things like this, the best approach is to proceed incrementally to get a better view of where the error is occurring, e.g.
- fetch JSON to a separate column and make sure there are no errors
- build GREL expression in a piecewise fashion, either saving the intermediary results in their own separate columns for verification or, at a minimum, confirming in the preview that they are error free and produce the data that you expect.

Tom

p.s. Not to be a killjoy, but note that Google's Terms of Service (unless they've changed them recently) limit use of their geocoding API to preparing data to display on one of their maps

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathon Paarlberg

unread,
Jul 22, 2015, 10:50:29 AM7/22/15
to OpenRefine
This is just a question that's related to geocoding. I was wondering if you can explain exactly what this code is doing to the Json that is collected when I fill a column based on output from Google's geocoding service.



with(value.parseJson().results[0].geometry.location, pair, pair.lat +", " + pair.lng)

I was thinking that, ideally, since sometimes there are several different results expressed in the Json, it would be nice to aggregate them somehow or maybe triangulate a geographical average that would be a better approximation most of the time.

Do you think I should start a separate topic for this question?

Thanks.


On Tuesday, July 21, 2015 at 3:48:29 AM UTC-4, Owen Stephens wrote:

Jonathon Paarlberg

unread,
Jul 22, 2015, 10:58:52 AM7/22/15
to OpenRefine
Thanks for the advice.

Regarding the Google API, I'm really just trying it out with a few locations. (Less than 25) If the script for geocoding works well, and if we eventually need to geocode a large number of sites, then we'll have to purchase a geocoding service. Alternatively, I may try out MapQuest's open geocoder; I only used Google because that was what was given to me in the example.

As it stands, the locations I have are few enough that I can use Google Maps to find exact addresses or highly likely addresses and then feed those into the geocoding script OR just copy and paste from the URL that's output from Google Maps at the location that I feel is the best candidate for the site. That way will be more time consuming but more accurate.

Matthew Blissett

unread,
Jul 23, 2015, 6:24:13 AM7/23/15
to openr...@googlegroups.com
Explained in English prose,

with(value.parseJson().results[0].geometry.location, pair, pair.lat +", " + pair.lng)

"Parse the value as JSON, and look for 'results', which is a list.  Take the 'zeroth' (first) element of the list.  Take the 'geometry' property.  Take the 'location' property.  Call its value 'pair'.

Return the value created by joining pair.lat and pair.long, with ', ' in the middle."

So it is returning coordinates in the way people normally read them.


The "with" bit is providing an alias for the long expression, to avoid writing:

value.parseJson().results[0].geometry.location.lat +", " + value.parseJson().results[0].geometry.location.lng

which is equivalent.

Matt
--

Jonathon Paarlberg

unread,
Jul 23, 2015, 2:09:23 PM7/23/15
to OpenRefine, M.Bli...@kew.org

Thank you, Matt.

Right, so the first element in the array, [0], will only capture the first result if Google's geocoding service actually lists several possibilities.

The output I get from the geocoder API can look like this, for example:

{ "results" : [ { "address_components" : [ { "long_name" : "Thousand Hills Drive", "short_name" : "Thousand Hills Dr", "types" : [ "route" ] }, { "long_name" : "Drummond", "short_name" : "Drummond", "types" : [ "sublocality_level_1", "sublocality", "political" ] }, { "long_name" : "Outer West Durban", "short_name" : "Outer West Durban", "types" : [ "locality", "political" ] }, { "long_name" : "Durban Metro", "short_name" : "Durban Metro", "types" : [ "administrative_area_level_2", "political" ] }, { "long_name" : "KwaZulu-Natal", "short_name" : "KZN", "types" : [ "administrative_area_level_1", "political" ] }, { "long_name" : "South Africa", "short_name" : "ZA", "types" : [ "country", "political" ] }, { "long_name" : "3626", "short_name" : "3626", "types" : [ "postal_code" ] } ], "formatted_address" : "Thousand Hills Drive, Outer West Durban, 3626, South Africa", "geometry" : { "bounds" : { "northeast" : { "lat" : -29.748753, "lng" : 30.70189599999999 }, "southwest" : { "lat" : -29.7572126, "lng" : 30.6884316 } }, "location" : { "lat" : -29.7515822, "lng" : 30.6928772 }, "location_type" : "GEOMETRIC_CENTER", "viewport" : { "northeast" : { "lat" : -29.748753, "lng" : 30.70189599999999 }, "southwest" : { "lat" : -29.7572126, "lng" : 30.6884316 } } }, "partial_match" : true, "place_id" : "ChIJGdhVfOHv9h4RTdwpjTLyapc", "types" : [ "route" ] } ], "status" : "OK" }

As you can see, there are three different latitude:longitude pairs given: (-29.748753, 30.70189599999999), (-29.7572126, 30.6884316) and (-29.7515822,30.6928772)

Jonathon Paarlberg

unread,
Jul 23, 2015, 2:16:05 PM7/23/15
to OpenRefine, lion...@gmail.com
Correction, it returns not the first pair but the first pair after the "location" property of "geometry". So it's focused only on this one aspect of the results.


On Monday, July 20, 2015 at 10:27:50 AM UTC-4, Jonathon Paarlberg wrote:

Owen Stephens

unread,
Jul 24, 2015, 3:53:21 AM7/24/15
to OpenRefine, lion...@gmail.com
I think there are two issues here. 

This is about the construction of the JSON returned by the geocoding service. In this case you've only got back a single result, but it contains multiple lat/lon positions - but they all relate to the same location. If I format the JSON:

      "geometry": {
        "bounds": {
          "northeast": {
            "lat": -29.748753,
            "lng": 30.70189599999999
          },
          "southwest": {
            "lat": -29.7572126,
            "lng": 30.6884316
          }
        },
        "location": {
          "lat": -29.7515822,
          "lng": 30.6928772
        },
        "location_type": "GEOMETRIC_CENTER",
        "viewport": {
          "northeast": {
            "lat": -29.748753,
            "lng": 30.70189599999999
          },
          "southwest": {
            "lat": -29.7572126,
            "lng": 30.6884316
          }
        }
      },

You can see that the first two lat/lon pairs define a bounding box for the result
The third lat/lon (the one the GREL expression extracts) is the 'location' and you can see the 'location type' is give as the Geometric Centre
The last two lat/lon pairs define a viewport (that is the proposed area of map to show when showing your result on a map)


Obviously you need to choose the right data for your task, but in terms of have a single position/location for the address then I think the 'location' lat/lon is probably the best bet.

However it is also possible for the geocoding API to return multiple results - and as you note the [0] in the current expression means that the GREL will only use the first result in the list. If you want to use multiple results you can use 'forEach' to go through each 'result' in turn - something like:

forEach(value.parseJson().results,v,v.geometry.location.lat + ", " + v.geometry.location.lat)

Owen

Tom Morris

unread,
Jul 24, 2015, 3:47:09 PM7/24/15
to openr...@googlegroups.com, lion...@gmail.com
What Owen said (and thanks for pretty printing the JSON, Owen).

Note also that the geometric center or centroid is not necessarily the center of the bounding box (it isn't in this example).  I think one would generally want the centroid, but if you prefer the center of the bounding box, it could be computed from the bounds.  Other than that, I don't see a reason to deal with the other sets of coordinates that are returned by Google.

Tom

--

Jonathon Paarlberg

unread,
Aug 3, 2015, 11:01:12 AM8/3/15
to OpenRefine, lion...@gmail.com
Thanks, guys. You're right about the centroid, and I'll try to pretty-print the Json next time I need to share it.
Reply all
Reply to author
Forward
0 new messages