Geonames - parsing data from json

398 views
Skip to first unread message

Ger Dijkstra

unread,
Jul 12, 2017, 12:17:42 PM7/12/17
to OpenRefine
Hello all,

I am trying to parse data from cells that contain data from GeoNames, here is an example (in red the items that I would like to extract):

{"totalResultsCount":521,"geonames":[{"adminCode1":"05","lng":"8.91418","geonameId":2911007,"toponymName":"Hanau am Main","countryId":"2921044","fcl":"P","population":88648,"countryCode":"DE","name":"Hanau","fclName":"city, village,...","countryName":"Germany","fcodeName":"seat of a fourth-order administrative division","adminName1":"Hesse","lat":"50.13423","fcode":"PPLA4"}]}


The aim is to get the values for geonameId, lat, and lng each in a separate column. However, I am totally unfamiliar with the grel function "parseJson". Can someone please help me with building a working parseJson expression? Many thanks in advance!

Kind regards from (very) rainy Berlin,
Ger Dijkstra


John Little

unread,
Jul 12, 2017, 1:11:27 PM7/12/17
to openr...@googlegroups.com
Hi Ger.

The GREL you want is as follows:

value.parseJson().geonames[0].lng
value.parseJson().geonames[0].geonameId
value.parseJson().geonames[0].lat

You'll want to use one line per operation where you use the "add column based on this column" function.  (I think you know that.  If that's not clear, let me know.)

Cheers.

--John

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thad Guidry

unread,
Jul 12, 2017, 1:12:50 PM7/12/17
to OpenRefine
Yeah we could probably do better on documenting using parseJson().  I'll get Ettore to work on that over the next week :)

We have the following that will help you learn and give the answer to yourself.

In addition, we could probably have a bit more added to our parseJson() GREL reference to highlight the main differences in the parseJson() expression syntax...and that would be of Arrays and Entities(JSON Objects).

Arrays can be parsed out with the GREL syntax using [ ] such as:
value.parseJson()["geonames"]

Entities (in JSON are called Objects) are easy:
value.parseJson().totalResultsCount

Combining them is also easy:
value.parseJson()["geonames"].geonameId

-Thad


John Little

unread,
Jul 12, 2017, 1:21:23 PM7/12/17
to openr...@googlegroups.com
Deference to Thad's response.   I suspect his syntax is more standard.  

But I also want to add to my earlier response.  It is sometimes easiest to get a sense of how to parse the JSON if you first format the JSON in a JSON viewer.  Making the JSON more human readable makes it easier to determine the keys you will need/use.

For example:  
  1. http://jsonviewer.stack.hu/
  2. paste the example JSON
  3. click "Format" (in the sub-menu bar)

Then you can more easily see the structure of the JSON to better understand what keys you'll want to use with the parseJSON function.  That is, the "geonames" key indicates the array with its square brackets, followed by -- in my example -- the array position (in this case, element 1 of the array:  [0]), followed by the desired key (sub-key of geonames).

Cheers.

Ger Dijkstra

unread,
Jul 13, 2017, 5:57:47 AM7/13/17
to OpenRefine
Hello John,

Thanks a lot, the grel functions work perfectly! 

Kind regards,
Ger

Ger Dijkstra

unread,
Jul 13, 2017, 6:18:21 AM7/13/17
to OpenRefine
Hello Thad,


On Wednesday, July 12, 2017 at 7:12:50 PM UTC+2, Thad Guidry wrote:
Yeah we could probably do better on documenting using parseJson().  I'll get Ettore to work on that over the next week :)


 I looked at the documentation that is available on github and on other websites. I am probably not the only one who would benefit from a "documentation for dummies". At high level I understand what expressions are and how they work, but not at a detailed level. I will try to describe with the following example what I would need in order to more easily work with grel expression more easily.

This expression is a good example of what I didn't understand in an expression:

value.parseJson().geonames[0].geonameId  (thanks again, John :-) )

When I break this expression down in its individual elements, I understand something like this:

element 1:  value.parseJson().  --> analyze a piece of Json
element 2:  geonames[0].         --> no clue what this element is referring to, and what the 0 in brackets does
element 3:  geonameId             --> this is the "tag" / "element" within the results that we're looking for, the value after "geonameId" is the desired and returned value

What I need is, on other words, an explanation of how an expression is build up / composed, what the individual elements "do", and what variables can be used within the elements (like the [0] in element 2). Because I don't understand the syntax of an expression I am not able to adjust / change expressions that one can find on github etc. But I am eager to learn :)

Kind regards,
Ger



Thad Guidry

unread,
Jul 13, 2017, 11:47:32 AM7/13/17
to OpenRefine
Since you have Essential questions about understanding expressions more.... 


where that page actually gives you a TIP to say

IMPORTANT TIP: [1] is equivalent to saying "The 2nd part of an array or list" in GREL since indexing of arrays or lists in Refine actually begins with [0] or "The 1st part of an array or list". 
 
You probably did not read all the way through the User Documentation and just started to play around :)  Please spend time reading everything under the FEATURE AREAS specifically all the Essential pages of our User Documentation on this Wiki page 


Get busy reading and THEN playing as you go along :)

-Thad

Ger Dijkstra

unread,
Jul 13, 2017, 11:52:42 AM7/13/17
to OpenRefine
Hi Thad,

You probably did not read all the way through the User Documentation and just started to play around :)  

Yes, absolutely true ... 


Get busy reading and THEN playing as you go along :)

I got the hint, thanks for the info, Thad, I'll start reading :) !

Kind regards,
Ger

Ettore Rizza

unread,
Jul 14, 2017, 9:14:43 AM7/14/17
to OpenRefine
Hi Ger,

The hardest part is finding the path of the element you want to extract from the Json. For this, I recommend using a browser extensions (such as PickyJson for Chrome), which allows you to click on an element and to copy and paste its path. 



In this case, the extension tells me that the element I want is "geonames [" 0 "] lat. Then i can use it directly in Open Refine (taking care to replace the 0 between quotation marks by a true 0.) :

value.parseJson.geonames[0].lat

Owen Stephens

unread,
Jul 14, 2017, 9:49:18 AM7/14/17
to OpenRefine
Hi Ger,

I started looking at adding an alternative approach for parsing JSON in OpenRefine - I made a little progress but then stalled due to my time being eaten up by some other projects for a bit - I'm hoping over the summer I'll be able to get started on this again. I'd be interested in hearing from you and others what would work for you more intuitively.

Examples work best in demonstrating how it might work - so if you have any ideas please feel free to put some examples into this thread!

Best wishes

Owen
Message has been deleted

Ettore Rizza

unread,
Jul 14, 2017, 10:12:46 AM7/14/17
to OpenRefine
I made a little error in my Grel formula. The correct one is of course value.parseJson().geonames[0].lat

Here is a working example :

Ger Dijkstra

unread,
Jul 14, 2017, 2:39:30 PM7/14/17
to OpenRefine
Hello Owen,

Thanks for your support! Talking for myself, I should first indeed take the time to read the information that Thad has pointed out. I think that once I understand how the GREL syntax works, I should be able to adjust GREL expressions according to my needs. I will probably run into more examples later this summer, likely in the field of georeferencing. We are currently comparing the usage of GeoNames versus Erfgeo (http://erfgeo.nl/, a Dutch website, that is more specific for old Dutch place names). Adjusting the api call takes some time (especially when adding additional criteria, like country, and wether one is looking for a place or a municipality), after results are retrieved I discovered I need different parse expressions. I could not use the parseJson expressions that are used on the http://erfgeo.nl/ website. I will get back to this after my bike holiday in the Ukraine :)

Still, the help from you and all the others on this group is FANTASTIC and I am really thankful for that!! Thanks to all you folks out there :) !

Best wishes,
Ger

Ettore Rizza

unread,
Jul 14, 2017, 2:58:12 PM7/14/17
to OpenRefine
I'm working on something quite similar with Geonames, Wikidata and DBpedia. Don't hesitate to ask questions when you need clarification. At one point, it's quite possible that GREL will not work anymore, but be aware that it's possible to parse Json with great precision in Open Refine using Jython. 
Reply all
Reply to author
Forward
0 new messages