Can't find how to parse a big and intricated Json

135 views
Skip to first unread message

Ettore Rizza

unread,
Nov 20, 2016, 2:42:48 PM11/20/16
to OpenRefine
Hi, 

I know how to parse json cells in Open refine, but this one is too tricky for me. 





I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs. 


For each ID and each day of the year from now until november 2017, i would extract the availability of this rooms (true or false) and its price at this day. I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this. Any help would be appreciate. 

Ettore Rizza

unread,
Nov 20, 2016, 3:04:21 PM11/20/16
to OpenRefine

Precision : i've tried, of course, forEach(value.parseJson().calendar_months, e, e.days). The result is an array of arrays that disrupts me.

Owen Stephens

unread,
Nov 21, 2016, 4:26:35 AM11/21/16
to OpenRefine
Cross posting from Stackoverflow:

I think you are on the right track. The output of:

forEach(value.parseJson().calendar_months, e, e.days)


is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.

To keep the steps manageable I'd suggest tackling it like this:

First use

forEach(value.parseJson().calendar_months,m,m.days).join("|")

You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.

Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows each containing a JSON expression. Now for each ID you have 12 rows in OR

Then use:

forEach(value.parseJson(),d,d).join("|")

This splits the JSON down into the individual days

Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.

Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day.

You've now got some pretty easy JSON in each cell - so you can extract availability using

value.parseJson().available

etc.

Owen

Owen Stephens

unread,
Nov 21, 2016, 4:36:36 AM11/21/16
to OpenRefine
A couple of additions to my reply:

1) After you've split out the JSON into months I should have said '12 rows per ID' - not just 12 rows altogether
2) Once you've split the JSON out into one row per day, then you'll probably want to use the 'fill down' function on the ID column to fill in the ID for each of the rows

Ettore RIZZA

unread,
Nov 21, 2016, 4:53:38 AM11/21/16
to openr...@googlegroups.com
it works ! Thank you very much, Owen. :) 

I was too focused on solving the problem with one complex formula ("forEach(forEach..."). In addition, the result of "forEach (value.parseJson (), calendar_months, m, m.days) .join ( "|")" gave a bracketed answer, which made me think it was another Open Refine array.

Thanks again.

--
You received this message because you are subscribed to a topic in the Google Groups "OpenRefine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openrefine/MJ2VSS0o3Cw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openrefine+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ettore RIZZA

unread,
Nov 21, 2016, 4:56:55 AM11/21/16
to openr...@googlegroups.com
By the way, it looks like the version of Jython implemented in Open refine does not contain the module "json", is it ?

Owen Stephens

unread,
Nov 21, 2016, 11:41:50 AM11/21/16
to OpenRefine
OpenRefine 2.6rc2 uses Jython 2.5.3 (you can see this by looking in /extensions/jython/module/MOD-INF/lib/) - which doesn't include the json module. The latest version of Jython (2.7.0) does include the json module. 

I'm not really sure how the Jython integration works, but I was able to simply drop in the jar for Jython 2.7.0 (the 'standalone JAR' from http://www.jython.org/downloads.html) into the Jython extension in place of the 2.5.3 jar - the relevant directory is extensions/jython/module/MOD-INF/lib/ (I just removed the 2.5.3 jar and dropped in the 2.7.0 jar), restarted OR, and it seemed to work - I could then use 'import json' in the Jython expression.

As I say - I don't really know what I'm doing here so it maybe doing this could cause problems somewhere - so don't do it on my say so :)

I also believe it is possible to add modules that Jython can then use, but I've never been able to make this work successfully. There are some pointers from Tom Morris in this thread https://groups.google.com/forum/#!searchin/openrefine/jython%7Csort:relevance/openrefine/OvphcgDLNTs/HbOBCoDycq4J if you want to follow up that idea

Owen

Ettore RIZZA

unread,
Nov 21, 2016, 1:44:54 PM11/21/16
to openr...@googlegroups.com
Thanks again, Owen ! I just replaced the version of Jython (renaming it 2.5.3, in case the version is hardcoded somewhere): everything seems fine. A little too easy (it's weird), but it works. 

Reply all
Reply to author
Forward
0 new messages