Parsing wikipedia API JSON output

160 views
Skip to first unread message

Andrea Borruso

unread,
Jan 13, 2018, 1:46:13 PM1/13/18
to OpenRefine
Hi,
I have this JSON output

{
 
"batchcomplete": "",
 
"query": {
   
"pages": {
     
"222899": {
       
"pageid": 222899,
       
"ns": 0,
       
"title": "Juniperus oxycedrus",
       
"extract": "Il ginepro rosso (Juniperus oxycedrus L.) è una pianta a portamento arbustivo, presente in Italia, appartenente alla famiglia delle Cupressaceae.\nÈ una specie caratteristica della macchia mediterranea.\n\n"
     
}
   
}
 
}
}

I would like to extract the "extract" value ("Il ginepro rosso (Juniperus oxycedrus L.) è una pianta a ...."), but I would like to have a generic way to do it, because I'm able to do it only specifing the id (value.parseJson().query.pages["222899"].extract).

I need something like value.parseJson().query.pages[].extract for all the records of my dataset, but this is a wrong formula.

Thank you

Ettore Rizza

unread,
Jan 14, 2018, 2:29:17 AM1/14/18
to OpenRefine
Hi Andrea, 

you will find information about the problem here or here

Short answer : the best with Wikimedia's APIs is to use Python/Jython rather than Grel, with a script like:

import json
data
= json.loads(value)
lista
= []
for pageid, page in data["query"]["pages"].items():
     lista
.append(page['extract'])

return ":::".join(lista)

Example:

andy

unread,
Jan 14, 2018, 5:26:36 AM1/14/18
to openr...@googlegroups.com
Hi Ettore,
you are very kind and skilled, thank you.


On 14 January 2018 at 08:29, Ettore Rizza <ettor...@gmail.com> wrote:
Hi Andrea, 

you will find information about the problem here or here

Short answer : the best with Wikimedia's APIs is to use Python/Jython rather than Grel, with a script like:


Yesterday I have solved using API XML output and `parseHtml()`. The XML output (for me) it's easier to process, because it has a simpler structure.

Grazie mille



--
___________________

Andrea Borruso
website: https://medium.com/tantotanto
38° 7' 48" N, 13° 21' 9" E, EPSG:4326
___________________

"cercare e saper riconoscere chi e cosa,
 in mezzo all’inferno, non è inferno, 
e farlo durare, e dargli spazio"

Italo Calvino
Reply all
Reply to author
Forward
0 new messages