Openlibrary.org parse JSON

251 views
Skip to first unread message

wl

unread,
Apr 24, 2018, 8:00:58 AM4/24/18
to OpenRefine
Hi OR community!

I have a question about parsing results from the openlibrary.org API.

When I add a column json based on this URL
{"ISBN:1860390323": {"publishers": [{"name": "Orchard Books"}], "identifiers": {"isbn_13": ["9781860390326"], "openlibrary": ["OL12052499M"], "isbn_10": ["1860390323"], "oclc": ["35208816"], "goodreads": ["1731978"]}, "weight": "2.7 ounces", "title": "The Amazing Mr. Pleebus (Orchard Readalones)", "url": "https://openlibrary.org/books/OL12052499M/The_Amazing_Mr._Pleebus_(Orchard_Readalones)", "number_of_pages": 64, "cover": {"small": "https://covers.openlibrary.org/b/id/3013949-S.jpg", "large": "https://covers.openlibrary.org/b/id/3013949-L.jpg", "medium": "https://covers.openlibrary.org/b/id/3013949-M.jpg"}, "publish_date": "June 20, 1996", "key": "/books/OL12052499M", "authors": [{"url": "https://openlibrary.org/authors/OL3306530A/Nick_Abadzis", "name": "Nick Abadzis"}]}}

Now it should be possible to extract values by a key, e.g. the number of pages
value.parseJson().get("number_of_pages")

It only works when I remove the first element from the JSON (the "ISBN:and specific number")
{"publishers": [{"name": "Orchard Books"}], "identifiers": {"isbn_13": ["9781860390326"], "openlibrary": ["OL12052499M"], "isbn_10": ["1860390323"], "oclc": ["35208816"], "goodreads": ["1731978"]}, "weight": "2.7 ounces", "title": "The Amazing Mr. Pleebus (Orchard Readalones)", "url": "https://openlibrary.org/books/OL12052499M/The_Amazing_Mr._Pleebus_(Orchard_Readalones)", "number_of_pages": 64, "cover": {"small": "https://covers.openlibrary.org/b/id/3013949-S.jpg", "large": "https://covers.openlibrary.org/b/id/3013949-L.jpg", "medium": "https://covers.openlibrary.org/b/id/3013949-M.jpg"}, "publish_date": "June 20, 1996", "key": "/books/OL12052499M", "authors": [{"url": "https://openlibrary.org/authors/OL3306530A/Nick_Abadzis", "name": "Nick Abadzis"}]}

This is for demonstration purposes to show this OR feature. I could remove the first element or extract the number of pages with a regex in a real project.
But how can I get the value of an element by its key in the complete API result?

If it's not possible with openlibrary.org could someone recommend a free ISBN API that is easy to parse?

Thanks,
Wolf

Ettore Rizza

unread,
Apr 24, 2018, 8:50:46 AM4/24/18
to OpenRefine
Hi,

This is because this API produces a Json with a unique key. A quick hack in GREL could be to delete the number just after ISBN:

value.replace(/(?<=ISBN):\d+/, "").parseJson().ISBN.number_of_pages


A cleaner solution in Jython:

import json

data = json.loads(value)

my_list = []
for key, val in data.items():
     my_list.append(str(val['number_of_pages']))
return "::".join(my_list)

wl

unread,
Apr 25, 2018, 12:41:50 AM4/25/18
to OpenRefine
Thanks a lot, Ettore!

Both your solutions work like a charm!
Wolf
Reply all
Reply to author
Forward
0 new messages