Google Refine 2.0 as a data cleansing and exploration tool for JSON-LD

92 views
Skip to first unread message

Bradley Allen

unread,
Nov 11, 2010, 2:26:32 PM11/11/10
to jso...@googlegroups.com, dfh...@csail.mit.edu
After playing around for a few moments using a JSON-LD dump of the
linked data in http://t4gm.info I was able to round-trip the dump
through Google Refine 2.0 and json_ld_processor with only very minor
tweaks to the project (cleaned up auto-generated column names, made
some columns multi-valued, turned the comma-separated strings in
multi-valued columns into JSON arrays in the export template.) I did
depend on json_ld_processor's approach to handling null values (which
is to ignore them) and had to plug a context back in in the export
template prefix field, but the ease at which I was able to do this,
and the ability to do faceted browse over the loaded data set, was
tremendous.

So Google Refine is essential an out-of-the-box data cleansing and
faceted browse tool for JSON-LD. While I haven't confirmed it, my
guess is that only JSON RDF serializations work as seamlessly as this,
and that of these only JSON-LD generates the most natural table in
Google Refine. Assuming this is the case, this is a powerful
validation of Manu and Mark's design approach to the serialization, or
at least its compatibility with the design of Google Refine.

If you want to give it a crack, the t4gm.info JSON-LD file is at
http://bradleypallen.org.s3.amazonaws.com/t4gm_info_ld.json. Simply
create a new project with that URL as the data source.

David- Kudos and thanks very much.

Bradley P. Allen
http://bradleypallen.org

Reply all
Reply to author
Forward
0 new messages