Templating nested json

313 views
Skip to first unread message

Lara Marziali

unread,
Jun 15, 2020, 8:20:56 AM6/15/20
to OpenRefine

Hi,


I've got this kind of data:

aut.png



and I want to export them into a nested JSON, like:

"person": [
    {
      "name": "Alighieri, Dante",
      "viafId": "893",
      "wikidataId": "Q3939",
      "role": "creator"
    },
    {
      "name": "Other",
      "viafId": "566",
      "role": "contributor"
    }
  ]


if(isNonBlank(value), '{ "name": "' + value + '", "viafId": "' + cells['viafId'].value + '", "wikidataId": "' + cells['wikidataId'].value + '", "role": "' + cells['role'].value + '" }', null)

but with my data it takes just the first line and not the second, I think because in the second case the "wikidataId" is null. I think I need to insert another logic for the viafId and the WikidataId. Maybe a script in python using the NonBlank variable would do the trick? Is it possible?

Thanks,
Lara

Thad Guidry

unread,
Jun 15, 2020, 10:25:48 AM6/15/20
to openr...@googlegroups.com
The Custom Templating Exporter can be used...

The only caveat is that it will generate a slightly invalid JSON file, because there's no way to control the Row Separator with a conditional (it's a pure String)...
so there will be an extra "," comma before the blanked out "wikidataId" field.

Anyways, here's what you would paste into the "Row Template" box that works (except for the extra comma, which can be cleaned up with an external script after exporting from OpenRefine):

    {
      "person name" : {{jsonize(cells["person name"].value)}},
      "role" : {{jsonize(cells["role"].value)}},
      "viafId" : {{jsonize(cells["viafId"].value)}},
      {{if(isBlank(cells["wikidataId"].value),"",'"wikidata : "' + cells["wikidataId"].value + '"')}}
    }

image.png

Hope this helps,


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/c5cd8a8e-e8a9-4ac8-9b0f-4bde05e9a13fo%40googlegroups.com.

Lara Marziali

unread,
Jun 15, 2020, 10:52:11 AM6/15/20
to OpenRefine
Thanks,

the only problem with that is that I don't create an object "person" with the values as an array. Unfortunately it has to be like this. There are other values that I need (simpler) to export, so I can't use the "prefix" to create the "person" object. I wasn't very clear in this, sorry.

For what I know the templating repeats what you write in the "Row templates" for each row, and it's not that the case for me.
To unsubscribe from this group and stop receiving emails from it, send an email to openr...@googlegroups.com.

Antonin Delpeuch (lists)

unread,
Jun 15, 2020, 11:01:20 AM6/15/20
to openr...@googlegroups.com
Hi Lara,

You are hitting a pretty fundamental problem with OpenRefine's record
mode: the inability to represent an arbitrary hierarchical structure
faithfully. We are well aware of this issue and we are looking into it,
but it is typically the sort of issue that requires a lot of thought so
it is probably not going to be solved very soon.

In the meantime, if there is only one level of nested objects, perhaps
you could do as follows:
- use the "join multi-valued cells" operation to concatenate your array
values using "," as separator (make sure you add quotes to your values
beforehand if they are strings)
- then, use the templating exporter as Thad suggests?

It's a hack, but that is all I can think of at the moment.

Best,
Antonin

On 15/06/2020 16:52, Lara Marziali wrote:
> Thanks,
>
> the only problem with that is that I don't create an object "person"
> with the values as an array. Unfortunately it has to be like this. There
> are other values that I need (simpler) to export, so I can't use the
> "prefix" to create the "person" object. I wasn't very clear in this, sorry.
>
> For what I know the templating repeats what you write in the "Row
> templates" for each row, and it's not that the case for me.
>
> Il giorno lunedì 15 giugno 2020 16:25:48 UTC+2, Thad Guidry ha scritto:
>
> The Custom Templating Exporter can be used...
>
> The only caveat is that it will generate a slightly invalid JSON
> file, because there's no way to control the Row Separator with a
> conditional (it's a pure String)...
> so there will be an extra "," comma before the blanked out
> "wikidataId" field.
>
> Anyways, here's what you would paste into the "Row Template" box
> that works (except for the extra comma, which can be cleaned up with
> an external script after exporting from OpenRefine):
>
>     {
>       "person name" : {{jsonize(cells["person name"].value)}},
>       "role" : {{jsonize(cells["role"].value)}},
>       "viafId" : {{jsonize(cells["viafId"].value)}},
>       {{if(isBlank(cells["wikidataId"].value),"",'"wikidata : "' +
> cells["wikidataId"].value + '"')}}
>     }
>
> image.png
>
> Hope this helps,
> Thad
> https://www.linkedin.com/in/thadguidry/
> <https://www.linkedin.com/in/thadguidry/>
>
>
> On Mon, Jun 15, 2020 at 7:21 AM Lara Marziali
> <lara.m...@medialibrary.it <javascript:>> wrote:
>
> Hi,
>
>
> I've got this kind of data:
>
> aut.png
>
>
>
> and I want to export them into a nested JSON, like:
>
> "person": [
>     {
>       "name": "Alighieri, Dante",
>       "viafId": "893",
>       "wikidataId": "Q3939",
>       "role": "creator"
>     },
>     {
>       "name": "Other",
>       "viafId": "566",
>       "role": "contributor"
>     }
>   ]
>
> I've seen this solution
> (https://groups.google.com/forum/#!searchin/openrefine/json$20templating%7Csort:date/openrefine/56MIeWHZiKY/fXexf3t5rTQJ
> <https://groups.google.com/forum/#!searchin/openrefine/json$20templating%7Csort:date/openrefine/56MIeWHZiKY/fXexf3t5rTQJ>)
> so I created a new column base on "person name" with
>
> if(isNonBlank(value), '{ "name": "' + value + '", "viafId": "' +
> cells['viafId'].value + '", "wikidataId": "' +
> cells['wikidataId'].value + '", "role": "' + cells['role'].value
> + '" }', null)
>
> but with my data it takes just the first line and not the
> second, I think because in the second case the "wikidataId" is
> null. I think I need to insert another logic for the viafId and
> the WikidataId. Maybe a script in python using the NonBlank
> variable would do the trick? Is it possible?
>
> Thanks,
> Lara
>
> --
> You received this message because you are subscribed to the
> Google Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to openr...@googlegroups.com <javascript:>.
> <https://groups.google.com/d/msgid/openrefine/c5cd8a8e-e8a9-4ac8-9b0f-4bde05e9a13fo%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/openrefine/8d2dc566-61b1-432d-af29-8ab4c6fec15co%40googlegroups.com
> <https://groups.google.com/d/msgid/openrefine/8d2dc566-61b1-432d-af29-8ab4c6fec15co%40googlegroups.com?utm_medium=email&utm_source=footer>.

Lara Marziali

unread,
Jun 15, 2020, 11:16:55 AM6/15/20
to OpenRefine
Hi and thanks,

yes I understand the problem. I think that with that hack will do, I didn't thought about it.

I look forward for the solution, in the meantime I will use this.

Thank you both,
lara

Thad Guidry

unread,
Jun 15, 2020, 11:21:13 AM6/15/20
to openr...@googlegroups.com
Typically when I need to do CSV to JSON conversion and other data transformations, I turn to other tools.
You might be interested in Bonobo which is alpha, but works well for this actually since you can control many things about the Graph nodes including Orphans, etc.


or

Pandas - Python (many tutorials on YouTube showing CSV to JSON)



To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/5dc84943-9b33-d05a-e120-b68640cb5338%40antonin.delpeuch.eu.

Lara Marziali

unread,
Jun 16, 2020, 6:31:49 AM6/16/20
to OpenRefine
Thank you a lot!

Owen Stephens

unread,
Jun 25, 2020, 5:57:17 AM6/25/20
to OpenRefine

>         I've seen this solution
>         (https://groups.google.com/forum/#!searchin/openrefine/json$20templating%7Csort:date/openrefine/56MIeWHZiKY/fXexf3t5rTQJ
>         <https://groups.google.com/forum/#!searchin/openrefine/json$20templating%7Csort:date/openrefine/56MIeWHZiKY/fXexf3t5rTQJ>)
>         so I created a new column base on "person name" with
>
>         if(isNonBlank(value), '{ "name": "' + value + '", "viafId": "' +
>         cells['viafId'].value + '", "wikidataId": "' +
>         cells['wikidataId'].value + '", "role": "' + cells['role'].value
>         + '" }', null)
>
>         but with my data it takes just the first line and not the
>         second, I think because in the second case the "wikidataId" is
>         null. I think I need to insert another logic for the viafId and
>         the WikidataId.

When joining together values like this it's worth explicitly converting them to Strings. You can do this by adding ".toString()" to each value you are using. In particular this ensures that any "null" values are converted to an empty string instead. If you use:

if(isNonBlank(value), '{ "name": "' + value.toString() + '", "viafId": "' + cells['viafId'].value.toString() + '", "wikidataId": "' + cells['wikidataId'].value.toString() + '", "role": "' + cells['role'].value.toString() + '" }', null)

You should find that you get an output on both lines. It may not entirely solve your problem because the result is you get some empty properties:

"person": [
   
{
     
"name": "Alighieri, Dante",
     
"viafId": "893",
     
"wikidataId": "Q3939",
     
"role": "creator"
   
},
   
{
     
"name": "Other",
     
"viafId": "566",

     
"wikidataId": "",
     
"role": "contributor"
   
}
 
]

That may or may not be appropriate.

To do better than this you would need to test each cell to see if it was blank before you included it in the output string

Best wishes

Owen
Reply all
Reply to author
Forward
0 new messages