Splitting all multi-valued columns into several columns at once in OpenRefine

51 views
Skip to first unread message

Michael Markert

unread,
Apr 21, 2022, 10:15:26 AM4/21/22
to OpenRefine

Hello there, first question here: 

I have a complex XML imported into OpenRefine and merged all rows of all records to one using a GREL formula in "All => Transform". Now I have over 40 columns with each containing multiple values in most cells seperated by "|" and I want to have them in different columns.

To give an impression of the data: Bildschirmfoto 2022-04-21 um 14.35.03.png

I could apply "Edit column => Split into several columns" for each column but this would mean to do it over and over again and this is just a test data set so later ones will likely contain even more rows. I am pretty sure this can be done via "All => Transform" using GREL but I haven't found a solution yet. 

Please help me!

Owen Stephens

unread,
Apr 21, 2022, 12:35:37 PM4/21/22
to OpenRefine
Hi Michael,

Unfortunately there's no way of doing a column split (or other column addition) using the All -> Transform. I can think of a way of avoiding having to use "split into several columns" on each column - but neither are great. Perhaps you could describe where you are starting from, and what final outcome you are looking for so we can think about potential alternative approaches overall?
....

However - the option I can think of just based on what you've posted is to generate the necessary "Operation History" JSON that can be used on the project. So for example the split operation for a single column might be:
[{
    "op": "core/column-split",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "col-1",
    "guessCellType": false,
    "removeOriginalColumn": false,
    "mode": "separator",
    "separator": "|",
    "regex": false,
    "maxColumns": 0,
    "description": "Split column col-1 by separator"
  }]


If you generate the same set of JSON substituting the right column names you'll get a set of JSON that can be posted using the "Apply" option in the "Undo/Redo" panel. So if the above was to do 'col-1' you could then do:

[{
    "op": "core/column-split",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "col-1",
    "guessCellType": false,
    "removeOriginalColumn": false,
    "mode": "separator",
    "separator": "|",
    "regex": false,
    "maxColumns": 0,
    "description": "Split column col-1 by separator"
  },
{
    "op": "core/column-split",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "col-2",
    "guessCellType": false,
    "removeOriginalColumn": false,
    "mode": "separator",
    "separator": "|",
    "regex": false,
    "maxColumns": 0,
    "description": "Split column col-2 by separator"
  }]

To do col-1 and col-2 in one go (and so on). How you create the JSON is up to you - but for example you could paste all the relevant column names into Excel and use a formula to fill in the all the boilerplate/template JSON and just add in the column name in the formula

Best wishes

Owen

Michael Markert

unread,
Apr 25, 2022, 2:46:52 AM4/25/22
to OpenRefine
Dear Owen, 
thank you very much for your answer - I haven't thought of using "Apply..."! This is fast (and straightforward) enough for my workflow.

All the best
Michael

Reply all
Reply to author
Forward
0 new messages