Simple way to access all cell values in a row

1,631 views
Skip to first unread message

Owen Stephens

unread,
May 13, 2016, 5:24:44 AM5/13/16
to OpenRefine
I don't think I've seen this posted/documented anywhere before so I thought I'd post it in case anyone was interested. I discovered yesterday that you can access column names in a project using:

row.columnNames

This gives an array of column names. This can be combined with a forEach loop and the cells["column name"] syntax get all values from a single row into an array:

forEach(row.columnNames,cn,cells[cn].value)

A couple of uses for this are merging all values in a row and looking for blanks across a row. I've added the row.columnNames variable to the wikipage on Variables, and added a few recipes to https://github.com/OpenRefine/OpenRefine/wiki/Recipes. For convenience I'm repeating the recipes here:

Merging all columns in a project

To merge all values across a row, you can use 'row.columnNames' (which retrieves all the column names in the project) with the 'cells' variable:

  forEach(row.columnNames,cn,cells[cn].value).join("|")

If there are 'null' cells in the row, the row will still merge, but where the cells were null you will get the phrase "Cannot retrieve field from null|" If there are empty strings ("") in any cell in the row, these will be merged as empty strings.


To make null cells and cells with empty strings behave in the same way you can enhance the GREL given above with a check for null cells:

  forEach(row.columnNames,cn,if(isNull(cells[cn]),"",cells[cn].value))


Facet for rows with a certain number of blank cells

If you want to find rows that have a certain number of blank cells, including 'all blank' or 'no blanks' you can create a custom facet:

Facet->Custom text facet (from any column in the project)

use the GREL:

  filter(row.columnNames,cn,isBlank(cells[cn].value)).length()


This will give a facet with the number of blank (null or empty strings) cells in each row. If you want to limit by a particular number you can enhance the GREL with a check for a specific length: 

Rows with more than one blank cell (true/false facet)

  filter(row.columnNames,cn,isBlank(cells[cn].value)).length()>1

Rows with exactly one blank cell (true/false facet)

  filter(row.columnNames,cn,isBlank(cells[cn].value)).length()==1

Rows with less than two blank cells (true/false facet)

  filter(row.columnNames,cn,isBlank(cells[cn].value)).length()<2

Any other suggestions for how this function could be used gratefully recieved either here or added directly to the wiki

Owen 

Thad Guidry

unread,
May 13, 2016, 8:14:35 PM5/13/16
to openrefine
Its from our Variables rework that David Huynh did after Gridworks 1.1

We documented them all here:

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thad Guidry

unread,
May 13, 2016, 8:16:12 PM5/13/16
to openrefine
Oh, you updated that page already....so we didn't have the columnNames documented before ?  That's a shame, and that's probably my fault. :(

Tom Morris

unread,
May 16, 2016, 10:24:18 PM5/16/16
to openr...@googlegroups.com
Thanks for the update, Owen. I wonder if we should directly support integer array indexing as well? It might be a bit of work since cells can be sparse and columns can be reordered, but the current mechanism seems a little convoluted if all you really want to do is iterate through all the cells in order. What do folks think?

As a recap, the variables associated with a row are: index (int), cells (array of cells), record (reference to record containing this row), columnNames (array of string), flagged (boolean), starred (boolean).

Tom

--

Owen Stephens

unread,
May 17, 2016, 5:30:09 AM5/17/16
to OpenRefine
Integer array indexing would be nice, but not essential IMO
I think there are other enhancements in terms of supported variables that would deliver more value - specifically ability to access rows by row.index (at the moment you can work around this by moving all rows into records, but that's very hacky)

Martin Magdinier

unread,
May 29, 2016, 12:23:57 PM5/29/16
to openr...@googlegroups.com
Thanks Owen for the tip.

This might by somewhat helpful for user who want to edit multiple columns quickly with the same operation. For example when one wants to trim all columns there is two option:
1. using the point and click interface to select the trim function
2. edit the JSON from from the Undo/Redo.

Now the user can repeat the following GREL from any columns by updating the columnName index from 0 to x: cells[row.columnNames[0]].value.trim()

This still required one operation per column, but
1. it reduces the number of click (just select any transform command) and
2. it doesn't required to enter each column name

Martin

Owen Stephens

unread,
May 31, 2016, 6:33:21 AM5/31/16
to OpenRefine
Thanks Martin,

The other way you could use it in the 'need to trim all columns' context is to create a list of the columns containing values that need trimming. So on any column select Facet->Custom Text Facet and enter the GREL:

filter(row.columnNames,cn,or(cells[cn].value.startsWith(" "),cells[cn].value.endsWith(" ")))

Will give a facet which contains a list of all the columns which contain values which either start or end with a space

Owen

raja kumar dash

unread,
Nov 21, 2016, 3:22:18 AM11/21/16
to OpenRefine
What a great tip. It's also useful for creating a lookup table using JSON. E.g., I have to merge data from multiple sources using a common field - such as a unique ID. So if get new data in a Refine project, I can use Owen's tip to generate a JSON string for an entire row of values, and repeat the process in a loop. Here's an example:

1. Create your "lookup table" project in Refine
2. Left-click the dropdown triangle on the rightmost column, mouse over the "Edit column" option, then click "Add column based on this column"
3. In the dialogue, paste in the following to create a new column containing a JSONified version of each row's values:

'{ ' +forEach(row.columnNames, a, '"' +a +'": "' +cells[a].value +'" ').join(", ") +' }'

4. You can verify one string at http://jsonlint.com/, but that should work.
5. Now, use this project as a lookup table from from another refine project and only have to do one cross-join

wl

unread,
Jun 19, 2017, 9:53:04 AM6/19/17
to OpenRefine
Thank you for this great tip for walking through all columns in a table! I've used it to create SQL statements for inserting records in a MySQL database. This is very useful and saves my a lot of time!
Text data is put in quotation marks, missing values are replaced by NULL.

Example data:

id,var1,var2,var3
1,7,a,2017-06-19
2,4.3,xy,
3,,,2017-07-21


Looks like this in OR:

var3 -> Edit column -> Add column based on this column -> column name: sql_insert
"INSERT INTO table ("+forEach(row.columnNames,cn,cn).join(", ")+") VALUES ("+forEach(row.columnNames,cn,if(isNumeric(cells[cn].value),cells[cn].value,"'"+cells[cn].value+"'")).join(", ").replace('Cannot retrieve field from null','NULL')+");"

There is probably a more elegant way to test on empty cells, but just replacing the error message by NULL works, too...


With Export -> Custom tabluar exporter -> Select only column sql_insert -> Download -> Preview -> Copy the SQL statements and run them in the MySQL console
INSERT INTO table (id, var1, var2, var3) VALUES (1, 7, 'a', '2017-06-19');
INSERT INTO table
(id, var1, var2, var3) VALUES (2, '4.3', 'xy', NULL);
INSERT INTO table
(id, var1, var2, var3) VALUES (3, NULL, NULL, '2017-07-21');

Wolf

Ettore Rizza

unread,
Jun 30, 2017, 4:14:57 AM6/30/17
to OpenRefine
Thankx for the trick, WL. Another way to recreate SQL statements is using Templating Export.


wl

unread,
Jul 1, 2017, 2:07:36 AM7/1/17
to OpenRefine, ettor...@gmail.com
Many thanks, Ettore! I've never used Templating Export before. Now there is a good example of this functionality.
Wolf
Reply all
Reply to author
Forward
0 new messages