Newbie to openrefine: How to concatenate values of 2 or more columns into a new one

7,022 views
Skip to first unread message

hazel...@experiencebell.org

unread,
Mar 7, 2014, 10:38:11 AM3/7/14
to openr...@googlegroups.com
I have 2 columns names First Name and Last Name. I want to combine the values on these columns into a new column called Full Name.

Martin Magdinier

unread,
Mar 7, 2014, 10:41:41 AM3/7/14
to openrefine
You can use the following expression: cells["col1"].value + " " + cells["col2"].value

However you need to be carefull that you don't have blank cells in one or the other column, if not the GREL expression will fail. More details here


Martin



On Fri, Mar 7, 2014 at 10:38 AM, <hazel...@experiencebell.org> wrote:
I have 2 columns names First Name and Last Name. I want to combine the values on these columns into a new column called Full Name.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tom Morris

unread,
Mar 10, 2014, 12:47:21 PM3/10/14
to openr...@googlegroups.com
On Fri, Mar 7, 2014 at 10:41 AM, Martin Magdinier <martin.m...@gmail.com> wrote:
You can use the following expression: cells["col1"].value + " " + cells["col2"].value

However you need to be carefull that you don't have blank cells in one or the other column, if not the GREL expression will fail.

If there are potentially null valued cells, one could use if(isNull(),,) to keep your expression from erroring out.

Tom 

Sean Thackurdeen

unread,
Jun 29, 2017, 8:36:25 AM6/29/17
to OpenRefine
I perpetually return to this post and remain confused. Could you post an example ?

Ettore Rizza

unread,
Jun 29, 2017, 4:07:38 PM6/29/17
to OpenRefine
I think Tom Morris was talking about something like that.

if(isBlank(cells["First Name"].value), " ", cells["First Name"].value)  + if(isBlank(cells["Last Name"].value), " ", cells["Last Name"].value)




OK, a bit complicated for a simple concatenation of two strings... There are probably simpler solutions using facet by blank : http://kb.refinepro.com/2011/07/merge-2-columns-that-have-both-blank.html

Thad Guidry

unread,
Jun 29, 2017, 4:58:47 PM6/29/17
to openr...@googlegroups.com
Perform whatever facets you want or need to narrow the rows or records that will be affected by your concatenate or merge...and then follow the Wiki guide below. 
 
I've added some examples on how to do Merge Columns on our Wiki... Merging Concatenating Columns was missing from this page

Ettore Rizza

unread,
Jun 30, 2017, 3:35:36 AM6/30/17
to OpenRefine
We can add something in the documentation about the concatenation of two columns that contain null values. But I don't know if there are simpler methods or either magic trick that Owen has the secret.

Owen Stephens

unread,
Jun 30, 2017, 7:12:43 AM6/30/17
to OpenRefine
So as Ettore has pointed out the issue arises when one or more of the cells you want to concatenate is null - the concatenate fails because of the null value. 
I'm afraid I don't have any tricks for this - but I can suggest three approaches and you can pick the one that works for you.

To avoid this problem you either have to:
a) Do the concatenate then check for rows in which it hasn't worked
OR
b) Make sure all the cells you are going to concatenate contain a non-null value to start with
OR
c) Check for null values within the concatenation statement

I suspect (a) is the simplest

e.g.
cells["First name"].value) + " " + cells["Last name"].value

If there are nulls in either of these columns, this statement will result in a blank for those rows. You can then use a filter by blank on the column to find and fix the problem rows.

For (b) there are multiple approaches but a couple of options are:

b1) Use 'facet by blank' on the columns you are going to include in the concatenation and exclude any blank values across all the columns before doing the concatenation
b2) Before the concatenation replace any null values in the cells using a GREL like 
if(isNull(value)," ",value)
(you could use the new All->Transform menu option in OpenRefine 2.7 to do this across the project, but be careful as you are changing data with this!)

For (c) you can
c1) Check for null cells in the concatenate statement:
if(isNull(cells["First name"])," ",cells["First name"].value) + " " + if(isNull(cells["Last name"])," ",cells["Last name"].value)

c2) You can iterate through the columns to get the values, checking for nulls within the iterator, and then join together the resulting array - this is more concise code but is slightly more complicated to get your head around
forEach(row.columnNames[0,2],cn,if(isNull(cells[cn]),"",cells[cn].value)).join(" ")

Thad Guidry

unread,
Jun 30, 2017, 7:31:08 AM6/30/17
to openr...@googlegroups.com
One of the original repliers probably didn't realize or forgot that we deal with errors already in our Expression dialog.
  
Its not a concern.  Our Expression dialog has options to set to blank when null, store error, or copy value from original column.
You just keep the default of 'set to blank'.  If your using one of our API Clients, I think all of them have a parameter for that option.

Folks forget about that without the dialog open in front of them :)
-Thad

concat.png

Sean Thackurdeen

unread,
Jun 30, 2017, 12:10:37 PM6/30/17
to OpenRefine
Thanks so much y'all !

Eoin O'Mahony

unread,
Dec 12, 2017, 2:08:41 PM12/12/17
to OpenRefine
Yes, thank you all. I tried this today for the first time. Relative newbie. 

ted.s...@gmail.com

unread,
Jan 14, 2019, 11:14:55 PM1/14/19
to OpenRefine
Hi Owen,

I'm new to Open Refine and trying to understand how to apply the expression in option C2) that you described: forEach(row.columnNames[0,2],cn,if(isNull(cells[cn]),"",cells[cn].value)).join(" ")

I have a similar use case to the original poster's situation, where I had 2 columns originally: 'First name' and 'Family name'. I wanted to concatenate the first and family names, removing all white spaces. However, in both cases, there can be more than a single word in each cell of the 'Family name' and 'First name' columns. 

What I did was to split the original cells using space as a separator which broke both the first and last name columns into 4 columns for each (all white spaces are now gone).

I now would like to concatenate the results, concatenating my newly created family name and first name columns e.g Family name 1 + FN 2 + FN 3 + FN 4. How would I plug those 4 columns into your expression in C2, please?

Cheers,
Teddy

Ettore Rizza

unread,
Jan 15, 2019, 3:57:33 AM1/15/19
to OpenRefine
Hello Teddy,

If your longest name has 5 elements ("Jean Marc de la Michardière"), put 5 in the brackets ( "row.columnNames[0,5]"). But looks like something has changed in the null management since Owen's answer, since we can now (in OR 3) make concatenations even with null between the elements. 


forEach(row.columnNames[0,5], colname, cells[colname].value).join(" ")



screenshot-127.0.0.1-3333-2019.01.15-09-47-25.png




It's weird, I don't remember any discussions about it.


Ettore

Ettore Rizza

unread,
Jan 15, 2019, 4:07:01 AM1/15/19
to OpenRefine
Erratum: the concatenation of a string with a null always produces a null, nothing has changed. 

screenshot-localhost-3333-2019.01.15-10-01-46.png




On the other hand, we can join an array containing nulls. That's why the simplified formula above (forEach(row.columnNames[0,5], colname, cells[colname].value).join(" ")) works without needing to use if(isNull()).

Owen Stephens

unread,
Jan 15, 2019, 6:36:03 AM1/15/19
to OpenRefine
Hi Ettore,

There have been some changes to how nulls are handled - but the current situation is if you want the null to be treated as a string you have to explicitly convert it - so if you use

value.toString() + cells["Column 1 3"].value.toString() + cells["Column 1 4"].value.toString()

That would avoid the problem with nulls, as null now (OR 3.1) converts to an empty string on 'toString()'

Lots of discussion about null handling on Github - I'm still not sure we've got this working in the optimal way but I think we have more consistency now than previously. Happy to look at further changes if anyone feels strongly about null handling and wants to suggest changes.

Owen

Ettore RIZZA

unread,
Jan 15, 2019, 6:41:20 AM1/15/19
to openrefine
Hi Owen,

Thanks for the reminder. There was so much discussion that I lost the thread. Given the number of questions in this group related to the concatenation of columns, it will one day be necessary to create a graphical interface that handles nulls, much like the "replace" function.

Ettore Rizza


--
You received this message because you are subscribed to a topic in the Google Groups "OpenRefine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openrefine/O6QF_KKaMw0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openrefine+...@googlegroups.com.

Owen Stephens

unread,
Jan 15, 2019, 6:49:34 AM1/15/19
to OpenRefine
Agreed that column concatenation would be a good function to have an interface for with simple options. I also saw a request via Twitter for a "swap column values" option which feels slightly related and also a good idea.

There's an existing Github issue which suggests a "Concatenate with" function https://github.com/OpenRefine/OpenRefine/issues/1473 - although this suggests a slightly different approach. I wonder if this needs combining with your suggestion here with a single approach to concatenating values from multiple columns?

Owen

Ettore Rizza

unread,
Jan 15, 2019, 7:02:15 AM1/15/19
to OpenRefine
There's an existing Github issue which suggests a "Concatenate with" function

Yes, I remember that proposal. It was based on the list of interesting features of Talend Data Preparation. But in this software, "Concatenate with ..." is not a function, it's a menu feature with a friendly interface.
Reply all
Reply to author
Forward
0 new messages