Newbie: Difference between two dates stored in two columns

190 views
Skip to first unread message

dhkl...@gmail.com

unread,
Nov 5, 2016, 4:27:27 AM11/5/16
to OpenRefine
Hi Community, 

I have a little project I am working on using OpenRefine. I do a lot of data cleaning, which works nicely. However there is one step I cannot figure out: It is about determining the age of a participant of a project on a certain cutoff day. There are two relevant columns, one containing the cutoff-date for each record (called "Col.New") and one the Date of Birth (called "Date of Birth"). Both dates are like that: 2013-06-30T00:00:00Z.

This is where I get stuck: To get the age of the participant on the cutoff day, I should get the difference between the new column and the "Date of Birth" column in full years (something I could achieve rather easily with excel, however the way I am working, it would be nice to get a finished table out of OpenRefine). 

What I tried was to get the diff expression to work on the "Col.New"-column, something like this: 

diff(cells["Col.New"], cells["Date of Birth"], year)



However this gives me: "Error: Unexpected arguments - expecting either 2 strings or 2 dates and a unit string". Both columns have the same format (shown above). I admit that I have no clue how to apply the diff-expression, so that might be one reason. 

It's probably an easy error, but being new I feel that I tried everything without getting something different that this error.

Help would be highly appreciated. Thank you very much!

Thad Guidry

unread,
Nov 5, 2016, 9:22:49 AM11/5/16
to OpenRefine
The conditions for the 3rd argument , as found in our source, show that it expects perhaps "years".



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thad Guidry

unread,
Nov 5, 2016, 9:24:16 AM11/5/16
to OpenRefine
Damn, got cut off during reply...

return delta;
}
delta /= 60;
if ("minutes".equals(unit)) {
return delta;
}
delta /= 60;
if ("hours".equals(unit)) {
return delta;
}
long days = delta / 24;
if ("days".equals(unit)) {
return days;
}
if ("weeks".equals(unit)) {
return days / 7;
}
if ("months".equals(unit)) {
return days / 30;
}
if ("years".equals(unit))

dhkl...@gmail.com

unread,
Nov 11, 2016, 1:17:15 PM11/11/16
to OpenRefine
Thank you, Thad, that helped a lot. 

The actuall expression was like this in the end:

diff(cells["Cutoff-Date"].value,cells["Date of Birth"].value, "years")

Then, however, it did not recognize the cutoff-date but rounded up the returning age. So I assume by comparing the two dates, OpenRefine gets something like 23,1 years and return 24 as the age, which is not what I wanted. So I replaced "years" with "days" and used 

floor(value/365)

on the Age-Column afterwards to get the correct age in years at the cuttoff-date. I could imagine that there is a cleaner way to achieve this but as far as I see it, the results are the way I want them to be. 

Thanks again. Keep it up ;). 
Reply all
Reply to author
Forward
0 new messages