Compare dates?

43 views
Skip to first unread message

olivia solis

unread,
Mar 26, 2019, 10:07:53 AM3/26/19
to OpenRefine
Hello all,

I need to normalize a bunch of date ranges that have begin and end dates. The system I eventually will be importing them into will only accept the date range if the begin date is before the end date (quite reasonable). The dates, which are currently strings, may have any of the following forms in the begin and end columns:

yyyy
yyyy-MM
yyyy-MM-dd

So here in this test project, I added columns for begin_date and end_date with the orginal values converted to dates:

Screen Shot 2019-03-26 at 8.58.42 AM.png



Is there an easy way to check to verify that the begin dates fall before the end dates? Some kind of boolean true/false that begin is before end. In one set of dates, I had converted the original dates to a number by adding "0000" to anything that was yyyy and 00 to anything that way yyyy-MM, then removing the dashes and converting to number. After that I could compare numbers. But I was hoping there was a a more direct/less time-consuming way to verify that all my thousands of dates have kosher date ranges, with begin before end.

Thanks!
Olivia
date_range_test.openrefine.tar.gz

Ettore Rizza

unread,
Mar 26, 2019, 11:55:03 AM3/26/19
to OpenRefine
Hello,

The diff function seems made for you. Normally, the result of this formula must always be positive:

diff(cells['end_date'].value, cells['begin_date'].value, "days")

Thus negative values should be errors. But I don't know why the thing does not work with 2900. Strange. I need to explore the Java documentation.

In the meantime, you can also use this, which seems to work more predictably:

cells['end_date'].value.datePart("time") - cells['begin_date'].value.datePart("time")


datePart ("time") expresses each date in milliseconds since January 1, 1970 (Unix Epoch).

Cheers,

Ettore Rizza

olivia solis

unread,
Mar 27, 2019, 9:26:56 AM3/27/19
to OpenRefine
Thank you, Ettore! Both of these are so much more efficient than my previous method. Oddly, the 2900 row did produce a negative value for me using the diff function, so I am not sure why we are experiencing different behavior. I am using LODRefine on a Linux VM if that makes any difference. 

Again, I very much appreciate this help!

Best,
Olivia

Thad Guidry

unread,
Mar 27, 2019, 10:30:27 AM3/27/19
to openr...@googlegroups.com
I think it might be because we are not comparing properly in our code somewhere, but I could be wrong.

> java.time.OffsetDateTime
> This is a value-based class; use of identity-sensitive operations (including reference equality (==), identity hash code, or synchronization) on instances of OffsetDateTime may have unpredictable results and should be avoided. The equals method should be used for comparisons.


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thad Guidry

unread,
Mar 27, 2019, 10:40:45 AM3/27/19
to openr...@googlegroups.com

Owen Stephens

unread,
Mar 28, 2019, 5:11:15 AM3/28/19
to OpenRefine
Ettore - can you clarify what behaviour you are seeing for the 2900 line?

If I run:

diff("2900-01-01T00:00:00Z".toDate(),"1923-08-03T00:00:00Z".toDate(),"days")
 
I get -70378 - which I think is what I'd expect

Are you seeing something different?

Owen
To unsubscribe from this group and stop receiving emails from it, send an email to openr...@googlegroups.com.

Ettore RIZZA

unread,
Mar 28, 2019, 5:22:14 AM3/28/19
to openrefine
@Owen : Same number as you, but positive :-|

screenshot-127.0.0.1-3333-2019.03.28-10-17-49.png
I use OR 3.2 on MacOSX.

Ettore Rizza


To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Ettore RIZZA

unread,
Mar 28, 2019, 5:30:14 AM3/28/19
to openrefine
But I got the right result using your formula.

screenshot-localhost-3333-2019.03.28-10-27-03.png




Ettore Rizza

Owen Stephens

unread,
Mar 28, 2019, 6:28:25 AM3/28/19
to OpenRefine
Thanks Ettore - it looks like there is something odd with comparing dates that are more than 106750 days apart

diff(toDate("1700-01-10"),toDate("1992-04-21"),"days") - > +106751
diff(toDate("1700-01-10"),toDate("1992-04-19"),"days") -> -106750

That gives us something to look at to do the analysis and fix

Ettore RIZZA

unread,
Mar 28, 2019, 6:37:08 AM3/28/19
to openrefine
Thanks Owen. As I said on Github, positive or negative, the result is wrong in any case since there are much more than 70 000 days of difference between 2900 and 1923. OpenRefine 2.8 give the right result, but OR 3.1 produces the same error.


Ettore Rizza


--
Reply all
Reply to author
Forward
0 new messages