Calculate number of days between two dates minus weekends in Open Refine

1,832 views
Skip to first unread message

amga...@gmail.com

unread,
Feb 3, 2014, 12:34:49 PM2/3/14
to openr...@googlegroups.com
Hello, 

Could anyone help me figure out an expression to calculate the number of days between 2 dates minus the weekends?  Here is the expression I used to calculate the number of days between 2 dates: diff(cells["DateOUT_date"].value, cells["DateIN_date"].value, "days") Then I realized I needed to account for the weekends.

I've been on the hunt for this solution for a week or so - I've tried to be thorough in my search. I found an answer on StackOverflow that explained how to do this in Python (http://stackoverflow.com/questions/3615375/python-count-days-ignoring-weekends) and I found an extension to Python that extends Python's datetime function (https://pypi.python.org/pypi/workdays/).  But I am stuck.  I don't know what Open Refine directory to move .py files to and I don't know exactly how I would write the expression to get my desired information.  Is there an easier way to do this? At one point when I was formatting my date fields the result was something like "Fri Jan 17", so it looks like the actual day is stored in there somewhere, so how can I exclude the weekends?

Any help would be much appreciated!

-Anne

Thad Guidry

unread,
Feb 3, 2014, 2:30:27 PM2/3/14
to openr...@googlegroups.com
Somehow you should be able to filter out only those that conform to the ISO weekday standard of 1,2,3,4,5 (for Monday through Friday only) and exclude those that are not day of week 6 or 7.

ISO definitions, where 1 is Monday and 7 is Sunday.  But Python itself uses 0 - 6, as some other languages do (just in case you want to experiment with Jython as your expression language of choice rather than GREL)

I just do not see an easy way to do that currently in GREL.  Hacking with Jython or Clojure as the expression of choice might be easier for you.


where you could offer a bounty ($ money) for us to hack on that a bit to get an additional parameter for diff() function to add a boolean flag (Y or N) for "include iso weekdays or not".

So your expression would potentially look like this:

diff(cells["DateOUT_date"].value, cells["DateIN_date"].value, Y, "days")

Of course, there also might be more of a benefit with including JodaTime into this conversion of hacking on diff() to support multiple kinds of date diff usage.



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--

Thad Guidry

unread,
Feb 3, 2014, 4:17:57 PM2/3/14
to openr...@googlegroups.com
Anne,

I did put together a quick Jython example workday_diff that shows 1 approach you might take.
You need to copy the workdays.py file (extracted from that extension you found) into the path of \OpenRefine\extensions\jython\module\MOD-INF\lib


import datetime
import workdays

def workday_diff(date1, date2)
date_format = "%m/%d/%Y"

d1 = datetime.datetime.strptime('12/27/2013', date_format)
d2 = datetime.datetime.strptime('01/02/2014', date_format)

workday_diff = workdays.networkdays(d1, d2)
return workday_diff

Hope that helps you along,

amga...@gmail.com

unread,
Feb 10, 2014, 3:55:51 PM2/10/14
to openr...@googlegroups.com
Hi Thad, 

Thanks so much! Unfortunately I was constrained by my non-existent coding skills, so I wasn't able to figure out how to do this in OpenRefine. There's a =NETWORKDAYS function in Excel that I'm just going to have to use. No biggie.

Best, 
Anne
Reply all
Reply to author
Forward
0 new messages