convering date format in openrefine

4,799 views
Skip to first unread message

peter.m...@hmri.org.au

unread,
Apr 21, 2017, 8:00:08 PM4/21/17
to OpenRefine
I uploaded from excel with a date in the format   29/05/1935

Open refine loads this as 1935-05-29T00:00:00ZI


I have looked around the posts but cant see how to convert this to a date in the original format -  I can bet it looking right in text but then when do toDate  it reverts back to the OpenRefine upload format

thanks

Peter





Ettore Rizza

unread,
Apr 22, 2017, 3:57:17 AM4/22/17
to OpenRefine
Click on "Edit Cells / Transform" and then use this GREL formula :

value.toString("dd/MM/YYYY")



Hope this help.

peter.m...@hmri.org.au

unread,
Apr 23, 2017, 7:30:26 PM4/23/17
to OpenRefine
Ettore, thanks again,  have now worked this out but took some trial and error. :)

Dates and time are one of the most difficult issues in Excel and quite hard in OpenRefine

for raw data fromated as  21/01/2017 3:04:02 PM

this worked   value.toDate('dd/MM/yyyy hh:mm')

also:    toDate(value, false, "dd/MM/yyyy hh:mm")

however the time output displayed is always in yyyy-MM-dd  format eg  2017-01-05T04:22:00Z  no matter what you put in the format argument in the transform string.

you can convert back to text in the  dd/MM/yyyy format however lose the ability to sort on date/time

However to other date date imported from different source in format  2/18/2017 10:52
i needed to use the following transform:

toDate(value),"dd/MM/yyyy hh:mm"

all very puzzling and to work this out has take several hours of playing around  and experimenting

Peter

Ettore Rizza

unread,
Apr 24, 2017, 6:03:16 AM4/24/17
to OpenRefine
Hi Peter, 

I think there is a misunderstanding about the concept of date. In Open Refine, dates (2016-10-31T00:00:00Z) are a primary data structure in the same way as strings ("Peter"), numbers (2), and Booleans (true or false). 

When you say "25/02/2014", this is a date in the human sense, but for Open Refine, it's a simple string of characters. It is the same difference as between 2 and "two": both are numbers in the human sense of the term, but for the computer, only the first is a number, the second is a string.

If you want to be able to sort your rows by date/time, you can keep the original date column and create a new one in dd/mm/YYY format. If you want a dd/mm/YYY h:m format, you can use this formula :

value.toString("dd/MM/yyyy h:mm")

But it's not a date, it's a string.

Tinashe Manzou

unread,
Nov 15, 2018, 5:52:16 PM11/15/18
to OpenRefine
Hi Ettore,

Would you know how to convert a date like the one I've pasted below.

Fri Jul 04 00:00:00 CAT 2014 Thank you kindly in advance, T

Ettore Rizza

unread,
Nov 16, 2018, 2:16:54 AM11/16/18
to OpenRefine
Hi Tinashe,

in theory, it should be something like:

value.toDate('EEE MMM dd HH:mm:ss z yyyy')

But this does not work (I got an "Error: Unable to parse as date")... 

Owen Stephens?

Ettore Rizza

unread,
Nov 16, 2018, 2:22:59 AM11/16/18
to OpenRefine
What's the problem with the result of a simple value.toDate() ?

screenshot-localhost-3333-2018.11.16-08-22-43.png

Thad Guidry

unread,
Nov 16, 2018, 5:26:02 AM11/16/18
to openr...@googlegroups.com
Why do I recognize the CAT 2014 as being significant here ?  I've seen this before but where ?



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Owen Stephens

unread,
Nov 16, 2018, 6:40:25 PM11/16/18
to OpenRefine
A couple of issues here:

1) there is a bug with toDate in 3.0 - see https://github.com/OpenRefine/OpenRefine/issues/1759 - this is why Ettore's suggestion of value.toDate('EEE MMM dd HH:mm:ss z yyyy') didn't work. You can get around it by using value.toDate(false, 'EEE MMM dd HH:mm:ss z yyyy') but this gives the same result as just value.toDate() - which has a problem - see (2)

2) CAT is the timezone (Central African Time). So "Fri Jul 04 00:00:00 CAT 2014" -> 2014-07-03T22:00:00Z (because CAT is UTC+2) - whereas currently the value.toDate() is giving 2014-07-04T00:00:00Z - two hours ahead of the correct time.

I'm happy to say I've been working on fixing these problems, and now both problems have been fixed in the code and the fix will be part of the upcoming 3.1 release

Owen



I've just submitted a fix for this (yesterday) and once this is released in v3.1 (in the next few weeks I hope) the behaviour of toDate will be correct I think:

"2014-07-03T22:00:00Z".toDate() -> [date 2014-07-03T22:00:00Z]

Thad Guidry

unread,
Nov 16, 2018, 7:18:21 PM11/16/18
to openr...@googlegroups.com
Great, Thanks Owen !

Reply all
Reply to author
Forward
0 new messages