How to use toDate to convert Timestamp with decimal seconds

3,581 views
Skip to first unread message

Jonathon Paarlberg

unread,
Oct 24, 2017, 9:17:52 AM10/24/17
to OpenRefine
Hi, all. I am trying to convert a timestamp such as the one below to a date that can be used for time series and calculations, especially time differences spanning fractions of a second.

For example, I'd like to convert to a time format that would allow me to subtract the first of these two times from the second. (The numbers after the seconds are in decimal seconds; i.e. tenths to ten-thousandths of a second.)
05/18/2016 11:54:26.5550
05/18/2016 11:54:26.7380

If I just do...
value.toDate('MM/DD/YYYY hh:mm:ss')

I get...
[date 2015-12-27T02:01:28Z]

...for the first date. This is obviously wrong, and also I don't know whether the decimalized seconds were figured in yet somehow hidden from the display of the result. What does the "Z" mean? I'm sorry to say I'm a bit lost here. Please point me in the right direction.

Thank you.

Ettore Rizza

unread,
Oct 24, 2017, 10:25:24 AM10/24/17
to OpenRefine
Hi Jonathon, 

I've never used the function toDate with such precision. There is probably a solution in digging into the Java documentation, but I will probably never use Open Refine for such accurate calculations. It is not made for that.

Jonathon Paarlberg

unread,
Oct 24, 2017, 10:37:01 AM10/24/17
to OpenRefine
Thanks, but I don't really need to use Openrefine for the calculations, only for generating a date formatted time (including the decimal seconds) that can be used for calculations by other software; for example, it could be imported into a spreadsheet application or a database application.

Ettore Rizza

unread,
Oct 24, 2017, 12:03:30 PM10/24/17
to OpenRefine
I'm not sure of the best way to do this in GREL, so I leave the answer to someone better educated than me in Java. In Jython, with the standard library, one solution might be to parse your string with the datetime module:

import datetime
return datetime.datetime.strptime(value, "%m/%d/%Y %H:%M:%S.%f")


The result is a datetime object, a sort of tuple. 




You could then extract part of the date/time or perform transformations on this tuple, for example transform it into epoch time (recognized by many software):

import datetime
import time

date
= datetime.datetime.strptime(value, "%m/%d/%Y %H:%M:%S.%f")

return time.mktime(date.timetuple()) + date.microsecond * 1e-6

Thad Guidry

unread,
Oct 24, 2017, 3:23:25 PM10/24/17
to openr...@googlegroups.com
Jonathan,

Look at our Date functions and see if you can figure it out.

OpenRefine just uses Java's SimpleDateFormat and we provide a link to that documentation on that Wiki page we have.


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ettore Rizza

unread,
Oct 24, 2017, 4:18:53 PM10/24/17
to OpenRefine
@Thad : I must admit that even reading the documentation, it is not easy to play with thousandths of seconds in GREL.

I can transform "05/18/2016 11:54:26.5550 " into "2016-05-18T11:54:26Z" using:

value.toDate("M/d/y H:m:s")

Or into "
2016-05-18T11:54:31Z" using:

value.toDate("M/d/y H:m:s.S")

But it's not easy to interpret "
.5550" as "half a second". 

Thad Guidry

unread,
Oct 24, 2017, 6:16:17 PM10/24/17
to openr...@googlegroups.com
Sure Ettore.  Happy to explain further...

I provided that precison enhancement last year... looks like docs need to be updated however in various places.  Please help Ettore with that ! :)

But basically the GREL would then look something like this...

value.toDate("MM/dd/YYYY HH:mm:ss.SSSS").datePart("milliseconds")

value.toDate("MM/dd/YYYY HH:mm:ss.SSSS").datePart("ms")  

value.toDate("MM/dd/YYYY HH:mm:ss.SSSS").datePart("S") 

Was that not completely explanatory on that Wiki page ?  I thought it was ... but perhaps that can be improved by you Ettore so that its more quickly explained to users with some further GREL examples.  Why not contribute to that Wiki page and improve it for all ?  You can do that ! :)

Thad Guidry

unread,
Oct 24, 2017, 6:21:11 PM10/24/17
to openr...@googlegroups.com
Ettore,

Probably this scenario example as well would be useful....

"yyyy-MM-dd'T'HH:mm:ss.SSSZ"     2001-07-04T12:08:56.235-0700

value.toDate("yyyy-MM-ddTHH:mm:ss.SSSZ").datePart("milliseconds")


I'd like you to ensure that we didn't miss any of those options in our Reference Docs.

Thanks !

Thad Guidry

unread,
Oct 24, 2017, 7:06:38 PM10/24/17
to openr...@googlegroups.com
Oh dear... we might have a bug within toDate() ...


Ettore, see if you can reproduce the bug I am seeing... and leave your comments in that new issue above I just created.

Ettore Rizza

unread,
Oct 25, 2017, 3:55:24 AM10/25/17
to OpenRefine
@Jonathon Paarlberg : Okayyyyy, I finally understood. To make it short, the date package used by GREL has a maximum accuracy of one millisecond. If you give it more precision, as in your example, the decimal part (.5550) will be interpreted as 5 seconds and 550 milliseconds. Your result will be wrong.

Two solutions at first sight:

- Use the datetime Python module mentioned above, which has microsecond precision.

- Use GREL, but by chomping a decimal on your seconds date part, so that 05/18/2016 11:54:26.5550 becomes 05/18/2016 11:54:26.555 (and you loose up to 0.9 millisecond of precision).

The formula would be in this case something like:

value.replace(/\d$/,'').toDate("MM/dd/y H:m:s.S")


Le mardi 24 octobre 2017 15:17:52 UTC+2, Jonathon Paarlberg a écrit :

Jonathon Paarlberg

unread,
Oct 26, 2017, 4:44:18 PM10/26/17
to OpenRefine
Thank you both very much! The detail on the wiki is there, but it wasn't immediately apparent to me how to use the Unit and "datePart" to create the GREL. I will try out your truncated version, Ettore, as I don't really need that last decimal place.

Jonathon Paarlberg

unread,
Oct 26, 2017, 5:42:21 PM10/26/17
to OpenRefine
Sorry to say that the replace function didn't work the way you had it. I will try again later. Without the replace, it still gives the Z, and I don't know how that is to be interpreted. Ettore, I also tried the Jython code, and for some reason it left my new column blank. I will also experiment some more with that when I have time. End of day here now. But thank you for the suggestions.


On Wednesday, October 25, 2017 at 3:55:24 AM UTC-4, Ettore Rizza wrote:

Thad Guidry

unread,
Oct 26, 2017, 6:43:15 PM10/26/17
to openr...@googlegroups.com
Z is zulu.  Its a throw back to the old Darpa defense program contributions and Java absorbed it way back in 1999 as well.

From Google:
The T is just a literal to separate the date from the time, and the Z means "zero hour offset" also known as "Zulu time" (UTC). If your strings always have a "Z" you can use: SimpleDateFormat format = new SimpleDateFormat( "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", Locale.US); format.setTimeZone(TimeZone.getTimeZone("UTC"))  

-Thad

--

Ettore Rizza

unread,
Oct 27, 2017, 6:45:38 AM10/27/17
to OpenRefine
Hmm, everything seems ok at home. Maybe your strings are slightly different from the ones I used. The regular expression \d$ means "any number at the end of the string".


The "milliseconds" part is invisible in the date format you get (2016-05-18T11:54:26Z), but rest assured, it's there. You can verify that by retransforming your string to date using :


value.toDate().toString("dd/MM/yyy h:m:s.S")


Result :

18/05/2016 11:54:26.555
18/05/2016 11:54:26.788

Owen Stephens

unread,
Oct 27, 2017, 7:08:21 AM10/27/17
to OpenRefine
As well as the other issues noted by Ettore and Thad, note that case is important in the toDate/toString conversion formats. Once you have done the replace as suggested by Ettore, you need to make sure you use:

value.toDate('MM/dd/yyyy hh:mm:ss.S')


not

value.toDate('MM/DD/YYYY hh:mm:ss')

as you had in your original question

Owen

Jonathon Paarlberg

unread,
Nov 1, 2017, 5:35:36 PM11/1/17
to OpenRefine
O.k. Thanks, both of you. I apologize for my initial confusion. As you said, the milliseconds are actually there, just obscured by the display of Zulu time. :-) Cute.

'Not really a problem that I can't get the last of the four decimal places.

Thad Guidry

unread,
Nov 1, 2017, 6:13:24 PM11/1/17
to openr...@googlegroups.com
Glad to be of assistance Jonathon !

If you have further problems, just send a new email to this list !  We watch it every second, 24/7. (just kidding, but Owen and Ettore nearly do :-)

Jonathon Paarlberg

unread,
Nov 1, 2017, 6:38:51 PM11/1/17
to OpenRefine
Separate issue, but exporting to Excel 2007 causes loss of the millisecond decimals. 'Need to find a way to subtract diff of next line Time from last line Time but keep milliseconds. Should I use Openrefine for that?


On Tuesday, October 24, 2017 at 9:17:52 AM UTC-4, Jonathon Paarlberg wrote:

Ettore Rizza

unread,
Nov 2, 2017, 6:28:45 AM11/2/17
to OpenRefine
Hi Jonathon,

Sorry for my incredibly late response, I was unusually sleeping. ;)

I'm not sure that Excel can not handle milliseconds. They probably do not appear on the screen, but a subtraction of two dates should give you their difference in milliseconds.

Anyway, you can always transform your date-time into Epoch Unix in Open Refine, with the formula value.datePart('time'), then perform in Excel subtractions between these two numbers. I would advise against doing these calculations in Open Refine.

Example with your two original dates :

Reply all
Reply to author
Forward
0 new messages