batch ingestion with two-digit year?

15 views
Skip to first unread message

Dan Prince

unread,
Sep 3, 2021, 3:27:41 PMSep 3
to Druid User

I want to ingest dates of the format MM/dd/yy (e.g. 02/01/03 for Feb 1 2003). I can't figure out how to specify this using a joda time format string, which seems to be what druid requires.  When I try, the data has dates in the year 0003, not 2003.

In java code I can do this by specifying a pivot year for the joda DateTimeFormatter, but there seems to be no way to do this with just a format string (which is all that is available in the druid ingest spec).

Rachel Pedreschi

unread,
Sep 16, 2021, 10:29:34 AMSep 16
to druid...@googlegroups.com
Can you give us an example of what you have been trying?  I also find this guide helpful when dealing with JODA time:



On Fri, Sep 3, 2021 at 12:27 PM 'Dan Prince' via Druid User <druid...@googlegroups.com> wrote:

I want to ingest dates of the format MM/dd/yy (e.g. 02/01/03 for Feb 1 2003). I can't figure out how to specify this using a joda time format string, which seems to be what druid requires.  When I try, the data has dates in the year 0003, not 2003.

In java code I can do this by specifying a pivot year for the joda DateTimeFormatter, but there seems to be no way to do this with just a format string (which is all that is available in the druid ingest spec).

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/97f2ed5b-0175-487d-99d1-31903ea603f3n%40googlegroups.com.


--
Rachel Pedreschi
VP Developer Relations and Community
Imply.io

Ben Krug

unread,
Sep 16, 2021, 5:28:22 PMSep 16
to druid...@googlegroups.com
Based on Rachel's reference, you might try YY instead of yy - but I'm not sure how druid will interpret the era, you might still get 0003.  Worth a try, though.

Dan Prince

unread,
Oct 4, 2021, 7:46:47 AMOct 4
to Druid User
Thanks for your replies, I figured this out through trial and error.  For posterity:
  • A pattern with one 'y', like 'M/d/y', does not infer a century.  The year is interpreted literally.  So, 3/4/7 is in the year 7 CE, not 2007.  This is true even if you provide a two-digit year:  03/04/07 is also March 4th 0007.
  • A pattern with two 'y's, like 'MM/dd/yy' does infer a century, but only if you provide at least 2 digits for the year.
    • 3/4/7 is March 4 0007
    • 3/4/07 is March 4 2007
  • The 'pivot' for inferring a century seems to be 40, so '40' is 2040, but '41' is 1941.  I couldn't find this documented anywhere in the joda time docs.  The way joda does this seems to be slightly different than java.text.SimpleDateFormat, which uses the '80/20 years' rule.
    • 1/22/07 is Jan 22 2007
    • 1/22/40 is Jan 22 2040
    • 1/22/41 is Jan 22 1941 (for SimpleDateFormat, a date created in 2021 interprets 41 as 2041)
    • 1/22/99 is Jan 22 1999

Reply all
Reply to author
Forward
0 new messages