Why is result null for DATE_DIFF() ?

32 views
Skip to first unread message

じょいすじょん

unread,
Mar 30, 2020, 11:17:37 PM3/30/20
to ArangoDB
I am doing DATE_DIFF(doc.date, prob.createdAt, "d", true)
Where the dates are ISO8601 date strings for example:
DATE_DIFF("2020-03-25T07:10:21Z", "2020-03-24T04:59:41+0000", "days", true)
However, I always get null as the result.
I tried ensuring they both show GMT and it is the same.
What am I doing wrong here?

When I create a query that just tests the DATE_DIFF function inputs

//RETURN DATE_DIFF("2020-03-25T07:10:21Z", "2020-03-24T04:59:41+0000", "days", true)
//RETURN DATE_DIFF("2020-03-25T07:10:21Z", "2020-03-27T07:10:21Z", "days", true)
RETURN DATE_DIFF("2020-03-22T04:59:41+0000", "2020-03-24T04:59:41+0000", "days", true)

I can see that any time the time zone is +0000 then the output is
Warnings:

[1572], 'in function 'DATE_DIFF()': invalid date value'

Result:

[
  null
]

According to the spec and for easier reading Wikipedia https://en.m.wikipedia.org/wiki/List_of_UTC_time_offsets

Z is +0000

I inserted : in +0000 as +00:00 and now it works.

Is this expected behavior or am I coming across a bug?
I'm unable to confirm whether or not the offset needs the colon : 


じょいすじょん

unread,
Mar 30, 2020, 11:27:16 PM3/30/20
to ArangoDB
I work around it by doing SUBSTITUTE( someDate, "+0000", "+00:00")
That works fine I guess.
But I would like to find out if I should file a bug with my data source ?

Jan Uhde

unread,
Mar 31, 2020, 12:07:17 AM3/31/20
to ArangoDB
While the ArangoDB code is not 100% ISO conform it should be correct there. I think the ISO states that you should b use the extended format or not. Therefore is should not be possible set the colon at dinner points but not at others. Like it is done in your example.

To get an idea please look at this unfinished code:

https://github.com/ObiWahn/PEGTL/commit/95a825326d734ff035c0979b2019695befda20b0#diff-88660bc76de9dabacba18430927dfcd1

The relevant ArangoDB code can be found here:
https://github.com/arangodb/arangodb/blob/devel/lib/Basics/datetime.cpp

Reading the code is your best shot to understand what will work and what does not. Because this does not follow closely any standard.

At some point I wanted to make it ISO conform but it was decided that it is not worth the trouble and we do not want to change what is currently supported.

If you are interested you can help in my freetime effort to finish the PEG grammar in above repository. It might be added to ArangoDB because we use PEGTL in other places by now. This was not the case when I was looking at the problem last time. Therefore an additional library needed to be added in the past resulting in more resistance.


dangerwillro...@gmail.com

unread,
Mar 31, 2020, 12:18:13 AM3/31/20
to aran...@googlegroups.com
Hi Jan

Thanks for confirming and thanks for the code reference. Very helpful to know.
Might be good to note in the docs How conformant it is or not.
If I have time I will peek at the code and see if I can contribute.

I wonder if some cases might be well covered already by ICU4C ?

>
> On Mar 31, 2020, at 13:07, Jan Uhde <jan....@arangodb.com> wrote:
>
> While the ArangoDB code is not 100% ISO conform it should be correct there. I think the ISO states that you should b use the extended format or not. Therefore is should not be possible set the colon at dinner points but not at others. Like it is done in your example.
> --
> You received this message because you are subscribed to a topic in the Google Groups "ArangoDB" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/arangodb/TWrmSbJq0vY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to arangodb+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/arangodb/8e857aa3-fd7a-4a30-a4ff-b1ef208015f0%40googlegroups.com.

Jan Uhde

unread,
Mar 31, 2020, 1:04:30 AM3/31/20
to ArangoDB
Hey,

Is this the functionality you are referring to:

https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/udat_8h.html#a86fd11a05b4586a3076eef4553cea1f0

Because you still need to provide a formapt.This could be done as a list or as an grammar. I think for ICU one could provide a vector of formats to test / evaluate. A grammar on the other hand will allow for faster match or termination.

How do you think ICU could be used for iso date parsing?

Thanks

Jan

じょいすじょん

unread,
Mar 31, 2020, 1:50:09 AM3/31/20
to aran...@googlegroups.com
Hi Jan,

If I understood the taocpp PEGTL correctly, it might be something like this:

struct extended_offset : sor< Z, seq< sign, zhh, opt< opt<colon>, zmm > > > {};
struct basic_offset : sor< Z, seq< sign, zhh, opt< opt<colon>, zmm > > > {};

With some additional tests to include something like this:
"2020-03-24T04:59:41+0000"
"2020-03-24T04:59:41+00:00"
"2020-03-24T04:59:41+00"
"2020-03-24T04:59:41"
"2020-03-24T04:59:41Z+0000"
"2020-03-24T04:59:41Z+00:00"
"2020-03-24T04:59:41Z+00"
"2020-03-24T04:59:41Z"
"2020-03-24T04:59:41-0000"
"2020-03-24T04:59:41-00:00"
"2020-03-24T04:59:41-00"
"2020-03-24T04:59:41"
"2020-03-24T04:59:41Z-0000"
"2020-03-24T04:59:41Z-00:00"
"2020-03-24T04:59:41Z-00"
"2020-03-24T04:59:41Z"

Sorry it's my first time looking at taocpp PEGTL, but if it helps, I could express it pretty concisely in a Regular Expression.
(only for the offset portion, and it may still yet be incomplete)

I'd recommend looking at ICU
icu/icu4c/source/i18n/datefmt.cpp

Theirs is the canonical and reference implementation of everything date time and unicode.
It's probably a parser only expecting a short string for a date, so you would either want to call their function if you can determine the terminals for a possible date string in your parsing, or just try to examine their logic and their tests. Their tests should be informative.

Unfortunately, it's an absolutely massive code base that takes time to dig through, but Github's search helps…

じょいすじょん

unread,
Mar 31, 2020, 1:53:24 AM3/31/20
to aran...@googlegroups.com
I will add inlined.
I'm not sure how low-level your parsing is happening, if you are already expecting dates in specific fields or what.
If you can identify a string that might be a date that needs parsing, you can send it through their DateFormat::parse function for validation.
No idea how that might impact your performance.

At least, you could definitely at least use their reference for passing tests to aim for. If you can find them :D

>
> Thanks
>
> Jan
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "ArangoDB" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/arangodb/TWrmSbJq0vY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to arangodb+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/arangodb/6532709a-556a-4cd4-88d8-5a2043bcd4ac%40googlegroups.com.

じょいすじょん

unread,
Mar 31, 2020, 1:55:26 AM3/31/20
to aran...@googlegroups.com
Anyway, theirs being the canonical source for almost every library, since it ships with and is implemented by Unicode.org contributors and backed by IBM historically, it is the foundation for many platforms' implementations as well.
Gnarly big code base, but gets you consistency that somebody else maintains :D

Jan Christoph Uhde

unread,
Mar 31, 2020, 3:02:55 AM3/31/20
to ArangoDB
The tests in the tao-file are taken from ISO8601 (2004). `((\+|\-)\d\d)((:)?\d\d)?$` is not correct because the -cake- `:?` is a lie. It has either to be used everywhere or not at all.


Those lines are as they must be! Only the extended form is allowed to have the colon. This is the whole reason for having 2 different rules. Otherwise a single rule with the RE you provided would suffice.

It can be done with REs - no question - but it is a lot more to write than when using a something like a CFG.

If ICU would offer the functionality we would be happy to use it, but you have to provide all possible format strings which is exactly what my PEG grammar describes. And as I said trying out a list of format string is just much slower than using some sort of grammar or regular expression. I could try to parse "xyz" and would not get more than a parse error. Which means that I still have to go through the list of all format strings. With the grammar on the other hand there is nothing to get away from the start symbol and you are immediately done. ICU is a nice and widely used lib, but it is about unicode encodings and not ISO8601 parsing. It has some date support but not what we need. The code does not even mention the ISO once.
To unsubscribe from this group and all its topics, send an email to aran...@googlegroups.com.

じょいすじょん

unread,
Mar 31, 2020, 5:03:55 AM3/31/20
to aran...@googlegroups.com
Hi Jan,

Thanks for the discussion.

On Mar 31, 2020, at 16:02, Jan Christoph Uhde <j...@uhdejc.com> wrote:

The tests in the tao-file are taken from ISO8601 (2004).
I'm reading it and I do not see any prohibition on using the basic format time shift (UTC/GMT offset) with the extended format date and time of day.

Perhaps I am not finding it successfully.
I am reading ISO 8601 part 1 and part 2 2019 version (I don't have an older version)
It does not provide a mixed extended and basic example but it does not prohibit it
Same.

`((\+|\-)\d\d)((:)?\d\d)?$` is not correct because the -cake- `:?` is a lie.
I don't promise it to be exhaustive, I have not run it on enough tests. 

It has either to be used everywhere or not at all.
I'm not sure that's accurate in reality.
At least in practice, it's rather common to see : separator for hours minutes and seconds but nothing in the GMT offset.
An example online

I see it in sections 4.3.13, 5.3.4.1
It only describes basic and extended time shift formats for the GMT offset.
There are no rules described that I can find that are saying it must be : or not. Only the extended format includes it.


From ISO 8601-1

A time shift, often used in the representation of local standard time against UTC, is represented as follows:

4.3.13 Time shift

a) Basic, hours and minutes: [±][hour][min] or [“Z”] EXAMPLE 1 ‘+0500’ or ‘Z’

b) Basic, hours only: [±][hour] or [“Z”] EXAMPLE 2 ‘+05’ or ‘Z’

c) Extended, hours and minutes: [±][hour][“:”][min] or [“Z”] EXAMPLE 3 ‘+05:00’ or ‘Z’

The UTC designator ["Z"] indicates that there is no time shift from UTC of day and is functionally equivalent to the expressions ‘+0000’ and ‘+00:00’. The time shift shall be expressed as positive (i.e. with the leading plus sign [“+”]) if it is ahead of or equal to UTC, and as negative (i.e. with the leading minus sign [“-”]) if it is behind UTC. 


To unsubscribe from this group and all its topics, send an email to arangodb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arangodb/f43df4a6-6412-4dbc-bee4-b32ed25611bd%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages