CEIL(__time to HOUR) returns 2019-05-25T03:00:00.000Z for __time 2019-05-25T02:00:00.000Z

48 views
Skip to first unread message

vindhya g

unread,
May 26, 2019, 2:20:16 AM5/26/19
to Druid User
Hi Experts,
  I have a use case where i want to group all events within an hour and have an average out of it.
So for ex events from 01 to 02 hour (including event at exact  02) to be included in the average but what i see is the event at 02 is considered at 03 instead of 02 if ceil function is used. 
I see ceil function behaviour in sql is different.
Even as per the definition here http://druid.io/docs/latest/misc/math-expr.html
it should return integer greater than or equal but it doesnt seem to return equal .
Is this as expected ?
Is there a function that adhere to my usecase that i can use?

Thanks in advance

Vadim Ogievetsky

unread,
May 28, 2019, 2:55:17 AM5/28/19
to Druid User
I can confirm that I am seeing what you are seeing, this is indeed very strange way for CEIL to act (not what I would expect). I think an issue should be raised.
As for your use case of "i want to group all events within an hour and have an average out of it", how come FLOOR (or CEIL as it works now) does not work for you?

Gian Merlino

unread,
May 28, 2019, 4:21:23 AM5/28/19
to druid...@googlegroups.com
This happens because what CEIL(<timestamp> TO Y) does in Druid is return the (exclusive) upper bound of the time bucket that includes <timestamp>. In particular, because it's exclusive, it will never be equal to the timestamp. If you pass in something that is at the bottom end of a bucket (like you did) then you'll get the start of the next bucket.

The function doesn't seem to be standardized (at least I can't find docs on it as a standard SQL function anywhere) so I guess it can behave however we want. IMO the current behavior is a bit weird, since it's non-idempotent. So it would make more sense to special case it so it returns the provided timestamp if that timestamp is already the upper bound of a time bucket.


Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/b492d6cd-f8fb-4faf-9f18-a658e4f67ad5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vindhya g

unread,
Jun 3, 2019, 1:25:27 AM6/3/19
to Druid User
Thanks Gian for helping and creating the issue . This is helpful 


On Tuesday, 28 May 2019 08:21:23 UTC, Gian Merlino wrote:
This happens because what CEIL(<timestamp> TO Y) does in Druid is return the (exclusive) upper bound of the time bucket that includes <timestamp>. In particular, because it's exclusive, it will never be equal to the timestamp. If you pass in something that is at the bottom end of a bucket (like you did) then you'll get the start of the next bucket.

The function doesn't seem to be standardized (at least I can't find docs on it as a standard SQL function anywhere) so I guess it can behave however we want. IMO the current behavior is a bit weird, since it's non-idempotent. So it would make more sense to special case it so it returns the provided timestamp if that timestamp is already the upper bound of a time bucket.


Gian

On Mon, May 27, 2019 at 11:55 PM Vadim Ogievetsky <va...@imply.io> wrote:
I can confirm that I am seeing what you are seeing, this is indeed very strange way for CEIL to act (not what I would expect). I think an issue should be raised.
As for your use case of "i want to group all events within an hour and have an average out of it", how come FLOOR (or CEIL as it works now) does not work for you?

On Saturday, May 25, 2019 at 11:20:16 PM UTC-7, vindhya g wrote:
Hi Experts,
  I have a use case where i want to group all events within an hour and have an average out of it.
So for ex events from 01 to 02 hour (including event at exact  02) to be included in the average but what i see is the event at 02 is considered at 03 instead of 02 if ceil function is used. 
I see ceil function behaviour in sql is different.
Even as per the definition here http://druid.io/docs/latest/misc/math-expr.html
it should return integer greater than or equal but it doesnt seem to return equal .
Is this as expected ?
Is there a function that adhere to my usecase that i can use?

Thanks in advance

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

vindhya g

unread,
Jun 3, 2019, 1:28:45 AM6/3/19
to Druid User
Hi Vadim.
  Thanks for checking this. So the way we show the data in ui is all the events that come from lets say 9.00 to 10.00 is shown on the chart(bar graph)  at 10.00 . so we have to do groupby ceil(__time) but problem is lot of events come at dot 10.00 which gets ceiled to 11 and that leads lot of discrepency on the data we show for time 10.00.
Hope my explanation about the use case is clear 
Reply all
Reply to author
Forward
0 new messages