how do I round up in query granularity?

84 views
Skip to first unread message

Marco Villalobos

unread,
Feb 22, 2022, 11:52:25 PM2/22/22
to druid...@googlegroups.com
I am ingesting data from Kafka.

In my granularity spec, I wrote:

"granularitySpec": {
        "segmentGranularity": "day",
        "queryGranularity": "fifteen_minute",
        "rollup": true
}

I noticed that the __time column gets rounded down instead of up, however, our business logic requires us to round this up.

Is there a way to configure this sort of behavior or achieve this sort of behavior?

Peter Marshall

unread,
Feb 23, 2022, 4:51:26 AM2/23/22
to Druid User
The `queryGranularity` setting allows for a number of types of supported truncation:
https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html#granularityspec

I don't believe that there's any "round up" function per se as this is not a rounding but a truncation – if that makes sense?

I wonder if you might be able to do a calculation on your timestamp using a transform, whether to create a new dimension that is rounded up, or even to replace the time value completely with what you need... - maybe timestampCeil?

Marco Villalobos

unread,
Feb 23, 2022, 8:12:17 AM2/23/22
to druid...@googlegroups.com
Hi Peter, 

Again, thank you. That seems like a good idea, however, I am also using a stringLast aggregator, and I think a timestamp transformation destroys the correctness of that.

Perhaps there is a way to get the stringLast aggregator to use the untransformed timestamp? I hope that a custom javascript aggregator is not the answer.


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/01568930-d777-4380-9c42-70227e263ccan%40googlegroups.com.

Mark Herrera

unread,
Feb 23, 2022, 4:33:47 PM2/23/22
to Druid User
Hi Marco,

In regard to your question about using the stringLast aggregator on untransformed timestamps, do you mean that there would be no rollup? I ask because the First / Last aggregators "cannot be used in ingestion spec, and should only be specified as part of queries."

Also, for the sake of my own clarity as I read your questions, does your business logic require two timestamps? An untransformed __time and something else?

Best,

Mark

Marco Villalobos

unread,
Feb 23, 2022, 7:48:50 PM2/23/22
to druid...@googlegroups.com
The documentation says:

(Double/Float/Long) First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.

I want to use stringLast. That will work, right?

On Feb 23, 2022, at 1:33 PM, Mark Herrera <mark.h...@imply.io> wrote:

Hi Marco,

Marco Villalobos

unread,
Feb 23, 2022, 7:52:59 PM2/23/22
to druid...@googlegroups.com
My business logic does not require two time stamps. 

I just need the timestamp to round up instead of truncate, and I do need to also calculate the value with the largest timestamp in that interval (stringLast).

Sent from my iPhone

On Feb 23, 2022, at 4:48 PM, Marco Villalobos <mvill...@kineteque.com> wrote:

Ben Krug

unread,
Feb 24, 2022, 12:06:23 PM2/24/22
to druid...@googlegroups.com
stringLast should take the string with the latest timestamp before rollup - ie, before truncation on timestamp.  Is that not what you're seeing?

Marco Villalobos

unread,
Feb 24, 2022, 1:49:11 PM2/24/22
to druid...@googlegroups.com
Hi Ben,

I'll recap this thread first before I answer.

Pre-aggregation in druid truncates the timestamp, however, my business logic requires me to round up.
Peter suggested that I can transform the incoming timestamp and round up during ingestion with a "transform spec" and 
now I am stating my concern that my aggregation also requires stringLast function applied to it, and my concern is that a transform spec on the timestamp will break the correctness 
stringLast.


I still need to experiment and see what happens.

But ultimately, my question now is, if I use a transform spec on the __timestamp column, is it still possible for the stringLast to use the original timestamp when calculating stringLast?


Ben Krug

unread,
Feb 24, 2022, 2:07:45 PM2/24/22
to druid...@googlegroups.com
Thank you, that clarifies it for me.  If you transform __time to __time+yourInterval, then it might truncate down
to the "rounded up" value, and stringLast should hopefully come out the same as before.

Marco Villalobos

unread,
Mar 1, 2022, 6:05:59 PM3/1/22
to druid...@googlegroups.com
I am going to add some closure to this discussion.

this is my solution for rounding up in query granularity:

"transformSpec": {
    "transforms": [
        { "type": "expression", "name": "__time", "expression": "timestamp_ceil(__time, 'PT15M') + __time - timestamp_floor(__time,'PT15M')" }
    ]
}

I learned this technique from this mailing list, and also the following links:


It seems to be working.

Everybody, thank you for helping.



On Feb 24, 2022, at 11:07 AM, Ben Krug <ben....@imply.io> wrote:

up

Peter Marshall

unread,
Mar 7, 2022, 11:21:24 AM3/7/22
to Druid User
ha!  I'm glad you got an answer!  @Ben_Krug yet again being helpful as opposed to me hahaha!

That's an interesting example - thanks for pasting your solution :)
Reply all
Reply to author
Forward
0 new messages