Array create rollup aggregator

Skip to first unread message

Avishek Neupane

Jan 11, 2021, 9:13:18 PM1/11/21
to Druid User
In Ingestion, can we aggregate string column to array of string?

So, if each raw record has an "id" field; let's say with values "1","2","3" respectively. The resulting record would be ["1","2","3"].

I could not find something that does this documented in

It's mostly for operational use-case. This way we can track which raw records were used to generate the aggregated row.

Peter Marshall

Jan 14, 2021, 10:02:53 AM1/14/21
to Druid User
ohhh interesting... hmmm...  You mean like:


10:01, beans, chocolate, 9, 1
10:30, beans, chocolate, 4, 2
10:45, eggs, waffles, 4, 3
10:48, eggs, waffles, 6, 4

Would produce:

10:00, beans, chocolate, 13, [1,2]
10:00, eggs, waffles, 10, [3,4]


I suspect you might need to write your own custom aggregator:

You could email the dev-list if you would like to ask other Druid devs for assistance on that front...

- pete

Peter Marshall

Jan 14, 2021, 10:04:07 AM1/14/21
to Druid User
(That's an interesting data lineage aggregator btw - it might be useful for other people as well... may be worth asking in the Druid Slack channel as well just in case someone has already solved it, or would help you write it :D )

Avishek Neupane

Jan 14, 2021, 2:13:58 PM1/14/21
to Druid User
The example you provided is exactly the kind of use case we were looking into. A layer on top would be to uniq on the array as well, so we can get only distinct elements.

I wonder if the tricky part is when we need to rollup on the rollup. For numeric data type, the base rollup is the same as the rollup-of-rollup. For these kind of array rollups, base rollup (create array from raw values) is different from rollup-of-rollup (merge array).

Will ask in the Druid slack channel. Thank you for the pointers Pete. :) 
Reply all
Reply to author
0 new messages