MIN and MAX functions on Druid string type dimension columns

35 views
Skip to first unread message

Manu T M

unread,
Dec 3, 2019, 12:49:07 AM12/3/19
to Druid User
Hi,

I am using Druid as a backend to our Tableau visualization layer using Avatica JDBC driver. Live queries are generated in Tableau and sent to Druid via Avatica. Unfortunately, there are few queries that apply MAX and MIN aggregation functions on string type dimension columns while using GROUP BY. Is there a way to make this work? Can I use lookups where I can keep a numeric type as key and apply MAX or MIN on that?

Thanks!
Manu

Manu T M

unread,
Dec 5, 2019, 7:12:24 AM12/5/19
to Druid User
Hi,

Checking again. Does anyone have any idea on how to it? Help will be much appreciated

Thanks!
Manu

Jad Naous

unread,
Dec 5, 2019, 10:59:43 PM12/5/19
to Druid User
Can you ingest the same column as a string and as a numeric type?

Lakshminarayana Chari

unread,
Dec 7, 2019, 10:15:01 AM12/7/19
to druid...@googlegroups.com
Hi Man,

Tableau supports pass through functions. You can use them to do specific computations.
ex:
RAWSQLAGG_STR(“DB_SPECIFIC_CONCAT(%1,%2)”, [arg1], [arg2])

For reference: 

If this does not work, please elaborate your use case.

Regards, Chari.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/c4a88651-88c3-4919-b08e-332a96515a9e%40googlegroups.com.

Manu T M

unread,
Dec 7, 2019, 11:35:38 AM12/7/19
to druid...@googlegroups.com
Thanks, Chari. I will try to explain my use case in details

Assume my table has below columns
col1 int (dimension)
col2 string (dimension)
col3 string (dimension)
col4 long (metric)

If I run a GROUP BY expression like below, I am getting a failure from Druid
SELECT col1, MAX(col2), MAX(col3), SUM(col4) GROUP BY col1

As I understand, this is because Druid does not support aggregate functions on string type

Thanks!
Manu

Lakshminarayana Chari

unread,
Dec 8, 2019, 8:15:04 AM12/8/19
to druid...@googlegroups.com
Hi Manu,

You are right, max/min is not supported on non-numeric.

Regards, Chari.

Tijo Thomas

unread,
Dec 8, 2019, 11:13:35 PM12/8/19
to druid...@googlegroups.com
Hi Manu, 

Have u tried casting to integer and then take max? 

Thanks



--
Tijo Thomas
Solutions Architect  | => Imply , Bangalore , India  

Manu T M

unread,
Dec 9, 2019, 12:19:12 AM12/9/19
to druid...@googlegroups.com
Hi Tijo,

Thanks for your response. Yes, I have tried casting the string column to int and taking a max. The query is running without any errors but the max value returned is 0 for all columns.

Thanks!
Manu

Tijo Thomas

unread,
Dec 9, 2019, 12:43:39 AM12/9/19
to druid...@googlegroups.com
Is it possible to attach ur ingestion spec, some sample data and query you used.  I hope performance is not a big deal for you. I think you cant get the same performance as the value is a measure.

Manu T M

unread,
Dec 9, 2019, 12:58:15 AM12/9/19
to Druid User
Sure. Attaching the ingestion spec and sample data. This is synthetic data that we are using for testing now. The query I have tried are listed below, which basically tries to run aggregation on string column "analysis_period"

Query 1: Failure while executing
select company,MAX(analysis_period) AS a_period from druid_test group by company;

Query 2: Query successful but gives 0 as the output of MAX 
select company,MAX(CAST(analysis_period AS INTEGER)) AS a_period from druid_test group by company;

To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.


--
Tijo Thomas
Solutions Architect  | => Imply , Bangalore , India  

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.
druid_parquet_ingest_spec
query-731cd4a4-2c97-4dea-bacc-4f4fef99e063.csv

Zisis F

unread,
Sep 27, 2021, 11:14:22 AM9/27/21
to Druid User
A bit late here but If your issue is to keep a single value for a string dimension or even better the first or last value of that string dimension regarding __time column when making a group by aggregation, I think the proper way to do it is to use LATEST and EARLIEST functions for strings. You can find them here.
Reply all
Reply to author
Forward
0 new messages