Is there a way to implement listFiltered dimensionSpec equivalent in SQL

142 views
Skip to first unread message

Dhanashri Pitre

unread,
May 30, 2020, 11:29:46 AM5/30/20
to Druid User
Hi All,

I have a multi value dimension field in druid. 
And I use SQL queries to retrieve the data.

I want to filter the multi-value field after grouping on it.
As per the documentation : https://druid.apache.org/docs/latest/querying/multi-value-dimensions.html

"dimensions": [ { "type": "listFiltered", "delegate": { "type": "default", "dimension": "tags", "outputName": "tags" }, "values": ["t3"] } ]

listFiltered type can be used to filter the multi-value dimensions in native queries.

Can this be achieved in the sql queries?

guli...@quantcast.com

unread,
Aug 4, 2020, 1:54:58 PM8/4/20
to Druid User
I'm asking the same question:
the dimensionSpec are useful, how can we use it in the SQL world?
Thanks!

Rachel Pedreschi

unread,
Aug 4, 2020, 3:55:16 PM8/4/20
to druid...@googlegroups.com

Multi-value strings

Druid's native type system allows strings to potentially have multiple values. These multi-value string dimensions will be reported in SQL as VARCHAR typed, and can be syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions will be applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special multi-value string functions, which can perform powerful array-aware operations.

Grouping by a multi-value expression will observe the native Druid multi-value aggregation behavior, which is similar to the UNNEST functionality available in some other SQL dialects. Refer to the documentation on multi-value string dimensions for additional details.


Based on this, I think you can use a WHERE or HAVING clause using the multi-value-string-functions, but I haven't tried it myself.  

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/6f49f84d-d206-43e5-bd41-1276d9fd6f1an%40googlegroups.com.


--
Rachel Pedreschi
VP Developer Relations and Community
Imply.io

guli...@quantcast.com

unread,
Aug 4, 2020, 4:40:30 PM8/4/20
to Druid User
Hi Rachel,
Thank you for your reply!
In my use case, I'm more focused on the usage of Lookup DimensionSpecs and Extraction DimensionSpec on regular string dimensions.
In my native query, I can put such a json into the Dimension, to treat `US` as `US`, and treat any other countries to be `other` before the aggregation.
So even though I have 200 countries in my dataset, when I group by country using this lookup dimensionSpec, I get only 2 groups in my output: the `US` and the `other`.
{
  "type": "lookup",
  "dimension": "country",
  "outputName": "country",
  "replaceMissingValueWith": "other",
  "retainMissingValue": false,
  "lookup": {
    "type": "map",
    "map": {
      "US": "US"
    },
    "isOneToOne": false
  }
}

and Extraction DimensionSpec is even more flexible than Lookup DimensionSpecs. Are these functions also support in SQL? Or for now, it's a native-only function?
Thanks!

Rachel Pedreschi

unread,
Aug 4, 2020, 5:01:25 PM8/4/20
to druid...@googlegroups.com
How about using a MV_CONTAINS(arr,expr) in a CASE statement.  You can also use lookups in SQL.

https://druid.apache.org/docs/latest/querying/sql.html#string-functions

 I don't have a multivalue data set to try it on, so this is just brainstorming.

Reply all
Reply to author
Forward
0 new messages