How to use "filtered dimension spec" in Druid SQL?

111 views
Skip to first unread message

Jason G

unread,
Apr 22, 2021, 11:51:08 AM4/22/21
to Druid User
I am facing this problem articulated here: https://druid.apache.org/docs//0.15.1-incubating/querying/multi-value-dimensions.html

"You might be surprised to see inclusion of "t1", "t2", "t4" and "t5" in the results. It happens because query filter is applied on the row before explosion. For multi-value dimensions, selector filter for "t3" would match row1 and row2, after which exploding is done. For multi-value dimensions, query filter matches a row if any individual value inside the multiple values matches the query filter."

The recommended solution is to use the "filtered dimension spec":

To solve the problem above and to get only rows for "t3" returned, you would have to use a "filtered dimension spec" as in the query below.

How can I do this using Druid SQL?  

Thanks

Ben Krug

unread,
Apr 22, 2021, 12:01:06 PM4/22/21
to druid...@googlegroups.com
Hi Jason -

Can you share a sample query that returns the other values (not selected for)?  I tried a few queries with GROUP BY and an mvd, and druid sql seemed to do the filtering for me.  (I notice the doc page you referenced mentioned using HAVING, maybe that's the key.)

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/20577646-269b-4d2a-90e5-0c18fcb5e557n%40googlegroups.com.

Jason G

unread,
Apr 22, 2021, 12:44:34 PM4/22/21
to Druid User
I was able to get the query working in druid-native.  I don't know how to do the same in druid-sql.  Any help to get this working in druid-sql would be appreciated.

{
  "queryType": "groupBy",
  "dataSource": {
    "type": "table",
    "name": "ciJournals"
  },
  "intervals": {
    "type": "intervals",
    "intervals": [
      "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
    ]
  },
  "filter": {
    "type": "in",
    "dimension": "rx",
    "values": [
      "Niraparib",
      "GSK-3985771",
      "JNJ-64091742",
      "MK-4827",
      "niraparib hydrochloride",
      "Zejula",
      "ZL 2306",
      "C19H20N4O",
      "2-{4-[(3S)-piperidin-3-yl]phenyl}-2H-indazole-7-carboxamide",
      "Rucaparib",
      "AG-014699",
      "AG-14699",
      "CO-338",
      "PF-01367338",
      "PF-1367338",
      "Rubraca",
      "Rucaparib phosphate",
      "AG-14361",
      "AG-14447",
      "HYDAMTIQ",
      "Rucaparib - Pfizer",
      "TBI-361",
      "C19H18FN3O",
      "6-fluoro-2-{4-[(methylamino)methyl]phenyl}-3,10-diazatricyclo[6.4.1.0^{4,13}]trideca-1,4,6,8(13)-tetraen-9-one",
      "Olaparib",
      "AZD 221",
      "AZD-2281",
      "KU-0059436",
      "KU-59436",
      "Lynparza",
      "MK 7339",
      "C24H23FN4O3",
      "4-(3-{[4-(cyclopropylcarbonyl)piperazin-1-yl]carbonyl}-4-fluorobenzyl)phthalazin-1(2H)-one"
    ]
  },
  "granularity": {
    "type": "all"
  },
  "dimensions": [
    {
      "type": "listFiltered",
      "delegate": {
        "type": "default",
        "dimension": "rx",
        "outputName": "Drug"
      },
      "values": [
        "Niraparib",
        "GSK-3985771",
        "JNJ-64091742",
        "MK-4827",
        "niraparib hydrochloride",
        "Zejula",
        "ZL 2306",
        "C19H20N4O",
        "2-{4-[(3S)-piperidin-3-yl]phenyl}-2H-indazole-7-carboxamide",
        "Rucaparib",
        "AG-014699",
        "AG-14699",
        "CO-338",
        "PF-01367338",
        "PF-1367338",
        "Rubraca",
        "Rucaparib phosphate",
        "AG-14361",
        "AG-14447",
        "HYDAMTIQ",
        "Rucaparib - Pfizer",
        "TBI-361",
        "C19H18FN3O",
        "6-fluoro-2-{4-[(methylamino)methyl]phenyl}-3,10-diazatricyclo[6.4.1.0^{4,13}]trideca-1,4,6,8(13)-tetraen-9-one",
        "Olaparib",
        "AZD 221",
        "AZD-2281",
        "KU-0059436",
        "KU-59436",
        "Lynparza",
        "MK 7339",
        "C24H23FN4O3",
        "4-(3-{[4-(cyclopropylcarbonyl)piperazin-1-yl]carbonyl}-4-fluorobenzyl)phthalazin-1(2H)-one"
      ]
    }
  ],
  "aggregations": [
    {
      "type": "count",
      "name": "Count"
    }
  ],
  "limitSpec": {
    "type": "default",
    "columns": [
      {
        "dimension": "Count",
        "direction": "descending",
        "dimensionOrder": {
          "type": "numeric"
        }
      }
    ],
  },
}

Itai Yaffe

unread,
Apr 22, 2021, 1:04:41 PM4/22/21
to druid...@googlegroups.com
Hey Jason,
AFAIK, unfortunately, listFiltered is not supported in Druid SQL, only in native queries (though I'd love to be proven wrong here 😉).

Itai

Jason G

unread,
Apr 22, 2021, 1:54:53 PM4/22/21
to Druid User
OK, thanks!

Ben Krug

unread,
Apr 22, 2021, 2:55:36 PM4/22/21
to druid...@googlegroups.com
I see now, that's frustrating.  I tried WHERE, HAVING, using a subquery and selecting from it using WHERE, etc, no luck.
The only way I found is to kind of unfold the values first, and you have to hardcode for each value.  In your case, you'd end up with a 33-way UNION query, and long unions are slow.  But, if you want to try, this worked for my dataset, with an mvd called 'd6_mvd' and a couple of values from a couple of different mvds.  (I used ordinal instead of offset, because otherwise rows that didn't have the value came back as null, and I couldn't filter them out from rows that did have the value at offset 0 - unless you set useDefaultValueForNull, maybe.)

select array_ordinal(d6_mvd,array_ordinal_of(d6_mvd,'beca')) myVal, count(*)
from dataTest
where array_ordinal_of(d6_mvd, 'beca') > 0
group by 1
UNION ALL
select array_ordinal(d6_mvd,array_ordinal_of(d6_mvd,'delo')) myVal, count(*)
from dataTest
where array_ordinal_of(d6_mvd, 'delo') > 0
group by 1

(I'm selecting counts where the mvd contains 'beca' or 'delo', and excluding counts for the other values in those same mvds from the output.  Pretty ugly, though.)

Reply all
Reply to author
Forward
0 new messages