Search query slow with filter

105 views
Skip to first unread message

John Schmidt

unread,
Dec 13, 2016, 5:55:53 AM12/13/16
to Druid User
Hello!

The following unfiltered search query takes ~2 seconds:

{
  "queryType": "search",
  "dataSource": "DATASOURCE",
  "searchDimensions": [
    "DIMENSION"
  ],
  "query": {
    "type": "insensitive_contains",
    "value": "foo"
  },
  "granularity": "all",
  "intervals": ["2016-11-12T05:00:00+00:00/2016-12-12T13:00:00+00:00"]
}

Now I add this filter:

"filter": {
  "type": "selector",
  "dimension": "DIMENSION2",
  "value": "BAR"
}

Running the same query with this filter takes ~22 seconds.

Some info:

The cardinality of "DIMENSION" is somewhere between 100.000 and 1.000.000, and "DIMENSION2" is 15-20.

We are running Druid 0.9.1. DATASOURCE contains around 41GB of data over one month, with rollup hour and Concise bitmaps. Our timeseries and topN queries with filters are not experiencing the same drastic difference in speed.

Is this behavior expected? Is there anything we can do to speed up the filtered query? Let me know if there is any other information I can provide to help narrow down the issue.

Best,
John

Gian Merlino

unread,
Dec 14, 2016, 12:24:56 PM12/14/16
to druid...@googlegroups.com
I think the issue here is that a filtered search query uses an index-only approach involving a bitmap intersection for each search dimension value that matches the "query". The idea is this should be faster than scanning through the rows that match the filter and picking up all dimension values for those rows. But if you have a lot of values that match the query, or if the filter is very selective, or both, then this assumption could be wrong. Ideally the search query should use some heuristics to choose between the index-only vs. cursor-based algorithms, or at least provide a context flag to let you choose which one gets used.


Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/3d53978a-b0ba-4f43-8ee4-fc2c951e7c7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Schmidt

unread,
Dec 15, 2016, 4:14:57 AM12/15/16
to Druid User
Thank you for your reply Gian, much appreciated! I will follow the Github issue for updates.

Best,
John


Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages