How to apply date range filter during ingestion

142 views
Skip to first unread message

Jason G

unread,
May 20, 2021, 3:30:08 PM5/20/21
to Druid User
I'm trying this filter, but it's not working properly. It's not returning enough rows.

ingestion spec

"transformSpec": {
        "filter": {
          "type": "interval",
          "dimension": "__time",
          "intervals": [
            "2015-01-01/2100-01-01"
          ]
        }
      }

Basically I want to get all records where __time is greater than 2015-01-01.

Thanks

Ben Krug

unread,
May 20, 2021, 3:41:38 PM5/20/21
to druid...@googlegroups.com
I haven't seen a transformSpec used like this.  Does it do anything?  Are you trying to do something with particular rows, but keep all the rows?

Normally interval goes in the ioConfig - eg (from a reindex job):

{

  "type": "index_parallel",

  "spec": {

    "ioConfig": {

      "type": "index_parallel",

      "inputSource": {

        "type": "druid",

        "dataSource": "dataTest",

        "interval": "2020-10-03T00:00/2020-11-03T11:00"

      }

    },

...

}


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/d4b7b7d5-6689-4d69-bd85-8fefa867c9d5n%40googlegroups.com.

Jason G

unread,
May 20, 2021, 5:49:13 PM5/20/21
to Druid User
I am loading data from a local json file.  The json file contains some records I want to exclude (filter out) based upon a date field.  I only want records that are greater than 2015-01-01.

I tried what you suggested, but it does not work.

Rachel Pedreschi

unread,
May 20, 2021, 6:19:31 PM5/20/21
to druid...@googlegroups.com
I think you may need to specify the time as well.


"type" : "interval", "dimension" : "__time", "intervals" : [ "2015-01-01T00:00:00.000Z/2100-01-01T00:00:00.000Z" ]



--
Rachel Pedreschi
VP Developer Relations and Community
Imply.io

Jason G

unread,
May 20, 2021, 8:31:20 PM5/20/21
to Druid User
Still does not work.  Here's my latest attempt.  It is still filtering out too many records.  I'm going to need to filter the data before loading it into Druid apparently.

"transformSpec": {
        "filter": {
          "type": "bound",
          "dimension": "__time",
          "lower": "1420088400000",
          "lowerStrict": false,
          "upper": "1621570538000",
          "upperStrict": true,
          "ordering": "numeric"
        }
      }


Nishant Bangarwa

unread,
May 21, 2021, 3:01:38 AM5/21/21
to druid...@googlegroups.com
Hi Jason, 
You will need to specify interval in your granularitySpec as below - 
 "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "DAY",
        "queryGranularity": "HOUR",
        "rollup": true,
        "intervals": [
          "2020-01-01/2021-01-01"
        ]
      }


Batch ingestion tasks will throw away any records with timestamps outside of the specified intervals.

Reply all
Reply to author
Forward
0 new messages