dimensionsSpec regex

314 views
Skip to first unread message

Sameer

unread,
Apr 23, 2018, 2:31:57 PM4/23/18
to Druid User
Hi - I am trying to modify value of one of the dimension column values while ingestion using regex in dimensionsSpec like below.

{
"type": "regex",
"name": "testDim",
"expr": "(\\w+)",
"replaceMissingValue" : true,
"replaceMissingValueWith": "1"
}

The testDim dimension has some string value which I want to replace with "1". Is this the right way to do? Can someone please help me - may be the expr that I have specified is not correct?

Thanks

Jonathan Wei

unread,
Apr 27, 2018, 8:26:53 PM4/27/18
to druid...@googlegroups.com
Are you on 0.12.0?

If so, this undocumented PR may be useful for you: https://github.com/druid-io/druid/pull/4890, along with the expression documentation: http://druid.io/docs/latest/misc/math-expr.html

The "transformSpec" goes in the "dataSchema" of your ingestion spec, on the same nesting level as "datasource" and "parser", e.g.:

```
"transformSpec": {
    "transforms": [
       {
         "type": "expression",
         "name": "eventTime",
         "expression": "timestamp_format(eventTime, yyyy-MM-dd'T'HH:mm:ss.SSSZ, UTC)"
       }
    ]
}
```

Where "expression" is an expression suitable for your use case.

Prior to 0.12.0, there is no mechanism for transforming input values during ingestion.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/4f5c8530-e153-44b6-be25-15342c2154c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

patric...@ipt.ch

unread,
Jun 7, 2018, 2:51:27 AM6/7/18
to Druid User
This looks very interesting to me as well, thanks for posting it.
The PR mentions both "transform" and "filter" functions. Could you give a similar example for a "filter"?

Background: we have large streams that our staging environment cannot fully ingest like prod can, so we would like to sample the stream and only keep like 1% of all events.


Am Samstag, 28. April 2018 02:26:53 UTC+2 schrieb Jonathan Wei:
Are you on 0.12.0?

If so, this undocumented PR may be useful for you: https://github.com/druid-io/druid/pull/4890, along with the expression documentation: http://druid.io/docs/latest/misc/math-expr.html

The "transformSpec" goes in the "dataSchema" of your ingestion spec, on the same nesting level as "datasource" and "parser", e.g.:

```
"transformSpec": {
    "transforms": [
       {
         "type": "expression",
         "name": "eventTime",
         "expression": "timestamp_format(eventTime, yyyy-MM-dd'T'HH:mm:ss.SSSZ, UTC)"
       }
    ]
}
```

Where "expression" is an expression suitable for your use case.

Prior to 0.12.0, there is no mechanism for transforming input values during ingestion.
On Mon, Apr 23, 2018 at 11:31 AM, Sameer <learn.sa...@gmail.com> wrote:
Hi - I am trying to modify value of one of the dimension column values while ingestion using regex in dimensionsSpec like below.

{
"type": "regex",
"name": "testDim",
"expr": "(\\w+)",
"replaceMissingValue" : true,
"replaceMissingValueWith": "1"
}

The testDim dimension has some string value which I want to replace with "1". Is this the right way to do? Can someone please help me - may be the expr that I have specified is not correct?

Thanks

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Jonathan Wei

unread,
Jun 7, 2018, 1:14:52 PM6/7/18
to druid...@googlegroups.com
The "filter" in the "transformSpec" takes the same format as the Druid query filters (), e.g.:

```
"transformSpec": {
    "filter" : {
      "type": "selector",
      "dimension" : "<dimension>",
      "value" : "<dimension_value>"
    },
    "transforms": [
       {
         "type": "expression",
         "name": "eventTime",
         "expression": "timestamp_format(eventTime, yyyy-MM-dd'T'HH:mm:ss.SSSZ, UTC)"
       }
    ]
}
```
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

ShaharC

unread,
Aug 29, 2018, 5:45:07 PM8/29/18
to Druid User
Hey Jonathan!

Can the transform output field can be used as a timestamp within the schema? trying something like:
       
{
   
"type" : "expression",
   
"name" : "eventTime",

   
"expression" : "visitStartTime + div(time,1000)"
}

"timestampSpec": {
    "column": "eventTime",
    "format": "posix"
}
Reply all
Reply to author
Forward
0 new messages