New Datasource from old one: Convert dimension to metric

31 views
Skip to first unread message

Lax T

unread,
Oct 14, 2021, 10:23:38 PM10/14/21
to Druid User
HI,
I am creating a Rolled-up datasource from an existing Raw data datasource. I want to convert a dimension column from the raw datasource to metric column in the new one.

I am using "druid" batch ingestion for this and I see that the metric column is 0 when I do this. Please let me know if this is possible.

The task json is pasted below. The column "user_count" is a dimension in the raw datasource.

Thanks
Lax


{
  "type": "index_parallel",
  "spec": {
    "dataSchema": {
      "dataSource": "Eg8PrDataRollup1Hr",
      "timestampSpec": {
        "column": "__time",
        "format": "iso"
      },
      "dimensionsSpec": {
        "dimensions": [
          "org_id",
          "city"
        ]
      },
      "metricsSpec": [
        {
          "name": "count",
          "type": "count"
        },
        {
          "name": "sum_user_count",
          "type": "floatSum",
          "fieldName": "user_count"
        }
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "HOUR",
        "queryGranularity": "HOUR",
        "rollup": true,
        "intervals": null
      },
      "transformSpec": {
        "filter": null,
        "transforms": []
      }
    },
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "druid",
        "dataSource": "RawUserData",
        "interval": "2021-10-01T00:00:00.000Z/2021-10-02T00:00:00.000Z",
        "filter": null,
        "dimensions": null,
        "metrics": null
      },
      "inputFormat": null,
      "appendToExisting": false
    },
    "tuningConfig": {
      "type": "index_parallel",
      "maxRowsPerSegment": null,
      "maxRowsInMemory": 1000000,
      "maxBytesInMemory": 0,
      "maxTotalRows": null,
      "numShards": null,
      "splitHintSpec": null,
      "partitionsSpec": {
        "type": "dynamic"
      },
      "indexSpec": {
        "bitmap": {
          "type": "roaring",
          "compressRunOnSerialization": true
        },
        "dimensionCompression": "lz4",
        "metricCompression": "lz4",
        "longEncoding": "longs",
        "segmentLoader": null
      },
      "indexSpecForIntermediatePersists": {
        "bitmap": {
          "type": "roaring",
          "compressRunOnSerialization": true
        },
        "dimensionCompression": "lz4",
        "metricCompression": "lz4",
        "longEncoding": "longs",
        "segmentLoader": null
      },
      "maxPendingPersists": 0,
      "forceGuaranteedRollup": false,
      "reportParseExceptions": false,
      "pushTimeout": 0,
      "segmentWriteOutMediumFactory": null,
      "maxNumConcurrentSubTasks": 1,
      "maxRetry": 3,
      "taskStatusCheckPeriodMs": 1000,
      "chatHandlerTimeout": "PT10S",
      "chatHandlerNumRetries": 5,
      "maxNumSegmentsToMerge": 100,
      "totalNumMergeTasks": 10,
      "logParseExceptions": false,
      "maxParseExceptions": 2147483647,
      "maxSavedParseExceptions": 0,
      "partitionDimensions": [],
      "buildV9Directly": true
    }
  }
}

Ben Krug

unread,
Oct 15, 2021, 2:36:49 AM10/15/21
to druid...@googlegroups.com
You might need to specify the dimension in the ioConfig.  If it's not mentioned in dimensions, it might not get read.  Just set dimensions to ["user_count"], or put all three, but I think you just need the one.  Does that make a difference?

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/5c905ef3-b211-47b8-8e38-30623c55bda8n%40googlegroups.com.

Lax T

unread,
Oct 15, 2021, 4:15:17 PM10/15/21
to Druid User
Thanks Ben. That helped but now I see the "user_count" as a dimension as well "sum_user_count" as metric column. I want to get rid of the "sum_user_count" but get rid of the "user_count" dimension column.

Thanks a lot.

Lax T

unread,
Oct 15, 2021, 6:15:22 PM10/15/21
to Druid User
Yes, found the fix. I had to specify "user_count" under "metrics" in the ioConfig.

Ben Krug

unread,
Oct 16, 2021, 3:38:21 PM10/16/21
to druid...@googlegroups.com
Excellent - I'm glad you got it, and thanks for letting us know!  I'd used the dimensions entry, but not the metrics entry, before.

Reply all
Reply to author
Forward
0 new messages