Result of all metrics is 0 after running index_hadoop task

36 views
Skip to first unread message

santosh sahoo

unread,
Mar 3, 2017, 2:16:01 AM3/3/17
to Druid User
Hi,

I am trying to run the index_hadoop task on one of my datasource for reindexing. This job is getting success and segment is creating. But when I am trying to see the result by select query, I found all metrics value are 0. Could you please help me on this. Below are details I have used to run this task.

index_hadoop task:

{
  "type": "index_hadoop",
  "spec": {
    "dataSchema": {
      "dataSource": "index_hadoop_test20",
      "parser": {
        "type": "string",
          "parseSpec": {
            "format" : "json",
            "timestampSpec" : {
                "format" : "auto",
                "column" : "timestamp"
            },
            "columns" : [
                    "timestamp",
                    "dim1",
                    "metric1_sum"
                    ],
            "dimensionsSpec" : {
                "dimensions" : []
            }
        }
      },
      "metricsSpec": [
{
"name" : "metric1_sum",
"type" : "doubleSum",
"fieldName" : "metric1"
}
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "DAY",
        "queryGranularity": "NONE",
        "intervals": ["2017-03-01T02:00:00/2017-03-02T02:00:00"]
      }
    },
"ioConfig": {
  "type": "hadoop",
  "inputSpec": {
  "type":"dataSource",
     "ingestionSpec": {
        "dataSource":"arpan1",
        "intervals" : ["2017-03-01T02:00:00/2017-03-02T02:00:00"]
      }
   }
 },
    "tuningConfig": {
      "type": "hadoop",
        "partitionsSpec" : {
        "type" : "hashed",
        "targetPartitionSize" : 5000000
      },
      "jobProperties" : {}
    }
  }
}

Select Query :

 {
   "queryType": "select",
   "dataSource": "index_hadoop_test20",
   "descending": "true",
   "dimensions":[],
   "metrics":[],
   "granularity": "all",
   "intervals": ["2017-01-01/2017-03-31"],
   "pagingSpec":{"pagingIdentifiers": {}, "threshold":300}
 }

Result : 


    "timestamp": "2017-03-01T07:00:00.000Z",
    "result": {
      "pagingIdentifiers": {
        "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z": -11
      },
      "dimensions": [
        "dim4",
        "dim3",
        "dim2",
        "dim1"
      ],
      "metrics": [
        "metric1_sum"
      ],
      "events": [
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -1,
          "event": {
            "timestamp": "2017-03-01T09:45:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -2,
          "event": {
            "timestamp": "2017-03-01T09:30:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -3,
          "event": {
            "timestamp": "2017-03-01T09:15:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -4,
          "event": {
            "timestamp": "2017-03-01T09:00:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -5,
          "event": {
            "timestamp": "2017-03-01T08:45:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -6,
          "event": {
            "timestamp": "2017-03-01T08:30:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -7,
          "event": {
            "timestamp": "2017-03-01T08:00:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -8,
          "event": {
            "timestamp": "2017-03-01T07:45:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -9,
          "event": {
            "timestamp": "2017-03-01T07:30:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -10,
          "event": {
            "timestamp": "2017-03-01T07:15:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        },
        {
          "segmentId": "index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z",
          "offset": -11,
          "event": {
            "timestamp": "2017-03-01T07:00:00.000Z",
            "dim4": "dim4",
            "dim3": "dim3",
            "dim2": "dim2",
            "dim1": "dim1",
            "metric1_sum": 0
          }
        }
      ]
    }
  }

Please give some solution on this.

Gian Merlino

unread,
Mar 6, 2017, 4:35:59 AM3/6/17
to druid...@googlegroups.com
Hey Santosh,

What version of Druid is this and what did the ingestion spec for the original load (not reindexing) look like?

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/6bf1fa7f-9c7a-48fa-8085-11645ddf82af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

santosh sahoo

unread,
Mar 6, 2017, 4:56:49 AM3/6/17
to Druid User
Hi Gian,

I have tested it on druid-0.9.2 , druid-0.10.0-SNAPSHOT and druid-0.10.0-rc1 versions. Please find the original ingestion spec I have used to load data.

{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "arpan1",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "timestampSpec": {
          "column": "timestamp",
          "format": "ddMMyyyyHHmmss"
        },
        "dimensionsSpec": {
          "dimensions": [
          ]
        }
      }
    },
    "metricsSpec": [
      {
        "name": "metric1",
        "fieldName": "metric1",
        "type": "doubleSum"
      }
    ],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "HOUR",
      "queryGranularity": "NONE",
      "rollup": false
    }
  },
  "ioConfig": {
    "topic": "testdemo",
    "consumerProperties": {
      "bootstrap.servers": "localhost:9092"
    }
  }
}


Thanks,
Santosh

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Gian Merlino

unread,
Mar 6, 2017, 5:18:58 AM3/6/17
to druid...@googlegroups.com
Does it work if you add metric1 to the dataSource ingestionSpec metrics list when you do reindexing? Like this:

"ioConfig": {
  "type": "hadoop",
  "inputSpec": {
  "type":"dataSource",
     "ingestionSpec": {
        "dataSource":"arpan1",
        "intervals" : ["2017-03-01T02:00:00/2017-03-02T02:00:00"],
        "metrics" : ["metric1"]
      }
   }
 }

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

santosh sahoo

unread,
Mar 6, 2017, 5:42:49 AM3/6/17
to Druid User
Hi Gian,

I have done the modification in ingestionSpec as per your suggestion, but got the same result.

Please have a look again.

Thanks in advance.

Gian

Gian Merlino

unread,
Mar 6, 2017, 6:40:30 AM3/6/17
to druid...@googlegroups.com
Ah, what's going on is that the hadoop reindexing mechanism is being sneaky. It doesn't apply your metricsSpec to the segments as-is, it applies them in the "combining" form. This is nice I guess since it lets you use the same metricsSpec while reindexing as you would on your raw data. But it also means you can't use the metricsSpec to define new aggregators. This would be useful and would be a new feature. The docs could also use some clarifications.

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Vadim

unread,
Jan 19, 2018, 1:55:32 AM1/19/18
to Druid User
Hi Gian, I was wondering how did you manage to solve this issue? I am facing the same problem right now at https://groups.google.com/d/msg/druid-user/4I9lUreb60k/JdWIyE1xAAAJ

понедельник, 6 марта 2017 г., 13:40:30 UTC+2 пользователь Gian Merlino написал:

Gian

Reply all
Reply to author
Forward
0 new messages