Metrics Ingestion

272 views
Skip to first unread message

Bryan Baugher

unread,
Aug 20, 2014, 11:56:51 PM8/20/14
to druid-de...@googlegroups.com
Hi everyone,

I had a couple questions around how collisions are handled/defined when ingesting data in real time. The doc seems to suggest rows collide if all the dimensions are the same over your granularity. If one of your dimensions is numeric value that is likely never to be unique (i.e. think metric timing) the data would never collide correct? So let's say I am tracking metrics about my application and my json data looks something like this,

{
  "timestamp": "2013-09-04T21:44:00.000Z",
  "host": "myhost.com",
  "metric": "get_request",
  "value": "132",
}

Is it possible to tell ingestion to ignore value (so rows are very likely to collide) but aggregate value (sum and probably count too) so that I can affectively keep track of a rolling average on my metric? If this is possible has anyone looked into creating a percentile aggreator (i.e. 95% of all values are < X)?

Gian Merlino

unread,
Aug 21, 2014, 12:23:46 AM8/21/14
to druid-de...@googlegroups.com
Yes, that's possible. The idea around indexing that the dimensions are things you want to be able to group by and drill down on, and metrics are things you want to aggregate. So in your data you can index "host" and "metric" as dimensions but "value" as a metric. The usual way to do an average is to create a "value" metric using the longSum or doubleSum aggregators and "count" metric using the count aggregator, and then at query time, use a division postAggregator to divide the longSum/doubleSum of "value" by the longSum of "count". You can query the average for any interval you like with this approach, including a rolling average over whatever period you like (as long as it is aligned on your granularity).

There is also an approximate histogram aggregator that can also be used to compute percentiles. It's a bit more involved to configure so I would suggest starting out with averages and then working in percentiles if you see value there.

Bryan Baugher

unread,
Aug 21, 2014, 1:33:50 AM8/21/14
to druid-de...@googlegroups.com
Whats the difference between a metric and a dimension? I also don't see how to configure metrics in the ingestion spec, is there documentation or something that can show me how to do that?

Gian Merlino

unread,
Aug 21, 2014, 2:40:36 AM8/21/14
to druid-de...@googlegroups.com
I think the metrics are called "aggregations" in the ingestion specs.

You can imagine indexing as doing something like: SELECT AGG(metric_column1), AGG(metric_column2), ... FROM your_data GROUP BY(dimension_column1, dimension_column2, ...). "AGG" is usually "longSum" or "doubleSum" or "min" or "max" or something like that. Each row in druid will contain the aggregated values for each group of dimension columns. This blog post talks about it in some more detail: http://druid.io/blog/2011/04/30/introducing-druid.html
Reply all
Reply to author
Forward
0 new messages