Histogram aggregator

668 views
Skip to first unread message

Steve

unread,
May 15, 2013, 6:37:32 AM5/15/13
to druid-de...@googlegroups.com
Hi,

I've noticed a Histogram aggregator in the Druid source. This seems quite useful but I'm guessing the reason it isn't documented is because it doesn't work yet. I get the following exception when trying to use it:

java.lang.UnsupportedOperationException: HistogramAggregatorFactory does not support getTypeName()
at com.metamx.druid.aggregation.HistogramAggregatorFactory.getTypeName(HistogramAggregatorFactory.java:152)
at com.metamx.druid.index.v1.IncrementalIndex.<init>(IncrementalIndex.java:108)
at com.metamx.druid.realtime.plumber.Sink.makeNewCurrIndex(Sink.java:157)
at com.metamx.druid.realtime.plumber.Sink.<init>(Sink.java:67)
at com.metamx.druid.realtime.plumber.RealtimePlumberSchool$1.getSink(RealtimePlumberSchool.java:213)
at com.metamx.druid.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:168)

Is anybody working on making the Histogram aggregator functional? Alternatively any tips on things to investigate or use as a guide so I can look into fixing this would be appreciated as there's not much in the way of documentation in this area right now.

Thanks.

Eric Tschetter

unread,
May 15, 2013, 12:16:07 PM5/15/13
to druid-de...@googlegroups.com
Steve,

The Histogram aggregator currently only exists as a query aggregator, it doesn't work if you are trying to index (ingest) data with it.  What it can do is provide a histogram of data that has already been indexed.  This is primarily useful only if your dimension set is such that you don't actually ever aggregate events together.  We implemented it to do things like introspect the level of summarization we are getting (if you have a "count" field in the summarized data, then a histogram of that field will show you how many rows rows are the result of only 1 events, 2 events, etc.).

Making histogram work with the current data ingestion mechanisms requires

(1) changing the getTypeName() call to return a proper name
(2) implementing a ComplexMetricSerde object that can serialize and deserialize the object type.
(3) registering the type by calling ComplexMetrics.registerSerde() with the typename and the ComplexMetricSerde

This is not well documented yet because this API is still rather unstable.  I'm hoping to solidify and simplify this API a bit more this summer, at which point we should document it some more.

That said, if you want to jump in and try to implement something against it, it definitely is possible to implement it and make it work.  I would recommend that if you do it, you contribute it back so that we can evolve it forward as we change up the API (also because I just me want more functionality contributed back ;) ).

--Eric  


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/d11f2bff-fc53-440e-b005-424de62a425b%40googlegroups.com?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Nicolas F.

unread,
Dec 6, 2013, 12:33:07 PM12/6/13
to druid-de...@googlegroups.com
Hi,

I could need to implement the indexation of histograms. Did you have time to document the process ? And/or is the API solidified ?

Also, I could not use "histogram" in a query, I got the same error as when I try to build histograms during ingestions.

java.lang.UnsupportedOperationException: HistogramAggregatorFactory does not support getTypeName()
  at io.druid.query.aggregation.HistogramAggregatorFactory.getTypeName(HistogramAggregatorFactory.java:158)
  at io.druid.segment.incremental.IncrementalIndex.<init>(IncrementalIndex.java:104)
  at io.druid.segment.incremental.IncrementalIndex.<init>(IncrementalIndex.java:130)
  at io.druid.query.groupby.GroupByQueryQueryToolChest.mergeGroupByResults(GroupByQueryQueryToolChest.java:125)
  at io.druid.query.groupby.GroupByQueryQueryToolChest.access$100(GroupByQueryQueryToolChest.java:57)
  at io.druid.query.groupby.GroupByQueryQueryToolChest$2.run(GroupByQueryQueryToolChest.java:84)
  at io.druid.query.FinalizeResultsQueryRunner.run(FinalizeResultsQueryRunner.java:102)
  at io.druid.query.BaseQuery.run(BaseQuery.java:78)
  at io.druid.query.BaseQuery.run(BaseQuery.java:73)

My indexing configuration :

{
  "type" : "index",
  "dataSource" : "test",
  "granularitySpec" : {
    "type" : "uniform",
    "gran" : "day",
    "intervals" : [ "2013-12-06/2013-12-07" ]
  },
  "aggregators" : [
  ],
  "firehose" : {
    "type" : "local",
    "baseDir" : "/Users/test/Downloads/druid-services-0.6.26/examples/cucina/",
    "filter" : "access_logs_2013120610_3_sample.json",
    "parser" : {
      "timestampSpec" : {
        "column" : "time",
        "format" : "ruby"
      },
      "data" : {
        "format" : "json",
        "dimensions" : ["remote","path","agent","my_metric"]
      }
    }
  }
}

My query:

{
  "queryType" : "groupBy",
  "dataSource": "test",
  "granularity": "all",
  "dimensions": [""],
  "aggregations" : [
   {
     "type" : "histogram",
     "name" : "my_metric",
     "fieldName" : "my_metric",
     "breaks" : [0.0, 20.0]
    }
  ],
  "intervals": ["2013-12-06T00:00/2013-12-31T00:00"]
}

Thanks.

-- Nicolas

Fangjin Yang

unread,
Dec 8, 2013, 5:41:07 PM12/8/13
to druid-de...@googlegroups.com
I don't believe we (or anyone else in the community) uses the histogram aggregator right now so it is very likely that this aggregator doesn't work. I don't think there are any unit tests around this aggregator either. There does exist an an approximate histogram aggregator for Druid that is described here: http://metamarkets.com/2013/histograms/

This approx histogram agg however is not part of the open source offering yet. Perhaps someone from the community will contribute a similar version in the near future. If you are interested, there is more (recent) info about data structures for streaming histograms and quantiles here: https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf?raw=true

Thanks,
FJ

Fangjin Yang

unread,
Dec 8, 2013, 5:43:31 PM12/8/13
to druid-de...@googlegroups.com
I do believe however, at some point this aggregator did work for queries and may still work for queries. As Eric pointed out, this aggregator doesn't really work for ingestion right now AFAIK.


On Wednesday, May 15, 2013 3:37:32 AM UTC-7, Steve wrote:

Steven Harris

unread,
Dec 8, 2013, 5:51:52 PM12/8/13
to druid-de...@googlegroups.com, druid-de...@googlegroups.com
If it doesn't have tests and or doesn't work my vote would be to remove it. Code bases with broken, test less stuff become unmanageable. Plus Git has it for reference.

Cheers,
Steve
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Fangjin Yang

unread,
Dec 8, 2013, 5:53:52 PM12/8/13
to druid-de...@googlegroups.com
So I dug through the code and I am mistaken in that it does have tests. The aggregator may still work in 0.6 as a query aggregator but it'll take a bit more work to get it the ingestion component working.

Reza

unread,
Mar 25, 2015, 7:26:13 PM3/25/15
to druid-de...@googlegroups.com
I believe this thread is a outdated. Here is another one for the future users who end up here: https://groups.google.com/forum/#!searchin/druid-development/histogram/druid-development/Wj4P0W5Blug/aadvvioENf4J
Reply all
Reply to author
Forward
0 new messages