Javascript Custom Aggregation and other Questions

1,814 views
Skip to first unread message

Austin Kyker

unread,
Jul 9, 2015, 7:16:16 PM7/9/15
to druid...@googlegroups.com
Hi,

In your documentation I see this:

{ "type": "javascript",
  "name": "<output_name>",
  "fieldNames"  : [ <column1>, <column2>, ... ],
  "fnAggregate" : "function(current, column1, column2, ...) {
                     <updates partial aggregate (current) based on the current row values>
                     return <updated partial aggregate>
                   }",
  "fnCombine"   : "function(partialA, partialB) { return <combined partial results>; }",
  "fnReset"     : "function() { return <initial value>; }"


However, it is very unclear what these properties represent. Can you please explain what fieldNames, fnAggregate, fnCombine, and fnReset are with a simple example.

My use case is that I have an input row with metrics A and B.

I would like to define a metric C where C = A*B so that after ingestion I can use C in aggregations. What's the best way to accomplish this...


Also, I have a few questions on querying in general:

Is it possible to filter based on metrics or can you only filter based on dimensions? For instance if I had a metric Amount and I wanted to filter out the rows where Amount < 500, what would be the best way to approach this? Would I have to make Amount a dimension and then write a JS filter to parse the string to an integer and check if that integer is less than 500?

Another question - how could I groupby days of the week. If I wanted metrics for Sunday, Monday, ...? Is this possible?

Finally, is there a way to query Druid for datasource metadata. In other words, can you get from Druid all of the ingested datasources as well as their specs (dimensions and metrics)?

Sorry that these questions are all over the place.

Martin Schneider

unread,
Jul 10, 2015, 3:40:11 AM7/10/15
to druid...@googlegroups.com
Hi Austin,

So far I didn't use the javascript aggregation, but the case you are describing should be solvable with a arithmetic postaggregatio.


postAggregation : {
 
"type"  : "arithmetic",
 
"name"  : "metricC",
 
"fn"    : "*",
 
"fields": [ {"type": "fieldAccess","name": "metricA","fieldName": "metricA"},
               
{"type": "fieldAccess","name": "metricB","fieldName": "metricB"}
            ]
}

If you want to get only the values above a treshhold than use "haveing" (see here) in one groupBy query.

{
   
"type": "
lessThan",
   
"aggregation": "Amount",
   
"value": 500
}

If you add to each entry a dimension which holds the weekday, than you use a filter to get just all the data for Monday i.e.


"filter": { "type": "selector", "dimension": "weekday", "value": "Monday"}

There are three query types which gives you metadata:
I hope that is helpful for you.

With best regards

Martin Schneider

Eric Tschetter

unread,
Jul 10, 2015, 8:37:28 PM7/10/15
to druid...@googlegroups.com
Awesome response Martin!

Just wanted to add one more thing, on the brokers there are endpoints
that can also give you information about the various data sources,
they are under the "GET" portion of the Broker docs:

http://druid.io/docs/0.7.3/Broker.html

There are limitations for what they can show you on top of real-time
segments, but once you have some historical segments, they can show
you information about what's in them.

--Eric
> --
> You received this message because you are subscribed to the Google Groups
> "Druid User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to druid-user+...@googlegroups.com.
> To post to this group, send email to druid...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/druid-user/cb7b75bd-fa37-42d9-b9d0-00dd123418d5%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Austin Kyker

unread,
Jul 14, 2015, 3:23:38 PM7/14/15
to druid...@googlegroups.com
I appreciate your response!

Unfortunately, I don't think your post-aggregation solves the problem. If I have the following data:

                       Price/unit             Units Sold           
Record 1:           2                          3
Record 2:           1                          3 
Record 3:           3                          1

And I want to calculate total revenue, it would be done as follows:

total revenue = 2*3 + 1*3 + 3*1 = 12

What you're post-aggregation would do I'm afraid is compute (2 + 1 + 3) * (3 + 3 + 1) = 42

Maybe this is a problem that Druid cannot solve and I need to do the computation of
record revenue pre-ingestion (add a column).

I would still like someone to explain to me the different fields in the custom javascript aggregator!

Martin Schneider

unread,
Jul 15, 2015, 7:03:28 AM7/15/15
to druid...@googlegroups.com
Keep in mind that Druid is focused on timeseries. So what you mentioned one solution on your own, add a new metric for each entry, which holds the revenue = units * 'price/unit' and still keep the other dimensions.

If you look here and goto the JavaScript post-aggregator you find the general way of adding a JavaScript post-aggregator

postAggregation : {

  "type": "javascript",
  "name": <output_name>,

  "fieldNames" : [<aggregator_name>, <aggregator_name>, ...],
  "function": <javascript function>
}
  • output_name is the way the aggegrated field will be displayed, holds the result of the javascript function
  • aggregator_name would be ,in your case, 'price/unit' and 'units' which act as input for the javascript function
  • javascript function is simply the function you want to have in javascript


a littel below you can find an example

{
  "type": "javascript",
  "name": "absPercent",
  "fieldNames": ["delta", "total"],
  "function": "function(delta, total) { return 100 * Math.abs(delta) / total; }"
}
With best regards

Martin Schneider

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/GYXS8LMTOuc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Martin Schneider

unread,
Jul 15, 2015, 11:56:14 AM7/15/15
to druid...@googlegroups.com
I just realized that you talked about the JavaScript Aggregations. I'm sorry for that confusion on my side.

So like it stated in the first post, I don't have used the JavaScript Aggregations/Post-Aggregations so far. If you had look on the Aggregations doc a little further down you see this example JavaScript Aggregator.
{
  "type": "javascript",
  "name": "sum(log(x)*y) + 10",
  "fieldNames": ["x", "y"],
  "fnAggregate" : "function(current, a, b)      { return current + (Math.log(a) * b); }",
  "fnCombine"   : "function(partialA, partialB) { return partialA + partialB; }",
  "fnReset"     : "function()                   { return 10; }"
}
  • name is output dimension name
  • fieldnames is the dimension list of function inputs
  • fnAggregate : is the function how you aggregate values in on row (if I understand it correct)
  • fnCombine: is the function how to aggregate the different rows
  • fnReset: initial value for the aggregation

Let me try to create a JavaScript Aggregator for your example:

{
  "type": "javascript",
  "name": "revenue_Sum('price/unit' * units)",
  "fieldNames": ["price/unit", "units"],
  "fnAggregate" : "function(current, a, b)      { return current + ('price/unit' * 'units'); }",
  "fnCombine"   : "function(partialA, partialB) { return partialA + partialB; }",
  "fnReset"     : "function()                   { return 0; }"
}
I hope I finally could help you a little bit.


With best regards

Martin Schneider

On Friday, July 10, 2015 at 1:16:16 AM UTC+2, Austin Kyker wrote:
Reply all
Reply to author
Forward
0 new messages