Getting the first row in a timeseries query

685 views
Skip to first unread message

Richard Gill

unread,
Mar 11, 2015, 9:22:08 AM3/11/15
to druid...@googlegroups.com
Hey,

I would like to get the value of a particular dimension for the first (and last) event in a given granularity.

Example:

If I have dimensions = ['timestamp', 'dim1']

I would like to do a TimeSeries (or GroupBy) query to get the value of dim1 for the first and last rows (the rows with the min and max timestamp) in the grouping. 


{
 
"queryType": "timeseries",
 
"dataSource": "sample_datasource",
 
"granularity": "day",
 
...
 
"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]
}



One way I can think of doing it is with a min and max aggregator on timestamp, and then a seperate experimental select query to match exactly that timestamp. Or maybe a custom JavaScript aggregator? I was secretly hoping for an aggregator called 'first'!

I'm pretty new to Druid, so any help would be greatly appreciated!

Thanks,

Richard

Nishant Bangarwa

unread,
Mar 11, 2015, 10:29:59 AM3/11/15
to Richard Gill, druid...@googlegroups.com
Hi richard,
 
there are max and min aggregators but no native first and last value aggregators at presnt. 

you can do this by having two different custom javascript aggregators in a timeseries query, 
first aggregator will store the first value and ignore all subsequent new values given to it. 
and the last aggregator will just store the value thats been given to it and overwrites it every time its given a new value, thus having the last value given to it at any time.


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/b2fe4ee0-891b-4a8f-af5a-387d0041b1c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Richard Gill

unread,
Mar 11, 2015, 10:59:05 AM3/11/15
to druid...@googlegroups.com, richar...@gmail.com
Thanks very much.

For anyone who needs it. Here was my best effort on the aggregations, although there might be a better way.

{
 
"type": "javascript",
 
"name": "first",
 
"fieldNames"  : [ "dim1"],
 
"fnAggregate" : "function(current, dim1) { if(current == -Number.MAX_VALUE) {return dim1} else {return current}  } ",
 
"fnCombine"   : "function(partialA, partialB) { return partialA + partialB; }",
 
"fnReset"     : "function()                   { return -Number.MAX_VALUE; }"
},
{
 
"type": "javascript",
 
"name": "last",
 
"fieldNames"  : [ "dim1"],
 
"fnAggregate" : "function(current, dim1) { return dim1 } ",
 
"fnCombine"   : "function(partialA, partialB) { return partialA + partialB; }",
 
"fnReset"     : "function()                   { return 0; }"
 
}

Duane Barlow

unread,
Sep 4, 2015, 12:58:19 PM9/4/15
to Druid User, richar...@gmail.com
Is there a more straight-forward way to find the most recent row that matches a filter? I tried the below javascript aggregations and 'last' didn't work. This would be very helpful as I'd like to use this information to decide what granularity to set (If I know TS data will span one month vs 5 minutes).

Thanks!

va...@imply.io

unread,
May 31, 2016, 5:38:43 PM5/31/16
to Druid User, richar...@gmail.com
FWIW to anyone reading this: first and last aggs can not be implemented in JS because the merging order in fnCombine is not guaranteed. The solution posted by Richard, although very cool, will not work in the general case. Here is the issue: https://github.com/druid-io/druid/issues/2845 if you want to follow or +1




Reply all
Reply to author
Forward
0 new messages