Aggregation across time series

107 views
Skip to first unread message

aashish...@gmail.com

unread,
Jun 16, 2017, 9:30:38 AM6/16/17
to OpenTSDB
Hi Guys,

I am trying to push stock price data  in opentsdb . I created the schema like below :

test.price 1497613464196 421.615967 {priceType=askPrice, contributer=contributer50, instrument=INSTR_ID1}

As i have gone through opentsdb document , i came to know that we can have aggregate function  on time stamp only ? .

So question is how to use aggregation across time series .

Like in below example i am getting aggregation on the  basis of time stamp .

 ./tsdb query  2017/06/16-17:00:00 2017/06/16-17:14:44  sum  test.price

test.price 1497613464196 12889.014371 {priceType=askPrice, instrument=INSTR_ID1}
test.price 1497613692461 1301.265295 {priceType=askPrice, instrument=INSTR_ID1}
test.price 1497614072703 2022.726700 {priceType=askPrice, instrument=INSTR_ID1}

First question is , is there any other way to design scheme for ask,bid and mid price. If schema design is right then how can i apply aggregation across time series . 

Regards,
Aashish





ManOLamancha

unread,
Jul 6, 2017, 5:51:45 PM7/6/17
to OpenTSDB
There are two ways of aggregating time series in OpenTSDB (and all time series DBs). The first is a group-by or spatial aggregation where you take all values for T1 across multiple time series and aggregate them together. E.g. if you wanted to aggregate all "askPrice"s for symbol "VZ" you would query something like:

  ./tsdb query  2017/06/16-17:00:00 2017/06/16-17:14:44  sum  test.price priceType=askPrice symbol=VZ

(note the CLI query utility hasn't been updated in a long time, it's better to use the HTTP API)

However as you probably noticed in the documentation, the results will be a single value per unique timestamp within the query range. Your data looks like it has millisecond timestamps and if you're recording transactions, you'll see some ugly data with interpolated results for the series that don't have explicit values at each timestamp. In that case you'll want to downsample first.

Downsampling is time-based aggregation where the values within a defined downsampling "bucket" are aggregated for each time series into a single value at an aligned timestamp. E.g. you could downsample your data on a 1 second basis, then group the results by ask price and symbol. That kind of query should look like:

  ./tsdb query  2017/06/16-17:00:00 2017/06/16-17:14:44  sum 1s-sum test.price priceType=askPrice symbol=VZ

Reply all
Reply to author
Forward
0 new messages