Configurable granularity slots

52 views
Skip to first unread message

ya...@bena.do

unread,
Jun 13, 2016, 6:33:40 PM6/13/16
to Blueflood Discuss

Hello, I have a rather silly question regarding rollups and data retentions - I couldn't find an answer on the Wiki.  How long does blueflood keep the rolled-up data? Is it always 14 days for all granularities?


I'm okay with the granularities but need to change the number of slots for each:


5m for 1 day

20m for 1 week

60m for 2 weeks

240m for 1 month

1440m for 1 year


Is this possible? What will happen if the slots are simply changed in Granularity class to reflect this requirement?


Thanks,

Yarin

Gary Dusbabek

unread,
Jun 14, 2016, 4:47:07 AM6/14/16
to ya...@bena.do, Blueflood Discuss
Hi Yarin,

It's been a while since I worked in the code, but originally, you would just need to change the TTL values in Granularity.java to reflect this.

Gary


--
You received this message because you are subscribed to the Google Groups "Blueflood Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blueflood-disc...@googlegroups.com.
Visit this group at https://groups.google.com/group/blueflood-discuss.
To view this discussion on the web visit https://groups.google.com/d/msgid/blueflood-discuss/334e6a27-fb61-4c92-8dcb-97adaf93aa42%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ya...@bena.do

unread,
Jun 14, 2016, 6:44:19 AM6/14/16
to Blueflood Discuss, ya...@bena.do, gdus...@gmail.com
So this is something I still don't fully understand - If BF stores 14 days of data at a granularity of 5m, and also 14 days of data at 1440m - why is aggregation even required? is it only so that fetches return the same amount of data points regardless of the query period? 

Gary Dusbabek

unread,
Jun 14, 2016, 8:14:39 AM6/14/16
to ya...@bena.do, Blueflood Discuss
(Double reply to Yarin - sorry - first reply did not go to the list.)

I hope the current code stores more than 14 days of 1440m metrics. The main design point of rolling up is twofold: 1) conserve space (letting the finer granularities TTL away), 2) making it so that fewer database fetches are required for large time ranges.

Gary.

ya...@bena.do

unread,
Jun 14, 2016, 10:36:18 AM6/14/16
to Blueflood Discuss, ya...@bena.do, gdus...@gmail.com
I dug deeper in the code and indeed found TtlConfig class where it sets the 1440m rollup TTL to 1 year.
I guess I don't fully understand slots then. 
The Wiki says the following:

Slots: The number of locations to save a metric within a certain time period. This is equal to the number of times a discrete 'granularity' occurs over a given time range.

Can you please elaborate how a slot correlates to the schema/data-model? Why was 4032 base slots (14 days) was chosen to begin with? Why not 7 days for instance? Here's the commit that boggles my mind: https://github.com/rackerlabs/blueflood/commit/7f97270ac721c4a015d7ba6907a3010ec998cc9a#diff-f6ab010e2a8c7c85dd54384bd3a64b37L43

Looks like there was indeed a TTL associated with each granularity but now - only number of slots. For 1440m there are only 14 slots. Since you said BF doesn't keep only 14 values for the 1440m granularity - what exactly are these 14 slots used for?

Chandra Addala

unread,
Jun 14, 2016, 12:15:11 PM6/14/16
to Blueflood Discuss, ya...@bena.do, gdus...@gmail.com
I am developer on blueflood. I will look into this and get back to you.

ya...@bena.do

unread,
Jun 14, 2016, 12:29:19 PM6/14/16
to Blueflood Discuss, ya...@bena.do, gdus...@gmail.com
Looking forward to it.

Chandra Addala

unread,
Jun 14, 2016, 4:26:32 PM6/14/16
to Blueflood Discuss, ya...@bena.do, gdus...@gmail.com
This is in response to your original question, "How long does blueflood keep the rolled-up data?". 

Here are the TTL's for each granularity.

full: TTL of 1 day (unless configured with properties ARE_TTLS_FORCED=true, TTL_CONFIG_CONST=3 in which case its 3 days)
5m: TTL of 10 days
20m: TTL of 20 days
60m: TTL of 155 days
240m: TTL of 10 months approximately
so TTL for 5 days is 2(set in CassandraModel) * 5 = 10 days 

I will respond in detail how slots work.

Thanks,
Chandra

Chandra Addala

unread,
Jun 14, 2016, 5:12:53 PM6/14/16
to Blueflood Discuss, ya...@bena.do, gdus...@gmail.com
Here is a brief description on how slots work.

A slot is a time slot. The schema does not directly reflect slots. One purpose of the slots is to properly map data between granularities. 

Blueflood maintains data at these granularities 5m, 20, 60m, 240, 1440m. Lets say a metric as 10 data points in a 5 min period, we rollup those data points and store it as one data point in metrics_5m table. Similarly 4 data points in metrics_5m table, gets rolled up to 1 data point in 20m table and four 20m slots correspond one 60m slot and so forth. 

To properly map between granularities, we assigned a fixed number of slots to each granularties. The numbers are shown below. Slots 1-4 of 5m granularity maps to slot 1 of 20m, slots 5-8 of 5m granularity maps to slot 2 of 20m granularity and so forth.  By doing this we get a fixed mapping. At any given time, for a given (slot, granularity) combination we will be able to tell, the corresponding slot number of higher granularity it corresponds to. 

granularity slots
metrics_5m 4032
metrics_20m 1008
metrics_60m 336
metrics_240m 84
metrics_1440m 14

Why does blueflood have 4032 slots of 5m granularity? 4032 * 5 is roughly equivalent to 14 days. I dont know the significance of this number, 14 days, but I dont believe thats a very important number. I will take a stab at explaining why.

First of all, to decide on the number of slots to assign for 5m granularity they needed a number which is commonly divisible by 4(20m), 12(60m), 48(240m), 288(1440m) so that each slot has a proper mapping to higher granularties. So as blueflood starts assigning slot numbers to each 5m periods, after 4032 * 5 mins, it will run out of 5m slots. So it will start from 1 again. But before re-using slot 1, we have to be sure rollups are finished for slots 1-4 from previous cycle. Inorder to provide ample time for rollups, they might have just made it 14. So technically you have atleast 14 days to rollup data before you run out of slots, provided you are storing data in full resolution for that long.  

Thanks,
Chandra

Chandra Addala

unread,
Jun 14, 2016, 5:20:53 PM6/14/16
to Blueflood Discuss, ya...@bena.do, gdus...@gmail.com
Correction: full resolution should have TTL of 5 days.


On Tuesday, June 14, 2016 at 3:26:32 PM UTC-5, Chandra Addala wrote:

ya...@bena.do

unread,
Jun 14, 2016, 5:37:50 PM6/14/16
to Blueflood Discuss, ya...@bena.do, gdus...@gmail.com
Chandra,
Great explanation for both TTL and slots. Much appreciated! This should be in the Wiki..

P.S. I'm really curios why the TTL is multiplied by 5 in a hardcoded fashion... :-)

Chandra Addala

unread,
Jun 14, 2016, 5:43:58 PM6/14/16
to Blueflood Discuss, ya...@bena.do, gdus...@gmail.com
Thanks! I will update this information in the wiki. I have no idea why it is hardcoded to multiply by 5. 

Gary Dusbabek

unread,
Jun 15, 2016, 12:39:31 AM6/15/16
to Chandra Addala, Blueflood Discuss, ya...@bena.do
On Tue, Jun 14, 2016 at 11:12 PM, Chandra Addala <chandra...@gmail.com> wrote:
Here is a brief description on how slots work.

A slot is a time slot. The schema does not directly reflect slots. One purpose of the slots is to properly map data between granularities. 

Blueflood maintains data at these granularities 5m, 20, 60m, 240, 1440m. Lets say a metric as 10 data points in a 5 min period, we rollup those data points and store it as one data point in metrics_5m table. Similarly 4 data points in metrics_5m table, gets rolled up to 1 data point in 20m table and four 20m slots correspond one 60m slot and so forth. 

To properly map between granularities, we assigned a fixed number of slots to each granularties. The numbers are shown below. Slots 1-4 of 5m granularity maps to slot 1 of 20m, slots 5-8 of 5m granularity maps to slot 2 of 20m granularity and so forth.  By doing this we get a fixed mapping. At any given time, for a given (slot, granularity) combination we will be able to tell, the corresponding slot number of higher granularity it corresponds to. 

granularity slots
metrics_5m 4032
metrics_20m 1008
metrics_60m 336
metrics_240m 84
metrics_1440m 14

Why does blueflood have 4032 slots of 5m granularity? 4032 * 5 is roughly equivalent to 14 days. I dont know the significance of this number, 14 days, but I dont believe thats a very important number. I will take a stab at explaining why.

First of all, to decide on the number of slots to assign for 5m granularity they needed a number which is commonly divisible by 4(20m), 12(60m), 48(240m), 288(1440m) so that each slot has a proper mapping to higher granularties. So as blueflood starts assigning slot numbers to each 5m periods, after 4032 * 5 mins, it will run out of 5m slots. So it will start from 1 again. But before re-using slot 1, we have to be sure rollups are finished for slots 1-4 from previous cycle. Inorder to provide ample time for rollups, they might have just made it 14. So technically you have atleast 14 days to rollup data before you run out of slots, provided you are storing data in full resolution for that long.  

Yes, that is exactly it. It's the maximum amount of time we could let rollups pause (hardware failure, etc.) before active slots started to be "lost" and we wouldn't know what to roll up. We could have gone with 21 or something else, but whatever, it reflects the number of other slots.

Gary.
Reply all
Reply to author
Forward
0 new messages