When a segment comes in, I'd like to know some details about the previous 30 days of segments. So I'd like to have a 30 day sliding window, sliding every day.
I have two questions related to this:
1. If I stop/deploy/start the job to do a monthly release, does the aggregation window reset or do we still have 30 days of data in the window when the upgrade boots up? I know the window is backed by rocksdb, just wondering if that backing covers job upgrades (the window id would stay the same).
2. I really just want to see the data for the past 30 days. So I want a 30 day sliding window that slides every midnight, but I only care about the extent where tonight is the upper bound. Is there a way to inform Onyx to discard the data in the other 29 extents and just keep this one?
I'm just trying to see the tradeoffs between using a sliding window, versus querying a database each time I process a segment. We expect fairly high throughput, and every segment needs to both read from the aggregation and update the aggregation. I wrote a custom aggregation function, but it's not much different than collect-by-key.
Thanks in advance!