Hi, thanks very much for having a look at this :-)
For moving averages in particular I'd be using windows in the range of 1-200. I'll likely be needing about 90%+ of those to be calculated. The data always stays the same throughout the program run (~90k rows 1D).
I agree I can do it in O(n) by a GPU call per window size in for loop over the unique window sizes. Was really hoping I could do this in constant time though..
The reason this design is so important to me is because I'll be using lots of different formulas, ie. not just moving averages. However they are all based on the concept of calculation over different moving window periods. So there will be many for loops in my code (one for each formula) if I go down the for loop route, which I'm trying to avoid..
Does ArrayFire support asynchronous evaluation? eg. If perhaps I gave ArrayFire a semaphore in the .eval(), it could increment it and return immediately (executing in the background).. then I can keep piling more work onto it and then wait for the results at the end with a .sync(semaphore) or something?
I guess I could write a caching class that lazy evaluates the formula / window combination I'm requesting in the code and keeps the result in host memory. Could also pre-calculate the entire formula / window combinations space and read those numbers from disk when I load the data. It will be a big cache, but I can do it on EC2.
What would be the best way to do this different windows implementation?