So thanks to your useful help yesterday I am now playing around with a
big DataMatrix of 22 foreign exchange series daily returns of about
2500 points each (since 3 Jan 2000).
For each currency pair I want to find the linear combination of the
other 21 pairs which "best" replicates it, at any point in history,
always using a 262 day historical sample (so 1 year).
Please can you explain how I get those coefficients, (and the
intercept ideally, if I were to decide to regress the actual series
rather than their returns, for RV purposes), out of the
pandas.stats.ols.MovingOLS function? I don't see which method to use.
Can you also confirm that that is what the MovingOLS function does,
ie. put a rolling window onto the series and provide regression stats
for each window "snapshot" (I am specifying window_type = 'ROLLING' in
the call).
What does window_type 'EXPANDING' do?
Thanks.
Well, I apologize for the lack of documentation. That should change soon enough.
First off-- I would use the ols function in pandas.stats.api for all
of these, so you'd do:
model = ols(y=y, x=x, window_type='rolling', window=262, min_periods=100)
or something like that. It's going to compute statistics for a moving
window with each regression being labeled by the last period in the
window.
For coefficients: model.beta
If you include an intercept (the default), you can see the coefficient
by: model.beta['intercept']
There are lots of other model results very similar to the
scikits.statsmodels results classes except there will be an extra
dimension due to many regressions being run.
window_type='expanding' grows the size of the estimation window as
more data is available.
Like you might want to do:
model = ols(y=y, x=x, window_type='expanding', min_periods=262)
to start the window size at 262, and it will grow to the full sample
size at the end, so the last set of coefficients / results should
match as though you did a full sample estimation like:
model = ols(y=y, x=x)
The ols function is very useful-- it decides what class (OLS,
MovingOLS, PanelOLS, MovingPanelOLS) to use based on the types (Series
/ TimeSeries / DataFrame, etc.) of your inputs.
Hope this helps,
Wes