Kalman Filter speed-ups

Skipper Seabold

unread,

Aug 9, 2010, 6:34:48 PM8/9/10

to pystat...@googlegroups.com

I've finally got the Kalman filter working ok for an ARMA model.
Haven't tested beyond ARMA(1,1), ARMA(1,0), and ARMA(0,1), but these
agree with Stata to about 6 decimal places. Will probably write the
tests against Koopman's SsfPack though.

The only thing right now is that it's slow. Prohibitively so. I did
most of the optimizations that I could think of and I it can do 10,000
observation ARMA(1,1) in 26 seconds with arbitrary starting values and
in about 17 seconds with good starting values with the BFGS optimizer.
There is one more simple optimization where it can stop doing the
recursions if there is no more "gain", but I'm not sure yet it will be
advantageous in the present usage.

The other thing I can do is provide the analytic gradient; however,
for this to make sense as a speed-up it needs to be computed in the
recursions themselves. This should reduce the calls to loglike, which
right now take almost 2 seconds each. But this means that the only
solver we can use (unless we do some minor rewrites and include the
solvers in statsmodels, which I'd rather avoid) is the limited memory
bfgs, because it can take an objective function that also returns the
gradient.

I think the only real option is to move this into Fortan/C or Cython.
For just the ARMA case, it wouldn't be too hard. But to have a
general Kalman filter which the ARMA model (and others) rely on, we
would have to use linear algebra routines, so I guess that means using
something like Tokyo:
http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/

Anyone have any thoughts on this?

PS.

I just remembered. Is there anything in scipy.signal to calculate a
first guess on MA coefficients using a FIR filter? I have read that
the innovations algorithm of Brockwell and Davis is basically a FIR
filter, and if I can use Yule-Walker for AR coeffs plus something to
get the MA coefficients to start the Kalman Filter, it will go much
faster.

josef...@gmail.com

unread,

Aug 9, 2010, 7:03:09 PM8/9/10

to pystat...@googlegroups.com

I don't know Brockwell Davis, but if you have a fast AR estimation,
then OLS on the residuals was proposed as a starting value for ARMA
which should be very fast, but I don't know how robust it is.

I need to catch up again with tsa, I got stuck on pareto, genpareto in
the available time I had this week.

Josef

Skipper Seabold

unread,

Aug 9, 2010, 7:11:59 PM8/9/10

to pystat...@googlegroups.com

Thanks. Makes sense and gives a little better answer than an arbitrary zero.

Brockwell - Davis is also called the innovations algorithm, though as
far as I can tell only their textbook touches on it much.

Skipper

josef...@gmail.com

unread,

Aug 10, 2010, 1:43:22 PM8/10/10

to pystat...@googlegroups.com

looks similar to Levinson-Durbin recursion however in terms of
innovations, but google-books didn't show the ARMA example.
I would put this on a lower priority wish list.

scipy.signal does not have any estimation methods, as far as I have
seen. The filter design part didn't look useful to me for tsa in
general.

To the speed of Kalman Filter:
My impression is that for MLE kalman filter does too many
calculations, and would be more useful for non-standard cases, e.g.
missing values. For the simple case, I don't see how kalman filter
could compete with a fast loop over the data to get the residuals,
updating all matrices in each period, when for MLE mostly the entire
sample is relevant.

I would think it's more useful for forecasting, filtering and
smoothing, and handling of missing values than for mle estimation.
Recursive OLS/GLS or other simpler online/recursive estimation would
also be very useful.

And for more complicated models, where it is useful to use kalman
filter for MLE (missing values, time-varying parameters,...), I don't
think a minute for getting the estimate is too bad.
Another possible advantage might be if it scales better for larger
datasets, e.g. more variables, or restarting in the middle of the
dataset (so that not all data is required at once)

I think, Kalman filter for univariate ARMA seems overkill, but
recursive OLS/GLS for VAR would be very good to have. (with time
varying, depreciating coefficients)

Just some thoughts, but I never used Kalman filter much.

Josef

>
> Skipper
>

Skipper Seabold

unread,

Aug 10, 2010, 2:21:03 PM8/10/10

to pystat...@googlegroups.com

Exactly. Yeah, I've shelved it for now. OLS on the residuals seems
to do ok as far as speeding up convergence, though Stata for instance
seems to zero in on the parameters in the first iteration somehow.
That's why I thought to speed it up by better initial estimates.
Maybe talkbox has something in the lpc stuff that I've overlooked.

> scipy.signal does not have any estimation methods, as far as I have
> seen. The filter design part didn't look useful to me for tsa in
> general.

It's a shame. If I ever have more time, I'd like to try and connect
the dots between tsa and signal better (in my own mind at least).

>
> To the speed of Kalman Filter:
> My impression is that for MLE kalman filter does too many
> calculations, and would be more useful for non-standard cases, e.g.
> missing values. For the simple case, I don't see how kalman filter
> could compete with a fast loop over the data to get the residuals,
> updating all matrices in each period, when for MLE mostly the entire
> sample is relevant.

I'm not sure I see the distinction in what you're suggesting vs. the
KF. See below.

>
> I would think it's more useful for forecasting, filtering and
> smoothing, and handling of missing values than for mle estimation.
> Recursive OLS/GLS or other simpler online/recursive estimation would
> also be very useful.
>

How does recursive OLS differ from the linear Gaussian Kalman filter?
IIUC, that's basically what it is.

Is this what you mean:
http://en.wikipedia.org/wiki/Recursive_least_squares_filter ?

For the univariate case, we don't have to invert anything in the
recursions for the KF.

> And for more complicated models, where it is useful to use kalman
> filter for MLE (missing values, time-varying parameters,...), I don't
> think a minute for getting the estimate is too bad.
> Another possible advantage might be if it scales better for larger
> datasets, e.g. more variables, or restarting in the middle of the
> dataset (so that not all data is required at once)
>
> I think, Kalman filter for univariate ARMA seems overkill, but
> recursive OLS/GLS for VAR would be very good to have. (with time
> varying, depreciating coefficients)
>

I started with the univariate example (the recommendation is to recast
all multivariate estimation as univariate for the Kalman filter) to
get the pieces in order. Having a fast general kalman filter would
cover *a lot* of statistical models. Given a general Kalman filter,
it's almost trivial to add new models once you get the initial
conditions and system matrices right, which are mostly in the
literature.

It looks like most (all?) stats packages use the Kalman filter for
exact likelihood ARIMA and AR and it's most often the default. Not
sure yet, about G/ARCH. R, gretl, stata are the ones I'm working with
right now. And they estimate the models in a matter of seconds, maybe
less than 1.

I took out all the Python indexing and sped it up a little more. I
also wrote this morning a standalone example of the ARIMA kalman
filter using Cython, but I didn't get much speed up at all .45 seconds
vs. .41 seconds for calculating the likelihood (which only becomes a
problem when the solver has to call this function 20-30 times to get
objective function and gradient evaluations). Either, I've overlooked
something in the Cython code, or all of the time is spent in np.dot
Python overhead (which profiling suggests is the case). I haven't
gone down the Tokyo road yet, because I doubt we want to introduce
this dependency. Maybe I will post an example later to cython-user
and see if I am overlooking anything significant.

Right now, I am going to finish the tests for AR, ARMA, then work on
optimizing the Kalman filter, so we can at least push this stuff and
the TSA utility functions into the main code base and make notes in
the docs.

Skipper

josef...@gmail.com

unread,

Aug 10, 2010, 2:58:03 PM8/10/10

to pystat...@googlegroups.com

Optimization in one iteration doesn't sound much like non-linear
optimization too me. Do they have a magic device for the initial
guess/estimate?

>
>> scipy.signal does not have any estimation methods, as far as I have
>> seen. The filter design part didn't look useful to me for tsa in
>> general.
>
> It's a shame. If I ever have more time, I'd like to try and connect
> the dots between tsa and signal better (in my own mind at least).
>
>>
>> To the speed of Kalman Filter:
>> My impression is that for MLE kalman filter does too many
>> calculations, and would be more useful for non-standard cases, e.g.
>> missing values. For the simple case, I don't see how kalman filter
>> could compete with a fast loop over the data to get the residuals,
>> updating all matrices in each period, when for MLE mostly the entire
>> sample is relevant.
>
> I'm not sure I see the distinction in what you're suggesting vs. the
> KF. See below.

One call to signal.lfilter calculates the residuals, but that's for
the case conditional on the initial conditions.

>
>>
>> I would think it's more useful for forecasting, filtering and
>> smoothing, and handling of missing values than for mle estimation.
>> Recursive OLS/GLS or other simpler online/recursive estimation would
>> also be very useful.
>>
>
> How does recursive OLS differ from the linear Gaussian Kalman filter?
> IIUC, that's basically what it is.
>
> Is this what you mean:
> http://en.wikipedia.org/wiki/Recursive_least_squares_filter ?

Yes, I think that's what I meant, a linear model, e.g. AR(X), VAR(X),
can be estimated recursively (in one loop through the data), while
non-linear estimation (e.g. ARMA) requires the optimization with a
loop for each evaluation of the likelihood, as you have right now for
MLE.

>
> For the univariate case, we don't have to invert anything in the
> recursions for the KF.
>
>> And for more complicated models, where it is useful to use kalman
>> filter for MLE (missing values, time-varying parameters,...), I don't
>> think a minute for getting the estimate is too bad.
>> Another possible advantage might be if it scales better for larger
>> datasets, e.g. more variables, or restarting in the middle of the
>> dataset (so that not all data is required at once)
>>
>> I think, Kalman filter for univariate ARMA seems overkill, but
>> recursive OLS/GLS for VAR would be very good to have. (with time
>> varying, depreciating coefficients)
>>
>
> I started with the univariate example (the recommendation is to recast
> all multivariate estimation as univariate for the Kalman filter) to
> get the pieces in order. Having a fast general kalman filter would
> cover *a lot* of statistical models. Given a general Kalman filter,
> it's almost trivial to add new models once you get the initial
> conditions and system matrices right, which are mostly in the
> literature.

I think I ran into Kalman filters recently only for just-in-time
forecasting or now-casting with missing observations or observations
with non-synchronized observation times. I have to look at this when
we get to it.
(I have seen several examples for Particle Filter estimation for
non-linear models, mostly in a Bayesian context.)

>
> It looks like most (all?) stats packages use the Kalman filter for
> exact likelihood ARIMA and AR and it's most often the default. Not
> sure yet, about G/ARCH. R, gretl, stata are the ones I'm working with
> right now. And they estimate the models in a matter of seconds, maybe
> less than 1.

I'm surprised they are so fast if they use Kalman filter. However
there are several fortran or C packages that do Kalman filter, that
they might be using. Which arma implementation in R are you looking
at, I think there are several?

(For heavy duty work, there is also X12 from the US government, but I
have seen only a commandline interface.)

>
> I took out all the Python indexing and sped it up a little more. I
> also wrote this morning a standalone example of the ARIMA kalman
> filter using Cython, but I didn't get much speed up at all .45 seconds
> vs. .41 seconds for calculating the likelihood (which only becomes a
> problem when the solver has to call this function 20-30 times to get
> objective function and gradient evaluations). Either, I've overlooked
> something in the Cython code, or all of the time is spent in np.dot
> Python overhead (which profiling suggests is the case). I haven't
> gone down the Tokyo road yet, because I doubt we want to introduce
> this dependency. Maybe I will post an example later to cython-user
> and see if I am overlooking anything significant.

Many small calls to dot or other linear algebra functions kill the
performance also in Cython, this was the main conclusion for the KF
discussion on the cython mailing list.

>
> Right now, I am going to finish the tests for AR, ARMA, then work on
> optimizing the Kalman filter, so we can at least push this stuff and
> the TSA utility functions into the main code base and make notes in
> the docs.

good,

Cheers,
Josef

>
> Skipper
>

Skipper Seabold

unread,

Aug 10, 2010, 3:14:46 PM8/10/10

to pystat...@googlegroups.com

Magic is the only explanation I've come up with. Sample output from Stata

(setting optimization to BHHH)
Iteration 0: log likelihood = -1274.425
Iteration 1: log likelihood = -1274.3513
Iteration 2: log likelihood = -1274.3209
Iteration 3: log likelihood = -1274.3197
Iteration 4: log likelihood = -1274.3137
(switching optimization to BFGS)
Iteration 5: log likelihood = -1274.3135
Iteration 6: log likelihood = -1274.3116
Iteration 7: log likelihood = -1274.3113
Iteration 8: log likelihood = -1274.3113
Iteration 9: log likelihood = -1274.3113

And then it's converged. So it's definitely a pretty good first guess.

>>
>>> scipy.signal does not have any estimation methods, as far as I have
>>> seen. The filter design part didn't look useful to me for tsa in
>>> general.
>>
>> It's a shame. If I ever have more time, I'd like to try and connect
>> the dots between tsa and signal better (in my own mind at least).
>>
>>>
>>> To the speed of Kalman Filter:
>>> My impression is that for MLE kalman filter does too many
>>> calculations, and would be more useful for non-standard cases, e.g.
>>> missing values. For the simple case, I don't see how kalman filter
>>> could compete with a fast loop over the data to get the residuals,
>>> updating all matrices in each period, when for MLE mostly the entire
>>> sample is relevant.
>>
>> I'm not sure I see the distinction in what you're suggesting vs. the
>> KF. See below.
>
> One call to signal.lfilter calculates the residuals, but that's for
> the case conditional on the initial conditions.
>

Ah, ok.

Brief look on wikipedia and the particle filter definitely looks
interesting, especially since I've seen criticisms of the
extended/augmented Kalman filter, especially for statistical time
series models.

>
>>
>> It looks like most (all?) stats packages use the Kalman filter for
>> exact likelihood ARIMA and AR and it's most often the default. Not
>> sure yet, about G/ARCH. R, gretl, stata are the ones I'm working with
>> right now. And they estimate the models in a matter of seconds, maybe
>> less than 1.
>
> I'm surprised they are so fast if they use Kalman filter. However
> there are several fortran or C packages that do Kalman filter, that
> they might be using. Which arma implementation in R are you looking
> at, I think there are several?
>

arima or arima0 (same if no missing observations)

http://stat.ethz.ch/R-manual/R-devel/library/stats/html/arima.html

> (For heavy duty work, there is also X12 from the US government, but I
> have seen only a commandline interface.)
>

Yeah, I contacted the author of this a few weeks ago, and he said the
code is public domain if we want to wrap just the estimators with f2py
or fwrap depending on how it can be done. It's all in fortran (77?).

gretl provides and interface to it for their ARMA, if it's already installed.

>>
>> I took out all the Python indexing and sped it up a little more. I
>> also wrote this morning a standalone example of the ARIMA kalman
>> filter using Cython, but I didn't get much speed up at all .45 seconds
>> vs. .41 seconds for calculating the likelihood (which only becomes a
>> problem when the solver has to call this function 20-30 times to get
>> objective function and gradient evaluations). Either, I've overlooked
>> something in the Cython code, or all of the time is spent in np.dot
>> Python overhead (which profiling suggests is the case). I haven't
>> gone down the Tokyo road yet, because I doubt we want to introduce
>> this dependency. Maybe I will post an example later to cython-user
>> and see if I am overlooking anything significant.
>
> Many small calls to dot or other linear algebra functions kill the
> performance also in Cython, this was the main conclusion for the KF
> discussion on the cython mailing list.
>

Ah, yeah, then that's definitely what's going on. Tokyo may be the
only option then for Cython. It comes with it's own setup.py and is
pretty small, so it should be easy to include as a submodule if it's
worth it. I will get a benchmark after I finish the tests then see
how many installation issues it might raise.

josef...@gmail.com

unread,

Aug 10, 2010, 8:17:29 PM8/10/10

to pystat...@googlegroups.com

Is this for ARMA(1,1) ? I would use as benchmark a ARMA(2,2) which is
more difficult to estimate. I don't know for ARMA(1,1) but for
GARCH(1,1) there exist analytical estimates that are pretty good and
can work as initial estimates for MLE.

this one

http://www.statsoft.com/textbook/time-series-analysis/?button=3#parameter

recommends AS 191 :

http://www.jstor.org/stable/2347301?origin=JSTOR-artinfo

for fast rough estimation of (S)ARMA, and for an initial estimate.

I don't find the files for X12 that I downloaded a while ago, but from
what I remember it is a large number of Fortran files, and I think
only worth spending time on if there is a sponsor (wrapping it looks
quite a lot of work and not as much fun or instructive as writing your
own implementation).

Josef

Skipper Seabold

unread,

Aug 11, 2010, 8:41:34 AM8/11/10

to pystat...@googlegroups.com

Yeah it was 1,1. Though ARMA(2,2)

(setting optimization to BHHH)
Iteration 0: log likelihood = -14879.27
Iteration 1: log likelihood = -14878.37
Iteration 2: log likelihood = -14878.368
Iteration 3: log likelihood = -14878.368

Converged. It is a bit slower though.

> this one
>
> http://www.statsoft.com/textbook/time-series-analysis/?button=3#parameter
>
> recommends AS 191 :
>
> http://www.jstor.org/stable/2347301?origin=JSTOR-artinfo
>
> for fast rough estimation of (S)ARMA, and for an initial estimate.
>

Thanks. I had mostly been using the paper for their (3), but I will
have a look at the other for a first approximation.

Reply all

Reply to author

Forward