Hi,
This should be in the most recent (.3) release (candidate), but it's
not the default for the build. You must install from source, and do
python setup.py build --with-cython
Then
python setup.py install
Please let us know if you don't see a significant speed-up. If the
model is correctly specified, you should see a much faster
convergence. In future releases, this with-cython flag won't be
necessary.
Skipper
The code is (supposed to be) included in the distribution, but
statsmodels needs to be build with --with-cython as command line
argument. (cython needs to be installed and a c compiler needs to be
available)
I need to find the documentation for this since I haven't used it
myself. And I think we will need to add it more prominently.
Josef
>
> Thanks,
> Robert
I don't find it anywhere in the documentation. We should add these
instructions to the front page.
Is there a specific cython version required, or should we include the
generated c code ?
Josef
Ah, ok. Thanks for the report.
Skipper
I think we should include both, though strictly speaking I don't think
the .pyx file is needed.
Skipper
mis-communication here,
Since I never used it and didn't look carefully enough, I didn't
realize that setup.py requires that we ship the .c file.
either we need to generate the c file for the source distribution, or
require cython to be installed and use the pyx file in setup.py
Josef
>
> Skipper
>
It's mentioned in CHANGES, but should probably be in the INSTALL notes as well.
Skipper
Sorry, I'm talking to fast, the c file is just missing from the manifest
The file is in the source repository
http://bazaar.launchpad.net/~scipystats/statsmodels/devel/files/head:/scikits/statsmodels/tsa/kalmanf/
I think you should be able to just download the c file and put it into
your statsmodels.
Thanks for reporting, we will get a new release out and I will check
better that the source distribution has all the files.
Josef
>
> Josef
>
>>
>> Skipper
>>
>
Yes, I think we just need to add *.pyx and *.c in the global-include
of the manifest.in
Josef
>
> Skipper
>
adding *.py *.pyx *.c to manifest.in takes care of this and also the
missing examples directory that Wes reported
global-include *.csv *.py *.txt *.pyx *.c
Josef
>
> Josef
>
>>
>> Skipper
>>
>
A question for cython experts
The same .c file should be good on all python 2.5, 2.6 2.7 versions,
but for python 3.2 we have to rebuild the c source file from the pyx.
Is that correct?
Josef
>
> Josef
>
>>
>> Josef
>>
>>>
>>> Skipper
>>>
>>
>
Thanks Ralf, one worry less
Josef
>
> Cheers,
> Ralf
>
Briefly (in the middle of studying), the biggest savings is from
taking advantage of the steady state in the (time-invariant) Kalman
filter. This means we can skip calling numpy.dot (a lot). I found
convergence to ss to happen in ~10-15 loops on average, rather than
re-estimating P nobs number of times, at each candidate for params.
See Durbin and Koopman's book for the details.
> Also would you happen to know why R's implementation of arima is so
> dang fast? Are they just not using the kalman filter approach?
> Unfortunately their code isn't even remotely documented.
>
Not in detail no. I would have to look at the docs again or browse the
code (arima vs arima0, whether it's actually exact likelihood, how
they determine starting parameters, etc.), but it shouldn't be *that
much* faster than ours with the optimizations (and set disp < 0 and
pick a good value for m in the optimizer). My guess is that it's all
done in C, the system matrix is set up and then all calls to the
optimizer and resultant loops. There also might be savings in the
approximation to the gradient/Hessian, and it probably uses a
different optimization algorithm (maybe, but it's probably full BFGS).
There are definitely a lot more optimizations that could be done to
speed this up (Fernando has provided a C drop-in for dot that I
haven't tried out yet, making sure that all copies are avoided...),
but I just haven't found the time and I am reasonably happy with the
speed here vs. say Stata. If you see any improvements, please let us
know. The X-11-arima code would be interesting to go through. It's
public domain, written in Fortran I believe, and very fast (though
limited in the number of observations it can handle).
Skipper