The discussion on stationarity should link to the adf unit root test.
I still think adfuller is a weird name, can we switch back to
unitroot_adf, its more explanatory, and "ad" are are not the initials
of Fuller's first and second name?
I'm thinking of reorganizing the main tsa page a bit, add VAR to the
list in estimation, and add the list of var classes also under the VAR
header, instead of just at the bottom of the VAR doc page.
I will also drop arma_mle from the docs for the release.
Josef
(a)ugmented (d)ickey fuller, but yeah, change it to unitroot_adf (or
unitroot.adf?)
Yeah, I don't think it's right yet. Will have a look.
Skipper
---------- Forwarded message ----------
From: <josef...@gmail.com>
Date: Fri, Mar 4, 2011 at 1:09 PM
Subject: Re: [pystatsmodels] devel notes
To: Skipper Seabold <jsse...@gmail.com>
On Fri, Mar 4, 2011 at 12:49 PM, Skipper Seabold <jsse...@gmail.com> wrote:
> On Fri, Mar 4, 2011 at 12:33 PM, <josef...@gmail.com> wrote:
>> the VAR docs look very nice
>> One question: The initial definition has the full contemporaneous
>> covariance matrix, Are there some identifying restrictions in the
>> estimation, or is identification/orthogonalization only relevant for
>> the impulse response function?
>>
>> The discussion on stationarity should link to the adf unit root test.
>> I still think adfuller is a weird name, can we switch back to
>> unitroot_adf, its more explanatory, and "ad" are are not the initials
>> of Fuller's first and second name?
>
> (a)ugmented (d)ickey fuller, but yeah, change it to unitroot_adf (or
> unitroot.adf?)
I still need to clean up this, it doesn't look very professional
http://statsmodels.sourceforge.net/devel/stats.html
In scipy.stats I found it confusing what all the "named" tests are
doing. That's why I prefer if the name indicates what kind of test it
is.
I'm leaving the classes that implement the tests in the docs, but we
should (eventually) roughly agree on a pattern for tests that produce
more results than test statistic and p-value, as we briefly mentioned
before
Josef
If it's not easy or quick to fix, then we could drop it for the
release. I have my notes for the fft implementation somewhere in my
piles of paper. (3 lines of code that takes hours to figure out and
test.)
Josef
>
> Skipper
>
pergram would also sound better as periodogram, (this will get a fft
Yeah, this changed a few weeks ago and it makes mailing lists
particularly difficult (especially since I'm lazy and just reply-all).
Skipper
Well, it gives sensical answers at least, but I need to look more at
the details to see what it's doing and the smoothing in particular as
it doesn't agree with nitime, R, or Stata.
import scikits.statsmodels.api as sm
data = sm.datasets.sunspots.load()
pgram = sm.tsa.pergram(data.endog)
print "%2.4f year cycle" % (1./(pgram.argmax()/float(len(data.endog))))
#11.0357 year cycle
My impression is that your reply-all is creating the problem. I
checked my emails and except for the last few with you all my messages
were sent only to the mailinglists.
I think when gmail sees two identical messages, one personal and one
through the mailing list, then ``reply`` answers by default the
personal message and not the mailing list. At least that`s how it
looks here.
Josef
>
> Skipper
>
nitime and matplotlib use different implementations, I don`t think any
of them should agree. And my impression is that the matplotlib
implementation doesn`t work well with Macro data, because they are too
short and irregular.
I think from the comparison with the theoretical spectrum of an ARMA
process, your pergram also looked ok. You could just change the notes
and explain a bit. Since it`s mostly for graphical analysis, we don`t
need to have an exact matching version, and since there are so many
variations on calculating the periodogram (and smoothing) it might be
difficult to figure out what exactly R or Stata are doing.
Josef
This reorganization makes sense-- then the class reference will be on
the main tsa page, sounds good to me.
For the record, Stata does this for the raw periodogram, though I
don't understand their log-standardization yet
x = sm.datasets.sunspots.load().endog
pgram = np.abs(len(x)*np.fft.ifft(x))**2
pgram[0] = 0
Skipper
I`m not sure why they do this either, is there a strong stationarity
assumption behind this? I think there were some (fractional) unit root
tests for the periodogram at zero frequency.
Josef
>>
>> Skipper
>>
>
Back to list, no more reply all for me.
Yeah, I'm looking at the results now and stata and their docs use
numpy's ifft, but I see the fft in the textbooks (Brockwell and Davis
right now). The implied frequency is the same though. More digging...
> what I haven`t tried yet is to multiply the fft with a window, and
> whether the numpy windows are the standard windows for tsa.
> I don`t know about the log scaling.
>
> Josef
>
>
It doesn't look like anyone includes the periodogram at 0. Ie.,
Brockwell and Davis give for a subset of the sunspots data the
following
ss = sm.datasets.sunspots.load().endog[70:170]
pgram = 1./len(ss)*np.abs(np.fft.fft(ss))**2
Then plot
pgram[1:]/(2*np.pi)
Then they plot a smoothed version.
Skipper
This is also what octave's (and presumably matlab's) periodogram
output, so I'm going go with it. They also set pgram[0] ~ 0.
Fine, if they all do it.
sometimes in some implementations it's not very clear why they are
doing things, I just read recently the commit message in pysal about
not matching R and Stata in a test, I think it was Breush-Pagan.
pysal has now also a nice diagnostics module, with some tests that we
don't have yet.
I just realized I still have two versions of Het GoldfeldQuandt, one
with y**2, one matching R which drops the squaring of y which is not
what I was initially reading and implemented. I thought I had switched
to matching R for all versions.
Browsing the online docs, I just realized again that the docs link to
the source code, since I updated Sphinx. Do we want to drop this
again? I'm ok either way.
Josef
Plenty of enhancements for later.
> sometimes in some implementations it's not very clear why they are
> doing things, I just read recently the commit message in pysal about
> not matching R and Stata in a test, I think it was Breush-Pagan.
> pysal has now also a nice diagnostics module, with some tests that we
> don't have yet.
>
> I just realized I still have two versions of Het GoldfeldQuandt, one
> with y**2, one matching R which drops the squaring of y which is not
> what I was initially reading and implemented. I thought I had switched
> to matching R for all versions.
>
> Browsing the online docs, I just realized again that the docs link to
> the source code, since I updated Sphinx. Do we want to drop this
> again? I'm ok either way.
>
Not sure what you mean. The show source for the .rst?
Skipper
(reply-all? I'm changing now To: by hand)
No, on the right side of the doc pages there is a link to the source code
I found 2 missing summary lines, I have a pdf file now, 390 pages,
pdflatex still stops, but finishes after typing "q"
Josef
>
> Skipper
>
Old habits.
> No, on the right side of the doc pages there is a link to the source code
>
> http://statsmodels.sourceforge.net/devel/_modules/scikits/statsmodels/sandbox/stats/multicomp.html#GroupsStats
>
I kind of like that. Never noticed it before. Maybe easier to browse
than launchpad.
> I found 2 missing summary lines, I have a pdf file now, 390 pages,
> pdflatex still stops, but finishes after typing "q"
>
Good. I will see if I can get these updating to sourceforge as well.
Let me know when you push.
Skipper
It increases the size of the docs, and shows some of my messy code. If
we want, and figure out how to do it, we could also turn it off for
the release and turn it back on for the online docs.
It sure makes quickly checking the implementation much faster, a bit
like python source docs before sphinx.
>
>> I found 2 missing summary lines, I have a pdf file now, 390 pages,
>> pdflatex still stops, but finishes after typing "q"
>>
>
> Good. I will see if I can get these updating to sourceforge as well.
> Let me know when you push.
htmlhelp, chm, builds without problems and looks quite ok, very fast
and easy for browsing. It also contains the source files, but the link
is only at the docs of the classes not for the individual methods,
from what I have seen so far.
html docs are now at 13MB so we don't include them in the distribution
anymore, chm is 1.7MB
I will push later tonight, or tomorrow morning, I'm on a new branch
and without the latest merge from devel, so I still have some work
todo.
Building the pdf only required changes in two lines, but I would like
to get a pdflatex build that doesn't stop.
Josef
>
> Skipper
>
We just need to remove sphinx.ext.viewcode from the extensions in
conf.py to turn it off.
>
> It sure makes quickly checking the implementation much faster, a bit
> like python source docs before sphinx.
>
>>
>>> I found 2 missing summary lines, I have a pdf file now, 390 pages,
>>> pdflatex still stops, but finishes after typing "q"
>>>
>>
>> Good. I will see if I can get these updating to sourceforge as well.
>> Let me know when you push.
>
> htmlhelp, chm, builds without problems and looks quite ok, very fast
> and easy for browsing. It also contains the source files, but the link
> is only at the docs of the classes not for the individual methods,
> from what I have seen so far.
>
> html docs are now at 13MB so we don't include them in the distribution
> anymore, chm is 1.7MB
>
> I will push later tonight, or tomorrow morning, I'm on a new branch
> and without the latest merge from devel, so I still have some work
> todo.
> Building the pdf only required changes in two lines, but I would like
> to get a pdflatex build that doesn't stop.
>
Might be able to just add -interactive nonstopmode to LATEXOPTS in
*/build/latex but mine have too many errors still for this to work.
Skipper
just found this http://statsmodels.sourceforge.net/devel/_modules/index.html
the list of source files
>
>>
>>> I found 2 missing summary lines, I have a pdf file now, 390 pages,
>>> pdflatex still stops, but finishes after typing "q"
>>>
>>
>> Good. I will see if I can get these updating to sourceforge as well.
>> Let me know when you push.
>
> htmlhelp, chm, builds without problems and looks quite ok, very fast
> and easy for browsing. It also contains the source files, but the link
> is only at the docs of the classes not for the individual methods,
> from what I have seen so far.
>
> html docs are now at 13MB so we don't include them in the distribution
> anymore, chm is 1.7MB
>
> I will push later tonight, or tomorrow morning, I'm on a new branch
> and without the latest merge from devel, so I still have some work
> todo.
> Building the pdf only required changes in two lines, but I would like
> to get a pdflatex build that doesn't stop.
no clue yet, debuging sphinx, rst and 41000 lines of produced latex is
not "trivial"
in linear_model/RegressionResults there is a problem in the transition
between explicit and autogenerated docstring that generates an
incorrect latex.
Josef
>
> Josef
>
>
>>
>> Skipper
>>
>
won't help. I didn't have any problems running "sphinx latex". It
builds the latex but the latex has errors. I only get the problems
when calling pdflatex, and that is not included in the make files.
I didn't see any "sphinx pdf" in the make file
Josef
>
> Skipper
>
I got the first group of errors
it should be
\begin{longtable}{ll}
instead of
\begin{longtable}{LL}
I edited the latex directly, sphinx generates incorrect capital L for
package longtable (at least the version of longtable that I have). I
haven't looked at sphinx yet.
I still have some math errors that make the computer peep, and one
unicode error, that stop pdflatex, and tons of warnings.
maybe tomorrow
Josef
>
> Josef
>
>>
>> Skipper
>>
>
It looks like there is a strange unicode "-" in var.rst, in the last
math line of
{{{
Forecasting
~~~~~~~~~~~
The linear predictor is the optimal h-step ahead forecast in terms of
mean-squared error:
.. math::
y_t(h) = \nu + A_1 y_t(h - 1) + \cdots + A_p y_t(h - p)
}}}
but I cannot "see" what it is, and the python IDLE shell doesn't like it
>>> s="""Forecasting
~~~~~~~~~~~
The linear predictor is the optimal h-step ahead forecast in terms of
mean-squared error:
.. math::
y_t(h) = \nu + A_1 y_t(h - 1) + \cdots + A_p y_t(h - p)"""
Unsupported characters in input
>>>
the peeping math is left, but that's most likely because I never know
how to use the math directive correctly. There are still lots of math
problems in my docstrings.
Josef
sphinx latex seems to have the right behavior, but I have no idea why
{|l|l|} does not show up in the generated latex file
elif self.table.longtable:
self.body.append('{|' + ('l|' * self.table.colcount) + '}\n')
else:
self.body.append('{|' + ('L|' * self.table.colcount) + '}\n')
If we want pdflatex to finish without stopping, then we might have to
do also some string replacement directly in the generated Latex. Or
hitting "q" is part of the pdf build process.
I haven't fixed the other latex problems yet.
Josef
Are all of your doc changes in statsmodels-josef-experimental-030?
Skipper
All changes I have done so far are in devel. Between latex debugging,
playing with git and studying ancient history, I haven't done the
other part yet.
It's on my schedule for the rest of the day (once I manage to shut
down 100 tabs in firefox, I'm always low in memory these days).
Fixing the math is one of the main doc tasks still to do, Latex is
more sensitive to math errors and I still have many of those.
Do you have time to look at kalmanfilter and similar?
I don't know yet how clean we are able to get the sphinx log
(warnings, non-fatal errors) since there is still a lot of noise.
Josef
>
> Skipper
>
Do you have or can you prepare a list of the copyright of the
datasets? Debian packaging seems to have had problems with our
copyright statements, although I don't know any details.
It will also be easier if we start to keep a copyright_oth file, at
least some of my sandbox code also has "foreign" copyright statements
in some modules that we should also keep track of centrally.
(I don't know where we have a foreign datetime string parsing.)
Josef
Will have a look.
Skipper
Err, any more details? All of the datasets are public domain or I have
received express permission to include and distribute them except the
world fertility survey from 1978. Does Debian include R, if so, how do
they justify the inclusion of their datasets.
In [29]: for data in dir(sm.datasets):
....: if not data.startswith(('_','d','D')):
....: print data+":"
....: print getattr(getattr(sm.datasets,data),'COPYRIGHT')
anes96:
This is public domain.
ccard:
Used with expressed permission of the original author, who
retains all rights.
committee:
Used with expressed permission from the original author,
who retains all rights.
copper:
Used with expressed permission from the original author,
who retains all rights.
cpunish:
Used with expressed permission from the original author,
who retains all rights.
grunfeld:
This is public domain.
longley:
This is public domain.
macrodata:
This is public domain.
randhie:
This is in the public domain.
scotland:
Used with expressed permission from the original author,
who retains all rights.
spector:
Used with express permission of the original author, who
retains all rights.
stackloss:
This is public domain.
star98:
Used with expressed permission from the original author,
who retains all rights.
sunspots:
This data is public domain.
wfs:
Available for use in academic research. See SOURCE.
That`s good. I `didn`t know it`s so easy to get the list.
No more details, I just saw a (random) comment.
I have no idea how R is treating copyright on data, they don`t seem to
worry too much. I looked at Hyndman`s Time Series Data, and he
requests citation of the website, but most of the data is just taken
out of text books. Go figure.
Josef
Trying to write some docs for some of the newer tsa stuff (I've added
a robust add_lag to commit) and how I usually use it along with some
examples. Also want to write some docs for the datasets, which would
mention the copyrights and maybe clear up some of this. I saw Mike
just added the Old Faithful dataset, and I wanted to use it as a case
of showing how easy it is to add an official dataset.
Skipper
Ok, I`m working on the sphinx doc mainly for organizing and looking
for more bugs, one more var rename. I have some notes in
diagnostic.rst that wasn`t linked to anything yet.
The sourceforge problem occurs also with "var" for variance:
Josef
>
> Skipper
>
To datasets: On my computer I added the raw Greene files in a dataset
with just a function to load by table name as specified in Greene. I
didn`t commit it because it doesn`t follow the standard pattern, but
it`s convenient to load the data while working on examples in Greene`s
book, since I don`t have to figure out if we have it as a named
dataset.
Should we add something like this ?
Josef
Just got assigned, so maybe a fix will be forthcoming.
https://sourceforge.net/apps/trac/sourceforge/ticket/18145
Skipper
Sounds helpful, though I should note that he was probably the most
reticent about allowing the data to be distributed. A lot of
econometric software seems to come bundled with Wooldridge datasets as
well (maybe a textbook with more of an applied focus).
Skipper
FYI, after running
make latex
adding
LATEXOPTS = -interaction=nonstopmode
to
*/build/latex/Makefile
and then
make -C build/latex all-pdf
it will run to completion ignoring errors, but it still has problems obviously.
Skipper
Once it build`s we are already far, the rest is cosmetic.
make.bat doesn`t build pdf files automatically, only latex, so only
Windows users have to hit "q", but on windows the htmlhelp is much
nicer.
Did you ever try qthelp or devhelp, or whatever posix users use that
don`t use exclusively the command line (assuming this kind of users
even exist)?
I think we could add a command after the latex build to replace LL by
|l|l| in the generated latex file, maybe with a simple python script,
but I`m not sure it`s worth it.
Josef
>
> Skipper
>
I haven't yet, but will have a look.
>
> I think we could add a command after the latex build to replace LL by
> |l|l| in the generated latex file, maybe with a simple python script,
> but I`m not sure it`s worth it.
>
I'm looking at sphinx now about this. The relevant code is in
sphinx/writer/latex.py. I don't know the longtable package well
enough, but it looks like this might be a bug as it doesn't seem to
ever want capital L vs. lowercase l.
Skipper
Is there a reason not to use import * and define an __all__ in the
modules if necessary ?
import diagnostic
from diagnostic import *
import multicomp
from .multicomp import *
....
until now you used only explicit imports
I will add some arma_process function to tsa/api.py
Josef
I seemed to me that that part of sphinx 1.0.7 latex.py is correct but
it never get`s called. I didn`t see anything that could add {LL}, my
suspect is spec (from what I remember) but I don`t see where it would
be created.
Josef
>
> Skipper
>
Just trying to avoid doing this, but I don't suppose it matters much
in the api files.
> I will add some arma_process function to tsa/api.py
I though these weren't to be exposed much? If so, do we need to
revisit the sign convention for how to represent an ARMA process.
Skipper
If I had to teach time series econometrics, I would want to have them.
I had added a from_coeff and a from_estimation to the ArmaProcess, so
it`s easier to work with the theoretical properties and impulse
response functions from an estimated ARMA model. We are not using the
lagpolynomials itself yet, because we only allow for full lags (no
seasonal, no zeros) so far.
I leave them out for now, since I don`t think I tested these latest
changes, and I`m not sure I have the two parameterisations that we
discussed throughout the module. (both from_* class methods seem to
have the wrong sign for the ma part. oops)
I still need to convert all my test examples to actual tests.
Josef
>
> Skipper
>
https://bitbucket.org/birkenfeld/sphinx/src/ce4bb37a1409/sphinx/writers/latex.py#cl-657
But you're right it doesn't seem to actually be called.
Just tried with Sphinx HEAD and nothing works, so there must be some
big changes coming up.
I just made a change to an error in the math in sandbox/stats/diagnostic.py
ran
make latex
changed LL to ll
ran make all-pdf
It finished without stopping and at least looks passable. I will see
about older versions and putting these up on the web.
Skipper
I haven`t looked at this in a while, the impulse response function is
tested against matlab, there are also several ways to get the spectral
density from the (estimated) arma coefficients, checks for
stationarity and invertibility and so on. However, official tests
cover only very few parts.
I think eventually ArmaProcess will have to be exposed as a working class.
Josef
>
> Josef
>
>
>>
>> Skipper
>>
>
My guess was
if self.table.colspec:
self.body.append(self.table.colspec)
but I didn`t see anything in numext either that might define a colspec
>
> Just tried with Sphinx HEAD and nothing works, so there must be some
> big changes coming up.
That`s not good news if we have to go through all the changes again.
If there are more changes coming up we better not spend the time
figuring out how the current version works or doesn`t work :)
>
> I just made a change to an error in the math in sandbox/stats/diagnostic.py
>
> ran
>
> make latex
> changed LL to ll
> ran make all-pdf
That`s roughly what I also did, but it got boring, when I ran latex
several times in a row to look at the other errors
>
> It finished without stopping and at least looks passable. I will see
> about older versions and putting these up on the web.
Some of the math might still be horrible, but I didn`t look very closely.
Josef
>
> Skipper
>
https://bitbucket.org/birkenfeld/sphinx/src/ce4bb37a1409/sphinx/ext/autosummary/__init__.py#cl-285
>>
>> Just tried with Sphinx HEAD and nothing works, so there must be some
>> big changes coming up.
>
> That`s not good news if we have to go through all the changes again.
> If there are more changes coming up we better not spend the time
> figuring out how the current version works or doesn`t work :)
>
Probably not as dire as I made it sound. I made no effort to debug.
>>
>> I just made a change to an error in the math in sandbox/stats/diagnostic.py
>>
>> ran
>>
>> make latex
>> changed LL to ll
>> ran make all-pdf
>
> That`s roughly what I also did, but it got boring, when I ran latex
> several times in a row to look at the other errors
>
What do you want to do? For the nightly builds, I can just replace the
'LL' with 'll' after building the latex, I can patch my sphinx so it
only works locally, or I can patch the Makefile to do the replacement,
which would fix it for everyone, but raises the question of why this
is happening and if it's not a bug in Sphinx.
Is it useful to have nightly pdf builds? I think updating the html is
enough. In a nightly build, there will be other, new errors showing
up, which might make downloading a pdf file each time not very useful
to users.
For the release we should have a clean enough pdf version and a way
for users to build it, you could just add a shell command to the
makefile that does a global string replace in the latex file. That
would be the cheapest way, I doubt there are (m)any Windows users who
will build the pdf file.
Josef
In devel. Will you see if the fix works on windows?
Skipper
I will look at it and try it later tonight.
Josef
>
> Skipper
>