I have successfully built the package for 64-bit Windows with Python
2.7 (see below), but I am not sure about support for Python 2.5 or
Python > 3.1 yet. You can use similar instructions to build 32-bit or
just use mingw32 as far as I know.
I'm going to go ahead with the pull request unless there is anything
I'm missing.
Skipper
Windows Build Instructions (to be added to the docs)
---------------------------------
I grabbed Cython 0.15.1 and pandas 0.7.1 from Cristoph Gohlke.
Install the SDK as mentioned here.
http://wiki.cython.org/64BitCythonExtensionsOnWindows
I don't know what you need for Python > 3.1 or Python == 2.5. But this
works on 2.7. Install the SDK. Then open SDK Command Window
Start -> Programs -> Microsoft Windows SDK v7.0 -> CMD Shell
set DISTUTILS_USE_SDK=1
setenv /x64 /release
cd to statsmodels directory
python setup.py build
python setup.py install
Is using cython and compiling still optional?
If yes, just go ahead with the merge.
I haven't tried compiling anything for python 3.2 yet (windows, most
of what I use is Gohlke)
I also never setup the Microsoft Windows SDK
Josef
Two clicks. Branches. Then compare.
The idea is not to make it optional, so we don't have code twice
everywhere and in the case of KDE, it's prohibitively slow to fall
back to no Cython. So, no, it's not in this branch right now.
Then, I don't like it.
For now, cython will be only used in two places, none of the rest
needs the additional compilation step.
I'd rather have a statsmodels that can be installed as python package,
so users can do OLS, RLM and GLM and tsa (where some parts might be
slower) without having to worry about how to get a binary version.
I think eventually we will end up with enough cython to require it,
but for now I don't think it's worth the additional trouble.
Josef
We will provide snapshot binaries. There won't be any additional
burden for anyone that doesn't want to build the package themselves.
We could even set it up not to install the models that require Cython.
I doubt anyone really wants to use an ARMA estimator that takes 15-30
seconds to return a single model fit anyway.
> I think eventually we will end up with enough cython to require it,
> but for now I don't think it's worth the additional trouble.
Chicken and egg problem. If we're setup for Cython then maybe we will
get Cython contributions. If I know I can just write the .pyx file and drop it
in and the machinery is already there for it to be built without having to
think about it, I will write more Cython when it will help. There are certainly
places where we could start to optimize. I also hope that bootstrapping and
cross-validation will be a part of the .5 release.
Skipper
Can you keep using cython optional like now, or is it not possible
anymore with the new setup?
I think 15-30 seconds sounds fast enough (compared to some MCMC
comments that I read on the Gellman blog where you need at least 24
hours of burnin :)
The main problem with binaries are the version compatibility problems.
If I don't use the latest or most common version of python and numpy,
it always takes me a while to chase down binaries. (but thanks to
Gohlke most of the time it's not impossible)
(And it will take me some time to get setup for building binaries for
python 3.2, and find a replacement for tox on Windows that can handle
the additional build step. It might actually work, numpy, and scipy
cannot be build in an virtualenv on Windows the last time I tried.)
>
>> I think eventually we will end up with enough cython to require it,
>> but for now I don't think it's worth the additional trouble.
>
> Chicken and egg problem. If we're setup for Cython then maybe we will
> get Cython contributions. If I know I can just write the .pyx file and drop it
> in and the machinery is already there for it to be built without having to
> think about it, I will write more Cython when it will help. There are certainly
> places where we could start to optimize. I also hope that bootstrapping and
> cross-validation will be a part of the .5 release.
If an optional cython process is there, this would still be the case.
We could keep the chicken for 0.4 and build the eggs for 0.5 ?
Josef
>
> Skipper
Not when it's instantaneous in every other package. The comparison
would be an efficient 24 hour MCMC vs an inefficient 180 hour MCMC
implementation.
> The main problem with binaries are the version compatibility problems.
> If I don't use the latest or most common version of python and numpy,
> it always takes me a while to chase down binaries. (but thanks to
> Gohlke most of the time it's not impossible)
For us, it would just be Python versions + 32-bit and 64-bit, which
we'd need to provide periodically, unless we get better about releases
or get a buildbot.
> (And it will take me some time to get setup for building binaries for
> python 3.2, and find a replacement for tox on Windows that can handle
> the additional build step. It might actually work, numpy, and scipy
> cannot be build in an virtualenv on Windows the last time I tried.)
>
>>
>>> I think eventually we will end up with enough cython to require it,
>>> but for now I don't think it's worth the additional trouble.
>>
>> Chicken and egg problem. If we're setup for Cython then maybe we will
>> get Cython contributions. If I know I can just write the .pyx file and drop it
>> in and the machinery is already there for it to be built without having to
>> think about it, I will write more Cython when it will help. There are certainly
>> places where we could start to optimize. I also hope that bootstrapping and
>> cross-validation will be a part of the .5 release.
>
> If an optional cython process is there, this would still be the case.
>
> We could keep the chicken for 0.4 and build the eggs for 0.5 ?
>
Ok. Kicking down the road again. I'll continue to test for Cython
available files and provide both implementations.
Skipper
Can you make sure this installs okay
https://github.com/statsmodels/statsmodels/pull/174
It should be fine. It worked for me on Linux and Windows.
Skipper
python 2.6 2.7 installs fine with tox in virtualenv, no test failures
or errors, but it didn't compile the cython (while IIRC pandas did
automatically)
problem on python 3: you put the build tools into tools which has
add_constant in __init__ which cannot be loaded yet at this stage.
File "statsmodels\tsa\kalmanf\setup.py", line 3, in <module>
from statsmodels.tools._build import cython, has_c_compiler
File "statsmodels\tools\__init__.py", line 1, in <module>
from tools import add_constant, categorical
File "statsmodels\tools\__init__.py", line 1, in <module>
from tools import add_constant, categorical
ImportError: cannot import name add_constant
I haven't checked the details yet,
My guess is setup is now doing too much before 2to3 is called (during
python 3.2 pip install)
Josef
>
> Skipper
How does something get compiled without a compiler? Do you mean that
pandas still generates the C code from Cython but doesn't compile it?
Here, if there's no compiler detected, we don't bother generating the
C code from the Cython source because we can't do anything with it
anyway and we have pure python to fall back on.
> problem on python 3: you put the build tools into tools which has
> add_constant in __init__ which cannot be loaded yet at this stage.
The other option was to put it in the top-level tools and then add
this directory to the path each time a subpackage needs to install
Cython, but that felt too hackish. Can you see if that fixes it?
I didn't pay attention to all the details and tox doesn't show it, but
I have MingW as compiler in the python 2.6 and 2.7 virtualenv. I think
it worked automatically for installing pandas from source. I just
added distutils.cfg to the virtualenv python.
I don't have cython in the virtualenv, and I don't have cython in the
system python 2.7, ???
(one guess, if pandas builds the c source when sdist is called, then
python 2.6 with cython could create the c source for python 2.7)
as long as it works I don't ask why it's not broken.
>
>> problem on python 3: you put the build tools into tools which has
>> add_constant in __init__ which cannot be loaded yet at this stage.
>
> The other option was to put it in the top-level tools and then add
> this directory to the path each time a subpackage needs to install
> Cython, but that felt too hackish. Can you see if that fixes it?
I wondered why you didn't put it in top level tools. Might not before
tonight that I have enough time to dig into this.
Josef
I tried just with commenting out the imports from tools.__init__
(which I don't like in general anyway) and it seems to work.
So an alternative would be to put the buildtools create another
directory with an empty __init__ just for the build stage.
(My tox for python 3.2 doesn't complete anymore, because a file,
macrodata.dta, cannot be deleted or overwritten (runaway process with
open file handle?). But that's just the last stage of copying the the
build from the temporary directory, everything before looks fine)
I can look at the build/install logs that pip produces for python 2.6 2.7 later.
Josef
I'm out of my depth. During the build step we can check sys.argv and
capture --compiler=mingw32. But during install, there's no way to use
the same logic to have has_c_compiler return true (without checking
for the compiled .pyd file) unless we write some kind of temporary
file for this check.
It would *really* be easier to just have people install a compiler _if
and only if you want to build from source_ (which is probably no end
user), since it works fine with mingw32 (32-bit) and with the MS SDK
(32- and 64-bit) with very little effort. I'm not thrilled about
spending anymore time trying to shoehorn this in.
> (one guess, if pandas builds the c source when sdist is called, then
> python 2.6 with cython could create the c source for python 2.7)
>
> as long as it works I don't ask why it's not broken.
>
>>
>>> problem on python 3: you put the build tools into tools which has
>>> add_constant in __init__ which cannot be loaded yet at this stage.
>>
>> The other option was to put it in the top-level tools and then add
>> this directory to the path each time a subpackage needs to install
>> Cython, but that felt too hackish. Can you see if that fixes it?
>
> I wondered why you didn't put it in top level tools. Might not before
> tonight that I have enough time to dig into this.
>
I'll be off the grid for several days starting this evening. That's
why I'm trying to get all this done. Almost checked off everything
though.
I tried to do this in the top-level setup.py using numpy.distutils.
AFAICT from the docs of add_data_dir, you should be able to use a
function that returns the path to the data file to include, but I
couldn't get it to work. Stefan just has _build.py in the package
directory, which we could also do I just like to keep that clean.
I didn't have much time to look more into this.
My impression (which might be not all correct)
In the pandas way, the c sources are created when the sdist is build,
that means the install only needs a compiler but not cython.
The way you have it, the installing python needs to have cython and a
compiler. (In this case you could also just check for the presence of
cython not of a compiler)
The way tox works seems to be: it uses the system python to create the
sdist (python 2.6 with up-to-date cython in my case) then the
virtualenv python just needs to compile (pip install). This is the
same as when we distribute an sdist.
I have the compiler MingW specified in
pyxxx\Lib\distutils\distutils.cfg. The check for the compiler that you
posted on the numpy mailinglist doesn't pick up the information from
distutils.cfg and only looks for Microsoft compiler which I don't
have.
my two preliminary conclusions
I like the pandas way of distributing c files in the sdist better than
requiring cython.
In the current version, if the _build.py is moved to a neutral place,
(if it works, I think it would be easiest just next to setup.py or
somewhere central), then the pure python install (without cython and
compilation) seems to work without problems.
On the other hand, in my setup it also doesn't help, since it's not
picking up MingW. I guess this could be fixed with a better
has_compiler check.
Josef
There's nothing stopping us from doing this too and the plan is to
distribute C files with the source distribution. I just want to avoid
putting the C source under version control.
> In the current version, if the _build.py is moved to a neutral place,
> (if it works, I think it would be easiest just next to setup.py or
> somewhere central), then the pure python install (without cython and
> compilation) seems to work without problems.
> On the other hand, in my setup it also doesn't help, since it's not
> picking up MingW. I guess this could be fixed with a better
> has_compiler check.
>
Or by removing the has_compiler check.
These are separate issues. You can't install any of these packages
from source without a compiler whether you have the C files for the
extension from cython or they were given to you in a source
distribution. If we can remove the has_compiler check altogether then
it works with anything. I've installed under mingw32 and SDK when just
assuming there's a compiler. I just don't know how to parse the
distutils.cfg or get this information from distutils/numpy.distutils
to _optionally_ install the extensions. I was hoping this would be an
easy question for those with more knowledge about this on the numpy
list as numpy checks for the compilers but it's tied up with checking
for ATLAS, etc AFAICT, and I didn't want to spend too much time
getting my head around this code. I haven't seen any other packages
that provide both cython and python for the same thing.
If you can get it working, then by all means, but my vote is still for
using cython code always when it's available. I don't see the point in
trying to provide both and choosing during installation anymore.
Skipper
However that requires a different setup.py again, if you want it
automatically generated with sdist.
>
>> In the current version, if the _build.py is moved to a neutral place,
>> (if it works, I think it would be easiest just next to setup.py or
>> somewhere central), then the pure python install (without cython and
>> compilation) seems to work without problems.
>> On the other hand, in my setup it also doesn't help, since it's not
>> picking up MingW. I guess this could be fixed with a better
>> has_compiler check.
>>
>
> Or by removing the has_compiler check.
>
> These are separate issues. You can't install any of these packages
> from source without a compiler whether you have the C files for the
> extension from cython or they were given to you in a source
> distribution. If we can remove the has_compiler check altogether then
> it works with anything. I've installed under mingw32 and SDK when just
> assuming there's a compiler. I just don't know how to parse the
> distutils.cfg or get this information from distutils/numpy.distutils
> to _optionally_ install the extensions.
I went through the distutils.cfg issue once on the cython mailing list.
I think it shouldn't be tooo difficult to get a has_compiler check to
work. (famous last words)
Did you manage to install with mingw32 in the current setting?
>I was hoping this would be an
> easy question for those with more knowledge about this on the numpy
> list as numpy checks for the compilers but it's tied up with checking
> for ATLAS, etc AFAICT, and I didn't want to spend too much time
> getting my head around this code. I haven't seen any other packages
> that provide both cython and python for the same thing.
David C. was arguing for this and providing it in his audio? or the
other package? the one with levinson-durbin in it.
I need to look there.
>
> If you can get it working, then by all means, but my vote is still for
> using cython code always when it's available. I don't see the point in
> trying to provide both and choosing during installation anymore.
I'm not giving up the hope yet.
Josef
>
> Skipper
Yes, you'd have to provide the command or just put it in MANIFEST.in.
I had the pandas way before, but we have to figure out the conflict
between numpy.distutils and distutils or stop using the former.
>>
>>> In the current version, if the _build.py is moved to a neutral place,
>>> (if it works, I think it would be easiest just next to setup.py or
>>> somewhere central), then the pure python install (without cython and
>>> compilation) seems to work without problems.
>>> On the other hand, in my setup it also doesn't help, since it's not
>>> picking up MingW. I guess this could be fixed with a better
>>> has_compiler check.
>>>
>>
>> Or by removing the has_compiler check.
>>
>> These are separate issues. You can't install any of these packages
>> from source without a compiler whether you have the C files for the
>> extension from cython or they were given to you in a source
>> distribution. If we can remove the has_compiler check altogether then
>> it works with anything. I've installed under mingw32 and SDK when just
>> assuming there's a compiler. I just don't know how to parse the
>> distutils.cfg or get this information from distutils/numpy.distutils
>> to _optionally_ install the extensions.
>
> I went through the distutils.cfg issue once on the cython mailing list.
> I think it shouldn't be tooo difficult to get a has_compiler check to
> work. (famous last words)
>
> Did you manage to install with mingw32 in the current setting?
>
Yes. You have to comment out the has_c_compiler or have it return True
always. The problem right now is that it's easy to find out if someone
passed a --compiler flag in the build step, but when you do setup.py
install, there's no (I haven't figured out a) (straightforward) way
for has_c_compiler to return True after the extensions are built. I
guess the alternative is to write a custom build command that either
plays nice with numpy.distutils or get rid of numpy.distutils.
>>I was hoping this would be an
>> easy question for those with more knowledge about this on the numpy
>> list as numpy checks for the compilers but it's tied up with checking
>> for ATLAS, etc AFAICT, and I didn't want to spend too much time
>> getting my head around this code. I haven't seen any other packages
>> that provide both cython and python for the same thing.
>
> David C. was arguing for this and providing it in his audio? or the
> other package? the one with levinson-durbin in it.
> I need to look there.
>
I don't see anything like this in audiolab or talkbox. Both provide
and build C extensions.
>>
>> If you can get it working, then by all means, but my vote is still for
>> using cython code always when it's available. I don't see the point in
>> trying to provide both and choosing during installation anymore.
>
> I'm not giving up the hope yet.
>
I still don't see why to spend time on this. Users grab binaries
snapshots. People who want to build from source on windows a) know how
or b) get mingw32 or install the SDK with instructions we provide.
Skipper
I managed to get the compiler check in has_c_compiler() to parse the
cfg file and use the compiler is specified.
The function uses now mingw32 in my case.
I didn't try parsing the command line for a compiler, since I haven't
thought yet how I can get into a session and use introspection to work
in that case.
I also saw that cython.inline ignores distutils.cfg and doesn't work
in my setup.
Josef
Can I pull this in and merge this today in an effort to cut a release candidate?
Skipper
It worked for me.
The only problem left, that I see, is to move the tools so they don't
import statsmodels during build.
I'm also soon ready to work on statsmodels and get the other things
ready for merge
Josef
I still think this is fragile and would prefer to just be explicit
about the build process. Is it true that all windows users have a
distutils.cfg? I don't. I would expect building with
--compiler=mingw32 to work and in this case, it still won't. I can
write a hack around that but it will just be that.
Otherwise, this needs another rethink and more coding / looking at
distutils for a robust solution... Why, again, can't we just have
users who want to build from source have a compiler?
Skipper
I always have a distutils.cfg since some things don't work without one
and I find always typing --compiler=mingw32 annoying.
(require some users to write a distutils.cfg (currently), or require
other users to install Windows SDK?)
cython.pyximport has a more robust version of .cfg parsing, but I
didn't want to try to figure out the details.
I assume there is an "official" way to get at the setup.py commandline
options, but I didn't want to look for it until we decide to go this
way.
>
> Otherwise, this needs another rethink and more coding / looking at
> distutils for a robust solution... Why, again, can't we just have
> users who want to build from source have a compiler?
As above, I don't think it's worth to require cython and a compiler
for development. If I just work on the source, checkout or branch I
don't have to worry about building anything, and having a "full"
development environment.
(and I'm not going to use a Debian virtual machine for it :)
I don't have a compiler for win64 and python 3.2, and I don't see a
need to require it.
However, I think distributing fast binaries is useful.
Josef
>
> Skipper
So it's up to me to wade through this and make this thing that no one
else has ever thought to do robust rather than you just installing SDK
or just using mingw32 _that works out of the box if we don't have to
do this check_? This check is the only problem right now. Everything
else works without it provided you have mingw32 or are using MS SDK.
It's not a matter of getting at the command line. This is trivial (and
is what my has_c_compiler does). The problem is that you don't specify
the compiler at the command line for the install step, but we still
have this "do we have a C compiler available or given at the command
line step" that is checked also during install after the extension is
built. Again, we can have the build step write some kind of "build has
already been called" file and then have the install step check for
this, but it's hack and fragile.
>
>>
>> Otherwise, this needs another rethink and more coding / looking at
>> distutils for a robust solution... Why, again, can't we just have
>> users who want to build from source have a compiler?
>
> As above, I don't think it's worth to require cython and a compiler
> for development. If I just work on the source, checkout or branch I
> don't have to worry about building anything, and having a "full"
> development environment.
If you don't edit any of the extensions, you don't have to rebuild.
> (and I'm not going to use a Debian virtual machine for it :)
>
> I don't have a compiler for win64 and python 3.2, and I don't see a
> need to require it.
>
This is for the future and so we can just write code. I haven't heard
any other complaints from anyone who wants to build from source.
Basically what I'm hearing is I don't want to follow some given
instructions to install one extra thing that is needed for my
environment (or use the thing I already have installed _that works_),
so it's up to you to spend (more) hours wading through documentation
and code and try to figure out a robust workaround. Also, if you want
to write something in Cython, you have to write and maintain two
versions of the code, even if one is prohibitively slow to use.
Python 3.2 is a real concern, but it's either a solved problem, or
every single package that has built extensions on Windows doesn't work
(which doesn't sound likely to me).
As far as I know, most economics departments (in contrast to, for
example, machine learning groups, I assume) are still dominated by
Windows and I know very few people who would be interested in
installing and running a full development environment.
But of course they don't use python either and don't contribute to
statsmodels, so ...
Users will not care either way as long as there are binaries available.
Josef
Sorry I need to amend this of course.
Fortunately, with the exception of American University. So you and
Alan might know better, what can be assumed from Windows users that
actually use Python.
Josef
Then how do they develop R extensions using compiled languages? I know
as a user I've also seen some R packages that download and compile
source on installation. Or work on Dynare? Or if they're writing Stata
extensions in Mata, I assume they can follow instructions to install a
compiler and use it. Gretl is written entirely in C++ and has a strong
developer core who are mostly using windows AFAICT. The Matlab MFE
toolbox has compiled extensions for the same things we're needing them
for.
I don't think most people in economics departments are candidates for
developers to work on an open source software package python or python
+ built extensions. Based on my experience installing a compiler
(ming32 or SDK) is less work than getting git setup on windows given
instructions for both.
I just don't see this as the throw your hands up and quit hurdle, if
you really want to contribute. But as you said, we're not exactly
attracting a ton of developers to scare off anyway, but being slow
will certainly turn off users, which I don't know that we're really
attracting yet anyway. And speed is often the first point of
comparison in articles comparing stat software. And in any event,
we're going to distribute binary distributions, so you could just drop
in the pre-built extension and skip the build step.
I've used your commit and patched the check to work on windows, you
just won't be able to specify the compiler at the command line. It
works for me with 32-bit python + mingw32 and with windows sdk. I'm
going to push it shortly if you want to look at it.
Skipper
BTW we have a VM set up capable of building Python 2.5 through 3.2 on
win32 and win64, so I can set this up sometime in the next few weeks
for continuous integration with statsmodels and also posting nightly
binaries someplace on pydata.org. Anyone doing serious statsmodels
development would need to set up a development environment on their
system, but this is the same as anyone doing R development for
example.
- Wes
Also, I was thinking of the wrong audience. It's the RAs that have to
do the work.
>
> I don't think most people in economics departments are candidates for
> developers to work on an open source software package python or python
> + built extensions. Based on my experience installing a compiler
> (ming32 or SDK) is less work than getting git setup on windows given
> instructions for both.
>
> I just don't see this as the throw your hands up and quit hurdle, if
> you really want to contribute. But as you said, we're not exactly
> attracting a ton of developers to scare off anyway, but being slow
> will certainly turn off users, which I don't know that we're really
> attracting yet anyway. And speed is often the first point of
> comparison in articles comparing stat software. And in any event,
> we're going to distribute binary distributions, so you could just drop
> in the pre-built extension and skip the build step.
>
> I've used your commit and patched the check to work on windows, you
> just won't be able to specify the compiler at the command line. It
> works for me with 32-bit python + mingw32 and with windows sdk. I'm
> going to push it shortly if you want to look at it.
Ok.
Do you have time to look at the pickle issue? (separate thread or github issues)
I finished remove_data (outside of tsa)
https://github.com/statsmodels/statsmodels/pull/178
and it should be combined with the pickle branch
Josef
>
> Skipper
It builds the extensions fine on windows for 32-bit and 64-bit Python
2.7 and Python 3.2. I'm going to add notes in the docs on building
from source and strongly suggest that you do it correctly, because in
the next release (read: the next time I write a Cython extension) we
are going to require a compiler (and I'm sticking to this).
>
> Do you have time to look at the pickle issue? (separate thread or github issues)
> I finished remove_data (outside of tsa)
> https://github.com/statsmodels/statsmodels/pull/178
> and it should be combined with the pickle branch
>
I will have a look.
> Josef
>
>>
>> Skipper
Are you on current master 0bb3ae1 or later? What's broken? It seems to
work for me here. Python 2.7 on Linux.
Ok, good. Trying to stay on my toes here.
Thanks,
Skipper
I've been unpleasantly surprised by this before too in other packages.
Dependency and version checking is no longer rude in master.
https://github.com/statsmodels/statsmodels/commit/47bd9e0d119047a824d1a54871d5ab43bcca5844
Skipper
Good idea.
This was also the reason we don't have numpy in the install_requires.
I hate it when packages update my packages without asking.
The only problem avoiding the install_requires is that pypi and
package managers cannot automatically parse dependencies.
Josef
>
> Skipper