Schedule for statsmodels 0.5?

70 views
Skip to first unread message

Thomas Haslwanter

unread,
Mar 23, 2013, 9:34:55 AM3/23/13
to pystat...@googlegroups.com
Statsmodels has been at version 0.4 for ages, and MANY very helpful additions have been introduced since then. Is there any plan to bring it all together, and get a version 0.5 out?

Skipper Seabold

unread,
Mar 23, 2013, 11:40:13 AM3/23/13
to pystat...@googlegroups.com
On Sat, Mar 23, 2013 at 9:34 AM, Thomas Haslwanter <thomas.h...@gmail.com> wrote:
Statsmodels has been at version 0.4 for ages, and MANY very helpful additions have been introduced since then. Is there any plan to bring it all together, and get a version 0.5 out?

I was hoping to be able to find some time at the beginning of the month to close a few remaining warts with the 0.5 milestone tag (patches welcome), but that didn't happen. Hopefully, ASAP, though I'm pretty well buried at the moment. From my end I'll try to close up the things I want to do this week. No new features, but a couple of bothersome warts/bugs. Josef, thoughts?

We could use a dedicated release manager to pitch in with bug fixes and help keep us on schedule. Volunteers welcome.

Skipper

josef...@gmail.com

unread,
Mar 23, 2013, 12:10:21 PM3/23/13
to pystat...@googlegroups.com
On Sat, Mar 23, 2013 at 11:40 AM, Skipper Seabold <jsse...@gmail.com> wrote:
> On Sat, Mar 23, 2013 at 9:34 AM, Thomas Haslwanter
> <thomas.h...@gmail.com> wrote:
>>
>> Statsmodels has been at version 0.4 for ages, and MANY very helpful
>> additions have been introduced since then. Is there any plan to bring it all
>> together, and get a version 0.5 out?
>
>
> I was hoping to be able to find some time at the beginning of the month to
> close a few remaining warts with the 0.5 milestone tag (patches welcome),
> but that didn't happen. Hopefully, ASAP, though I'm pretty well buried at
> the moment. From my end I'll try to close up the things I want to do this
> week. No new features, but a couple of bothersome warts/bugs. Josef,
> thoughts?

I already had several release deadlines (definitely before pycon) for myself,
but life, new features and other work get's in the way.

I closed several issues in the last weeks (but opened new ones at
about the same speed),
and I haven't checked recently what remains that should be in 0.5.

version compatibility is pretty ok (after dropping python 2.5 and numpy 1.4)
and we keep master on green in TravisCI testing for python 2.7 and 3.2
and daily pythonxy testing has one failure.

I thought we could pretty much release a master anytime if there are
no serious known bugs open.
From our pattern, we are still (!) on an annual release schedule.


Skipper,
to add to your todos (as if you don't have enough todo):

https://github.com/statsmodels/statsmodels/issues/674
has been failing for quite some time in the pythonxy testing but not
on Travis nor my computer
(I assume a problem in the test)

and the online docs are not updating correctly.

Josef
"release early, release often"

Vincent Arel

unread,
Mar 23, 2013, 2:24:18 PM3/23/13
to pystat...@googlegroups.com
On Sat, Mar 23, 2013 at 11:40 AM, Skipper Seabold <jsse...@gmail.com> wrote:
Can you give me a better sense of what you have in mind here. I
clicked on the 0.5 Milestone tag on the issue tracker but got a list
of 84 issues back. Surely we can't fix all of those before release.

At a more general level, do you feel happy with the current release
cycle. I mean, independently of the amount of code written. I'm asking
because I saw a couple people mention (on blog posts and
stackoverflow) that the pace of development of SM is really slow.
That's partially a function of manpower, but almost certainly also an
impression that people get because of release structure. For example,
it makes sense to have a 0.5 with the formula framework, but a quick
0.5.1 release with NBin and QuantReg and MosaicPlot would both make
features available quickly and convey a sense that this is an active
project.

Of course, I really don't know what kind of work is involved in doing
releases...

>
> Skipper

Skipper Seabold

unread,
Mar 23, 2013, 3:03:07 PM3/23/13
to pystat...@googlegroups.com
On Sat, Mar 23, 2013 at 2:24 PM, Vincent Arel <vincen...@gmail.com> wrote:
On Sat, Mar 23, 2013 at 11:40 AM, Skipper Seabold <jsse...@gmail.com> wrote:
> On Sat, Mar 23, 2013 at 9:34 AM, Thomas Haslwanter
> <thomas.h...@gmail.com> wrote:
>>
>> Statsmodels has been at version 0.4 for ages, and MANY very helpful
>> additions have been introduced since then. Is there any plan to bring it all
>> together, and get a version 0.5 out?
>
>
> I was hoping to be able to find some time at the beginning of the month to
> close a few remaining warts with the 0.5 milestone tag (patches welcome),
> but that didn't happen. Hopefully, ASAP, though I'm pretty well buried at
> the moment. From my end I'll try to close up the things I want to do this
> week. No new features, but a couple of bothersome warts/bugs. Josef,
> thoughts?
>
> We could use a dedicated release manager to pitch in with bug fixes and help
> keep us on schedule. Volunteers welcome.

Can you give me a better sense of what you have in mind here. I
clicked on the 0.5 Milestone tag on the issue tracker but got a list
of 84 issues back. Surely we can't fix all of those before release.


Of course not. Many of these should not have a 0.5 milestone set. I went through and closed a couple and changed the milestones to Someday, but not for all of them.

With a quick look, the ones with type-enh should not be for 0.5 anymore.

Many of these are also very simple fixes. This is where having someone batting cleanup / having release manager helps. For instance,


This is easy, but it's been sitting there for a month. Often this burden falls on Josef around release time, which isn't fair, though I'd also argue that it's just as easy to make commit that fixes this as it is to make an issue report.

The ones I know that I want to address, specifically for this release, but will take some time (more than 10 minutes) 


I also still want to get rid of the stupid conditional cython checks and merge this after fixing the types issue. I've been wanting to do this for 9+ months.


Can we finally do this and just require Cython to build from source and a compiler to build from source releases? I'm tired of punting on this issue and I don't see what keeps us from doing this.

Skipper

 
At a more general level, do you feel happy with the current release
cycle.

No. I would like to switch to a time-based release structure, though life is still often too busy for me to commit to this. It's clear that once a year does not cut it. We need to get over our slow and perfect is better than fast and sloppy. Sometimes fast and sloppy gets the job done. Since we've merged in new features that have been on the TODO list, we've gotten several corner case bug reports, which we've been able to fix. This is a good thing. We're still nowhere near 1.0.
 
I mean, independently of the amount of code written. I'm asking
because I saw a couple people mention (on blog posts and
stackoverflow) that the pace of development of SM is really slow.
That's partially a function of manpower, but almost certainly also an
impression that people get because of release structure. For example,
it makes sense to have a 0.5 with the formula framework, but a quick
0.5.1 release with NBin and QuantReg and MosaicPlot would both make
features available quickly and convey a sense that this is an active
project.

Of course, I really don't know what kind of work is involved in doing
releases...

It's not really all that much work now that I have the windows build scripts, if we keep up with CHANGES.txt (but we don't really). It's the tidying up docs (fixing warnings making sure all new stuff is added) and last-minute bugs / wishlist stuff that takes a while.

Skipper

Skipper Seabold

unread,
Mar 23, 2013, 3:32:47 PM3/23/13
to pystat...@googlegroups.com
It's really hard to tell without being able to replicate. Regardess, I'd think it's innocuous since there's no error, just a dtype mismatch, but I'd need to look to be sure. Is there someone on 32-bit that can replicate this?
 

and the online docs are not updating correctly.

Low priority for me. I have again two presentations on my work on Monday (and not all results) and numpy/scipy currently broken. Haven't forgotten.

josef...@gmail.com

unread,
Mar 23, 2013, 3:59:05 PM3/23/13
to pystat...@googlegroups.com
It's not easy if all my checkout directories have other branches, and
I don't want to get even more distracted.

One problem why some things pile up before a release is that I'm the
only PR reviewer most of the time (with some help of Ralph).

So some merges are left with todos or expost review, and open issues.

>
> The ones I know that I want to address, specifically for this release, but
> will take some time (more than 10 minutes)
>
> https://github.com/statsmodels/statsmodels/issues/626
> https://github.com/statsmodels/statsmodels/issues/625
> https://github.com/statsmodels/statsmodels/issues/549
> https://github.com/statsmodels/statsmodels/issues/515
>
> I also still want to get rid of the stupid conditional cython checks and
> merge this after fixing the types issue. I've been wanting to do this for 9+
> months.
>
> https://github.com/statsmodels/statsmodels/pull/266
> https://github.com/statsmodels/statsmodels/issues/204
>
> Can we finally do this and just require Cython to build from source and a
> compiler to build from source releases? I'm tired of punting on this issue
> and I don't see what keeps us from doing this.

No, I'm still against it.
statsmodels is still 99% python, and I rather keep it runable as pure
python package.

I'm still on Windows, and I don't want to go through a build, compile
cycle to develop and test.
And I haven't seen many problems with it, that I couldn't figure out fast IIRC.

(aside: I looked recently into adding cython based lowess in the same
way as the other cython extensions.)

>
> Skipper
>
>
>>
>> At a more general level, do you feel happy with the current release
>> cycle.
>
>
> No. I would like to switch to a time-based release structure, though life is
> still often too busy for me to commit to this. It's clear that once a year
> does not cut it. We need to get over our slow and perfect is better than
> fast and sloppy. Sometimes fast and sloppy gets the job done. Since we've
> merged in new features that have been on the TODO list, we've gotten several
> corner case bug reports, which we've been able to fix. This is a good thing.
> We're still nowhere near 1.0.

I still don't like fast and sloppy.
As user I don't like a library where you have to gamble whether the
numbers are correct (with more than a small probability).
As developer, it often makes life more difficult down the road.

(I didn't change my opinion about stats libraries since my early scipy days.)

I'd rather have people that complain about the slow pace, than users
complain about buggy results, and a library that is not trustworthy.
I haven't read those comments in a some time.

It would be easier if we had a fast response and maintenance team,
that consists of more than ... developers.

We got some helpful issue reports and pull requests by users, but
there still remains a lot of maintenance work to do.

Josef

Skipper Seabold

unread,
Mar 23, 2013, 4:10:10 PM3/23/13
to pystat...@googlegroups.com
I've responded to these objections numerous times, and I've shown you how to build on windows and I've written instructions no how to do this. You do NOT have to rebuild every time you change something. You only have to rebuild if change things that need compiling. On the other hand, right now I DO have to rebuild everytime because the way we have the compiler checks is broken. If you have a compiler, the Cython extensions get built twice on build and install. Let's get passed this. Performance is going to become more critical as we become better and actually competitive with alternatives. None of my students now want to learn Python. They all ask me about julia...
 

(aside: I looked recently into adding cython based lowess in the same
way as the other cython extensions.)

>
> Skipper
>
>
>>
>> At a more general level, do you feel happy with the current release
>> cycle.
>
>
> No. I would like to switch to a time-based release structure, though life is
> still often too busy for me to commit to this. It's clear that once a year
> does not cut it. We need to get over our slow and perfect is better than
> fast and sloppy. Sometimes fast and sloppy gets the job done. Since we've
> merged in new features that have been on the TODO list, we've gotten several
> corner case bug reports, which we've been able to fix. This is a good thing.
> We're still nowhere near 1.0.

I still don't like fast and sloppy.
As user I don't like a library where you have to gamble whether the
numbers are correct (with more than a small probability).
As developer, it often makes life more difficult down the road.

I'm not talking about numbers or not testing. I'm talking about adding every bell and whistle or making sure that the we have written everything in the most general way possible to account for every possible extension down the road.
 

(I didn't change my opinion about stats libraries since my early scipy days.)

I'd rather have people that complain about the slow pace, than users
complain about buggy results, and a library that is not trustworthy.
I haven't read those comments in a some time.

It would be easier if we had a fast response and maintenance team,
that consists of more than ... developers.

Chicken and egg. We need a better release cycle with better visibility to attract more users and developers.

josef...@gmail.com

unread,
Mar 23, 2013, 5:13:12 PM3/23/13
to pystat...@googlegroups.com
And I think I replied as often why I don't like it.
If you don't install for development then you don't need to rebuild.
I don't see that the cython extension gets build twice (unless you
tell setuptools/distutils to build it twice.)

All the previous issues that we had, turned out to come from other
packages and were unrelated to our way contional building.

About performance: we still have some slack, and I don't think going
through pandas and formulas in loops is very "performant" but nobody
complains.

And I don't argue about the fashionable programming language of the day.
(my last and only comment about julia was: no classes and no namespaces
and I just barely start to figure out the dispatch system of R)
We don't need nor get every bell and whistle, nor the most general,
but a bit of planning ahead pays off later.
It's a question of how far we take this. (I guess I am a bit too far.)

(example power, not so big in terms of actual code, but it takes
almost no code to add new statistical tests, and improving the
rootfinding was confined to one method, instead of writing 10 or 20
specialized functions, and changing all of them)

We could also merge some other packages, but they might not fit in
consistently and we cannot take much advantage of them.

another example is quantile_regression: Vincent and I didn't know,
when he started, that linear_model will turn out to be the wrong
"framework". It will still be merged with some work-arounds for the
mismatches. But planning ahead, we should get the right super classes,
and the next similar model will be able to benefit from the common
structure.

If possible, I'm still in for the long haul instead of some quick shots.

Josef

Vincent Arel

unread,
Mar 23, 2013, 6:01:17 PM3/23/13
to pystat...@googlegroups.com
I've responded to these objections numerous times, and I've shown you how to build on windows and I've written instructions no how to do this. You do NOT have to rebuild every time you change something. You only have to rebuild if change things that need compiling. On the other hand, right now I DO have to rebuild everytime because the way we have the compiler checks is broken. If you have a compiler, the Cython extensions get built twice on build and install. Let's get passed this. Performance is going to become more critical as we become better and actually competitive with alternatives. None of my students now want to learn Python. They all ask me about julia...

 

Can we brainstorm on whats been added since last release. Id like to update CHANGES.txt without having to read all commit messages for the past year. 

Thinking ahead to .5.1, I will try to keep a local branch with changes as they appear in my github notifications. I will build the docs tonight to see what that looks like and will hist the results online for you to see.


--
Vincent Arel-Bundock

Skipper Seabold

unread,
Mar 23, 2013, 6:07:52 PM3/23/13
to pystat...@googlegroups.com
On Sat, Mar 23, 2013 at 6:01 PM, Vincent Arel <vincen...@gmail.com> wrote:
> Can we brainstorm on whats been added since last release. Id like to update CHANGES.txt without having to read all commit messages for the past year.

If you'd like, have a look at IPython helper scripts. Pandas should
have some too. AFAIK, they will go through and find issues that have
been closed, etc., so you don't have to. This has long been on my TODO
list.

https://github.com/ipython/ipython/tree/master/tools
http://ipython.org/ipython-doc/rel-0.13/whatsnew/github-stats-0.13.html#issues-list-013

>
> Thinking ahead to .5.1, I will try to keep a local branch with changes as they appear in my github notifications. I will build the docs tonight to see what that looks like and will hist the results online for you to see.

The docs build nightly (it's just broken at the moment until I look at
it, hopefully tomorrow), so you don't need to do push them online if
you don't want to. If you feel like fixing any warnings, then this
would be welcome. It can be time consuming though. Ping the list if
there's any weird sphinx stuff. We've likely seen it before.

Thanks,

Skipper

Skipper Seabold

unread,
Mar 23, 2013, 6:26:33 PM3/23/13
to pystat...@googlegroups.com
Originally your objection was I don't want this to become a linux-only club, which was valid. So I spent a lot of time searching how to do this on windows, figured it out, wrote instructions, and build scripts, so it would become a non-issue and really easy for you and other windows users. I also wrote the (admittedly fragile) architecture to support both python and cython code. I made this clear, and you agreed, that this was conditional on it being a _transitional_ fix that we would drop in the future. It's now the future (almost a year later). Now it seems your objection is "I don't want to." I don't think this is valid, and I don't think it's unreasonable of me to think so.

Maybe we're still 99% python, because our only Cython contribution has sat in a PR for 9 months or because we have the condition that any Cython code must be accompanied by Python code and you need to make sure both work.
 
If you don't install for development then you don't need to rebuild.  

I don't know what this has to do with anything. While I'm working, I install often. I don't use setup.py develop very much, and I don't know if it has the same problem as below - but I'm not interested in more workarounds.
 
I don't see that the cython extension gets build twice (unless you
tell setuptools/distutils to build it twice.)

Running

python setup.py install

does the compiler checks and rebuilds all of the extensions, even if they exist. It's annoying for me, but I've lived with it because we long ago agreed to get rid of this stuff, so I've been patient.
 

All the previous issues that we had, turned out to come from other
packages and were unrelated to our way contional building.

This is not true. Though this thread was hijacked, the original issue still stands.

 

About performance: we still have some slack, and I don't think going
through pandas and formulas in loops is very "performant" but nobody
complains.

And I don't argue about the fashionable programming language of the day.
(my last and only comment about julia was: no classes and no namespaces
and I just barely start to figure out the dispatch system of R)

I'm not arguing about anything. I'm just saying that developer-types want to work on technical and performant code. E.g., pandas and scikit-learn. I would love to have someone come along, run vbench over all of our stuff and then start to work on hotspots. Right now, this is never going to happen.

Skipper Seabold

unread,
Mar 26, 2013, 10:48:15 AM3/26/13
to pystat...@googlegroups.com
On Sat, Mar 23, 2013 at 3:32 PM, Skipper Seabold <jsse...@gmail.com> wrote:
On Sat, Mar 23, 2013 at 12:10 PM, <josef...@gmail.com> wrote:

<snip> 
and the online docs are not updating correctly.

Low priority for me. I have again two presentations on my work on Monday (and not all results) and numpy/scipy currently broken. Haven't forgotten.
 

Fixed. Wasn't thinking in how I was using virtualenv in a subprocess. Am thinking now.


The power stuff probably needs a new section header?

Skipper

josef...@gmail.com

unread,
Mar 26, 2013, 11:09:30 AM3/26/13
to pystat...@googlegroups.com
On Tue, Mar 26, 2013 at 10:48 AM, Skipper Seabold <jsse...@gmail.com> wrote:
> On Sat, Mar 23, 2013 at 3:32 PM, Skipper Seabold <jsse...@gmail.com>
> wrote:
>>
>> On Sat, Mar 23, 2013 at 12:10 PM, <josef...@gmail.com> wrote:
>>>
>>>
> <snip>
>>>
>>> and the online docs are not updating correctly.
>>
>>
>> Low priority for me. I have again two presentations on my work on Monday
>> (and not all results) and numpy/scipy currently broken. Haven't forgotten.
>>
>
>
> Fixed. Wasn't thinking in how I was using virtualenv in a subprocess. Am
> thinking now.

Thanks, It's nice to see the updated documentation for current master.
Initially I wanted to keep it together with the basic statistics and
(parametric) tests, since I started to write the power functions for
those tests. When the power part gets larger, it will need a new
section.

I wrote a note somewhere about rethinking the statsmodels.stats module
structure, but that's postponed for now.

Josef

>
> Skipper

Skipper Seabold

unread,
Mar 26, 2013, 11:20:03 AM3/26/13
to pystat...@googlegroups.com
On Tue, Mar 26, 2013 at 11:09 AM, <josef...@gmail.com> wrote:
On Tue, Mar 26, 2013 at 10:48 AM, Skipper Seabold <jsse...@gmail.com> wrote:
> On Sat, Mar 23, 2013 at 3:32 PM, Skipper Seabold <jsse...@gmail.com>
> wrote:
>>
>> On Sat, Mar 23, 2013 at 12:10 PM, <josef...@gmail.com> wrote:
>>>
>>>
> <snip>
>>>
>>> and the online docs are not updating correctly.
>>
>>
>> Low priority for me. I have again two presentations on my work on Monday
>> (and not all results) and numpy/scipy currently broken. Haven't forgotten.
>>
>
>
> Fixed. Wasn't thinking in how I was using virtualenv in a subprocess. Am
> thinking now.

Thanks, It's nice to see the updated documentation for current master.
Initially I wanted to keep it together with the basic statistics and
(parametric) tests, since I started to write the power functions for
those tests. When the power part gets larger, it will need a new
section.

Ok just wanted to make sure it wasn't an oversight. The way the section is written (to me) it looks like those are intended to be both basic statistics with frequency weights and t-tests with frequency weights.

Skipper Seabold

unread,
Mar 26, 2013, 11:21:16 AM3/26/13
to pystat...@googlegroups.com
...and maybe that is the intention, if so, carry on.

josef...@gmail.com

unread,
Mar 26, 2013, 11:32:51 AM3/26/13
to pystat...@googlegroups.com
That section header doesn't match anymore, I forgot to check that.
I don't want the header to get too long. I will look at it again.

Josef
Reply all
Reply to author
Forward
0 new messages