[pystatsmodels] launchpad.net blueprints, development plans, bugzuilla

0 views
Skip to first unread message

Vincent Davis

unread,
Apr 17, 2010, 8:41:10 PM4/17/10
to pystat...@googlegroups.com
I am playing with the blueprints some. https://blueprints.launchpad.net/statsmodels
I realize that there have been few direct contributor to the code. but as I am new to StatsModels I am not familiar with what has been started and what is in currently in the planning stage. Basically I am discovering there is a lot I don't know I don't know. It would help if there was a better way (or if I new of the current way)this was being documented. I hate to say it but browsing through the file of each branch is not a great way to go about it :)
I would like to know:
What the goal/is being worked on in each branch?
What are the current primary projects/goals?
What is the goal for the next release?
.....

Everything on the GSoC list should be documented and available to current and potential developers now.

You get the idea. I have no preference in what is used. I know bugs are being tracked with bugzilla but not sure about features and development plans.


The answer to this should be added to the developernotes.rst so that potential developers can quickly see what has been started and what needs to be done.

The hard part is keeping this up to date and current.


josef...@gmail.com

unread,
Apr 18, 2010, 12:39:36 AM4/18/10
to pystat...@googlegroups.com
On Sat, Apr 17, 2010 at 8:41 PM, Vincent Davis <vin...@vincentdavis.net> wrote:
I am playing with the blueprints some. https://blueprints.launchpad.net/statsmodels
I realize that there have been few direct contributor to the code. but as I am new to StatsModels I am not familiar with what has been started and what is in currently in the planning stage. Basically I am discovering there is a lot I don't know I don't know. It would help if there was a better way (or if I new of the current way)this was being documented. I hate to say it but browsing through the file of each branch is not a great way to go about it :)
I would like to know:
What the goal/is being worked on in each branch?
What are the current primary projects/goals?
What is the goal for the next release?
.....

Everything on the GSoC list should be documented and available to current and potential developers now.

You get the idea. I have no preference in what is used. I know bugs are being tracked with bugzilla but not sure about features and development plans.

It's the launchpad bugtracker which is not a bugzilla.
 


The answer to this should be added to the developernotes.rst so that potential developers can quickly see what has been started and what needs to be done.

The hard part is keeping this up to date and current.

There is a lot of space for improvement in this.
From our workflow so far, having more a more formal forum for development plans wasn't really necessary or more work than benefit. (Almost) all the development plans and discussion are in the mailing list, and branches were, most of the time, defined by what a developer is working on and not by topic. Moreover, it is easier to keep the information updated if it is in comments and notes in the source tree, to indicate plans, current status and todos. This also makes it more easily accessible when we are working on the source.

The best overview currently is Skippers proposal for gsoc that he send to the mailing list recently. I don't know if his final version is publicly accessible in http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/jseabold/t127082935700  (I guess not)
We had some private emails with additional information in preparation of gsoc.

Here are some random items, and I have only a vague idea that might work.

We could add a (better) description to topics branches, e.g. Skipper-maxent, is his work on maximum entropy models, which also affects scipy.maxent.

Most branches are regularly moved into trunk, in trunk branch: "bzr qlog" provides the combined history and changesets for browsing.

I think, we should add the big list with being-worked-on and todo list to a easy accessible location, e.g. the expanded version of Skippers gsoc plans.

SMEP (statsmodels enhancement proposals) and blueprints:

* tickets are useful for specific items
* launchpad blueprints are not convenient to update, they could work mainly as proposal and status summary
* Wiki: easier to edit, but more work to maintain
  - example: https://blueprints.launchpad.net/statsmodels/+spec/catdataclass  click: Read the full specification
  - mediawiki on sourceforge not used yet (the only other page: http://sourceforge.net/apps/mediawiki/statsmodels/index.php?title=Blueprints/DataSets )
 - similar: developer plans on scipy wiki resembles development reality one weakly
* rst docs and docstrings for notes are easier to convert to full documentation.
* publicly editable or part of source ?
* mailing list

But I think, it will be useful to try this again especially writing up a summary will be easier to read than a full search of the discussions in the mailing list.


What are the current primary projects/goals?
What is the goal for the next release?

Both are a bit vague, except for gsoc, it was mostly driven what Skipper and I were interested in, needed to have and found time for. There are not many upfront design plans or schedules for development.

primary projects/goals: improve coverage of statistical and econometric methods with python/numpy/scipy
This essentially means pick your favorite model or estimator and write it. Although, we have a good idea what the set is to get basic coverage of econometrics.

goal for the next release: https://launchpad.net/statsmodels/+milestone/0.3.0
essentially get the things out of the sandbox that are close to being ready

We have been working on scikits.statsmodels for roughly a year, and there is still a lot to figure out or to improve until we "grow up" .

I will try to collect a work in progress and near term plans report/list (expanding on Skippers gsoc). For most parts browsing the trunk sandbox will give you a good idea.

Josef
 

Vincent Davis

unread,
Apr 18, 2010, 1:16:04 AM4/18/10
to pystat...@googlegroups.com
@ Josef
Thanks for that long reply. I realize my question are not peculiar to this project and that there are established tools and methods for addressing documentation, planning, bugs......

I was wishing for a "complete list of everything" :)  But actually just wanted to ask how you would prefer to make an incremental improvement in documenting current work in progress. 
As I said looking through the different branches trying to discover what is different is not the best way. I need to learn more about bzr so I can more easily compare branches.

I am ok with using text files in the branches to identify what is being worked on or the plans for that branch.

I feel kinda like saying whatever. It's easy enough to post a question on the mailing list.
So no big deal.


josef...@gmail.com

unread,
Apr 18, 2010, 1:46:56 AM4/18/10
to pystat...@googlegroups.com
On Sun, Apr 18, 2010 at 1:16 AM, Vincent Davis <vin...@vincentdavis.net> wrote:
@ Josef
Thanks for that long reply. I realize my question are not peculiar to this project and that there are established tools and methods for addressing documentation, planning, bugs......

I was wishing for a "complete list of everything" :)  But actually just wanted to ask how you would prefer to make an incremental improvement in documenting current work in progress. 

Ok, that's a much easier question
I usually browse the bzr history, so the fastest way to indicate the status is an informative commit message, and comments in the code to indicate problems, left-over todos or unfinished parts. And delete the comments when they don't apply anymore.
 
As I said looking through the different branches trying to discover what is different is not the best way. I need to learn more about bzr so I can more easily compare branches.

bzr qdiff --new=lp:~jsseabold/statsmodels/statsmodels-wrapper

This goes through launchpad, there is also a way to compare two branches on the local file system.
I usually just use winmerge which is easier for browsing.

There are only two branches (plus maxent) not in trunk, so you don't need to do lots of cross-comparisons.

Josef
 

Mike

unread,
Apr 18, 2010, 5:41:52 PM4/18/10
to pystatsmodels


On 18 Apr, 06:46, josef.p...@gmail.com wrote:
> >    *Vincent Davis
> > 720-301-3003 *
> > vinc...@vincentdavis.net
> >  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
> > On Sat, Apr 17, 2010 at 10:39 PM, <josef.p...@gmail.com> wrote:
> >>http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010...
> >> (I guess not)
> >> We had some private emails with additional information in preparation of
> >> gsoc.
>
> >> Here are some random items, and I have only a vague idea that might work.
>
> >> We could add a (better) description to topics branches, e.g.
> >> Skipper-maxent, is his work on maximum entropy models, which also affects
> >> scipy.maxent.
>
> >> Most branches are regularly moved into trunk, in trunk branch: "bzr qlog"
> >> provides the combined history and changesets for browsing.
>
> >> I think, we should add the big list with being-worked-on and todo list to
> >> a easy accessible location, e.g. the expanded version of Skippers gsoc
> >> plans.
>
> >> SMEP (statsmodels enhancement proposals) and blueprints:
>
> >> * tickets are useful for specific items
> >> * launchpad blueprints are not convenient to update, they could work
> >> mainly as proposal and status summary
> >> * Wiki: easier to edit, but more work to maintain
> >>   - example:
> >>https://blueprints.launchpad.net/statsmodels/+spec/catdataclass click: Read
> >> the full specification<http://sourceforge.net/apps/mediawiki/statsmodels/index.php?title=Blu...>
> >>   - mediawiki on sourceforge not used yet (the only other page:
> >>http://sourceforge.net/apps/mediawiki/statsmodels/index.php?title=Blu...)
Does this mean you would or wouldn't like me to make a blueprint thing
for the smoothing stuff I'm doing? Or if there is anything else I
should do regarding planning it let me know.

My only concerns on this sort of thing are trying to avoid treading on
each others toes by not knowing what people are working on.


>
> >>>   *Vincent Davis
> >>> 720-301-3003 *
> >>> vinc...@vincentdavis.net
> >>>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
> --
> Subscription settings:http://groups.google.com/group/pystatsmodels/subscribe?hl=en

josef...@gmail.com

unread,
Apr 18, 2010, 6:38:50 PM4/18/10
to pystat...@googlegroups.com
I think, the blueprint is up to you, maybe a short description as a
reminder would be useful.

>
> My only concerns on this sort of thing are trying to avoid treading on
> each others toes by not knowing what people are working on.

The main way of communicating plans and designs is still the mailing
list. From the discussion, we know that you are working on kernel
regression and smoothers, and currently the only related code and
plans are in kde that Skipper started but, most likely, won't get to
it again until summer.

This should be enough to avoid toes.
And sometimes competing implementations, e.g. VAR and ARMA, will just
result in more flexible estimators, or in a choice of estimators for
different applications.

I really appreciate docstrings, on module level and for individual
classes/functions/methods,
(It's much easier to follow if they contain references and the
notation for the math of the models if there are competing versions in
the literature.).

Josef
Reply all
Reply to author
Forward
0 new messages