GSoC 2017 is coming, is it?

206 views
Skip to first unread message

josef...@gmail.com

unread,
Jan 13, 2017, 10:13:40 PM1/13/17
to pystatsmodels
The Python Software Foundation sent out an email about GSOC 2017.
Soon organizations will have to apply to Google for participation, and the PSF needs ideas pages from its sub-organizations.


Will we have mentors?
Are there specific topics that somebody wants to mentor?
Are there already possible candidates with matching topics?


We get a lot out of GSOC projects, however, it is also a lot of work and we have a backlog in merging almost finished pull requests from past GSOC projects.



Josef
https://martinfowler.com/bliki/TwoHardThings.html
https://www.quora.com/Why-is-naming-things-hard-in-computer-science-and-how-can-it-can-be-made-easier

Max Linke

unread,
Jan 16, 2017, 8:54:10 AM1/16/17
to pystatsmodels
Hi

NumFOCUS will apply again as an umbrella organization. I'm going to be the Administrator this year. If you like to participate with us that would be nice and I can give you more infornation about the ideas pages and where to post them. If not could you please tell me where you decide to apply so that I can link to you on the NumFOCUS GSoC page.

best Max

josef...@gmail.com

unread,
Jan 16, 2017, 10:03:49 AM1/16/17
to pystatsmodels
Thanks Max,

Most likely we will apply again with the PSF, given our inertia after having participated for 8 years in a row, and most of the years we got all the slots that we asked for and could handle.

We will add a 2017 ideas page when we have a rough idea how many mentors will be available and what our high priority projects are.

Josef

Chad Fulton

unread,
Jan 19, 2017, 9:50:03 PM1/19/17
to Statsmodels Mailing List
I will most likely be available a mentor again, if there is an interested student. I need to think about potential projects, though. I'm happy to do more state space or possibly more general TSA. I hope people post ideas they have.

A couple of notes:

1. I will hopefully at some point be able merge in some state space regime switching model code.
2. I also have a lot of structural change code written (Bai-Perron tests, breakpoint least squares, etc.), so I'd be less interested in mentoring a student on that subject until at least I merge it (but as usual, who knows when I'll have time, probably not before GSOC is selected).

A couple of possible state space topics:

1. We now have the Cython smoothers, so the EM algorithm is possible in state space models. We could try to add that to various models.
2. Specific models that are missing: ARFIMA, multivariate unobserved components models, more complex cycles, now-casting type models and (non-state-space) MIDAS models
3. I want to get more postestimation results for state space models (e.g. IRF confidence intervals)
4. If someone really liked unobserved components models, a project could be to make a really comprehensive implementation in some way (e.g. Harvey's 1989 book has many more extensions, hypothesis / specification tests, etc. than we have) I think that a really well-developed model could be pretty nice.
5. If someone really liked VARIMA models, there's a bunch to be done there, as far as identification and estimation.
6. Nonlinear / non-Gaussian state space models.
7. We still don't have the framework for linear restrictions (this is pretty easy, and it's not in there because I've never used it myself or seen it used practically)
8. (This one is almost certainly too advanced and would require more time as a mentor than I have) exact initial Kalman filtering / smoothing

Non-state-space:

- We still don't have forecasting or IRFs, etc. for any of the Markov switching models. Also the EM algorithms are not entirely "correct" (so right now they're private and only used for initial values for the usual fit methods).
- Markov switching VARs are a good topic, and we have the Cython versions of the Hamilton Filter and Smoother, so I think we can do e.g. Krolzig's stuff (I think he develops the EM algorithm for the class he considers).

More-out-there-idea: Bayesian posterior simulation in state space models:

I know Statsmodels' principle is to leave Bayesian simulation to the other packages, but the fact is that there are some nice models that can be estimated by Gibbs Sampling that can't be easily estimated by MLE. For example, Markov switching dynamic factor models are a lot easier in the Bayesian context (in fact, the exact MLE can't be feasibly computed so the Kim smoother only allows approximate MLE). Stochastic volatility, outliers, etc. can be added pretty easily in this context.

- One GSOC project might be to see how to integrate PyMC3 and Statsmodels' state space models. Output would probably not be the MS-DFM + stochastic volatility etc. model, but implementations of existing models using this estimation procedure.

josef...@gmail.com

unread,
Jan 19, 2017, 10:05:49 PM1/19/17
to pystatsmodels
One topic, besides the macroeconometrics, would be bringing more forecasting support in the style of Hyndman, e.g. open unfinished PRs. Those would be popular with our forecasting users.
Although, Hyndman seems to prefer the single error model, where I don't know what the status in the discussion was.
In case you are interested in building something like those on top of statespace models.

Josef

Chad Fulton

unread,
Jan 20, 2017, 12:21:28 AM1/20/17
to Statsmodels Mailing List
That's a good idea. Yes, if someone wanted to go this route, that would be a very nice addition.

I always forget exactly what the implications of this single error approach are vis a vis our state space models. IIRC, most things will work, but maybe there are some things that won't fit into our framework?

w.r.t. the existing TBATS etc. PR, it looked pretty far along to me but seems to have stalled.

josef...@gmail.com

unread,
Jan 20, 2017, 12:48:30 AM1/20/17
to pystatsmodels
I have no idea, I was just listening on your discussion. IIRC, Hyndman argued somewhere that the single error approach provides some additional stability to the estimator. However, I cannot think of a reason why it should make much difference, e.g. compared to the forecast errors, outliers, ... My guess is that if it doesn't fit into the current statespace setup, then it might be better to just provide a multi-error forecasting model.
 

w.r.t. the existing TBATS etc. PR, it looked pretty far along to me but seems to have stalled.

Unfortunately it looks that way.


(More remote: One Hyndman paper that I thought would be interesting if I were working in this area, is forecasting for a partition of the population with aggregation constraint/consolidation. The example was tourism forecasts for regions and for the aggregate, IIRC. I wasn't interested enough at the time to read the small print.)

Josef

Roman Ring

unread,
Feb 6, 2017, 7:24:43 AM2/6/17
to pystatsmodels
Hello!

Seeing as the deadline for mentoring organizations is near and there's no wiki page for GSoC 2017, figured I'd post here.

I would be interested in joining Chad Fulton with a state space based project, though to be honest I don't have much experience with them.
I am generally experienced in programming and dabbled in computational science based projects, including statsmodels https://github.com/statsmodels/statsmodels/pull/3193

I am motivated to close the gap with state space models with a few pointers from Chad.


Cheers,
Roman Ring

josef...@gmail.com

unread,
Feb 6, 2017, 8:30:06 AM2/6/17
to pystatsmodels
On Mon, Feb 6, 2017 at 2:36 AM, Roman Ring <ino...@gmail.com> wrote:
>
> Hello!
>
> Seeing as the deadline for mentoring organizations is near and there's no wiki page for GSoC 2017, figured I'd post here.

Thanks for the reminder, I'll try to add it by tomorrow's deadline.

>
> I would be interested in joining Chad Fulton with a state space based project, though to be honest I don't have much experience with them.
> I am generally experienced in programming and dabbled in computational science based projects, including statsmodels https://github.com/statsmodels/statsmodels/pull/3193

I thought this was already merged. scipy has some more optimizers
coming in that look useful for our optimization problems.

>
> I am motivated to close the gap with state space models with a few pointers from Chad.


Chad posted a link recently on the mailing list "State space models -
paper" to his introduction with provides a very good background to the
statespace implementation and writing extensions to it. That should be
a good start to get an overview.

Josef


>
>
>
> Cheers,
> Roman Ring
>

Chad Fulton

unread,
Feb 6, 2017, 9:32:17 PM2/6/17
to Statsmodels Mailing List
> I would be interested in joining Chad Fulton with a state space based project, though to be honest I don't have much experience with them.

snip
 
>
> I am motivated to close the gap with state space models with a few pointers from Chad.


Great, glad to hear you're interested in state space models - there's certainly lots of things that can be done.

One option if you're not too used to state space models might be adding the exponential smoothing class of models. The best place to take a look would be Hyndman and Khandakar's paper describing the implementation of their R forecast package, https://www.jstatsoft.org/article/view/v027i03, and a book-length treatment in "Forecasting with Exponential Smoothing: The State Space Approach" (Hyndman, Koehler, Ord, and Snyder).

The main issue would be determining what you would be doing, and that might be more difficult than it sounds.

For example, in section 2.2 they describe 30 models, but the typical state space framework (which we use) doesn't support multiplicative components, and so our framework only supports 6 of those models. The next problem is that although we don't have models with the names they use there, our UnobservedComponents class basically covers those models.

So I think going this route would imply a topic of "automatic forecasting" more than "state space". I think better forecasting tools is something we do want, but you'd have to decide if you wanted to lean more towards the exponential smoothing framework (i.e. implementing their alternative state space estimation framework to accommodate the multiplicative-type models) or if you wanted to lean towards the "automatic forecasting" framework (i.e. probably mostly using our existing models and then creating an automatic forecasting interface like Hyndman and Khandakar's over the top of it).

And of course there are a number of other state space topics that I mentioned above that would be good too.

Kerby Shedden

unread,
Feb 7, 2017, 9:07:38 PM2/7/17
to pystatsmodels
If not too late... I am available to mentor this year again.

Interesting topics to me would be:

* Survey methods

* Smoothing penalties for existing regression models (MGCV-like models)

* Support for sparse/compressed design matrices

* Setting up a performance benchmarking suite

* Basic structural equations modeling (SEM)

josef...@gmail.com

unread,
Feb 7, 2017, 10:07:15 PM2/7/17
to pystatsmodels
On Tue, Feb 7, 2017 at 9:07 PM, Kerby Shedden <kshe...@umich.edu> wrote:
> If not too late... I am available to mentor this year again.

It's not too late. Glad to hear it.

I created a gsoc 2017 wiki page and did first round of updating the projects.
We are also now on the PSF ideas/sub-org page
http://python-gsoc.org/#ideas

>
> Interesting topics to me would be:
>
> * Survey methods

I have enough of a basic overview now that this would be a good
feasible project. It would require working also more with existing
models than writing new models.

>
> * Smoothing penalties for existing regression models (MGCV-like models)

This has mostly been covered by GAM and penalized splines and is
waiting in a PR for final review and a bugfix.
But I haven't gone back to penalized methods in two years.

>
> * Support for sparse/compressed design matrices

Also one of my new targets

>
> * Setting up a performance benchmarking suite

I'm not sure this makes a full GSOC project

>
> * Basic structural equations modeling (SEM)

That's one topic I haven't gotten into yet. (except for the
simultaneous equation econometrics counterpart).

For large parts it sounds to me a bit like relying too much on
distributional or normality assumptions.

I left multivariate in the ideas list, but there has been good
progress recently.
I also left in MixedGLM which will need more work. (adaptive gaussian
quadrature would be on my wishlist)

Other topics that I would be interested in are core econometrics
models for applied economics, social sciences and similar fields with
observational data. Especially, we are still missing large parts of
micro-econometrics.
(I was recently reading forum threads were researchers mentioned that
they like to use python for data handling and the surrounding
programming but have to switch to Stata or R for the core estimation
methods.)

Josef

Roman Ring

unread,
Mar 15, 2017, 3:49:13 AM3/15/17
to pystatsmodels
Hello again,

Sorry for the long delay in a follow-up.
If you're still available, I think I'd prefer to lean towards the "automatic forecasting" route since it seems more realistic to get a working MVP out within GSoC timeline.

Can you recommend any additional background material on this topic, aside from the links above?


Cheers,
Roman

Chad Fulton

unread,
Mar 18, 2017, 7:40:53 PM3/18/17
to Statsmodels Mailing List
On Wed, Mar 15, 2017 at 3:49 AM, Roman Ring <ino...@gmail.com> wrote:
Hello again,

Sorry for the long delay in a follow-up.
If you're still available, I think I'd prefer to lean towards the "automatic forecasting" route since it seems more realistic to get a working MVP out within GSoC timeline.

Can you recommend any additional background material on this topic, aside from the links above?



I think we'd want ours to be broadly similar to that. Note that a lot of the the pieces are already available, e.g. the Box-Cox transformations are in Statsmodels.

Chad

Kevin Sheppard

unread,
Mar 19, 2017, 5:57:56 AM3/19/17
to pystatsmodels
Statsmodels needs 3 mentors with PSF.  As of last week, there were only 2 signed up, so if you are thinking of mentoring, it is important to register an interest.

Chad Fulton

unread,
Mar 19, 2017, 8:57:11 AM3/19/17
to Statsmodels Mailing List

On Sun, Mar 19, 2017 at 5:57 AM, Kevin Sheppard <kevin.k....@gmail.com> wrote:
Statsmodels needs 3 mentors with PSF.  As of last week, there were only 2 signed up, so if you are thinking of mentoring, it is important to register an interest.

Thanks for the reminder. I thought I had, but I guess I didn't complete all the pages. I should be signed up now.
Reply all
Reply to author
Forward
0 new messages