[GSoC] Extensions to State Space Models

305 views
Skip to first unread message

Aleksandar Karakas

unread,
Mar 12, 2016, 7:34:10 AM3/12/16
to pystatsmodels
Hello everyone,

I study financial and actuarial mathematics as well as software engineering and management at the TU Graz and would be very interested in participating at the GSoC. 
In the statistics related exercises (e.g. GLM, Time Series Analysis) we have used R exclusively (this may be valuable when it comes to writing tests where R output might serve as a reference). 
Due to my software engineering lectures I've also programmed in Python where I've used Numpy, Matplotlib, and NetworkX amongst other modules.

The suggested topics of extending the state space models (e.g. adding models like VAR and VARMA or adding non-linear and non-Gaussian filtering) sound interesting. Which ones are of high priority? And which ones do you think could also serve as the basis for a master's thesis? 

I'm looking forward to your answers! :)
Best regards,
Aleks

josef...@gmail.com

unread,
Mar 12, 2016, 8:14:00 AM3/12/16
to pystatsmodels


On Sat, Mar 12, 2016 at 7:14 AM, Aleksandar Karakas <aleks....@gmail.com> wrote:
Hello everyone,

Hi Aleks,
Sorry, I forgot to delete VAR VARMA and similar from the list. Those were Chad's last year topics and are already in master. Current statust is 
And Chad has one more very large PR waiting for review and merge.

Non-gaussian filtering for example in the linear exponential family would be interesting but I don't know anything about it in the context of statespace models.
For VAR type models the main thing that is missing are VECM and multivariate cointegration models. But those are usually not implemented with a statespace framework, AFAIK.

There are already two students that indicated interest in statespace models, so it might be good to also look for some alternative project ideas.


For a master thesis topic it might be better to check with your (poterntial) professors first.
(When I did my Magister, a long time ago, then I picked a topic advertised by the institute for my thesis.)


Josef

Aleksandar Karakas

unread,
Mar 12, 2016, 1:50:23 PM3/12/16
to pystatsmodels
Thank you for your suggestions, Josef! I will talk to my professors on Tuesday and Wednesday and then post further ideas.
Best,
Aleks 

Aleksandar Karakas

unread,
Mar 15, 2016, 1:21:46 PM3/15/16
to pystatsmodels
Hello Josef, hello all,


Am Samstag, 12. März 2016 14:14:00 UTC+1 schrieb josefpktd:


On Sat, Mar 12, 2016 at 7:14 AM, Aleksandar Karakas <aleks....@gmail.com> wrote:
Hello everyone,

Hi Aleks,
 

I study financial and actuarial mathematics as well as software engineering and management at the TU Graz and would be very interested in participating at the GSoC. 
In the statistics related exercises (e.g. GLM, Time Series Analysis) we have used R exclusively (this may be valuable when it comes to writing tests where R output might serve as a reference). 
Due to my software engineering lectures I've also programmed in Python where I've used Numpy, Matplotlib, and NetworkX amongst other modules.

The suggested topics of extending the state space models (e.g. adding models like VAR and VARMA or adding non-linear and non-Gaussian filtering) sound interesting. Which ones are of high priority? And which ones do you think could also serve as the basis for a master's thesis? 

Sorry, I forgot to delete VAR VARMA and similar from the list. Those were Chad's last year topics and are already in master. Current statust is 
And Chad has one more very large PR waiting for review and merge.

Non-gaussian filtering for example in the linear exponential family would be interesting but I don't know anything about it in the context of statespace models.
For VAR type models the main thing that is missing are VECM and multivariate cointegration models. But those are usually not implemented with a statespace framework, AFAIK.

My professor from time series analysis liked the idea of implementing VECM and multivariate cointegration models. For R there are packages addressing this topic (e.g. vars, urca or tsDyn). Is there interest in the implementation of these models?


There are already two students that indicated interest in statespace models, so it might be good to also look for some alternative project ideas.

I have also talked to the professor I had in GLM but he is too busy to supervise another thesis. That's why I hope to be able to contribute to statsmodels.tsa.

Best,
Aleks 

josef...@gmail.com

unread,
Mar 15, 2016, 2:21:47 PM3/15/16
to pystatsmodels
On Tue, Mar 15, 2016 at 1:21 PM, Aleksandar Karakas <aleks....@gmail.com> wrote:
Hello Josef, hello all,

Am Samstag, 12. März 2016 14:14:00 UTC+1 schrieb josefpktd:


On Sat, Mar 12, 2016 at 7:14 AM, Aleksandar Karakas <aleks....@gmail.com> wrote:
Hello everyone,

Hi Aleks,
 

I study financial and actuarial mathematics as well as software engineering and management at the TU Graz and would be very interested in participating at the GSoC. 
In the statistics related exercises (e.g. GLM, Time Series Analysis) we have used R exclusively (this may be valuable when it comes to writing tests where R output might serve as a reference). 
Due to my software engineering lectures I've also programmed in Python where I've used Numpy, Matplotlib, and NetworkX amongst other modules.

The suggested topics of extending the state space models (e.g. adding models like VAR and VARMA or adding non-linear and non-Gaussian filtering) sound interesting. Which ones are of high priority? And which ones do you think could also serve as the basis for a master's thesis? 

Sorry, I forgot to delete VAR VARMA and similar from the list. Those were Chad's last year topics and are already in master. Current statust is 
And Chad has one more very large PR waiting for review and merge.

Non-gaussian filtering for example in the linear exponential family would be interesting but I don't know anything about it in the context of statespace models.
For VAR type models the main thing that is missing are VECM and multivariate cointegration models. But those are usually not implemented with a statespace framework, AFAIK.

My professor from time series analysis liked the idea of implementing VECM and multivariate cointegration models. For R there are packages addressing this topic (e.g. vars, urca or tsDyn). Is there interest in the implementation of these models?

I haven't looked at what's available in R.
Stata also has a good documentation for it, e.g. http://www.stata.com/manuals14/tsvecintro.pdf http://www.stata.com/manuals14/tsvec.pdf and related.
tsvec.pdf has a large methodological section using MLE.
RATS, maybe OX, and some smaller packages might also be good references for time series analysis.

and maybe some others might also have code in R.

I would specifically focus on the VECM models. For example Kevin Sheppard has a larger collection of unit roots in his arch package. But we don't have much cointegration support yet.
We have a buggy/limited cointegration test in statsmodels that is not advertised, and Johansen cointegration test which also does parts of the VECM calculations inside is in my https://github.com/statsmodels/statsmodels/issues/448  https://github.com/statsmodels/statsmodels/pull/453 
The modules in the PR have been copied to several other packages or is free floating around.

VECM is a good topic, but there might be a problem to have it as a GSOC project because of our limited amount of mentoring time.

In any case, I would be helping out as much as needed, so we can get this into statsmodels.

Josef

josef...@gmail.com

unread,
Mar 15, 2016, 9:38:04 PM3/15/16
to pystatsmodels
I browsed some documentation to get a rough idea (again) what is done in this area.

R packages vars and tsDyn sound good. I didn't see many details about the implemented algorithms in the documentation and the vars vignette as far as I browsed those. Maybe Pfaff has more description in books or articles.

From the Stata manual it looks like the first part without extra restrictions on the parameters can be explicitly calculated. The linear algebra looks a bit extensive, but is relatively straightforward to translate into code, and parts will be similar to what's used for johansen's cointegration test.
The second part with structural restrictions requires a nonlinear estimator, and seems to be what the vars package is focusing on.

I guess several parts for the postestimation results and interpretation, like impulse response functions and similar, can be reused from the current VAR implementation. Both Stata and matlab have functions to convert a model from the VECM to the VAR representation. But I haven't looked at any of those details.

The main book for the current implementation of VAR in statsmodels was Lütkepohl. (I haven't read it except for a few sections.)


Josef

Aleksandar Karakas

unread,
Mar 16, 2016, 11:50:58 AM3/16/16
to pystatsmodels
Thank you, Josef, for the linked resources and your comments on them! They look very promising. Also, the book you suggested below (Lütkepohl) seems to discuss VECM very thoroughly (according to its ToC). I will go to the library tomorrow to have a more detailed look at it.
 

VECM is a good topic, but there might be a problem to have it as a GSOC project because of our limited amount of mentoring time.

In any case, I would be helping out as much as needed, so we can get this into statsmodels.

I really appreciate your offer and the effort you put into this project. After my studies (when funding isn't crucial anymore) I can imagine to contribute to the community as I plan to use Python (which I favor over R) in my day-to-day work. But for now I will have to choose a master's thesis which is funded. So the GSoC looks like the perfect starting point to me to become familiar with statsmodels.

Best,
Aleks

josef...@gmail.com

unread,
Mar 16, 2016, 1:02:11 PM3/16/16
to pystatsmodels
I browsed part of Pfaff's book Analysis of Integrated and Cointegrated Time Series Analysis in R chapters 4 and 8
There are many examples how to use R, but as far as I saw the algebra and explanations are more focused on hypothesis testing and I haven't seen much on estimation itself.

Based on some comments it looks like large parts of johansen procedure and estimation are implemented in urca for ca.jo.

 
 

VECM is a good topic, but there might be a problem to have it as a GSOC project because of our limited amount of mentoring time.

In any case, I would be helping out as much as needed, so we can get this into statsmodels.

I really appreciate your offer and the effort you put into this project. After my studies (when funding isn't crucial anymore) I can imagine to contribute to the community as I plan to use Python (which I favor over R) in my day-to-day work. But for now I will have to choose a master's thesis which is funded. So the GSoC looks like the perfect starting point to me to become familiar with statsmodels.

We will see over the next weeks how the GSOC "market" is going this year.

Try to write a rough outline for a proposal in the next few days.

Josef

Aleksandar Karakas

unread,
Mar 16, 2016, 5:47:44 PM3/16/16
to pystatsmodels
I will try it! :)
Aleks
Reply all
Reply to author
Forward
0 new messages