Global VAR Modelling

josef...@gmail.com

unread,

Feb 16, 2016, 7:48:03 PM2/16/16

to pystatsmodels

https://sites.google.com/site/gvarmodelling/gvar

https://scholar.google.ca/scholar?cluster=558503087777751602&hl=en&as_sdt=0,5&sciodt=0,5

found via stats.stackexchange

Sounds like an interesting version of nested VAR.

I've never heard of it and only looked for a few minutes

In case someone is interested, I think this sounds like a nice feature to have.

Our current VAR are limited in the size because it uses the full VAR model, i.e. k_vars * k_vars * nlags parameters plus constants and trends, IIRC.

Josef

piyush dhingra

unread,

Feb 19, 2016, 6:44:16 PM2/19/16

to pystatsmodels

Hi josef

I am 3rd year Msc Economics and B.E. Computer science undergrad from Bits Pilani Goa . For the past few months , i have been trying to solve some of the issues and trying to get myself familiarize with the statsmodels code .For quite some time , I was looking for a project in statsmodels for participating in Gsoc and this models seems to be an interesting one . I have gone through the links which you have provided above and the model seems to excite me . I would like to contribute to the community .

Please let me know what ways can i contribute to get this project .

Piyush

josef...@gmail.com

unread,

Feb 22, 2016, 1:00:43 PM2/22/16

to pystatsmodels

Hi,

Global VAR is a project for a macro econometrician or someone in a field that uses similar tools.

I only looked at it for an hour or so, and this would be too time consuming for me right now to figure out myself.

My main problem is that we have a bit more than one hundred open PRs, 33 of those are mine. So priority is getting "finished", and I'll try to help with very specific projects, enhancements and similar. (Case in point, adding frequency weights to GLM which has been on my high priority wishlist for some time.)

Many PRs have missing unit tests or low test coverage. Adding tests is a major task.

Beside that, code review and making design decision are both time consuming.

Josef

Piyush

Piyush Dhingra

unread,

Mar 11, 2016, 6:55:50 PM3/11/16

to pystatsmodels

Hi josef

I have gone through this year's ideas page and i am interested in doing project " Add Maximum Likelihood Models for other distributions" . I wanted to discuss more about the project . Can you please tell me more briefly what all is required from the project and could you please provide some references where i can find material related to the project ??
Also is anyone else taking this project this year ??

Thanks

Piyush

josef...@gmail.com

unread,

Mar 11, 2016, 7:35:32 PM3/11/16

to pystatsmodels

On Fri, Mar 11, 2016 at 6:55 PM, Piyush Dhingra <piyus...@gmail.com> wrote:

Hi josef

I have gone through this year's ideas page and i am interested in doing project " Add Maximum Likelihood Models for other distributions" . I wanted to discuss more about the project . Can you please tell me more briefly what all is required from the project and could you please provide some references where i can find material related to the project ??

I was thinking about this again as a GSOC project. The advantage is that is conceptually relatively straightforward, even if there will or could be some tricky spots in the implementation.

Also is anyone else taking this project this year ??

I haven't hear of anybody being interested in this.

You need to do a bit of a search to find links for the following, I don't have all of them ready.

The two best examples how to get started is with GenericLikelihoodModel with examples NegativeBinomial (there should be a notebook) and BetaRegression https://github.com/statsmodels/statsmodels/pull/2030 (I don't remember what the current status is.)

The discrete model are our prototypical MLE models, but have a bit of class hierarchy now that might not make it obvious to read.

In terms of which MLE models to implement:

My favorite topics would be additional count data models, hurdle, truncated, ZIP, ZINB, PIG, ... I posted a list while ago on the mailing list when I looked into this.

And secod, parametric survival models, both proportional hazard and accelerated lifetime models, Weibull, lognormal, ... with censoring. Main reference is stata manual for streg http://www.stata.com/manuals14/ststreg.pdf (I recently read that and related material.)

I don't know what the R equivalent is

The documentation for R GAMLSS has a good list of likelihood functions with different parameterizations, but without derivatives, score and hessians. (e.g. I was browsing or reading http://www.gamlss.org/wp-content/uploads/2013/01/gamlss-manual.pdf )

In terms of work

The main implementation part is to get the loglikelihood and if possible first and second derivatives of it (score and hessian).

Half of the work will be writing unit tests against R, Stata, a similar package or published results.

Several of the models are two (or three) part models (several parameters and possibly mixing). The main prototype for two part models is BetaRegression and that would have establish the main pattern before GSOC.

Large parts of the general implementation and results can just be inherited.

The main idea of a project like this is to get reference and package in R, Stata or wherever with a good list of likelihood functions and code a list of them (including unit tests). This would provide the base functionality for many models.

An alternative would be to restrict the number or groups of models and try to add more of the extras for them, specification tests, plots, descriptive statistics or presentation of results. (I'm not sure what would go in there because it depends a bit on the model category.)

Josef

Thanks

Piyush

Piyush Dhingra

unread,

Mar 14, 2016, 12:51:58 AM3/14/16

to pystatsmodels

Thanks for the reply Josef .

I was searching for the distributions you mentioned and i found few links for the code of some of the distributions mentioned above .
https://cran.r-project.org/web/views/Distributions.html
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html

I just found the distributions in R and to be honest i dont know how the maximum likelihood functions are going to be for these distributions . Probably i'll have to read up more about it .
So should i write a draft proposal about the project or should i try next year ??

Piyush Dhingra

unread,

Mar 23, 2016, 10:20:01 PM3/23/16

to pystatsmodels

Hi Josef ,

Sorry I was busy with submissions so took a while to draft the proposal . Please have a look and suggest any necessary changes .

Abstract:

Statsmodels is a Python-based statistics and econometrics package [1,2]. The project aims to provide an alternative to current commerical (MATLAB, Stata,SAS, SPSS, eViews) and open source (R, gretl) statistical packages. Statsmodels package have maximum likelihood models for only a couple of distributions . Maximum likelihood models is mising for count data and parametric survival models . I propose to work in these areas in order to increase the viability of statsmodels as a standalone, open source package for statistical analysis.

For my GSoC project, I will add maximum likelihood models for parametric survival models , Weibull , Gamma , lognormal , Inverse Gaussian distribution and for additional count data models such as Zero inflated poison (ZIP) , Zero inflated Negative Binomial (ZINB) , Poisson distribution , Poisson-inverse Gaussian (PIG) distribution .

Timeline :

Community Bonding Period (April - May 22)

-Bond with the community.

-Read implementation articles, understand the estimation algorithms already implemented in Statsmodels for similar model types.

Week 1 (May 23- May 29) :

-Get fully acclimated with maximum likelihood models already included in statsmodels .

-Getting started with GenericLikelihoodModel with examples NegativeBinomial and BetaRegression . Investigating about there current status.

- Starting with likelihood function for poisson distribution . R[10]

Week 2 (May 30 – June 5):

- Continuing and completing with mle for poisson distribution .R[1]

- Adding unit Tests and documentation for this model .

- Adding notebooks explaining everything with examples.

Week 3 and Week 4 (June 6 – June 20 ):

- Add likelihood model for Zero inflated Poisson (ZIP) and Zero inflated Negative Binomial (ZINB). R[7]

-Finding examples and making proper notebook explaning everything properly . R[5]

-Tests and documentation.

Week 5 (June 21 – June 28 ):

-Completing the documentation for the models covered so far and ensuring proper tests for the same . Get acquainted with work already done with the likelihood models .

- Finalize/clean code for midterm evaluation .

Week 6 and Week 7 (June 29 – July 13):

- Adding mle model for Hurdle distribution

-Looking for specification tests, plots, descriptive statistics or presentation of results.R[1]

-Making proper documentation .

Week 8 and Week 9 (July 14 – july 27):

- Add Mle model for lognormal distribution .Searching for examples

- Maintaining proper notebook with explanation.

-Unit tests and Documentation .

Week 10 and Week 11 (July 28 – August 11 ):

- Add Mle model for Weibull distribution .

-Looking for extra plots , descriptive statistics , examples etc .

-Unit test and Documentation

Week 12 (August 12 – 16):

-Finalize/clean code, write tests , improve documenation, etc.

Week 13 (August 17-23 ) :

Code submission

References :

1.Regression Models with Count Data .

http://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm

2. http://www.stata.com/manuals14/rmlexp.pdf

3 . Analyze parameters for zero-inflated Poisson data

http://www.biostat.umn.edu/~john-c/5421/zeroinflatedpoisson.notes

4.“Instructions on how to use the gamlss package in R”
http://www.gamlss.org/wp-content/uploads/2013/01/gamlss-manual.pdf

5.Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure
https://support.sas.com/resources/papers/sgf2008/countreg.pdf

6. http://www.inside-r.org/packages/cran/vgam/docs/zipoisson

7. Fit zero-inflated regression models for count data via maximum likelihood
http://artax.karlin.mff.cuni.cz/r-help/library/pscl/html/zeroinfl.html

8. http://blog.stata.com/?s=maximum+likelihood

9. Poisson regression fitted by glm() and maximum likelihood .
http://www.r-bloggers.com/poisson-regression-fitted-by-glm-maximum-likelihood-and-mcmc/

10. Estimation of Claim Count Data using Negative Binomial, Generalized Poisson, Zero-Inflated Negative Binomial and Zero-Inflated Generalized Poisson Regression Models .
https://www.casact.org/pubs/forum/13spforum/Ismail%20Zamani.pdf

11. Maximum-likelihood Fitting of Univariate Distributions
https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html

12. Regression Models for Count Data in R
https://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf

13 .Fitting a Model by Maximum Likelihood
http://www.r-bloggers.com/fitting-a-model-by-maximum-likelihood/

14. Stata manual for streg
http://www.stata.com/manuals14/ststreg.pdf

15. https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_genmod_sect048.htm

About Me :
I am currently a third year Msc Economics + B.E.(Hons.) Computer Science undergraduate student from Bits Pilani K.K. Birla Goa Campus .

I have been using Python and Git for more than 2 years now and I am comfortable with them. . Last summer I had worked on two projects for Indian Red Cross Society .
The projects involved creation of web based platforms in collaboration with the South Asia Region Delegation (SARD) office for the progress monitoring of the South Asia
Youth Network (SAYN) members and mapping the progress of National Societies (NSMM) . Both the sites were developed using django (python framework) .
https://github.com/dhpiyush/
https://github.com/gnarula/

I had started using statsmodels in late December , 2015 and started contributing to open source project in early january 2016 . Few of my PR's are :
https://github.com/statsmodels/statsmodels/pull/2750
https://github.com/statsmodels/statsmodels/pull/2790
https://github.com/statsmodels/statsmodels/pull/2746

Contact Info :
NAME : PIYUSH DHINGRA
EMAIL :piyus...@gmail.com
Phone: 0917767832763
Github : https://github.com/dhpiyush
Postal Address : Bits Pilani K.K. Birla Goa Campus
NH 17B Bypass Road, Zuarinagar, Sancoale, Goa 403726
Ah7/322
I thank you in advance for your comments. They have very useful and much appreciated thus far.

Piyush

josef...@gmail.com

unread,

Mar 24, 2016, 9:09:53 AM3/24/16

to pystatsmodels

On Wed, Mar 23, 2016 at 10:20 PM, Piyush Dhingra <piyus...@gmail.com> wrote:

Hi Josef ,

Sorry I was busy with submissions so took a while to draft the proposal . Please have a look and suggest any necessary changes .

Hi Piyush,

The proposal looks good overall, but will need some adjustments in the list of distributions.

We have had Poisson for a long time, both as family in GLM and as specific class in discrete.

What would be interesting in the basic distributions are Generalized Poisson (which I don't remember right now) and Generalized Negative Binomial which has one extra parameter between NB1 and NB2.

I'm not sure about including log-linear model. It can be estimated with transformed dependent variable, but the main problem is in calculating expected values or predicting means because of the transformation.

Weibull could be accompanied by another distribution (or add distributions as time permits) to see better the common pattern. My guess is that the main work will be in getting the setup of the class and it will be relatively easy to add another likelihood function.

Josef

josef...@gmail.com

unread,

Mar 24, 2016, 10:27:09 AM3/24/16

to pystatsmodels

This paper looks like a very good overview for the generalizations of Poisson and Negative Binomial.

It's a good find, thanks.

From what I read before about GP-P and NB-P it would be good to concentrate on those plus the zero inflated versions, and skip the last part with log-normal and Weibull, and keep mainly basic, mainly Poisson, hurdle models additionally (and others hurdle models if they can be quickly implemented using the same structure).

This assumes we can get full versions of GP-P, NB-P and similar. I don't remember if I have seen gradient and Hessian calculations for those before.

There will also be extra work involved if we want to have versions with fixed extra parameter and version with jointly estimated extra parameter.

Josef

Piyush Dhingra

unread,

Mar 24, 2016, 7:26:33 PM3/24/16

to pystatsmodels

Thanks Josef ,

I have incorporated the proposed changes in my proposal i.e i have removed weibull and lognormal distribution from the list and added negative binomial poisson and generalized poisson . I have submitted the final draft proposal incorporating these changes . I appreciate for all the comments . Looking forward to contribute more to the organization .

Thanks

Piyush

josef...@gmail.com

unread,

Jul 28, 2016, 1:00:50 PM7/28/16

to pystatsmodels

It looks like they have a liberal license for "usage" (but don't
specify explicitly which usage)

"""
Copyright 2014 L. Vanessa Smith and Alessandro Galesi.

The GVAR Toolbox 2.0 can be used if and only if you agree with the
following Terms and Conditions of Use.

TERMS AND CONDITIONS OF USE

THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS
WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

IN NO EVENT WILL THE COPYRIGHT HOLDERS OR THEIR EMPLOYERS, OR ANY
OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THIS SOFTWARE, BE
LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL
OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE
THE PROGRAM (INCLUDING, BUT NOT LIMITED TO, LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
"""

matlab with excel data files

I ran into this again while trying to figure out some details for
vector error correction models and reading and following up on Pesaran
articles.

If we can reuse and translate parts of the code, then it would speed
up the next stage in this area to go into structural VARX and VECX
modelling.

Plus, for whoever is interested in international macro econometrics,
their data set might be interesting (but I haven't looked at any GVAR
details or data yet.)

BTW: It would be very helpful to have a VECM expert as review or
contributor. I struggled all summer just to understand what the
different constant and trend specification in a VECM are and mean.

my current summary notes before having code to play with:
https://github.com/statsmodels/statsmodels/issues/3129

Josef

Reply all

Reply to author

Forward