Piyush
Hi josef
I have gone through this year's ideas page and i am interested in doing project " Add Maximum Likelihood Models for other distributions" . I wanted to discuss more about the project . Can you please tell me more briefly what all is required from the project and could you please provide some references where i can find material related to the project ??
Also is anyone else taking this project this year ??
Thanks
Piyush
Abstract:
Statsmodels is a Python-based statistics and econometrics package [1,2]. The project aims to provide an alternative to current commerical (MATLAB, Stata,SAS, SPSS, eViews) and open source (R, gretl) statistical packages. Statsmodels package have maximum likelihood models for only a couple of distributions . Maximum likelihood models is mising for count data and parametric survival models . I propose to work in these areas in order to increase the viability of statsmodels as a standalone, open source package for statistical analysis.
For my GSoC project, I will add maximum likelihood models for parametric survival models , Weibull , Gamma , lognormal , Inverse Gaussian distribution and for additional count data models such as Zero inflated poison (ZIP) , Zero inflated Negative Binomial (ZINB) , Poisson distribution , Poisson-inverse Gaussian (PIG) distribution .
Timeline :
Community Bonding Period (April - May 22)
-Bond with the
community.
-Read implementation articles, understand the estimation algorithms already implemented in Statsmodels for similar model types.
Week 1 (May 23- May 29) :
-Get fully acclimated with maximum likelihood models already included in statsmodels .
-Getting started with GenericLikelihoodModel with examples NegativeBinomial and BetaRegression . Investigating about there current status.
- Starting with likelihood function for poisson distribution . R[10]
Week 2 (May 30 –
June 5):
- Continuing and completing with mle for poisson distribution .R[1]
- Adding unit Tests and documentation for this model .
- Adding notebooks explaining everything with examples.
Week 3 and Week 4 (June 6 – June 20 ):
- Add likelihood model for Zero inflated Poisson (ZIP) and Zero inflated Negative Binomial (ZINB). R[7]
-Finding examples and making proper notebook explaning everything properly . R[5]
-Tests and documentation.
Week 5 (June 21 – June 28 ):
-Completing the documentation for the models covered so far and ensuring proper tests for the same . Get acquainted with work already done with the likelihood models .
- Finalize/clean code for midterm evaluation .
Week 6 and Week 7 (June 29 – July 13):
- Adding mle model for Hurdle distribution
-Looking for specification tests, plots, descriptive statistics or presentation of results.R[1]
-Making proper documentation .
Week 8 and Week 9 (July 14 – july 27):
- Add Mle model for lognormal distribution .Searching for examples
- Maintaining proper notebook with explanation.
-Unit tests and Documentation .
Week 10 and Week 11 (July 28 – August 11 ):
- Add Mle model for Weibull distribution .
-Looking for extra plots , descriptive statistics , examples etc .
-Unit test and Documentation
Week 12 (August 12 – 16):
-Finalize/clean code, write tests , improve documenation, etc.
Week 13 (August 17-23 ) :
Code submission
References :
1.Regression Models with Count Data .
http://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm
2. http://www.stata.com/manuals14/rmlexp.pdf
3 . Analyze parameters for zero-inflated Poisson data
http://www.biostat.umn.edu/~john-c/5421/zeroinflatedpoisson.notes
4.“Instructions on how to use the gamlss package in R” http://www.gamlss.org/wp-content/uploads/2013/01/gamlss-manual.pdf
5.Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure
https://support.sas.com/resources/papers/sgf2008/countreg.pdf
6. http://www.inside-r.org/packages/cran/vgam/docs/zipoisson
7. Fit zero-inflated regression models for count data via maximum likelihood
http://artax.karlin.mff.cuni.cz/r-help/library/pscl/html/zeroinfl.html
8. http://blog.stata.com/?s=maximum+likelihood
9. Poisson regression fitted by glm() and maximum likelihood . http://www.r-bloggers.com/poisson-regression-fitted-by-glm-maximum-likelihood-and-mcmc/
10. Estimation of Claim Count Data using Negative Binomial, Generalized Poisson, Zero-Inflated Negative Binomial and Zero-Inflated Generalized Poisson Regression Models . https://www.casact.org/pubs/forum/13spforum/Ismail%20Zamani.pdf
11. Maximum-likelihood Fitting of Univariate Distributions https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html
12. Regression Models for Count Data in R https://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf
13 .Fitting a Model by Maximum Likelihood http://www.r-bloggers.com/fitting-a-model-by-maximum-likelihood/
14. Stata manual for streg http://www.stata.com/manuals14/ststreg.pdf
15. https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_genmod_sect048.htm About Me : I am currently a third year Msc Economics + B.E.(Hons.) Computer Science undergraduate student from Bits Pilani K.K. Birla Goa Campus .
I have been using Python and Git for more than 2 years now and I am comfortable with them. . Last summer I had worked on two projects for Indian Red Cross Society .
The projects involved creation of web based platforms in collaboration with the South Asia Region Delegation (SARD) office for the progress monitoring of the South Asia
Youth Network (SAYN) members and mapping the progress of National Societies (NSMM) . Both the sites were developed using django (python framework) . https://github.com/dhpiyush/ https://github.com/gnarula/
I had started using statsmodels in late December , 2015 and started contributing to open source project in early january 2016 . Few of my PR's are : https://github.com/statsmodels/statsmodels/pull/2750
https://github.com/statsmodels/statsmodels/pull/2790
https://github.com/statsmodels/statsmodels/pull/2746 Contact Info :
NAME : PIYUSH DHINGRA EMAIL :piyus...@gmail.com Phone: 0917767832763 Github : https://github.com/dhpiyush Postal Address : Bits Pilani K.K. Birla Goa Campus NH 17B Bypass Road, Zuarinagar, Sancoale, Goa 403726 Ah7/322
I thank you in advance for your comments. They have very useful and much appreciated thus far.
Piyush
Hi Josef ,
Sorry I was busy with submissions so took a while to draft the proposal . Please have a look and suggest any necessary changes .
Thanks Josef ,