On Tue, Nov 22, 2016 at 10:04 AM, Robin Kramer <kramer...@gmail.com> wrote:After having learnt R, I've been using Python as my main tool for data analysis and I've been loving it so far. However, since I've started using statistical methods that are just a little more complex, I encountered the limits of it: Much of the complex methods that are in R, have not been implemented in statsmodels yet :(
On Tue, Nov 22, 2016 at 4:10 PM, <josef...@gmail.com> wrote:On Tue, Nov 22, 2016 at 10:04 AM, Robin Kramer <kramer...@gmail.com> wrote:After having learnt R, I've been using Python as my main tool for data analysis and I've been loving it so far. However, since I've started using statistical methods that are just a little more complex, I encountered the limits of it: Much of the complex methods that are in R, have not been implemented in statsmodels yet :(to add a few comments to this:As in the discussion "Python versus R" that has mostly shifted to "Python and R", it should be clear that there are no (!?) other packages that cover as wide a range of methods in statistics as R. (usability and consistency across packages is a different issue)For example for (outlier) robust estimation we just have the basic M-estimators, and scikit-learn has some things, while in R many of the "big" guys and several dedicated R developers (package maintainers) have collaborated for 10 to 20 years.So we try to get the basic tools plus pretty good coverage in some areas where some developers are more interested in. Statespace models is currently one of those. GLM and GEE are in pretty good shape, so it's worth thinking about what parts are still missing,for GLM:We have got some of those since those issues where opened, largely because of the work of thequackdaddy.Having good issues and wishlists for those areas are useful so we see what's missing and what should be the priorities. Implementing it might still take time if nobody is interested enough to work on it.In my personal interest I often get stuck in generic methods and reusable tools that are missing to expand in several models. Sandwich covariance is one that has been pretty successful, adding weights to all models would be another great thing that will open up a many new applications for the existing models. We also need better and more flexible covariance matrix estimators in general to be plugged in in several places. Another one that has been on my wishlist for a long time are generic diagnostic measures and hypothesis tests that can be plugged into many models instead of having them just for OLS as it is now.So, I'm always happy to see contributions for specific methods as counterpoint to me getting lost in generic or general solutions.JosefI am missing one method in particular. After performing some GEE's, I've got several models of which I can tell which independent variables have a significant contribution to the prediction of the dependent variable. However, what I do not know and cannot find is which model is better than to other. For Linear Mixed Effect Models there are AIC, BIC and log-likelihood methods, but these cannot be used for GEE (https://onlinecourses.science.psu.edu/stat504/node/180). There is method called Quasi-likelihood under the independence model criterion (QIC; http://www.jstatsoft.org/v57/c01/paper), which is implemented in R (http://stats.stackexchange.com/questions/21771/how-to-perform-model-selection-in-gee-in-r), but again, not in Python.Is it possible to have the QIC method implemented in Python too? The linked paper describes how to calculate the QIC rather briefly, so I hope it won't be too difficult. Hope to hear from you!sounds very popular, There is also a user contributed Stata version. So we need it also.We don't have a quasi-likelihood attached to our families, AFAIR, only the full likelihood. So, that might be a missing piece for a quick implementation from the outside of the models.