log-transformation linear regression vs. nonlinear regression

271 views
Skip to first unread message

Jose Mn

unread,
Mar 4, 2014, 10:11:59 PM3/4/14
to pystat...@googlegroups.com
Hi,

I have some power-law data sets imported using pandas. Some of them seem to be best modeled using linear regression after log-transformation of the data, but other data sets seem to be best modeled using nonlinear regression. A third group of data sets could be modeled using an average model. In any case, for every data set, I need to calculate the likelihood that the data are generated from a normal distribution with additive error and the likelihood that the data are generated from a lognormal distribution with multiplicative error. Then, I should calculate AICc for each model, compare them and if neither model is favored, get the AICc weights of the two models, calculate the model-weighted power-law parameters and, finally, get their CIs by bootstrapping. 

As a second step, I need to apply weights to the "x-data" coming from uncertainties in the measurements. 

I started to develop a python class that is wrapping scipy.optimize.leastsq (and numpy.linalg.lstsq for the linear regression weighted fit). Then I read in the R vs Python: Practical Data Analysis (Nonlinear Regression) blog entry (written in the past August) about statsmodels (sorry, it was an unknown for me!), but the author pointed out that "Python doesn’t have a mixed-effects models module (there’s some code in the statsmodels module but its not finished).". So my question is: have been any progress since then so that I could take profit of statsmodels power? Of course, I would be glad to use statsmodels as much as possible to solve this issue (my code would be more robust and, in addition, I think it would be an excellent way of effectively learning this excellent module).

Thank you very much in advance,
Jose

josef...@gmail.com

unread,
Mar 4, 2014, 10:47:06 PM3/4/14
to pystatsmodels
partial answer (I'm on spring break)

Linear Model with weights is fully supported by WLS.

non-linear models are still not really supported, mixed effects models
is work in progress, PR by Kerby Shedden.
However mixed models are for panel or repeated measures or longitudinal data.

Maximum likelihood for a non-linear model can relatively easily be
done with GenericLikelihoodModel, which, as the name says, just
provides some generic structure to be subclassed for specific cases.

Non-linear least squares is currently still misplaced in old branches.

AICc is available in a helper function in statsmodels.stats.

I don't understand the connection to power-law.

Josef

Jose Manuel

unread,
Mar 5, 2014, 5:51:41 AM3/5/14
to pystat...@googlegroups.com
Hi Josef and thank you very much for your detailed answer, even more if you're on spring break!

About the connection to power-law: this is my trial to implement in Python the procedure proposed in the paper On the use of log-transformation vs. nonlinear regression for analyzing biological power laws. This paper has become an important reference in the ecology field when modeling power-law data, but Xiao et al. implement the full procedure in R, and I am using a Python/C++ approach in my code due to performance issues. 

Jose
Reply all
Reply to author
Forward
0 new messages