R versus Python: pivoting QR and singular design

30 views
Skip to first unread message

josef...@gmail.com

unread,
Apr 22, 2017, 3:44:06 PM4/22/17
to pystatsmodels
http://stackoverflow.com/questions/43524756/difference-between-linear-regression-coefficients-between-python-and-r/43525373
http://stackoverflow.com/questions/40935624/differences-in-linear-regression-in-r-and-python/40937228

parts of my comments in response to R is much better than Python

Personal note: If you want some variable dropped randomly from your
regression, then use R. If you want a SVD/pinv regularized solution,
then use Python scikit-learn or statsmodels. If you want neither, then
clean your data and choose your variables yourself.

It is very unlikely that statsmodels will ever get variable selection
by pivoting QR, even though it could now be implemented with
scipy.linalg.

I played a bit with the example: If we change d by adding 1e-6, i.e. d
<- c(1, 1, 1, 1, 1, 1, 1 + 1e-6), then d is kept and e is dropped. If
I add 1e-7 instead of 1e-6, then d is still dropped as in the original
case. This looks pretty "random" (fragile) to me for repeated use
cases with slightly different datasets. (using R 3.2.0)


(Some time ago I thought pivoting QR would be a good enhancement, now
that scipy exposes the linalg function.)

Josef
Reply all
Reply to author
Forward
0 new messages