Factor Analysis merged, and playing with pandas formatting

30 views
Skip to first unread message

josef...@gmail.com

unread,
Dec 14, 2017, 10:12:48 PM12/14/17
to pystatsmodels
Thanks to Yichuan Liu we have now Factor Analysis in statsmodels

which is currently focused on getting rotated factor loadings using the principal axis method.

This also includes the code from another python package
and thanks to the author of that package we have a good set of rotations now available.

Kerby has a PR adding maximum likelihood estimation for factor analysis.



Given that I had to read up on the related literature and examples for reviewing the PR, I also took the opportunity to try out pandas formatting or style options.
This should make interpreting rotated loadings easier



Our implementation of Factor Analysis is currently labeled with an experimental status. This is mainly because it is currently targeted to obtaining rotated loadings. 
I expect that we will have to make changes to the structure of the classes and parts of the api when we add additional features. However,I don't know factor analysis, factor models and their uses well enough to guess now what the right structure will be.

Everyone is invited to join in the fun, use it, complain about it, make suggestions and add extensions. (The first and the last are the important ones.)


BTW: This is one of several new features in `statsmodels.multivariate` added by Yichuan Liu with partial review by me, Factor Analysis comes after MANOVA, repeated measures ANOVA, Canonical correlation and a limited features version of MultivariateOLS written as backend of MANOVA.

But there is still a lot of work to do in this multivariate methods neighborhood to catch up with ....

Josef

josef...@gmail.com

unread,
Dec 16, 2017, 12:58:45 AM12/16/17
to pystatsmodels
How well does it work?

Trying it out on wine quality

The factors look kind of good, but there is a Heywood (*) that I haven't met yet. 

(*) the share of the variance of a variable that is explained by the factors is larger than one, but shares should be between 0 and 1.

Based on what I read in the documentation of other stats packages, factor analysis can be a bit fragile, impossible shares, multiple local minima and convergence failures don't seem to be uncommon. Our initial version of factor analysis doesn't come yet with much automatic support for nasty cases.

Josef

josef...@gmail.com

unread,
Dec 16, 2017, 4:02:13 PM12/16/17
to pystatsmodels
The pandas styler object is fun and easy to use.

I updated the "wine" gist with a few more things, some that we don't have yet in statsmodels (and I don't have unit tests for). e.g. at the end of the gist is a matrix version of computing all partial correlation coefficients and all OLS parameters from regressing one variable against all others.
Those use styler highlighting directly to make larger values more visible. 

(And since I'm not a color expert, it just uses yellow from the pandas documentation)
(bonus not yet used is to use different colors for large positive and for large negative values)

Josef
Reply all
Reply to author
Forward
0 new messages