About $var

Skip to first unread message

Mahmood Naderan

Feb 5, 2021, 7:27:33 AMFeb 5
to factomin...@googlegroups.com
When I looked at the results for the variables, I see that

> res.pca$var$cos2
Dim.1 Dim.2 Dim.3 Dim.4
V1 0.979430886 8.081098e-05 2.034420e-02 1.440994e-04
V2 0.986428773 1.104647e-04 1.288176e-02 5.790035e-04
V3 0.996889631 1.024138e-03 8.045294e-04 1.281702e-03
V4 0.001123153 9.988644e-01 1.145636e-05 1.002566e-06
> res.pca$var$contrib
Dim.1 Dim.2 Dim.3 Dim.4
V1 33.04564905 0.008080453 59.76216021 7.18411028
V2 33.28175527 0.011045588 37.84083902 28.86636012
V3 33.63470090 0.102405585 2.36334710 63.89954641
V4 0.03789478 99.878468373 0.03365367 0.04998318

From what I have read before about the definitions of contribution and
squared cosine in PCA analysis on the web, I don't understand what is
reported here.
For example, the cos2(V1) shows that the highest value is on Dim.1,
however the highest contribution is on Dim.3.
I expected that the highest cos2(V1) on Dim.3 imply that it has the
highest contribution on Dim.1 too.
So, the question is in which dimension, V1 is better projected?



Feb 6, 2021, 12:52:55 PMFeb 6
to FactoMineR users
The squared cosine is a measure of how well the variable "fits" the axis.  Thus, with your small data set, the squared cosines for the four axes for each variable will sum to 1 (i.e., all the variation has been explained in four axes).  The first three variables fit axis 1 very well, but none of the others, the fourth variable fits axis 2.

The contribution is how much each variable contributes to each axis.  Each column of values will sum to 100.

Thus, in your toy example, 97.9% of the variation in the first variable is "explained" by the first axis.  Of the variation that the first axis explains, however, about 1/3 of the variation explained by that axis comes from your first three variables, and only a tiny fraction from var 4.

Thus, in your toy example, the first axis "explains" almost all the variation in the first three variables, and the second axis is showing the variation in the fourth variable.

Your can think about is as the cos2 shows how the the variation in a single variable is spread across the axes, whereas the contribution shows where the variation represented by each axis is coming from.

In CA, these two values are sometimes called the absolute and relative contributions.  I also prefer the way Greenacre does this where everything is standardised to per mills (i.e., out of 1000) so we don't get these very small numbers.

Best wishes, Kris.

Reply all
Reply to author
0 new messages