7 views

Skip to first unread message

Feb 5, 2021, 7:27:33 AMFeb 5

to factomin...@googlegroups.com

Hi,

When I looked at the results for the variables, I see that

> res.pca$var$cos2

Dim.1 Dim.2 Dim.3 Dim.4

V1 0.979430886 8.081098e-05 2.034420e-02 1.440994e-04

V2 0.986428773 1.104647e-04 1.288176e-02 5.790035e-04

V3 0.996889631 1.024138e-03 8.045294e-04 1.281702e-03

V4 0.001123153 9.988644e-01 1.145636e-05 1.002566e-06

> res.pca$var$contrib

Dim.1 Dim.2 Dim.3 Dim.4

V1 33.04564905 0.008080453 59.76216021 7.18411028

V2 33.28175527 0.011045588 37.84083902 28.86636012

V3 33.63470090 0.102405585 2.36334710 63.89954641

V4 0.03789478 99.878468373 0.03365367 0.04998318

From what I have read before about the definitions of contribution and

squared cosine in PCA analysis on the web, I don't understand what is

reported here.

For example, the cos2(V1) shows that the highest value is on Dim.1,

however the highest contribution is on Dim.3.

I expected that the highest cos2(V1) on Dim.3 imply that it has the

highest contribution on Dim.1 too.

So, the question is in which dimension, V1 is better projected?

Regards,

Mahmood

When I looked at the results for the variables, I see that

> res.pca$var$cos2

Dim.1 Dim.2 Dim.3 Dim.4

V1 0.979430886 8.081098e-05 2.034420e-02 1.440994e-04

V2 0.986428773 1.104647e-04 1.288176e-02 5.790035e-04

V3 0.996889631 1.024138e-03 8.045294e-04 1.281702e-03

V4 0.001123153 9.988644e-01 1.145636e-05 1.002566e-06

> res.pca$var$contrib

Dim.1 Dim.2 Dim.3 Dim.4

V1 33.04564905 0.008080453 59.76216021 7.18411028

V2 33.28175527 0.011045588 37.84083902 28.86636012

V3 33.63470090 0.102405585 2.36334710 63.89954641

V4 0.03789478 99.878468373 0.03365367 0.04998318

From what I have read before about the definitions of contribution and

squared cosine in PCA analysis on the web, I don't understand what is

reported here.

For example, the cos2(V1) shows that the highest value is on Dim.1,

however the highest contribution is on Dim.3.

I expected that the highest cos2(V1) on Dim.3 imply that it has the

highest contribution on Dim.1 too.

So, the question is in which dimension, V1 is better projected?

Regards,

Mahmood

Feb 6, 2021, 12:52:55 PMFeb 6

to FactoMineR users

The squared cosine is a measure of how well the variable "fits" the axis. Thus, with your small data set, the squared cosines for the four axes for each variable will sum to 1 (i.e., all the variation has been explained in four axes). The first three variables fit axis 1 very well, but none of the others, the fourth variable fits axis 2.

The contribution is how much each variable contributes to each axis. Each column of values will sum to 100.

Thus, in your toy example, 97.9% of the variation in the first variable is "explained" by the first axis. Of the variation that the first axis explains, however, about 1/3 of the variation explained by that axis comes from your first three variables, and only a tiny fraction from var 4.

Thus, in your toy example, the first axis "explains" almost all the variation in the first three variables, and the second axis is showing the variation in the fourth variable.

Your can think about is as the cos2 shows how the the variation in a single variable is spread across the axes, whereas the contribution shows where the variation represented by each axis is coming from.

In CA, these two values are sometimes called the absolute and relative contributions. I also prefer the way Greenacre does this where everything is standardised to per mills (i.e., out of 1000) so we don't get these very small numbers.

Best wishes, Kris.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu