Interpreting PCA plots

30 views
Skip to first unread message

Paul Evans

unread,
Jan 11, 2018, 4:48:54 PM1/11/18
to computation...@googlegroups.com
All,
I often use the stylo package as a classroom demo to reproduce the classic Mosteller and Wallace attribution of the disputed numbers from The Federalist Papers. The question comes up how to interpret the scale on the axes (-8 to 4 on PC1, -5 to 5 on PC2 in this example). I took a look at the documentation for princomp, which I believe is generating this plot, but wasn't able to find an answer.
Thanks,
Paul Evans



Maciej Eder

unread,
Jan 16, 2018, 7:50:47 AM1/16/18
to computationalstylistics
Hi Paul,

good question. In short, the principal components are linear combinations of the original variables, meaning that the numbers you get on the plot, are summed up contributions of particular dimensions (i.e. words or other variables). These 'contributions', referred to as 'loadings', are computed using some linear algebra. Now, if the original variables have different units, then the final scores in a plot will not really give any meaning (say, if you take human height, heartbeat, blood pressure, and age as your variables). However, if your variables have the same unit, then the principal component will have that same unit as well. This is the case of word frequencies. In your plot, one can see the word frequencies scaled (or z-scored), because you've chosen using the correlation matrix. This means that your units are standard deviations. If a given word (say, "federal") is used 1 st.dev. more frequently than in the entire corpus, it will get 1 on your scale. This is a reasonably good explanation of it:
My favourite introduction to PCA (and many other topics), however, is that one:
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York: Springer.

I hope this helps!
All the best,
Maciej


--
You received this message because you are subscribed to the Google Groups "computationalstylistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to computationalstylistics+unsub...@googlegroups.com.
Visit this group at https://groups.google.com/group/computationalstylistics.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages