PCA biplot in ggplot

380 views
Skip to first unread message

Brandon Hurr

unread,
Nov 2, 2010, 11:33:57 AM11/2/10
to ggplot2
I know this is probably going to be a hack because ggplot2 can't do multiple scales on the same axis but I'm curious about how others have got on. I'm having issues with the scaling. You get loadings and scores from princomp() and I've managed to get those and use geoms (point, segment, and text) to make a nice biplot, but I scaled the min/max of loadings to the min/max of the scores using rescale because you can't have two scales. I know this is wrong because the loadings in the ggplot2 output don't look the same as the biplot output, but I'm afraid I don't understand enough to know the proper way to scale them. 

Has anyone done biplots in ggplot before for principal component analysis? 

See attached code. Compare the biplot to the ggplot output to see what I'm on about. 

Any help is much appreciated. 

Brandon

PCAhelp.txt

Brandon Hurr

unread,
Nov 2, 2010, 11:56:21 AM11/2/10
to ggplot2
It appears that in my haste I left out a line of code. Thanks to Luciano for pointing it out and shame on me for not clearing my workspace. 

ucdpcaplot<-data.frame(UCDVol[,1:4], UCDPCA$scores[,1:3], Ethylene=UCDVol[,colnum])


B
PCAhelp.txt

Sietse Brouwer

unread,
Nov 2, 2010, 12:00:03 PM11/2/10
to Brandon Hurr, ggp...@googlegroups.com
Hi Brandon,

I tried source()ing your textfile, but got the following error:
Error in rescale(ucdloadings[[1]], to = c(min(ucdpcaplot[[5]]),
max(ucdpcaplot[[5]]))) :
object 'ucdpcaplot' not found

When I looked at the source I thought, at first, that ucdpcaplot
should be created by assigning
ucdpcaplot <- biplot(UCDPCA)
but that returns NULL. Could you upload a fixed version of the source?

Kind regards,

Sietse Brouwer

Sietse Brouwer

unread,
Nov 2, 2010, 12:00:49 PM11/2/10
to Brandon Hurr, ggp...@googlegroups.com
Argh, crossed messages. Apologies, all.
--Sietse

Luciano Selzer

unread,
Nov 2, 2010, 1:32:00 PM11/2/10
to Brandon Hurr, ggplot2
I'm sorry, but the two graphs look pretty similar to me. I don't get what you are trying to do. Do you want to use secondary axes?


Luciano


2010/11/2 Brandon Hurr <bhi...@gmail.com>

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Brandon Hurr

unread,
Nov 2, 2010, 1:40:25 PM11/2/10
to Luciano Selzer, ggplot2
Maybe it's just me but I think that by scaling the way I did I over exaggerated the positive side of PC1. The arrows seem to be pointing more towards the right in the ggplot than they do in the biplot. I could be wrong though. 

Does anyone else think it's alright? I'll go with it if so. 

Brandon

Luciano Selzer

unread,
Nov 2, 2010, 1:47:15 PM11/2/10
to Brandon Hurr, ggplot2
Well that much is true. Howerer the default biplot it's really messy and I can't make out anything near the center. Also in the default biplot, arrows are multiplied by 0.8 so they are shorter

Guillaume T.R.

unread,
Nov 3, 2010, 11:30:39 AM11/3/10
to ggplot2
I'll agree with that: the biplot() is really messy. But, what do you
really want to know with your plot. I guess it's the angles between
your objects (variety) and your ethyl compounds. So, in my view, the
two graphs give the same answer, while the ggplot() one looks less
dense.

I'll keep your plot in mind since I tried a few weeks ago to plot a
RDA object and your plot could help out to polish my plot:
http://groups.google.com/group/ggplot2/browse_thread/thread/9db122bd461ce0e3

guillaume

On 2 nov, 13:47, Luciano Selzer <luciano.sel...@gmail.com> wrote:
> Well that much is true. Howerer the default biplot it's really messy and I
> can't make out anything near the center. Also in the default biplot, arrows
> are multiplied by 0.8 so they are shorter
> Luciano
>
> 2010/11/2 Brandon Hurr <bhiv...@gmail.com>
>
> > Maybe it's just me but I think that by scaling the way I did I over
> > exaggerated the positive side of PC1. The arrows seem to be pointing more
> > towards the right in the ggplot than they do in the biplot. I could be wrong
> > though.
>
> > Does anyone else think it's alright? I'll go with it if so.
>
> > Brandon
>
> > On Tue, Nov 2, 2010 at 17:32, Luciano Selzer <luciano.sel...@gmail.com>wrote:
>
> >> I'm sorry, but the two graphs look pretty similar to me. I don't get what
> >> you are trying to do. Do you want to use secondary axes?
>
> >> Luciano
>
> >> 2010/11/2 Brandon Hurr <bhiv...@gmail.com>
>
> >>>  It appears that in my haste I left out a line of code. Thanks to Luciano
> >>> for pointing it out and shame on me for not clearing my workspace.
>
> >>> ucdpcaplot<-data.frame(UCDVol[,1:4], UCDPCA$scores[,1:3],
> >>> Ethylene=UCDVol[,colnum])
>
> >>> B
>
> >>> On Tue, Nov 2, 2010 at 15:33, Brandon Hurr <bhiv...@gmail.com> wrote:
>
> >>>> I know this is probably going to be a hack because ggplot2 can't do
> >>>> multiple scales on the same axis but I'm curious about how others have got
> >>>> on. I'm having issues with the scaling. You get loadings and scores from
> >>>> princomp() and I've managed to get those and use geoms (point, segment, and
> >>>> text) to make a nice biplot, but I scaled the min/max of loadings to the
> >>>> min/max of the scores using rescale because you can't have two scales. I
> >>>> know this is wrong because the loadings in the ggplot2 output don't look the
> >>>> same as the biplot output, but I'm afraid I don't understand enough to know
> >>>> the proper way to scale them.
>
> >>>> Has anyone done biplots in ggplot before for principal component
> >>>> analysis?
>
> >>>> See attached code. Compare the biplot to the ggplot output to see what
> >>>> I'm on about.
>
> >>>>  Any help is much appreciated.
>
> >>>> Brandon
>
> >>>  --
> >>> You received this message because you are subscribed to the ggplot2
> >>> mailing list.
> >>> Please provide a reproducible example:http://gist.github.com/270442
>
> >>> To post: email ggp...@googlegroups.com
> >>> To unsubscribe: email ggplot2+u...@googlegroups.com<ggplot2%2Bunsu...@googlegroups.com>
> >>> More options:http://groups.google.com/group/ggplot2

Brandon Hurr

unread,
Nov 3, 2010, 12:51:59 PM11/3/10
to Guillaume T.R., ggplot2
I've always thought the main purpose of PCA plots was to show you associations within very complex/numerous data. I was hoping to see an association between a set of volatiles and a particular sample, but that appears to not be the case in this instance. The biplot is a mess as you said and I figured I could do it in ggplot which would allow me to tweak the appearance of the graph in a familiar way. I was just hoping there was an easier way than having to pull all of the data from the separate sections of the PCA and graph them. 

I've asked the multivariate statistician at work how he would scale them to the same axis/scale and came up with a a lot of ways to do it, most of which didn't work that well, but the best way was to "divide both the scores and the loadings by their row or column standard deviation." Although, I don't understand fully why that is or is not a good idea. 

B

Reply all
Reply to author
Forward
0 new messages