PCA and Linear Discriminant Analysis for classification

caroline

unread,

Apr 8, 2004, 8:02:42 AM4/8/04

to

Dear group,

I have few questions about those two topics as I used PCA to find PCS
and then need apply those values to LDA. I found PCs. I have written
LDA algorithm. The initial data set was 216x15154, which was a cancer
data.

My first quesion is: In the paper Lilien et al "Probabilistic Disease
Classification of Expression-Dependent Proteomic Data from Mass
Spectrometrynof Human Serum", 7th page under Methods, "A hyperplane H
is then computed using LDA using LDA. The PCA dimensionality reduced
sample points are projected on to H.....". How you project PC values
to H?

Also in page 8, figure 2, A, the LDA discriminant is perpendicular to
PCA space. I do not understand this?

Trining a data Set: I split my dimension reduced data set into sample
and training set and run the LDA program. I found when I used 10pcs
the error was 52%. Then I experimented with differnt sizes of sample
and training set and with a fixed number of PCs. I incresed the
number of PCs and run the program. The error rate got reduced when I
increses the number of PCs and also sample set 50, and PCs 150, I got
the error rate of 0.45%. I am not sure what I do is correct. How do
we train a data set and find the discriminant function? I do not know
how to train a data set.

I followed what is given in the book, Applied Multivariate Statistical
Analysis by Johnson and Wichern (2002), in page 602, second para.

Any help is appriciated.

Thank you in advance.

Caroline

Rob Henson

unread,

Apr 8, 2004, 11:33:23 AM4/8/04

to

"caroline" <caroline....@buseco.monash.edu.au> wrote in message
news:3kuu7sxtp648@legacy...

> Dear group,
>
> I have few questions about those two topics as I used PCA to find PCS
> and then need apply those values to LDA. I found PCs. I have written
> LDA algorithm. The initial data set was 216x15154, which was a cancer
> data.
>
> My first quesion is: In the paper Lilien et al "Probabilistic Disease
> Classification of Expression-Dependent Proteomic Data from Mass
> Spectrometrynof Human Serum", 7th page under Methods, "A hyperplane H
> is then computed using LDA using LDA. The PCA dimensionality reduced
> sample points are projected on to H.....". How you project PC values
> to H?

I think that you can probably find their code if you poke around on the
Dartmouth web site -- see the Supporting Material section of the paper for a
URL.

If you have access to the Statistics Toolbox then you can do something like
this for data matrix D:

% Select random training and test sets %
per_train = 0.5; % percentage of samples for training
nCt = floor(nC * per_train); % number of cancer samples in training
nHt = floor(nH * per_train); % number of healty samples in training
nt = nCt+nHt; % total number of training samples
sel_H = randperm(nH); % randomly select samples for training
sel_C = nH + randperm(nC); % randomly select samples for training
sel_t = [sel_C(1:nCt) sel_H(1:nHt)]; % samples chosen for training
sel_e = [sel_C(nCt+1:end) sel_H(nHt+1:end)]; % samples for evaluation

% PCA to reduce dimensionality
P = princomp(D(:,sel_t)',0);

% Project into PCA space
k = 3; % lost 3 degrees of freedom
% change k to remove more principal components 2 <= k < nt
x = D(:,:)' * P(:,1:nt-k);

% Use linear classifier
c = classify(x(sel_e,:),x(sel_t,:),id(sel_t));

You might also consider removing some points -- the first 3000 or so point
for each spectra are probably measuring artifacts of the SELDI matrix rather
than sample proteins.

HTH,

Rob Henson
The MathWorks, Inc.

someone

unread,

Apr 8, 2004, 1:10:21 PM4/8/04

to

Hi Here are some thoughts,

>"A hyperplane
> H
> is then computed using LDA using LDA. The PCA dimensionality
> reduced
> sample points are projected on to H.....". How you project PC
values to H?

This projection is done through LDA. See a simple function that I
wrote on the bottom that does LDA. Notice that you dataset needs to
be labeled because LDA finds the best linear way to separate the
clusters. The hyperplane will lie in the LDA space.

> Also in page 8, figure 2, A, the LDA discriminant is perpendicular
> to
> PCA space. I do not understand this?

I am not sure what you mean here. But it might be related to the fact
that the PCA components are orthogonal to each other (uncorrelated).
I need to see the context in which this was said...

> Trining a data Set: I split my dimension reduced data set into
> sample
> and training set and run the LDA program. I found when I used
> 10pcs
> the error was 52%. Then I experimented with differnt sizes of
> sample
> and training set and with a fixed number of PCs. I incresed the
> number of PCs and run the program. The error rate got reduced when
> I
> increses the number of PCs and also sample set 50, and PCs 150, I
> got
> the error rate of 0.45%. I am not sure what I do is correct. How
> do
> we train a data set and find the discriminant function?

Is the error on the training or sample set ? If you add more features
than you could be overfitting the data. You should measure the error
on a test data set that was not used for training.

A simple linear discriminant would be to chose the half-way distance
between the means.
The following function does LDA (it very simple). You can use the
following example as a simple guide on how to use it:

c1=[randn(100,1) randn(100,1)+1 randn(100,1)+6];
c2=[randn(100,1) randn(100,1) randn(100,1)];
scatter3(c1(:,1),c1(:,2),c1(:,3),'filled')
hold on;scatter3(c2(:,1),c2(:,2),c2(:,3),'r','filled')
figure
[dataLDA,A]=lda([c1;c2],[ones(100,1);zeros(100,1)],2,1)

%%%%LDA FUNCTION%%%%%

function [dataLDA,A]=lda(data,label,drec,mode)
%[dataLDA,A]=lda(data,label,drec)
%
%each row of data is a sample
%each column of data corresponds to a feature
%drec is the final dimension of your data
%choose mode 1 to plot and if drec is 2 or 3
%Linear Discriminant Analysis

if(any(label==0)),
label=label+1;
end

cat=length(unique(label));
[n,f]=size(data);
Sw=zeros(f);
Sb=zeros(f);
m=mean(data);

for i=1:cat,

[r,c]=find(label==i);
mg=mean(data(r,:));
ng=length(r);
Sw=Sw + cov(data(r,:)).*(ng-1);
Sb=Sb + ng*(mg-m)'*(mg-m) ;

end

[v2,d2]=eig(inv(Sw)*Sb);
A=v2(:,1:drec);
dataLDA=data*A;

if((drec==3 | drec==2) & mode)
gplot(label,dataLDA,drec)
end

%Helper function to plot labeled data
function gplot(label,data,drec)
%2 labels only
[r,c]=find(label==1);g1=data(r,:);
[r,c]=find(label==2);g2=data(r,:);
[r,c]=find(label==3);g3=data(r,:);

if(drec==3),
scatter3(g1(:,1),g1(:,2),g1(:,3),'filled')
hold on
scatter3(g2(:,1),g2(:,2),g2(:,3),'r','filled')
else

if(~isreal(g1(:))),
g1(:,1)=real(g1(:,1))+imag(g1(:,1));
g1(:,2)=real(g1(:,2))+imag(g1(:,2));
g2(:,1)=real(g2(:,1))+imag(g2(:,1));
g2(:,2)=real(g2(:,2))+imag(g2(:,2));
end

scatter(g1(:,1),g1(:,2))
hold on
scatter(g2(:,1),g2(:,2),'r')
end

Lucio

unread,

Apr 8, 2004, 3:55:14 PM4/8/04

to

Hi there
I try to answer your quetions below ...

>
> My first quesion is: In the paper Lilien et al "Probabilistic
> Disease
> Classification of Expression-Dependent Proteomic Data from Mass
> Spectrometrynof Human Serum", 7th page under Methods, "A hyperplane
> H
> is then computed using LDA using LDA. The PCA dimensionality
> reduced
> sample points are projected on to H.....". How you project PC
> values
> to H?
>

By doing LDA you are projecting your measuring space (in this case
reduced by PCA) into your decision space which is also called
estimation. You can think that the H space gives you a class
membership value.

> Also in page 8, figure 2, A, the LDA discriminant is perpendicular
> to
> PCA space. I do not understand this?
>

No, it is not perpendicular, H will contain the subspace that best
descriminates the two groups of samples that are (for simplicity)
scattered in a two dimensional plot.

> Trining a data Set: I split my dimension reduced data set into
> sample
> and training set and run the LDA program. I found when I used
> 10pcs
> the error was 52%. Then I experimented with differnt sizes of
> sample
> and training set and with a fixed number of PCs. I incresed the
> number of PCs and run the program. The error rate got reduced when
> I
> increses the number of PCs and also sample set 50, and PCs 150, I
> got
> the error rate of 0.45%. I am not sure what I do is correct. How
> do
> we train a data set and find the discriminant function? I do not
> know
> how to train a data set.
>

You can follow Rob's post, "classify" in the stats toolbox does LDA.

Anders Björk

unread,

Apr 8, 2004, 5:58:03 PM4/8/04

to

You can do Soft independent modelling of class analogy (SIMCA) with two or
more PCA-modells for each class you have.

Or You could discrimant Partial Least Squares, D-PLS for classification.

For more information see this link
http://www.acc.umu.se/~tnkjtg/chemometrics/editorial/oct2002.pdf

"caroline" <caroline....@buseco.monash.edu.au> skrev i meddelandet
news:3kuu7sxtp648@legacy...

Anders Björk

unread,

Apr 8, 2004, 6:15:49 PM4/8/04

to

See my additional comments below

"caroline" <caroline....@buseco.monash.edu.au> skrev i meddelandet
news:3kuu7sxtp648@legacy...

> Dear group,
>
> I have few questions about those two topics as I used PCA to find PCS
> and then need apply those values to LDA. I found PCs. I have written
> LDA algorithm. The initial data set was 216x15154, which was a cancer
> data.

If it is 15154 variabels, then you better do some clever variable selection.
My experiance is that over 3000-4000 variables it tends to be worse and
worse. Have you done some crossvalidation to determine the dimension(no pc)?
Have you looked at PCA-loadings if you split data in to some separate data
sets? If there some patterns that are simimilar for the separate sets? If so
modell these variables only in a few new PCA-modells! Have you calculated
the correlation between your class variables and 15154 variables? Then only
select those with high correlation for instance.

> My first quesion is: In the paper Lilien et al "Probabilistic Disease
> Classification of Expression-Dependent Proteomic Data from Mass
> Spectrometrynof Human Serum", 7th page under Methods, "A hyperplane H
> is then computed using LDA using LDA. The PCA dimensionality reduced
> sample points are projected on to H.....". How you project PC values
> to H?
>
> Also in page 8, figure 2, A, the LDA discriminant is perpendicular to
> PCA space. I do not understand this?
>
> Trining a data Set: I split my dimension reduced data set into sample
> and training set and run the LDA program. I found when I used 10pcs
> the error was 52%. Then I experimented with differnt sizes of sample
> and training set and with a fixed number of PCs. I incresed the
> number of PCs and run the program. The error rate got reduced when I
> increses the number of PCs and also sample set 50, and PCs 150, I got
> the error rate of 0.45%. I am not sure what I do is correct. How do
> we train a data set and find the discriminant function? I do not know
> how to train a data set.

Yepp the more noise you might modell the better classification results....
Think it terms of what generates the patterns in your data, is 50 separate
and independent variables then use 50! But it is unlikely... very unlikely..
I would say proballby not more then 10.

Good Luck with your modelling!

Greg Heath

unread,

Apr 9, 2004, 12:44:04 AM4/9/04

to

caroline....@buseco.monash.edu.au (caroline) wrote in message
news:<3kuu7sxtp648@legacy>...

> Dear group,
>
> I have few questions about those two topics as I used PCA to find PCS
> and then need apply those values to LDA. I found PCs. I have written
> LDA algorithm. The initial data set was 216x15154, which was a cancer
> data.

Was the mixture data standardized?
How many PCs were kept?
What % of mixture variance was preserved?
How many classes do you have?

I don't recommend PCA for dimensionality reduction as preprocessing for
classification. The dominant directions of the mixture spread are not
guaranteed to be in the dominant directions of class separation. Think
of two closely spaced parallel cigar shaped distributions. The dominant
PC direction is along the length of the cigars whereas the dominant
direction of separability is along the width of the cigars.

> My first quesion is: In the paper Lilien et al "Probabilistic Disease
> Classification of Expression-Dependent Proteomic Data from Mass
> Spectrometrynof Human Serum", 7th page under Methods, "A hyperplane H
> is then computed using LDA using LDA. The PCA dimensionality reduced
> sample points are projected on to H.....". How you project PC values
> to H?

If the original n-D data vector is x, the dominant n-D orthonormal PC
eigenvectors, pci (i=1,2,...m), represent a m-D hyperplane, P. The
original data is projected on P to obtain the m-D PCA dimensionality
reduced sample points, z, with components (x.pc1,x.pc2,...x.pcm)
in the PC orthogonal coordinate system.

For c subclasses, there will be at most c-1 dominant m-D eigenvectors
aj(j=1,2,...c-1) with nonzero eigenvalues. The c-1 hyperplane H is defined
by the aj. However, the aj are not orthogonal, so I'm not sure how z
is to be "projected" on H. The z.aj are the desired linear discriminants,
if that is what they mean.

> Also in page 8, figure 2, A, the LDA discriminant is perpendicular to
> PCA space. I do not understand this?

Neither do I. I haven't read the paper. You must have misunderstood
what they wrote.However, since you said "the" LDA discriminant, I will
assume that c = 2.

> Trining a data Set: I split my dimension reduced data set into sample
> and training set and run the LDA program. I found when I used 10pcs
> the error was 52%.

For the training set or test(holdout sample) set? Quote both errors.

> Then I experimented with differnt sizes of sample
> and training set and with a fixed number of PCs. I incresed the
> number of PCs and run the program. The error rate got reduced when I
> increses the number of PCs and also sample set 50, and PCs 150, I got
> the error rate of 0.45%. I am not sure what I do is correct.

One check is to repeat the experiment 10 or more times with # of PCs
and train/test ratio fixed. Each trial should have different random
draws for the train and test sets. The standard deviation of the
test set errors should characterize the reliability of your results.

Another check is to chart the training and test set errors as functions
of #of PCs and split ratio.

Hope this helps.

Greg