Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MLP Optimization Problem ( generalization problem ) -- on OCR data--Urdu

8 views
Skip to first unread message

zaheer ahmad

unread,
Nov 28, 2008, 12:45:04 PM11/28/08
to
Dear All

I am developing an OCR (Urdu) but having 'goal doesnt meet' problem.

my network is

Input =400 also reduced and checked on 100 and 144
output=54
Hidden layer = 20 but checked on 30,40,50,60,70,80,90,100 upto 250

Sample size i tried to train th net are
5400 (i.e. 100*54=5400) but also checked on
540 (i.e. 10*54=540) and
1080 (i.e. 20*54=1080) and
1350 (i.e. 25*54=1350) and
2700 (i.e. 50*54=2700)
where 54 are the number of character and 100,10,20,25 and 50 are samples of each character.

i tried on using traingdx, trainlm and trainscg( because of out of memory error ) with both mse and sse.

i dont know why it doesnt reach to the gaol the goal=0.1 for traingdx (or goal= 0.009 for tranlm)
some time it reach to goal but doesnt recognise test data.

the code is given as below:


clear;clc;

% SET CHARACTERS:
Alphabet =Alpha4Train;%Alphabet =Alphabet(:,1:100);
Target=TargetSet;%Target=Target(1:100);
[S1,Qa] = size(Alphabet);
[S2,Q] =size(Target);

% DEFINING THE NETWORK
% ====================
H1 =120 ;%115=10 120=0 with mc=0.5 120=2...80....200=met for 10 char ...150 120 for 10 alphas

net = newff(minmax(Alphabet),[H1 S2],{'logsig' 'logsig'},'traingdx');%trainrp trainscg
%%%%traingdx traingdm trainlm traincgf, net = newff(minmax(alphabet),[S1 S2],{'logsig' 'logsig'},'traingdx');

net.performFcn = 'sse'; % sse Sum-Squared Error performance function
net.trainParam.goal =0.10;% mean(var(Target))/100; %0.10;% 0.009;% Sum-squared error goal.
net.trainParam.show = 10; % Frequency of progress displays (in epochs).
net.trainParam.epochs = 95000; %5000 Maximum number of epochs to train.
% net.trainParam.mc = 0.95;%0.65;% % Momentum constant. mc=0.65 and s1=100 good memorization
% net.trainParam.mem_reduc =99999;
% net.trainParam.lr=0.01;%Learning rate
% net.trainParam.lr_inc=1.9;
% net.trainParam.lr_dec = 0.5;

% TRAINING THE NETWORK
% ====================

P = [Alphabet,Alphabet,Alphabet,Alphabet,Alphabet,Alphabet,Alphabet];
T = [Target,Target,Target,Target,Target,Target,Target];

[net,tr] = train(net,P,T);

% TRAINING THE NETWORK WITH NOISE...GET DIRTY FOR GOOD RESULTS AT THE END
% =======================================================================
netn = net;
netn.trainParam.goal =0.01;% mean(var(Target))/100; %0.009;%mean(var(Target))/100; % Mean-squared error goal.
netn.trainParam.epochs = 85000;%500
netn.trainParam.show = 10; %%% Frequency of progress displays (in epochs).

T = [Target,Target,Target,Target,Target,Target,Target];
P = [(Alphabet + randn(S1,Qa)*0.2), Alphabet + randn(S1,Qa)*0.3, Alphabet + randn(S1,Qa)*0.3,Alphabet,Alphabet,

(Alphabet + randn(S1,Qa)*0.2), Alphabet + randn(S1,Qa)*0.3];
[netn,trn] = train(netn,Alphabet,Target);

% load netxxx1010; ImProc(netn,net);

save netgdx2115;


%%%%%%%%%%
i have only 100 samples for each character to i have used

P = [Alphabet,Alphabet,Alphabet,Alphabet,Alphabet,Alphabet,Alphabet];

to the get inequality

Neq >~ r* Nw where (~2 < r < ~ 64). as described by Greg Heath in posts.
so it doesnt need to tell that i have tried to follow Greg Heath rule rule for choosing hidden layer.
even tried to overrule it some times but all in vain.

the comments in the code shows the values i have tested, so i have not omitted the comments for yours
reading despite it make the code reading a bit difficult, hope no one will mind.

thanks
zaheer ahmad

Greg Heath

unread,
Dec 2, 2008, 5:57:13 PM12/2/08
to
On Nov 28, 12:45 pm, "zaheer ahmad" <ahmad.zah...@yah00000.com> wrote:
> Dear All
>
> I am developing an OCR (Urdu) but having 'goal doesnt meet' problem.
>
> my network is
>
> Input =400 also reduced and checked on 100 and 144
> output=54
> Hidden layer = 20 but checked on 30,40,50,60,70,80,90,100 upto 250
>
> Sample size i tried to train th net are
> 5400 (i.e. 100*54=5400) but also checked on
> 540 (i.e. 10*54=540) and
> 1080 (i.e. 20*54=1080) and
> 1350 (i.e. 25*54=1350) and
> 2700 (i.e. 50*54=2700)
> where 54 are the number of character and 100,10,20,25 and 50 are samples of each character

So you have

size(p) = [400 Ntrn] for a character with 20*20 = 400 pixels
size(t) = [54 Ntrn] for 54 letters, integers and special characters?

> i tried on using traingdx, trainlm and trainscg( because of out of memory error ) with both mse and sse.

Forget sse

> i dont know why it doesnt reach to the gaol the goal=0.1 for traingdx (or goal= 0.009 for tranlm)

Why the difference? How were the goals determined?

> some time it reach to goal but doesnt recognise test data.

How similar are testing and training sets?

Clustering and visualizing the data should help.

> the code is given as below:
>
> clear;clc;
>
> % SET CHARACTERS:
> Alphabet =Alpha4Train;%Alphabet =Alphabet(:,1:100);
> Target=TargetSet;%Target=Target(1:100);

??
5400 not 100

> [S1,Qa] = size(Alphabet);

[400 5400]

> [S2,Q] =size(Target);

[54 5400]

if Q ~= Qa, error, end


> % DEFINING THE NETWORK
> % ====================
> H1 =120 ;%115=10 120=0 with mc=0.5 120=2...80....200=met for 10 char ...150 120 for 10 alphas

I have no idea what the comment is supposed to mean.

Nw = (400+1)*120+(120+1)*54 = 54+(1+400+54)*120 = 54,654
Neq = 5400*54 = 291,600 ~ 5.3*Nw

Would have prefered a higher ratio.

> net =newff(minmax(Alphabet),[H1 S2],{'logsig' 'logsig'},'traingdx');%trainrp trainscg

Why not standardize inputs and use tansig hidden nodes??

> %%%%traingdx traingdm trainlm traincgf, net =newff(minmax(alphabet),[S1 S2],{'logsig' 'logsig'},'traingdx');


>
> net.performFcn = 'sse'; % sse Sum-Squared Error performance function

Why not use mse??

> net.trainParam.goal =0.10;% mean(var(Target))/100; %0.10;% 0.009;% Sum-squared error goal.

??

c = 54
mean(Target) = [ 1 + (c-1)*0]/c = 1/c = 1/54 = 1.85e-2
mean(var(Target)) = [(1-1/c)^2 + (c-1)*(0-1/c)^2]/(c-1) = 1/c

net.trainParam.goal = 1.85e-4 % MSE

> net.trainParam.show = 10; % Frequency of progress displays (in epochs).
> net.trainParam.epochs = 95000; %5000 Maximum number of epochs to train.
> % net.trainParam.mc = 0.95;%0.65;% % Momentum constant. mc=0.65 and s1=100 good memorization

H = S1?

> % net.trainParam.mem_reduc =99999;
> % net.trainParam.lr=0.01;%Learning rate
> % net.trainParam.lr_inc=1.9;
> % net.trainParam.lr_dec = 0.5;

I use trainlm or trainscg and only specify goal,
show and (rarely) epochs.
So, I can't comment on the other settings.

> % TRAINING THE NETWORK
> % ====================
>
> P = [Alphabet,Alphabet,Alphabet,Alphabet,Alphabet,Alphabet,Alphabet];
> T = [Target,Target,Target,Target,Target,Target,Target];

This doesn't make sense.

> [net,tr] = train(net,P,T);
>
> % TRAINING THE NETWORK WITH NOISE...GET DIRTY FOR GOOD RESULTS AT THE END

This is called Jittering. Go to Google groups and search on

greg-heath jittering

> % =======================================================================
> netn = net;
> netn.trainParam.goal =0.01;% mean(var(Target))/100; %0.009;%mean(var(Target))/100; % Mean-squared error goal.

Revisit this.

> netn.trainParam.epochs = 85000;%500
> netn.trainParam.show = 10; %%% Frequency of progress displays (in epochs).
>
> T = [Target,Target,Target,Target,Target,Target,Target];
> P = [(Alphabet + randn(S1,Qa)*0.2), Alphabet + randn(S1,Qa)*0.3, Alphabet + > randn(S1,Qa)*0.3,Alphabet,Alphabet,
> (Alphabet + randn(S1,Qa)*0.2), Alphabet + randn(S1,Qa)*0.3];
> [netn,trn] = train(netn,Alphabet,Target);

Since Neq/Nw ~ 5. Probably don't need to increase Ntrn
by more than a factor of 2 to 4.

Use only one noise level and scale it to the
standard deviation of Alphabet in order to get
a specified SNR.

> % load netxxx1010; ImProc(netn,net);
>
> save netgdx2115;
>
> %%%%%%%%%%
> i have only 100 samples for each character to i have used
>
> P = [Alphabet,Alphabet,Alphabet,Alphabet,Alphabet,Alphabet,Alphabet];
>
> to the get inequality
>
> Neq >~ r* Nw where (~2 < r < ~ 64). as described by Greg Heath in posts.
> so it doesnt need to tell that i have tried to follow Greg Heath rule rule for choosing hidden layer.
> even tried to overrule it some times but all in vain.
>
> the comments in the code shows the values i have tested, so i have not omitted the comments for yours
> reading despite it make the code reading a bit difficult, hope no one will mind.

You should overlay plots of misclassified characters
with plots of means of the correct and assigned classes.
Perhaps the classes are not defined well enough and
you may need to use clustering to create well defined
subclasses.

You can also replace forced classification (always
make a classification) with conditional classification
(only make a classification if the posterior estimate
is larger than a threshold). To do this, overlay the
color coded histograms of the output for the classes
that get the most confused.

Go to Google Groups and search on

greg-heath forced-classification
greg-heath conditional-classification

Hope this helps.

Greg

zaheer ahmad

unread,
Dec 6, 2008, 3:54:02 PM12/6/08
to
Greg Heath <he...@alumni.brown.edu> wrote in message <e9dc5bcb-cf3d-4a6f...@d23g2000yqc.googlegroups.com>...

> On Nov 28, 12:45 pm, "zaheer ahmad" <ahmad.zah...@yah00000.com> wrote:
> > Dear All
> >
> > I am developing an OCR (Urdu) but having 'goal doesnt meet' problem.
> >
> > my network is
> >
> > Input =400 also reduced and checked on 100 and 144
> > output=54
> > Hidden layer = 20 but checked on 30,40,50,60,70,80,90,100 upto 250
> >
> > Sample size i tried to train th net are
> > 5400 (i.e. 100*54=5400) but also checked on
> > 540 (i.e. 10*54=540) and
> > 1080 (i.e. 20*54=1080) and
> > 1350 (i.e. 25*54=1350) and
> > 2700 (i.e. 50*54=2700)
> > where 54 are the number of character and 100,10,20,25 and 50 are samples of each character
>
> So you have
>
> size(p) = [400 Ntrn] for a character with 20*20 = 400 pixels
> size(t) = [54 Ntrn] for 54 letters, integers and special characters?

yes i have 400 Ntrn and 54 letter...all urdu character special character not considered.


> How similar are testing and training sets?

testing and training data are sample from the same population


> Clustering and visualizing the data should help.

kindly help or reference on clustering......

> if Q ~= Qa, error, end

Yes Q=Qa
> > % DEFINING THE NETWORK

> Why not standardize inputs and use tansig hidden nodes??

i have checked both and now ammeded the first to tansig and left the 2nd as it was as i need that to compare with ascii.

> H = S1?
yes it is .

>
> > netn.trainParam.goal =0.01;% mean(var(Target))/100; %0.009;%mean(var(Target))/100; % Mean-squared error goal.
>
> Revisit this.

i am using this as suggested by you.

> Use only one noise level and scale it to the
> standard deviation of Alphabet in order to get
> a specified SNR.

i dont understand or say dont know how to do....

> You can also replace forced classification (always
> make a classification) with conditional classification
> (only make a classification if the posterior estimate
> is larger than a threshold). To do this, overlay the
> color coded histograms of the output for the classes
> that get the most confused.

kindly elaborate this a little, assume i am a novice.
and also how to calculate error....

this is my 2nd thread on the same question as i was not receiving reply for a long time on my thread
http://www.mathworks.com/matlabcentral/newsreader/view_thread/235521#614583
so its better to discuss the problem in a single place so i will ask questions on that ( old ) thread here..
in that thread you suggested that :

>Maybe your classes are not well defined
>and have to be partitioned into subclasses
>via clustering (e.g., k-means).
how to perform this process ( k means )?

>Overlay the plot of each misclassified character
>(blue) on the plot of the mean of the class to
>which they were assigned (red) and the plot of
>the mean of the correct class (black)clustering.
>This should give some insight into the difficulty.
kindly help on this too.....

i have changed the code and included validation and testing ...it gets converged when i use H=2000 but the validation and testing line ( on graph ) remain well above the goal line and results are about 15-20 %.
the code now goes as below:

Alphabet =Alpha4Train;
Target=TargetSet;
[S1,Qa] = size(Alphabet); %% s=315 and Qa=54000 as now i have resized characters to 21x15
[S2,Q] =size(Target);%% S2=11 and Q=5400 which means Q=Qa


% DEFINING THE NETWORK
% ====================

H1 =2000; %% chosen using trial ...
net = newff(minmax(Alphabet),[H1 S2],{'tansig' 'logsig'},'trainscg');
net.performFcn = 'mse';
net.trainParam.goal = mean(var(Target))/100;%% as suggested by Greg Heath
net.trainParam.show = 10;
net.trainParam.epochs = 500;

% TRAINING THE NETWORK
% ====================

testPercent = 0.25;
validatePercent = 0.25;
[trainSamples,validateSamples,testSamples] = dividevec(Alphabet,Target,testPercent,validatePercent);
[net,tr] = train(net,trainSamples.P,trainSamples.T,[],[],validateSamples,testSamples);

i know that number of questions are increasing to be answered at one time but ...hope no one will mind....
thanks in advance for your time...
Zaheer Ahmad...

zaheer ahmad

unread,
Dec 10, 2008, 4:25:04 PM12/10/08
to
well, how to train the network using different input sized matrix i.e. i want to first train the network on 21x15 input matrix then want to train ( re-train ) the same network on 16x10 input matrix. The purpose of the process is to get a network which will be able to recognize different size of characters.....
I tried directly to pass new input matrix to 'train' but it produced error ( as i was expecting ).
one technique might be to pass it the weights and bias returned from the first training session but there will be a mismatch in the number of weights and bias because of change in number of input in input layer.... and i just got why the function 'train' trained on input size-1 matrix cant be trained directly on size-2 input matix....its because of number of weights too...am i correct?....then what can i do...how to retrain a network on differently sized input matrices...
thanks in advance
zaheer ahmad

Greg Heath

unread,
Dec 17, 2008, 12:25:01 PM12/17/08
to
On Dec 6, 3:54 pm, "zaheer ahmad" <ahmad.zah...@yah00000.com> wrote:
> Greg Heath<he...@alumni.brown.edu> wrote in message <e9dc5bcb-cf3d-4a6f-9fdd-d4ea0b8ee...@d23g2000yqc.googlegroups.com>...

> > On Nov 28, 12:45 pm, "zaheer ahmad" <ahmad.zah...@yah00000.com> wrote:
> > > Dear All
>
> > > I am developing an OCR (Urdu) but having 'goal doesnt meet' problem.
>
> > > my network is
>
> > > Input =400 also reduced and checked on 100 and 144
> > > output=54
> > > Hidden layer = 20 but checked on 30,40,50,60,70,80,90,100 upto 250
>
> > > Sample size i tried to train th net are
> > > 5400 (i.e. 100*54=5400) but also checked on
> > > 540 (i.e. 10*54=540) and
> > > 1080 (i.e. 20*54=1080) and
> > > 1350 (i.e. 25*54=1350) and
> > > 2700 (i.e. 50*54=2700)
> > > where 54 are the number of character and 100,10,20,25 and 50 are samples of each character
>
> > So you have
>
> > size(p) = [400 Ntrn] for a character with 20*20 = 400 pixels
> > size(t) = [54 Ntrn] for 54 letters, integers and special characters?
>
> yes i have 400 Ntrn and 54 letter...all urdu character special character not considered.
>
> > How similar are testing and training sets?
>
> testing and training data are sample from the same population
> > Clustering and visualizing the data should help.
>
> kindly help or reference on clustering......

Search on the matlab website. Iwould use kmeans.

> > if Q ~= Qa, error, end
> Yes Q=Qa
> > > % DEFINING THE NETWORK
> > Why not standardize inputs and use tansig hidden nodes??
>
> i have checked both and now ammeded the first to tansig and left the 2nd as it was as i need that to compare with ascii.
>
> > H = S1?
>
> yes it is .
>
> > > netn.trainParam.goal =0.01;% mean(var(Target))/100; %0.009;%mean(var(Target))/100; % Mean-squared error goal.
>
> > Revisit this.
>
> i am using this as suggested by you.
>
> > Use only one noise level and scale it to the
> > standard deviation of Alphabet in order to get
> > a specified SNR.
>
> i dont understand or say dont know how to do....

x = x0 + noise
SNR = mean(x0^2)/mean(noise^2)

Assuming

mean(randn(size(x))^2) ~ 1
mean(x0*randn(size(x)) ~ 0

choose

x = x0 + sqrt(mean(x0^2)/SNR)*randn(size(x));

> > You can also replace forced classification (always
> > make a classification) with conditional classification
> > (only make a classification if the posterior estimate
> > is larger than a threshold). To do this, overlay the
> > color coded histograms of the output for the classes
> > that get the most confused.
>
> kindly elaborate this a little, assume i am a novice.
> and also how to calculate error....

Sorry, I don't have the time.

Try searching the Google Group archives for

greg-heath forced-classification
greg-heath conditional-classification

> this is my 2nd thread on the same question as i was not receiving reply for a long time on my thread http://www.mathworks.com/matlabcentral/newsreader/view_thread/235521#...


> so its better to discuss the problem in a single place so i will ask questions on that ( old ) thread here..
> in that thread you suggested that :
>
> >Maybe your classes are not well defined
> >and have to be partitioned into subclasses
> >via clustering (e.g., k-means).
>
> how to perform this process ( k means )?

See the mathworks website documentation
for clustering finctions.

> >Overlay the plot of each misclassified character
> >(blue) on the plot of the mean of the class to
> >which they were assigned (red) and the plot of
> >the mean of the correct class (black)clustering.
> >This should give some insight into the difficulty.
>
> kindly help on this too.....

The mean of the members of each class should be
a very recognizable template representing the class.
If it is not, you should try to partition the class into
subclasses.

> i have changed the code and included validation and testing ...it gets converged when i use H=2000 but the validation and testing line ( on graph ) remain well above the goal line and results are about 15-20 %.

What error rate would you be satisfied with?

> the code now goes as below:
>
> Alphabet =Alpha4Train;
> Target=TargetSet;
> [S1,Qa] = size(Alphabet); %% s=315 and Qa=54000 as now i have resized characters to 21x15
> [S2,Q] =size(Target);%% S2=11 and Q=5400 which means Q=Qa
> % DEFINING THE NETWORK
> % ====================
> H1 =2000; %% chosen using trial ...
> net = newff(minmax(Alphabet),[H1 S2],{'tansig' 'logsig'},'trainscg');
> net.performFcn = 'mse';

> net.trainParam.goal = mean(var(Target))/100;%% as suggested byGreg Heath


> net.trainParam.show = 10;
> net.trainParam.epochs = 500;
> % TRAINING THE NETWORK
> % ====================
> testPercent = 0.25;
> validatePercent = 0.25;

These percentages may be too high.

> [trainSamples,validateSamples,testSamples] = dividevec(Alphabet,Target,testPercent,validatePercent);
> [net,tr] = train(net,trainSamples.P,trainSamples.T,[],[],validateSamples,testSamples);
>
> i know that number of questions are increasing to be answered at one time but ...hope no one will mind....
> thanks in advance for your time...

Try multiple trials of 10-fold cross-validation (XVAL).
keep track of the standard deviation as well as the mean.

greg-heath XVAL
greg-heath cross-validation

Hope this helps.

Greg

Greg Heath

unread,
Dec 17, 2008, 12:33:05 PM12/17/08
to
On Dec 10, 4:25 pm, "zaheer ahmad" <ahmad.zah...@yah00000.com> wrote:
> well, how to train the network using different input sized matrix i.e. i want to first  train the network on 21x15 input matrix then want to train ( re-train ) the same network on 16x10 input matrix. The purpose of the process is to get a network which will be able to recognize different size of  characters.....

All images have to be the same size. You can try
embedding the smaller sized image into the larger
frame. However, you may need to use scale-invariant
classification.

The only method I know of (and have never used) is
to use Fourier coefficients as inputs.

You need to search for scale-invariant image
classification.

Hope this helps.

Greg

-----SNIP

zaheer ahmad

unread,
Dec 21, 2008, 4:13:04 AM12/21/08
to
Thanks Alot Mr.Greg Heath thanks alot for your help and time...I got my solution and trained the net..
I was having problem in my targets and goal...
there were some rows ( fully filled with zeros ) in the target array, and the goal I set was not good..now i have set my goal=0.0001 and removed zeros from target set. despite that it doesn't achieve this goal (0.0001 ) but the results are very promising and my results improved very well. i think it will achieve this goal but i get bored and frustrated after waiting for 4-5 hours processing ans stop the training or my system restarts( being overloaded ). but when i check the results it remains good. so my problem solved,thanks again as
Your suggestions and recommendations help me alot to improve my coding and knowledge in neural networks.
Being a novice in NN i was unable to understand what is the actual problem and how to review the whole neural network routines..but your suggestion posted here and in response of other questions which i have thoroughly studies help me alot.
Zaheer Ahmad
Institute of Management Science(IMSciences)
Peshawar Pakistan.


manjula reddy

unread,
Feb 10, 2009, 6:33:01 AM2/10/09
to
0 new messages