Neural Network

amanita

unread,

Jan 29, 2010, 2:04:05 AM1/29/10

to

I am using a neural network to classify a number of inputs into two lables(0 and 1).I got a set for training and validation and i am trying to find the best topology.This is what i am doing:

for i=1:10
shuffling my input (training and validation set)
dividing the input to training and validation (for example 70% - 30%,i use the same percentage for all 10 iterations)
newff
validationperformance(i,1)=...
validationerrors(i,1)=....
end

meanvalidationperformance
meanvalidationerros

basically each time, for one topology, i am computing the mean validation performance and the mean validation errors in order to compare them later.When ill find the best topology ill use it in a test set.
My question is this: shall i shuffle my input inside this loop in order to have different vectors used as training set in each iteration? or shall i shuffle it once outside the loop and use the exact training set for all 10 iterations? by shuffling it inside the loop (even if the training set is stable at 70% for now) then i will see whats the performance of my net when different vectors of the training set are used.if i shuffle it outside the loop then the vectors of the training set are fixed(and maybe for example i got more 0 labeled vectors than 1 labeled vectors,so my net wont have all info needed for the 1 labeled vectors).I dont know if you understant me but its difficult to explain it.Shall i recycle vectors in my training set or shall i use fixed vectors?

Greg Heath

unread,

Jan 30, 2010, 5:19:02 AM1/30/10

to

On Jan 29, 2:04 am, "amanita " <k_amanit...@hotmail.com> wrote:
> I am using a neural network to classify a number of inputs into

> two lables (0 and 1). I got a set for training and validation

Do you also have an independent test set ?
Exactly how much data do you have?
What is the dimensionality?

> and i am trying to find the best topology. This is what i am doing:
>
> for i=1:10

for i = 1:Ntrials

> shuffling my input (training and validation set)
> dividing the input to training and validation (for example
> 70% - 30%,i use the same percentage for all 10 iterations)

See below to make sure Ntrn and Nval are sufficiently large.

> newff
> validationperformance(i,1)=...

Update summary stats
meanvalidationperformance(i,1)=...
varvalidationperformance(i,1)=...
stdvalidationperformance(i,1)=...

> validationerrors(i,1)=....

Update summary stats
meanvalidationerrors(i,1)=....
varvalidationerrors(i,1)=....
stdvalidationerrors(i,1)=....

Update plots of mean and mean +/- stdv

Check stopping criteria based on stdvs.

> end

>
>
> basically each time, for one topology, i am computing the mean
>validation performance and the mean validation errors in order

>to compare them later. When ill find the best topology ill use

>it in a test set.
>
> My question is this: shall i shuffle my input inside this loop
in order to have different vectors used as training set in each
iteration? or shall i shuffle it once outside the loop and use the
exact training set for all 10 iterations? by shuffling it inside
the loop (even if the training set is stable at 70% for now) then
i will see whats the performance of my net when different vectors
of the training set are used.if i shuffle it outside the loop then
the vectors of the training set are fixed(and maybe for example i
got more 0 labeled vectors than 1 labeled vectors,so my net wont
have all info needed for the 1 labeled vectors).I dont know if you
understant me but its difficult to explain it.Shall i recycle
>vectors in my training set or shall i use fixed vectors?

It depends on your data. Although shuffling inside the loop
is less biased, you may need more iterations to obtain
sufficiently precise estimates (i.e., sufficiently small stdvs).
However, if you keep a running tab on the standard deviations as
suggested above you should be able to use the same split and
rely on the randomness of the newff weight initializations.

The trn/val %split and number of iterations may need to
depend on the size of the data set and the number of weights
to be estimated.

One typical rule of thumb for accurate weight estimation
for an I-H-O MLP is

Neq >> 10*Nw

where

I,H,O are the number of input, hidden and output nodes,
respectively

Neq = Ntrn*O is the number of training equations obtained
from Ntrn training vectors and O outputs

Nw = (I+1)*H+(H+1)*O is the number of weights to be
estimated.

Typically, the ratio r = Neq/Nw > ~10 is satisfactory.

Typical rules for precise performance estimation are

std(MSE) << mean(MSE)
and
std(PCTerr) << mean(PCTerr)

MSE can be assumed to be CHi-SQUARE ditributed and
PCTerr can be assumed to be BINOMIALLY distributed
in order to estimate an adequate size for Nval.

So, once you are sure that Ntrn is large enough to
generate accurate weight estimates, and Nval is large
enough to keep precise running estimates of the
performance criteria. You can iterate until the stdvs
are sufficiently small.

I have written jillions of posts re this topic.
Search Google Groups using "greg-heath" and other
keywords. For example,

"greg-heath" Neq Nw
"greg-heath" partition
etc

Hope this helps.

Greg

amanita

unread,

Jan 30, 2010, 6:38:02 AM1/30/10

to

Greg Heath <he...@alumni.brown.edu> wrote in message <9d4d7641-c78a-439d...@f12g2000yqn.googlegroups.com>...

> Do you also have an independent test set ?
> Exactly how much data do you have?
> What is the dimensionality?

I've got a training set of 419 vectors (30 parameters each) and an independent test set of 150 vectors

> See below to make sure Ntrn and Nval are sufficiently large.

One of my assignments is to find the best percentage of training and validation set.So i try for different variations (70-30,60-40,50-50 etc).The basic concept of the code is that i load different topologies from a .dat.When i say topologies i mean different layers,different number of hidden neurons,different training and learning functions.
I start with one layer for example: 30 - 8 - 1 / logsig / lf: leargdm / tf: traingda
after 10 iterations of this topology i find the meanvalidationperformance and meanvalidation errors.I move on for different lf,tf,hidden neurons,layers etc ... and i do the same.At the end i compare the meanvalidation performance and errors of all these topologies and try to find the best one in order to use it on my test set.

>The trn/val %split and number of iterations may need to
>depend on the size of the data set and the number of weights
>to be estimated.

> So, once you are sure that Ntrn is large enough to
> generate accurate weight estimates, and Nval is large
> enough to keep precise running estimates of the
> performance criteria. You can iterate until the stdvs
> are sufficiently small.

>It depends on your data. Although shuffling inside the loop

>is less biased, you may need more iterations to obtain
>sufficiently precise estimates (i.e., sufficiently small stdvs).
>However, if you keep a running tab on the standard deviations as
>suggested above you should be able to use the same split and
>rely on the randomness of the newff weight initializations.

Yes i think now i understand what you say.Since though i dont have the time to run as many iterations needed till the stdvs are sufficiently small and since i need to check all these for about 900 topologies,10 iterations with the same cut for all 900 and the results just from meanvalidation errors wont give me a general picture of things? and then exclude the worst and move on in the way u suggested with my best picks?

amanita

unread,

Jan 30, 2010, 6:40:21 AM1/30/10

to

Greg Heath <he...@alumni.brown.edu> wrote in message <9d4d7641-c78a-439d...@f12g2000yqn.googlegroups.com>...

> Do you also have an independent test set ?

> Exactly how much data do you have?
> What is the dimensionality?

I've got a training set of 419 vectors (30 parameters each) and an independent test set of 150 vectors

> See below to make sure Ntrn and Nval are sufficiently large.

One of my assignments is to find the best percentage of training and validation set.So i try for different variations (70-30,60-40,50-50 etc).The basic concept of the code is that i load different topologies from a .dat.When i say topologies i mean different layers,different number of hidden neurons,different training and learning functions.

I start with one layer for example: 30 - 8 - 1 / logsig / lf: leargdm / tf: traingda
after 10 iterations of this topology i find the meanvalidationperformance and meanvalidation errors.I move on for different lf,tf,hidden neurons,layers etc ... and i do the same.At the end i compare the meanvalidation performance and errors of all these topologies and try to find the best one in order to use it on my test set.

>The trn/val %split and number of iterations may need to
>depend on the size of the data set and the number of weights
>to be estimated.

> So, once you are sure that Ntrn is large enough to
> generate accurate weight estimates, and Nval is large
> enough to keep precise running estimates of the
> performance criteria. You can iterate until the stdvs
> are sufficiently small.

>It depends on your data. Although shuffling inside the loop

>is less biased, you may need more iterations to obtain
>sufficiently precise estimates (i.e., sufficiently small stdvs).
>However, if you keep a running tab on the standard deviations as
>suggested above you should be able to use the same split and
>rely on the randomness of the newff weight initializations.

Yes i think now i understand what you say.Since though i dont have the time to run as many iterations needed till the stdvs are sufficiently small and since i need to check all these for about 900 topologies,10 iterations with the same cut for all 900 and the results just from meanvalidation errors wont give me a general picture of things? and then exclude the worst and move on in the way u suggested with my best picks?

Greg Heath

unread,

Feb 3, 2010, 4:26:17 AM2/3/10

to

On Jan 30, 6:40 am, "amanita " <k_amanit...@hotmail.com> wrote:
> Greg Heath <he...@alumni.brown.edu> wrote in message <9d4d7641-c78a-439d-b452-b270af408...@f12g2000yqn.googlegroups.com>...

> > Do you also have an independent test set ?
> > Exactly how much data do you have?
> > What is the dimensionality?
>
> I've got a training set of 419 vectors (30 parameters each) and
> an independent test set of 150 vectors

You have a DESIGN set of 419 vectors (30 VARIABLES each)...

total = design + test
design = training + validation

> > See below to make sure Ntrn and Nval are sufficiently large.
>
> One of my assignments is to find the best percentage of training
and validation set.So i try for different variations (70-30,60-40,
50-50 etc).The basic concept of the code is that i load different
topologies from a .dat.When i say topologies i mean different
> layers,different number of hidden neurons,

OK

>different training and learning functions.

No. These are not considered part of the topology.

> I start with one layer for example: 30 - 8 - 1 / logsig /
lf: leargdm / tf: traingda
> after 10 iterations of this topology i find the meanvalidation
performance and meanvalidation errors.I move on for different lf,
tf,hidden neurons,layers etc ... and i do the same.At the end i
compare the meanvalidation performance and errors of all these
topologies and try to find the best one in order to use it on my
>test set.

You have already explained what you are doing. Let me be clear
about my previous response.

1. The ratio Ntrn/Nval is NOT of primary importance.
2. The ratio Neq/Nw is of primary importance
3. The ratios std(MSE)/mean(MSE) and std(PCTerr)/mean(PCTerr)
are of primary importance
4. The arbitrary choice of Ntrials = 10 is, in general suboptimal.
5. Choose Nval and Ntrials to achieve specified bounds for
the ratios in 3.

Given the above for a fixed topology, the different choices
for learning and training functions will yield different results.
My recommendation is to first use the three best recommended in
the MATLAB documentation (e.g., TRAINLM, TRAINBFG, TRAINRP).
Once those experiments are completely finished (see below) then
consider some or all of the others (which tend to have a higher
probability of taking orders of magnitude longer to converge, not
converging to a sufficiently low local minimum, or not converging at
all).

> >The trn/val %split and number of iterations may need to
> >depend on the size of the data set and the number of weights
> >to be estimated.
> > So, once you are sure that Ntrn is large enough to
> > generate accurate weight estimates, and Nval is large
> > enough to keep precise running estimates of the
> > performance criteria. You can iterate until the stdvs
> > are sufficiently small.
> >It depends on your data. Although shuffling inside the loop
> >is less biased, you may need more iterations to obtain
> >sufficiently precise estimates (i.e., sufficiently small stdvs).
> >However, if you keep a running tab on the standard deviations as
> >suggested above you should be able to use the same split and
> >rely on the randomness of the newff weight initializations.
>

> Yes i think now i understand what you say. Since though i dont

have the time to run as many iterations needed till the stdvs
are sufficiently small and since i need to check all these for
>about 900 topologies,

Please explain how you got that number.

>10 iterations with the same cut for all 900
and the results just from meanvalidation errors wont give me a
general picture of things? and then exclude the worst and move
on in the way u suggested with my best picks?

No.

Unless you are using an APPLE I or COMMODORE, you should
be able to make a sufficient number of runs in a reasonable
amount of time. Moreover, once you made a number of runs
you might determine that fixing Ntrials at 18 is sufficient.

With newff, one hidden layer is sufficient. A reasonable minimum
bound on Neq/Nw (e.g., 5, 10 or 20) will limit the maximum
allowable number of hidden nodes, H. You can then find, by trial
and error, a reasonable range for the minimum number of hidden
nodes that will yield the optimum performance. I say range because
this is a statistical study that will depend on the distribution of
random initial weights. Therefore, don't always expect to obtain a
single unambiguous answer.

The next best type of topologies worth considering are those
with single hidden layers but a reduced number of inputs.
Obviously there are 2^30-1 possibilities. I would use
STEPWISEFIT to see what inputs are optimal for linear and
quadratic classifiers (for the latter use the linear classifer with
squares and crossproducts as additional inputs). I would also
consider the 30 scenarios where the values of one input on
the best candidate are randomly shuffled. The ranking of
performances give additional information as to which inputs
can be deleted without degrading performance.

Next, try two layers of hidden nodes. You should be able to
achieve the "optimum" performance achieved above using fewer
weights. However, I can't recommend a good adhoc rule of thumb
for determining the H1/H2 ratio. My two hidden layer classifier
designs have been based on a priori information based on
pretraining clustering and PCA. In those cases H1 > H2 worked
well. However, you should also try H1 < H2.

Trying more than 2 hidden layers is a waste of time.

Unfortunately, newff doesn't support skip layer topology. I think
you would have to use a custom design for those. Feedback
topopolgy is also not supported by newff. However, I wouldn't
expect feedback to make a significant difference.

Once the above is achieved for the 3 best algorithms you can
try some of the lesser algorithms (I think there are 13 in all).

Hope this helps.

Greg