Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Activation Function Output Mappings: thoughts on....

2 views

Skip to first unread message

tomhoo

unread,

Jun 29, 2009, 3:36:36 PM6/29/09

Activation Function Mappings, some thoughts:

The 2 traditional mappings are
fullspace: (-1,+1): tanh, symmetric logistic, linear, and
halfspace: (0, +1): logistic, Gaussian, Gaussian complement

But what is the implication of this?

Some hint is given where NeuralShell 2 says if the output is a
Category map to the halfspace, if it is continuous, map to the full
space.

EXAMPLE:

Consider 2 hidden nodes feeding 1 output node where the answer is 0.
With halfspace, you get only one solution: 0 and 0 for the nodes'
output.
For the fullspace, you get an infinity of solutions: x and -x for
their output.
One might say that a hidden node in the fullspace can take on the
characteristic of a nullifier for another characteristic, rather than
a contributor.
Or yet still, rather than have a pair of zero outputs indicating no
contribution from either node, you get the options, how, for + and -
some meaningless artifact.

EXAMPLE 2:

Consider same model with answer is a > 0.
With halfspace, each node's output, x, must be bounded by [0, a].
For the fullspace, there is no such bounding - you could get two huge
numbers whose difference is y which is negligible and becomes lost as
if noise relative to the two large numbers.
Without the constraint of positive outputs, its seems that a lot of
chaos or confusion is introduced and whatever meaning might have been
present in the 2 hidden nodes is now lost.

Intuitively, the constraints imposed by the halfspace seem healthy
compared to the fullspace. The fullspace seems dangerous and in need
of additional constraints.

Greg

unread,

Jun 30, 2009, 9:16:38 AM6/30/09

On Jun 29, 3:36 pm, tomhoo <tom...@gmail.com> wrote:
> Activation Function Mappings, some thoughts:
>
> The 2 traditional mappings are
> fullspace: (-1,+1): tanh, symmetric logistic, linear, and

What is a symmetric logistic? A scaled tanh?
The range of linear is (-inf,+inf)

> halfspace: (0, +1): logistic, Gaussian, Gaussian complement

What is a Gaussian complement? erfc?

> But what is the implication of this?
>
> Some hint is given where NeuralShell 2 says if the output is a
> Category map to the halfspace, if it is continuous, map to the full
> space.

???

You are making no distinction between hidden nodes and output nodes.

> EXAMPLE:
>
> Consider 2 hidden nodes feeding 1 output node where the answer is 0.
> With halfspace, you get only one solution: 0 and 0 for the nodes'
> output.

No.

You are forgetting the bipolar hidden-to-output weights

> For the fullspace, you get an infinity of solutions: x and -x for
> their output.
> One might say that a hidden node in the fullspace can take on the
> characteristic of a nullifier for another characteristic, rather than
> a contributor.
> Or yet still, rather than have a pair of zero outputs indicating no
> contribution from either node, you get the options, how, for + and -
> some meaningless artifact.
>
> EXAMPLE 2:
>
> Consider same model with answer is a > 0.
> With halfspace, each node's output, x, must be bounded by [0, a].
> For the fullspace, there is no such bounding - you could get two huge
> numbers whose difference is y which is negligible and becomes lost as
> if noise relative to the two large numbers.
> Without the constraint of positive outputs, its seems that a lot of
> chaos or confusion is introduced and whatever meaning might have been
> present in the 2 hidden nodes is now lost.
>
> Intuitively, the constraints imposed by the halfspace seem healthy
> compared to the fullspace. The fullspace seems dangerous and in need
> of additional constraints.

I disagree.

The preferred choice of output activations (tanh,
logistic or linear) is chosen w.r.t. the range of
the output space.

The preferred choice of hidden node activations:
a. Gaussian for local feature extraction
b. Tanh for nonlocal feature extraction and
zero-centered inputs.

Using logistic hidden nodes and/or nonzero-centered
inputs and/or nonscaled inputs tend to result in less
stable learning. It is also wise to consider scaled
outputs. See the FAQ.

Hope this helps.

Greg

tomhoo

unread,

Jun 30, 2009, 12:57:48 PM6/30/09

On Jun 30, 9:16 am, Greg <he...@alumni.brown.edu> wrote:
> On Jun 29, 3:36 pm, tomhoo <tom...@gmail.com> wrote:
>
> > Activation Function Mappings, some thoughts:
>
> > The 2 traditional mappings are
> > fullspace: (-1,+1): tanh, symmetric logistic, linear, and
>
> What is a symmetric logistic? A scaled tanh?

Logistic = L = 1 / ( 1+exp(-x) )
Sym. Log. = 2L - 1

> The range of linear is (-inf,+inf)
>
> > halfspace: (0, +1): logistic, Gaussian, Gaussian complement
>
> What is a Gaussian complement? erfc?

Gaussian = G
Gaus. Comp = G - 1

NOTE: This is NeuroShell 2 lingo which I guess is not standard.

>
> > But what is the implication of this?
>
> > Some hint is given where NeuralShell 2 says if the output is a
> > Category map to the halfspace, if it is continuous, map to the full
> > space.
>
> ???

??? Me Too!

>
> You are making no distinction between hidden nodes and output nodes.

Not directly, I was simply thinking about the output of any layer with
an activation function.

>
> > EXAMPLE:
>
> > Consider 2 hidden nodes feeding 1 output node where the answer is 0.
> > With halfspace, you get only one solution: 0 and 0 for the nodes'
> > output.
>
> No.
>
> You are forgetting the bipolar hidden-to-output weights
>

You sunk my battleship. Darn. I totally forgot to consider them.
So this pretty much tosses my "theory."

>
>
> > For the fullspace, you get an infinity of solutions: x and -x for
> > their output.
> > One might say that a hidden node in the fullspace can take on the
> > characteristic of a nullifier for another characteristic, rather than
> > a contributor.
> > Or yet still, rather than have a pair of zero outputs indicating no
> > contribution from either node, you get the options, how, for + and -
> > some meaningless artifact.
>
> > EXAMPLE 2:
>
> > Consider same model with answer is a > 0.
> > With halfspace, each node's output, x, must be bounded by [0, a].
> > For the fullspace, there is no such bounding - you could get two huge
> > numbers whose difference is y which is negligible and becomes lost as
> > if noise relative to the two large numbers.
> > Without the constraint of positive outputs, its seems that a lot of
> > chaos or confusion is introduced and whatever meaning might have been
> > present in the 2 hidden nodes is now lost.
>
> > Intuitively, the constraints imposed by the halfspace seem healthy
> > compared to the fullspace. The fullspace seems dangerous and in need
> > of additional constraints.
>
> I disagree.
>
> The preferred choice of output activations (tanh,
> logistic or linear) is chosen w.r.t. the range of
> the output space.

Since tanh and linear both map to (-1,+1)
how do you pick based on output range?
The only thought I have is linear for single output
and tanh for multiple ouputs.

>
> The preferred choice of hidden node activations:
> a. Gaussian for local feature extraction
> b. Tanh for nonlocal feature extraction and
> zero-centered inputs.
>

I'd like to know more about this local feature extraction. FAQ?

> Using logistic hidden nodes and/or nonzero-centered
> inputs and/or nonscaled inputs tend to result in less
> stable learning. It is also wise to consider scaled
> outputs. See the FAQ.

>
> Hope this helps.

Yes, thanks much

>
> Greg

PS. After thought of this is thinking how brain cells work and how
NNets work.

I would say that in the brain, everything is (0,1)... for both
weights and node outputs.

If you built the network electronically, the weights would be
resistors and the sigmoids would be an adder with an output threshold.

For a +ive DC input, all voltages, everywhere would be non-negative.

So has anyone ever tried to build a Nnet where weights and node
outputs are always positive???

Greg

unread,

Jul 2, 2009, 5:22:57 PM7/2/09

On Jun 30, 12:57 pm, tomhoo <tom...@gmail.com> wrote:
> On Jun 30, 9:16 am, Greg <he...@alumni.brown.edu> wrote:
> > On Jun 29, 3:36 pm, tomhoo <tom...@gmail.com> wrote:
>
> > > Activation Function Mappings, some thoughts:
>
> > > The 2 traditional mappings are
> > > fullspace: (-1,+1): tanh, symmetric logistic, linear, and
>
> > What is a symmetric logistic? A scaled tanh?
>
> Logistic = L = 1 / ( 1+exp(-x) )
> Sym. Log. = 2L - 1

= tanh(x/2) % scaled tanh

> > The range of linear is (-inf,+inf)
>
> > > halfspace: (0, +1): logistic, Gaussian, Gaussian complement
>
> > What is a Gaussian complement? erfc?
>
> Gaussian = G
> Gaus. Comp = G - 1

What good is it?

The range of linear is (-inf,inf)

> The only thought I have is linear for single output
> and tanh for multiple ouputs.

If bounded and unipolar, scale to (0,1) and use
logistic.
If bounded and bipolar, scale to (-1,+1) and use
tanh
Otherwise use linear

> > The preferred choice of hidden node activations:
> > a. Gaussian for local feature extraction
> > b. Tanh for nonlocal feature extraction and
> > zero-centered inputs.
>
> I'd like to know more about this local feature extraction. FAQ?

No, GEH.

Typically, inputs are measurements; some of which can be
considered features (I can't give an unambiguous definition
of a feature; however, I know one when I see one). I consider
hidden units as feature extractors and hidden node outputs
as features.

The sigmoid has an infinite domain. Therefore I consider it a
global feature extractor.

Although the Gaussian also has an infinite domain, it's effective
size is finite: several standard deviations about the center.
Therefore I consider it a local feature extractor. It is very
useful for clutter based classification.

-----SNIP

>
> PS. After thought of this is thinking how brain cells work and how
> NNets work.
>
> I would say that in the brain, everything is (0,1)... for both
> weights and node outputs.
>
> If you built the network electronically, the weights would be
> resistors and the sigmoids would be an adder with an output threshold.
>
> For a +ive DC input, all voltages, everywhere would be non-negative.
>
> So has anyone ever tried to build a Nnet where weights and node
> outputs are always positive?

Sure. Biological nets. However, the models use differential
equations and are pulse based. Therefore decreases in
signal strength are acheived via exponential decay and
multiplication by pulses instead of subtraction.

Hope this helps.

Greg

unread,

Jul 2, 2009, 5:28:16 PM7/2/09

On Jul 2, 5:22 pm, Greg <he...@alumni.brown.edu> wrote:
> On Jun 30, 12:57 pm, tomhoo <tom...@gmail.com> wrote:
> > On Jun 30, 9:16 am, Greg <he...@alumni.brown.edu> wrote:
> > > On Jun 29, 3:36 pm, tomhoo <tom...@gmail.com> wrote:
> Typically, inputs are measurements; some of which can be
> considered features (I can't give an unambiguous definition
> of a feature; however, I know one when I see one). I consider
> hidden units as feature extractors and hidden node outputs
> as features.
>
> The sigmoid has an infinite domain. Therefore I consider it a
> global feature extractor.
>
> Although the Gaussian also has an infinite domain, it's effective
> size is finite: several standard deviations about the center.
> Therefore I consider it a local feature extractor. It is very
> useful for clutter based classification.

useful for cluster based classification.

Greg

0 new messages