Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Kangaroos

4 views

Skip to first unread message

Bobby Chawla

unread,

Apr 25, 1994, 2:17:05 PM4/25/94

Does anyone have a copy of the 'kangaroo' discription of
backpropogation. It was floating around this group around
4-6 months ago.

I thought it was a good analogy. If you have it, please
e-mail me a copy.

Thanks, Bobby

+======================================================================+
| Bobby Chawla /\ /\ bob...@surya.uwaterloo.ca |
| 316 Minota Hagey Res / \/\ / \ /\ tel: +1 519 725 6063 |
| Univ. of Waterloo, Ontario \_/ \ \ /\ /\ / \ |
| Canada / N2L 3G1 \/ _ \ / \_/ \ / \ |
| / ____ __ __o __ ,__o \/ \ |
| / ___ -\<, __ _-\_<, \ |
|____________________________O/_O________(*)/'(*)______________________|
========================================================================

Bobby Chawla

unread,

Apr 26, 1994, 5:16:10 PM4/26/94

In comp.ai.neural-nets I wrote:

>Does anyone have a copy of the 'kangaroo' discription of
>backpropogation. It was floating around this group around
>4-6 months ago.

>I thought it was a good analogy. If you have it, please
>e-mail me a copy.

>Thanks, Bobby

Hi everyone,

I think this 'kangaroo' thread is great! So I thought I
would post it for everyone to enjoy...

Thanks to everyone who replied to my post, and thanks to

- sas...@hotellng.unx.sas.com (Warren Sarle)
- t...@cs.toronto.edu (Tony Plate)
- s...@sef-pmax.slisp.cs.cmu.edu (Scott E. Fahlman)
- rrd...@minyos.xx.rmit.OZ.AU (Jonathan O'Donnell)
- prec...@i41s18.ira.uka.de (Lutz Prechelt)

for their contributions :-)

------------------------------------------------------------------------------

From: sas...@hotellng.unx.sas.com (Warren Sarle)
Subject: Kangaroos (Was Re: BackProp without Calculus?)
Originator: sas...@hotellng.unx.sas.com

Training a network is a form of numerical optimization, which can
be likened to a kangaroo searching for the top of Mt. Everest.
Everest is the global optimum, but the top of any other really high
mountain such as K2 would be nearly as good. We're talking about
maximization now, while neural networks are usually discussed in
terms of minimization, but if you multiply everything by -1 it
works out the same.

Initial weights are usually chosen randomly, which means that the
kangaroo may start out anywhere in Asia. If you know something about
the scales of the inputs, you may be able to get the kangaroo to start
near the Himalayas. However, if you make a really stupid choice of
distributions for the random initial weights, or if you have really
bad luck, the kangaroo may start in South America.

With Newton-type (2nd order) algorithms, the Himalayas are covered
with a dense fog, and the kangaroo can only see a little way around
his location. Judging from the local terrain, the kangaroo make a guess
about where the top of the mountain is, and tries to jump all the way
there. In a stabilized Newton algorithm, the kangaroo has an altimeter,
and if the jump takes him to a lower point, he backs up to where he
was and takes a shorter jump. If the algorithm isn't stabilized, the
kangaroo may mistakenly jump to Shanghai and get served for dinner in
a Chinese restaurant. (I never claimed this analogy was realistic.)

In steepest ascent with line search, the fog is _very_ dense, and the
kangaroo can only tell which direction leads up. The kangaroo hops
in this direction until the terrain starts going down again, then
chooses another direction.

In standard backprop or stochastic approximation, the kangaroo is
blind and has to feel around on the ground to make a guess about which
way is up. He may be fooled by rough terrain unless you use batch
training. If the kangaroo ever gets near the peak, he may jump back
and forth across the peak without ever landing on the peak. If you use
a decaying step size, the kangaroo gets tired and makes smaller and
smaller hops, so if he ever gets near the peak he has a better chance
of actually landing on it before the Himalayas erode away. In backprop
with momentum, the kangaroo has poor traction and can't make sharp
turns.

I have been unable to devise a kangaroo analogy for cascade correlation.
Any ideas, Scott?

Notice that in all the methods discussed so far, the kangaroo can hope
at best to find the top of a mountain close to where he starts. There's
no guarantee that this mountain will be Everest, or even a very high
mountain. Various methods are used to try to find the actual global
optimum.

In simulated annealing, the kangaroo is drunk and hops around randomly
for a long time. However, he gradually sobers up and tends to hop
up hill.

In genetic algorithms, there are lots of kangaroos that are parachuted
into the Himalayas (if the pilot didn't get lost) at random places.
These kangaroos do not know that they are supposed to be looking for
the top of Mt. Everest. However, every few years, you shoot the
kangaroos at low altitudes and hope the ones that are left will be
fruitful and multiply.

Warren S. Sarle SAS Institute Inc. The opinions expressed here
sas...@unx.sas.com SAS Campus Drive are mine and not necessarily
(919) 677-8000 Cary, NC 27513 those of SAS Institute.

Article: 11866 of comp.ai.neural-nets
Path: mozart.unx.sas.com!sas!concert!gatech!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!utcsri!relay.cs.toronto.edu!neuron.ai.toronto.edu!ai.toronto.edu!tap
Newsgroups: comp.ai.neural-nets
From: t...@cs.toronto.edu (Tony Plate)
Subject: Re: Kangaroos (Was Re: BackProp without Calculus?)

sas...@hotellng.unx.sas.com (Warren Sarle) writes:

>In steepest ascent with line search, the fog is _very_ dense, and the
>kangaroo can only tell which direction leads up. The kangaroo hops
>in this direction until the terrain starts going down again, then
>chooses another direction.

Nice stories!

I offer one for conjugate gradient search:

The environent for conjugate gradient search is just like that
for steepest ascent with line search -- the fog is dense and the
kangaroo can only tell which direction leads up. The difference
is that the kangaroo has some memory of the directions it has
hopped in before, and the kangaroo assumes that the ridges are
straight (i.e., the surface is quadratic). The kangaroo chooses
a direction to hop in that is upwards, but that does not result
in it going downwards in the previous directions it has hopped in.
That it, is chooses an upwards direction which moving along will
not undo the work of previous steps. It hops upwards until
the terrain starts going down again, then chooses another
direction.

______________________________________________________________________

Newsgroups: comp.ai.neural-nets
Path: mozart.unx.sas.com!sas!concert!gatech!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!magnesium.club.cc.cmu.edu!honeydew.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!news
From: s...@sef-pmax.slisp.cs.cmu.edu
Subject: Re: Kangaroos (Was Re: BackProp without Calculus?)

From: sas...@hotellng.unx.sas.com (Warren Sarle)

Training a network is a form of numerical optimization, which can
be likened to a kangaroo searching for the top of Mt. Everest.

Great post! Of course, if you want the kangaroos to perform well, you
first have to teach them calculus.

I have been unable to devise a kangaroo analogy for cascade correlation.
Any ideas, Scott?

Never one to turn down a silly challenge when I should be doing research...

The real story of all these algorithms is that we've got a big continent
with an unknown number of mountains (components of the error). Your task
is to get one kangaroo on top of each mountain, all at the same time. If
you manage that without releasing too many extra kangaroos, you win.
Excess kangaroos tend to find their way to army bases and attack the
generals, leaving the army with poor generalization. (Sorry about that!)

Unlike the Himalayas, these are RUBBER mountains, so when a kangaroo is on
top of one, it gets squashed down flat. So a good solution is one in which
all the mountains are squashed flat at once. The problem is that these are
"hidden unit" kangaroos, and are therefore invisible to one another (and to
the generals). This makes it impossible for them to coordinate their
activities, which is a pity since we want them on DISTINCT mountains. They
can, however see the terrain from some distance away and see which
mountains are flattened at any given time. These kangaroos want to head
uphill, but they have poor memories, so they tend to respond to whatever
terrain they see at any given instant.

Now, in backprop, you guess how many kangaroos you're going to need and
release them all at once at random places. Each kangaroo looks around,
spots whatever distant mountain looks the biggest, and heads for it. If
that mountain suddenly goes flat, it looks around and finds some other
mountain -- a sort of marsupial, elastic, Alpine version of musical chairs.
A problem is that once a kangaroo is standing on top of a mountain, even if
it is Everest, that mountain goes flat and that kangaroo may go hopping off
to occupy some other mountain instead.

As you might imagine, it takes a long time in such a chaotic situation to
flatten all the mountains at once, even if you guessed right about the
number of kangaroos, which is not an easy task without a detailed map.

In Cascor, you release one kangaroo at a time. He looks around, spots the
highest mountain within view, and heads straight for it. When he reaches
the top, he stops. Then we nail him to the ground so he won't wander away
and release the next kangaroo, who goes off to find some other mountain.
Even though we have given up a certain amount of parallel search, this
orderly process is still faster than the total chaos of multi-kangaroo
backprop. When all the mountains are flat, you stop releasing kangaroos
while the army still has plenty of generals.

Actually, Cascor is a bit more complex than that, because the kangaroos can
stand on one another and thus reach higher and more complicated places than
they could reach otherwise. (The kangaroos hate it when you nail down the
ones standing on their heads.)

There's something about granting tenure to the most successful kangaroos
and killing off others, but I think I'd better stop now.

-- Scott

===========================================================================
Scott E. Fahlman Internet: se...@cs.cmu.edu
Senior Research Scientist Phone: 412 268-2575
School of Computer Science Fax: 412 681-5739
Carnegie Mellon University Latitude: 40:26:33 N
5000 Forbes Avenue Longitude: 79:56:48 W
Pittsburgh, PA 15213
===========================================================================
______________________________________________________________________

Article: 12061 of comp.ai.neural-nets
Path: mozart.unx.sas.com!sas!concert!news-feed-2.peachnet.edu!darwin.sura.net!europa.eng.gtefsd.com!library.ucla.edu!network.ucsd.edu!munnari.oz.au!bruce.cs.monash.edu.au!monu6!aggedor.rmit.OZ.AU!minyos.xx.rmit.OZ.AU!rrdjro
From: rrd...@minyos.xx.rmit.OZ.AU (Jonathan O'Donnell)

[... Wonderful descriptions deleted ...]

Presumably, this is why Australia is so flat.

Sorry, I couldn't resist.
Jona...@RMIT.edu.au
______________________________________________________________________

Path: mozart.unx.sas.com!sas!concert!gatech!howland.reston.ans.net!sol.ctr.columbia.edu!xlink.net!ira.uka.de!i41s18!prechelt
From: prec...@i41s18.ira.uka.de (Lutz Prechelt)
Newsgroups: comp.ai.neural-nets

In article <CCp2...@unx.sas.com>, sas...@hotellng.unx.sas.com (Warren Sarle) writes:
|>
|> Training a network is a form of numerical optimization, which can
|> be likened to a kangaroo searching for the top of Mt. Everest.
...
|> Initial weights are usually chosen randomly, which means that the
|> kangaroo may start out anywhere in Asia.

To shed some more light on what this wonderful article means in respect
to the original question (how to understand the backpropagation
algorithm without calculus) here are a few additional remarks:

1. The analogy only explains WHAT the algorithm does, but not HOW it does
the most intricate part: chosing the direction of the next jump.

Basically what the kangaroo does is the following: Wherever it stands,
it carves two ditches whose walls meet in a way so as to form a
V-shape; one ditch in north-south direction and one in east-west
direction. It plates the walls of these ditches first with steel then
with teflon so as to minimize friction (most, but not all variants of
backprop in fact minimize friction to zero) and so that all small
valleys or hills the ditch may have had are averaged out.

Then the kangaroo takes a bowling ball out of its pouch, puts it
into the north-south ditch and measures how far it rolls in a certain
time and in which direction it rolls away along the ditch. This
procedure is repeated for the east-west ditch. Let's assume the ball
rolled 8 centimeters in north direction in the first ditch and 14
centimeters in the east direction in the second ditch. Then a kangaroo
that uses learning rate 50 will jump to a point that is 4 meters north
and 7 meters east of where it was before.

It is not important for the algorithm whether the kangaroo uses the
same bowling ball over and over again, or throws it away after each
measurement and picks a new one from its pouch next time. This is
because in the backpropagation world, bowling balls bio-degrade in
zero time.

2. As all nice simplifications, this one, too, has a slight drawback.
In this case, the limitation is that the analogy only explains the
case of a network with two weights (which is less than *any* useful
backpropagation network must have). These two weights are represented
by the two orthogonal search directions of the kangaroo (North-South
and East-West).

In order to generalize the example to, say, a fully connected
network with three layers containing 10 input nodes, 5 hidden
nodes, and 8 output nodes (having 10x5 + 5x8 = 50 + 40 = 90 weights)
you have to imagine the same situation in a world existing in
a 91-dimensional space instead of our 3-dimensional one.

I assure you that to visualize this generalization is just as easy for
a non-math person as it is for any calculus professor.

3. Oh, yes, one more very important question:
Why does the Himalaya look just like it does ?
The answer is: it doesn't.

The mountains in which the kangaroo jumps around are `induced' by your
training data. Each example suggests certain hills or mountains at
certain points on the surface of the (otherwise absolutely flat)
earth. If the kangaroo performs a `batch' search, the world looks
like the arithmetic average of what the training examples suggest. If
the kangaroo performs an `online' search, the situation is more
complicated: There is one world for each training example; each of
these worlds looks exactly like the one training example it was made
from suggests. The kangaroo takes one jump in the first world
according to the above method and is then magically transfered to the
equivalent point in the next world, that is, to the point with the
same longitude and latitude, er, the same x and y coordinates (since
the worlds have to be rectangular for backprop, instead of spheric).
In each of the worlds the same procedure is applied and then the
kangaroo continues in the first world again.

Interestingly enough, the magical inter-world transfer is so inspiring
to the kangaroo that it can make one jump in all of the `online'
worlds (no matter how many there are) in about the same time it needs
for only two jumps in the `batch' world. This is the reason why
`online' kangaroos often find the point that provides the best
compromise between the altitude in all worlds much faster than `batch'
kangaroos find the top of the highest mountain in their single world.
Sometimes, however, the inter-world transfers are so confusing to the
`online' kangaroo that it never (or only very slowly) finds the
optimal point.

There are lots of heuristics to further improve the speed and/or
precision of the kangaroo's search. Most of them, though, require
a pocket calculator or lots of note paper or both.

>From all this we can conclude that the best methods to find the Mount
Everest are (in order):

1. to know where it is
2. to have a map on which you can find it
3. to know someone how knows where it is or who has a map
4. to send a kangaroo to search for it

and even if you have to send a kangaroo, it is useful if you know
at least

1. where the mountain range is in which the Mount Everest may
be and
2. how to bring your kangaroo to that mountain range.

Lutz

P.S.: Newest research results in the neural network area indicate
that backprop also works with frogs if you replace the bowling
ball with something appropriate (for instance a solar-powered
electro-mechanic 3-bit steep-O-meter).

--
Lutz Prechelt (email: prec...@ira.uka.de) | Whenever you
Institut fuer Programmstrukturen und Datenorganisation | complicate things,
Universitaet Karlsruhe; 76128 Karlsruhe; Germany | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092) | less simple.

______________________________________________________________________

Closing comment: Kangaroos have an advantage over frogs in that they
have a pouch in which to carry their altimeters, bowling balls, etc.
However, the latest research is using fleas, which may be superior
to kangaroos for some massively parallel genetic algorithms.

Warren S. Sarle SAS Institute Inc. The opinions expressed here
sas...@unx.sas.com SAS Campus Drive are mine and not necessarily
(919) 677-8000 Cary, NC 27513 those of SAS Institute.

0 new messages