Neural Network Software

0 views

Skip to first unread message

Guilleuma Deeken

unread,

Aug 5, 2024, 2:35:16 PM8/5/24

to dayciatreatex

AnANN consists of connected units or nodes called artificial neurons, which loosely model the neurons in a brain. These are connected by edges, which model the synapses in a brain. Each artificial neuron receives signals from connected neurons, then processes them and sends a signal to other connected neurons. The "signal" is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs, called the activation function. The strength of the signal at each connection is determined by a weight, which adjusts during the learning process.

Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly passing through multiple intermediate layers (hidden layers). A network is typically called a deep neural network if it has at least 2 hidden layers.[3]

Artificial neural networks are used for various tasks, including predictive modeling, adaptive control, and solving problems in artificial intelligence. They can learn from experience, and can derive conclusions from a complex and seemingly unrelated set of information.

Historically, digital computers evolved from the von Neumann model, and operate via the execution of explicit instructions via access to memory by a number of processors. Neural networks, on the other hand, originated from efforts to model information processing in biological systems through the framework of connectionism. Unlike the von Neumann model, connectionist computing does not separate memory and processing.

The simplest kind of feedforward neural network (FNN) is a linear network, which consists of a single layer of output nodes; the inputs are fed directly to the outputs via a series of weights. The sum of the products of the weights and the inputs is calculated at each node. The mean squared errors between these calculated outputs and the given target values are minimized by creating an adjustment to the weights. This technique has been known for over two centuries as the method of least squares or linear regression. It was used as a means of finding a good rough linear fit to a set of points by Legendre (1805) and Gauss (1795) for the prediction of planetary movement.[7][8][9][10][11]

In the late 1940s, D. O. Hebb[14] created a learning hypothesis based on the mechanism of neural plasticity that became known as Hebbian learning. Hebbian learning is considered to be a 'typical' unsupervised learning rule and its later variants were early models for long term potentiation. These ideas started being applied to computational models in 1948 with Turing's "unorganized machines". Farley and Wesley A. Clark[15] were the first to simulate a Hebbian network in 1954 at MIT. They used computational machines, then called "calculators". Other neural network computational machines were created by Rochester, Holland, Habit, and Duda[16] in 1956. In 1958, psychologist Frank Rosenblatt invented the perceptron, the first implemented artificial neural network,[17][18][19][20] funded by the United States Office of Naval Research.[21]

The invention of the perceptron raised public excitement for research in Artificial Neural Networks, causing the US government to drastically increase funding into deep learning research. This led to "the golden age of AI" fueled by the optimistic claims made by computer scientists regarding the ability of perceptrons to emulate human intelligence.[22] For example, in 1957 Herbert Simon famously said:[22].mw-parser-output .templatequoteoverflow:hidden;margin:1em 0;padding:0 32px.mw-parser-output .templatequote .templatequoteciteline-height:1.5em;text-align:left;padding-left:1.6em;margin-top:0

However, this wasn't the case as research stagnated in the United States following the work of Minsky and Papert (1969),[23] who discovered that basic perceptrons were incapable of processing the exclusive-or circuit and that computers lacked sufficient power to train useful neural networks. This, along with other factors such as the 1973 Lighthill report by James Lighthill stating that research in Artificial Intelligence has not "produced the major impact that was then promised," shutting funding in research into the field of AI in all but two universities in the UK and in many major institutions across the world.[24] This ushered an era called the AI Winter with reduced research into connectionism due to a decrease in government funding and an increased stress on symbolic artificial intelligence in the United States and other Western countries.[25][24]

During the AI Winter era, however, research outside the United States continued, especially in Eastern Europe. By the time Minsky and Papert's book on Perceptrons came out, methods for training multilayer perceptrons (MLPs) were already known. The first deep learning MLP was published by Alexey Grigorevich Ivakhnenko and Valentin Lapa in 1965, as the Group Method of Data Handling.[26][27][28] The first deep learning MLP trained by stochastic gradient descent[29] was published in 1967 by Shun'ichi Amari.[30][31] In computer experiments conducted by Amari's student Saito, a five layer MLP with two modifiable layers learned useful internal representations to classify non-linearily separable pattern classes.[31]

Self-organizing maps (SOMs) were described by Teuvo Kohonen in 1982.[32][33] SOMs are neurophysiologically inspired[34] neural networks that learn low-dimensional representations of high-dimensional data while preserving the topological structure of the data. They are trained using competitive learning.[32]

The convolutional neural network (CNN) architecture with convolutional layers and downsampling layers was introduced by Kunihiko Fukushima in 1980.[35] He called it the neocognitron. In 1969, he also introduced the ReLU (rectified linear unit) activation function.[36][10] The rectifier has become the most popular activation function for CNNs and deep neural networks in general.[37] CNNs have become an essential tool for computer vision.

In the late 1970s to early 1980s, interest briefly emerged in theoretically investigating the Ising model created by Wilhelm Lenz (1920) and Ernst Ising (1925)[52]in relation to Cayley tree topologies and large neural networks.The Ising model is essentially a non-learning artificial recurrent neural network (RNN) consisting of neuron-like threshold elements.[10]In 1972, Shun'ichi Amari described an adaptive version of this architecture,[53][10]In 1981, the Ising model was solved exactly by Peter Barth for the general case of closed Cayley trees (with loops) with an arbitrary branching ratio[54]and found to exhibit unusual phase transition behavior in its local-apex and long-range site-site correlations.[55][56]John Hopfield popularised this architecture in 1982,[57]and it is now known as a Hopfield network.

The time delay neural network (TDNN) of Alex Waibel (1987) combined convolutions and weight sharing and backpropagation.[58][59] In 1988, Wei Zhang et al. applied backpropagation to a CNN (a simplified Neocognitron with convolutional interconnections between the image feature layers and the last fully connected layer) for alphabet recognition.[60][61] In 1989, Yann LeCun et al. trained a CNN to recognize handwritten ZIP codes on mail.[62] In 1992, max-pooling for CNNs was introduced by Juan Weng et al. to help with least-shift invariance and tolerance to deformation to aid 3D object recognition.[63][64][65] LeNet-5 (1998), a 7-level CNN by Yann LeCun et al.,[66] that classifies digits, was applied by several banks to recognize hand-written numbers on checks digitized in 32x32 pixel images.

From 1988 onward,[67][68] the use of neural networks transformed the field of protein structure prediction, in particular when the first cascading networks were trained on profiles (matrices) produced by multiple sequence alignments.[69]

In 1991, Sepp Hochreiter's diploma thesis [70] identified and analyzed the vanishing gradient problem[70][71] and proposed recurrent residual connections to solve it. His thesis was called "one of the most important documents in the history of machine learning" by his supervisor Juergen Schmidhuber.[10]

In 1991, Juergen Schmidhuber published adversarial neural networks that contest with each other in the form of a zero-sum game, where one network's gain is the other network's loss.[72][73][74] The first network is a generative model that models a probability distribution over output patterns. The second network learns by gradient descent to predict the reactions of the environment to these patterns. This was called "artificial curiosity."

In 1992, Juergen Schmidhuber proposed a hierarchy of RNNs pre-trained one level at a time by self-supervised learning.[75] It uses predictive coding to learn internal representations at multiple self-organizing time scales. This can substantially facilitate downstream deep learning. The RNN hierarchy can be collapsed into a single RNN, by distilling a higher level chunker network into a lower level automatizer network.[75][10] In the same year he also published an alternative to RNNs[76] which is a precursor of a linear Transformer.[77][78][10] It introduces the concept internal spotlights of attention:[79] a slow feedforward neural network learns by gradient descent to control the fast weights of another neural network through outer products of self-generated activation patterns.

1997, Sepp Hochreiter and Juergen Schmidhuber introduced the deep learning method called long short-term memory (LSTM), published in Neural Computation.[82] LSTM recurrent neural networks can learn "very deep learning" tasks[83] with long credit assignment paths that require memories of events that happened thousands of discrete time steps before. The "vanilla LSTM" with forget gate was introduced in 1999 by Felix Gers, Schmidhuber and Fred Cummins.[84]