Hi folks -
Our next meeting will be this Friday, April 10th, at 2:30pm in GDC 3.516.
It seems like there's a good bit of interest in a few recent papers on
optimization and local minima. We'll focus on the following paper:
http://arxiv.org/abs/1405.4604v2
On the saddle point problem for non-convex optimization
Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio
A central challenge to many fields of science and engineering involves
minimizing non-convex error functions over continuous, high
dimensional spaces. Gradient descent or quasi-Newton methods are
almost ubiquitously used to perform such minimizations, and it is
often thought that a main source of difficulty for the ability of
these local methods to find the global minimum is the proliferation of
local minima with much higher error than the global minimum. Here we
argue, based on results from statistical physics, random matrix
theory, and neural network theory, that a deeper and more profound
difficulty originates from the proliferation of saddle points, not
local minima, especially in high dimensional problems of practical
interest. Such saddle points are surrounded by high error plateaus
that can dramatically slow down learning, and give the illusory
impression of the existence of a local minimum. Motivated by these
arguments, we propose a new algorithm, the saddle-free Newton method,
that can rapidly escape high dimensional saddle points, unlike
gradient descent and quasi-Newton methods. We apply this algorithm to
deep neural network training, and provide preliminary numerical
evidence for its superior performance.
For this week's meeting, it might also be interesting to look over
another paper from the same lab showing that RMSProp implements a
technique much like the one above:
http://arxiv.org/abs/1502.04390
lmj
--
http://www.cs.utexas.edu/~leif