Minimizing a sum of absolute values

Nicolas Boumal

unread,

Feb 27, 2015, 9:25:54 AM2/27/15

to manopt...@googlegroups.com

Hello,

Here is a question we get a lot: how can we use Manopt to minimize a nonsmooth cost function on a manifold like this (L1 cost):

h(X) = |g(X)| (absolute value)

where X is a point on a manifold (typically, a matrix), g : manifold -> reals, and h : manifold -> reals, where g is (sufficiently) smooth, but of course h isn't, because of the absolute value?

There is some research going on to develop Riemannian optimization algorithms for nonsmooth costs, but there is nothing for that in Manopt at this stage. (By all means, if you are developing such solvers yourself and you'd like to see them in Manopt, do let us know: there is a huge demand for it.)

Your best bet for now is to use a smooth approximation of h, which means you need to approximate the absolute value function.

I recommend a pseudo-Huber loss smoothing. Basically you replace f(x) = |x| by

f(x) = \sqrt{x^2 + \epsilon^2} - \epsilon.

As long as \epsilon is positive, this is perfectly smooth (C^\infty). For epsilon = 0, this is |x|.

So to get the gradient of your cost wrt the matrix X, you would use the chain rule for derivatives:

f : reals -> reals

g : matrices -> reals

h : matrices -> reals

h(X) = f(g(X))

Dh(X)[U] = f'(g(X)) Dg(X)[U]

hence the (Euclidean) gradients are given as follows:

grad h(X) = f'(g(X)) grad g(X)

with

f'(x) = x / \sqrt{x^2 + \epsilon^2} ( note that, as \epsilon -> 0, f'(x) -> sign(x) )

I hope this can help. Don't hesitate to comment in this topic for clarifications / alternative propositions / ...

Best,

Nicolas

Nicolas Boumal

unread,

Feb 27, 2015, 9:31:37 AM2/27/15

to manopt...@googlegroups.com

Actually, there is already an example in the manopt distribution that uses a similar technique to smooth the "max(x1, x2, x3, ...)" function, using the log-sum-exp approximation. The example is packing on the sphere:

http://www.manopt.org/reference/examples/packing_on_the_sphere.html

BM

unread,

Feb 28, 2015, 9:56:38 AM2/28/15

to manopt...@googlegroups.com

Hey Nicolas,

Thanks for the post. What about a list of popular non-smooth cost functions and their smooth approximations? You already covered two classes, a slightly different one would be the hinge loss.

Cheers,

BM

Nicolas Boumal

unread,

Feb 28, 2015, 12:16:26 PM2/28/15

to manopt...@googlegroups.com

Sure! I have no experience with the hinge loss though, so I'm not sure which smoothing works best in general. Can you suggest something here?

Nicolas

BM

unread,

Mar 2, 2015, 9:42:35 AM3/2/15

to manopt...@googlegroups.com

For the scalar version, the hinge loss h : R ---> R+ is

h(x) := max(0, 1- x).

The smooth version is

h_smooth(x) := 0, when x >= 1;

1 - x - \epsilon/2, when x <= 1- \epsilon;

(1/2)\epsilon (1 - x)^2, otherwise.

Here \epsilon is a parameter.

Reference: Section 4.1 of http://jmlr.org/proceedings/papers/v32/shalev-shwartz14.pdf.

Nicolas Boumal

unread,

Mar 4, 2015, 6:53:18 AM3/4/15

to manopt...@googlegroups.com

This is continuously differentiable once, right?

BM

unread,

Mar 6, 2015, 10:58:20 AM3/6/15

to manopt...@googlegroups.com

Yes. The log sum exponential, as you said earlier, would be the smooth version.

h(x) := max(0, 1- x).

h_smooth(x) := log(1 + exp(1-x)).

The plots are shown in http://www.wolframalpha.com/input/?i=log%281+%2B+exp%281-x%29%29+and+max%280%2C+1-x%29.