Minimizing a sum of absolute values

161 views
Skip to first unread message

Nicolas Boumal

unread,
Feb 27, 2015, 9:25:54 AM2/27/15
to manopt...@googlegroups.com
Hello,

Here is a question we get a lot: how can we use Manopt to minimize a nonsmooth cost function on a manifold like this (L1 cost):

h(X) = |g(X)| (absolute value)

where X is a point on a manifold (typically, a matrix), g : manifold -> reals, and h : manifold -> reals, where g is (sufficiently) smooth, but of course h isn't, because of the absolute value?

There is some research going on to develop Riemannian optimization algorithms for nonsmooth costs, but there is nothing for that in Manopt at this stage. (By all means, if you are developing such solvers yourself and you'd like to see them in Manopt, do let us know: there is a huge demand for it.)

Your best bet for now is to use a smooth approximation of h, which means you need to approximate the absolute value function.

I recommend a pseudo-Huber loss smoothing. Basically you replace f(x) = |x| by

f(x) = \sqrt{x^2 + \epsilon^2} - \epsilon.

As long as \epsilon is positive, this is perfectly smooth (C^\infty). For epsilon = 0, this is |x|.

So to get the gradient of your cost wrt the matrix X, you would use the chain rule for derivatives:

f : reals -> reals
g : matrices -> reals
h : matrices -> reals

h(X) = f(g(X))

Dh(X)[U] = f'(g(X)) Dg(X)[U]

hence the (Euclidean) gradients are given as follows:

grad h(X) = f'(g(X)) grad g(X)

with

f'(x) = x / \sqrt{x^2 + \epsilon^2}    ( note that, as \epsilon -> 0, f'(x) -> sign(x) )

I hope this can help. Don't hesitate to comment in this topic for clarifications / alternative propositions / ...

Best,
Nicolas

Nicolas Boumal

unread,
Feb 27, 2015, 9:31:37 AM2/27/15
to manopt...@googlegroups.com
Actually, there is already an example in the manopt distribution that uses a similar technique to smooth the "max(x1, x2, x3, ...)" function, using the log-sum-exp approximation. The example is packing on the sphere:


BM

unread,
Feb 28, 2015, 9:56:38 AM2/28/15
to manopt...@googlegroups.com
Hey Nicolas, 

Thanks for the post. What about a list of popular non-smooth cost functions and their smooth approximations? You already covered two classes, a slightly different one would be the hinge loss. 

Cheers,
BM 

Nicolas Boumal

unread,
Feb 28, 2015, 12:16:26 PM2/28/15
to manopt...@googlegroups.com
Sure! I have no experience with the hinge loss though, so I'm not sure which smoothing works best in general. Can you suggest something here?

Nicolas

BM

unread,
Mar 2, 2015, 9:42:35 AM3/2/15
to manopt...@googlegroups.com
For the scalar version, the hinge loss h : R  ---> R+  is 

h(x) := max(0, 1- x)


The smooth version is

h_smooth(x) :=  0,                          when x >= 1
                          1 - x - \epsilon/2,   when x <= 1- \epsilon
                          (1/2)\epsilon (1 - x)^2, otherwise.

Here \epsilon is a parameter. 





                 

Nicolas Boumal

unread,
Mar 4, 2015, 6:53:18 AM3/4/15
to manopt...@googlegroups.com
This is continuously differentiable once, right?

BM

unread,
Mar 6, 2015, 10:58:20 AM3/6/15
to manopt...@googlegroups.com
Yes. The log sum exponential, as you said earlier, would be the smooth version.

h(x) := max(0, 1- x)

h_smooth(x) :=  log(1 + exp(1-x)).


Regards,
BM
Reply all
Reply to author
Forward
0 new messages