Is Manopt only suitable for mainifold-constrained smooth optimization problems?

grandowife

unread,

May 6, 2019, 4:22:55 AM5/6/19

to Manopt

Hi there,

I have a manifold-constrained optimization problem, and a part of it is a non-smooth hinge loss objective like max(0, 1-z).

I wanna know if the Manopt can solve this problem and why.

Thank you very much.

Best wishes.

Nicolas Boumal

unread,

May 6, 2019, 8:22:43 AM5/6/19

to Manopt

Hello,

Can you please tell us more about your problem / application?

There are a few ways you can use manopt to handle nonsmooth cost functions:

1) smooth the cost function (e.g., max(0, 1-x) is approximately mu*log(1 + exp((1-x)/mu)). You can replace the nonsmooth function with this one and some value of mu that doesn't smooth too strongly. Optimize. Now, using that answer, strengthen smoothing by changing mu and reoptimize, using your previous output as initialization. Iterate a few times.

2) lookup MADMM by Kovnatsky and Bronstrein

3) implement a nonsmooth solver: the core of manopt is ready for this, as it allows you to specify the subgradient of a cost function (see tutorial). In my experience, nonsmooth solvers are subtle to implement (we had a couple attempts but nothing solid enough for release in the toolbox). But I'm convinced it's possible to write something good here.

Best,
Nicolas

grandowife

unread,

May 6, 2019, 9:07:11 AM5/6/19

to Manopt

Hi,

Thanks for your great answers!

The objective function of my problem can be generally described as:

U* = argmin_U 0.5 * ||U||^2 + sum_i(max(0, 1-u_i * X))

where, ||`||^2 is the F-norm of matrix U which contains n m-dim vectors; X is the feature matrix of an arbitrary training data.

Also, may I ask another question about the above objective function:

Why many studies do not regard such function as a non-smooth function and directly calculate the gradients of it?

Best wishes.

在 2019年5月6日星期一 UTC+8下午8:22:43，Nicolas Boumal写道：

Nicolas Boumal

unread,

May 6, 2019, 9:19:40 AM5/6/19

to Manopt

Hello,

I don't know what happens in these papers you refer to, but perhaps they just use the fact that the function is smooth almost everywhere, so that the gradient is defined almost everywhere. (That doesn't necessarily make it safe to use, and at a minimum one should adapt the stopping criterion since the gradient may not be zero (or defined) at a minimum.)

Reply all

Reply to author

Forward