Scaling in constrained least-squares problem

Nick Jensen

unread,

Jun 25, 2024, 3:21:56 AM6/25/24

to mosek

Hello,

I am solving the following problem

minimize ||Ax - b||

subject to R(x) >= 0

using MOSEK in CVXPY.

The scale of my data is very small. For instance, min(b)=1e-23 and max(b)=1e-20. They are on a similar scale, but very small absolute numbers.

When I solve my problem using numpy.linalg.pinv(A) @ b (unconstrained), the solution I get is reasonable and on the appropriate scale (I have a way to check it). However, when I try to solve the unconstrained case using CVXPY with MOSEK, I get all zeros for the solution. I use code like the following

cvxpy.Problem(cvxpy.Minimize( cp.norm( A @ x - b ,2) ))

My thought is that the solver is having a hard time using such small numbers in the algorithm. If that is right, are there any suggestions of how to fix this?

I would like to be able to solve this using CVXPY with MOSEK so I can eventually include the constraints. But at the moment I cannot get the results I expect. I look forward to hearing if anyone has ideas that can help.

Thank you,

Nick Jensen

Michal Adamaszek

unread,

Jun 25, 2024, 3:36:11 AM6/25/24

to mosek

Numbers of this order are treated as zero for all practical purposes. Note that x=0 is a correct solution (in the solver's sense) to your problem because the objective is norm(b) and it approximates the best possible objective 0 with an extremely high accuracy (better than the say 1e-8 or 1e-10 one typically expects from the solver).

Moreover Mosek truncates all entries below certain level to 0, and 1e-20 is certainly in the truncation range (details depend on parameters settings, which data part you are in etc.).

The best solution is to rescale the problem yourself, either by scaling both sides or changing the units or a combination of both.

https://docs.mosek.com/modeling-cookbook/practical.html#the-huge-and-the-tiny

https://docs.mosek.com/latest/pythonapi/debugging-numerical.html#formulating-problems

Best,

Michal

Erling D. Andersen

unread,

Jun 25, 2024, 3:43:40 AM6/25/24

to mosek

Here is a general comment.

* Unconstrained linear least squares optimization is equivalent to solving a linear system.

* It is harder to scale a general optimization problem, than linear equations.

* Scaling linear equations are hard.

* In other words real bad scaling can only be fixed by the modeler.

Nick Jensen

unread,

Jun 25, 2024, 3:46:23 AM6/25/24

to mosek

Thanks for the reply and the ideas.

A troublesome part of my problem is that the values are unitless, so I can't choose different units to improve scaling.

I tried scaling by doing something simple like the following, but still got zeros as a solution.

bSCL = b/min(b)

cvxpy.Problem(cvxpy.Minimize( cp.norm( A @ x - bSCL ,2) ))

Any further ideas?

Michal Adamaszek

unread,

Jun 25, 2024, 3:48:25 AM6/25/24

to mosek

minimizing norm(Ax-b) and norm(tAx-tb) are equivalent for any real t so there might be a t which puts all the coefficients in a reasonable range. You don't say much about the coefficients in A.

Perhaps a reproducible example would be helpful.

Michal

Michal Adamaszek

unread,

Jun 25, 2024, 3:49:35 AM6/25/24

to mosek

It is also another question what magnitude of solution x you expect.

If you have a fully reproducible example you can attach it here or contact our support email.

Michal

Nick Jensen

unread,

Jun 25, 2024, 11:52:49 PM6/25/24

to mosek

Ok, thanks for the idea. Please see the example attached. I skipped a lot of my problem domain-specific setup and just provided the data. Please let me know if you have any questions or ideas!

support.zip

Michal Adamaszek

unread,

Jun 26, 2024, 3:03:41 AM6/26/24

to mosek

The issue is not as much scaling as the fact that shape(A)=(600,608) the system is undertetermined, so there is no reason why the Mosek solution and your favourite solution should be the same. There is an 8-dim solution space of the linear system Ax=b. There isn't even a reason why x_p[:8] and xTrue should be the same unless you fix x_p[:8]:

problem = cp.Problem(objective=cost, constraints=[xdiag==xTrue/scale])

x_p = np.linalg.pinv(np.vstack([A, np.eye(8,608)])) @ (np.hstack([b/scale, xTrue/scale]))

You probably want to add the R(x) constraints to get a nondegenerate problem and test with that.

In all cases I tried, scaled or not, the solution produced by the solver satisfies np.linalg.norm(A...@x.value-b) == 0 so it is optimal.

Without scaling you want to solve the problem where b is or order 1e-23 and expected x is of similar order. That is a bad idea. You most likely want to use b/scale and then your solution will also be rescaled by the same factor.

Reading x.value after x participated in two different cvxpy problems is suspicious. For safety reasons I would never even let the same cp.Variable() participate in two different cp.Problem()s.

Best,

Michal

Personal

unread,

Jun 27, 2024, 12:04:40 AM6/27/24

to mo...@googlegroups.com

Thanks for the feedback! Say we can modify this to be overdetermined (this is reasonable in my problem setting). For instance, I modified the file that loads the data to be

# %% load the data
data = np.load('data.npz')
Ao = data['A']
b = data['b']
xTrue = data['xTrue']

# update
A = Ao[:,0:-12]

eliminating a few variables.

Now when I run it, I get a non-zero solution from Mosek, but it is still not close to the pinv solution or the true value. Why is that?

In my case, the scaled and unscaled solutions using Mosek are the same.

--
You received this message because you are subscribed to the Google Groups "mosek" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mosek+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mosek/8159e655-2317-42bc-8b04-c51af115abdfn%40googlegroups.com.

Nick Jensen

unread,

Jun 27, 2024, 12:04:44 AM6/27/24

to mosek

Thanks for the feedback! Say we can modify this to be overdetermined (this is reasonable in my problem setting). For instance, I modified the file that loads the data to be

# %% load the data
data = np.load('data.npz')
Ao = data['A']
b = data['b']
xTrue = data['xTrue']

# update
A = Ao[:,0:-12]

eliminating a few variables.

Now when I run it, I get a non-zero solution from Mosek, but it is still not close to the pinv solution or the true value. Why is that?

In my case, the scaled and unscaled solutions using Mosek are the same.

Erling D. Andersen

unread,

Jun 27, 2024, 12:35:24 AM6/27/24

to mosek

Why is pinv solution better than the Mosek one?

Is the objective value associated with pinv solution better than the Mosek one?

If not, you can just conclude that your problem has multiple optimal solutions.

Michal Adamaszek

unread,

Jun 27, 2024, 2:48:53 AM6/27/24

to mosek

You are doing something wrong because when I use A[:,0:-12] then the values of xTrue, x_p[:8] and x.value[:8] are almost identical, whether with or without scaling

When:

b = b/scale
xTrue = xTrue /scale

then:

Truth:
: [1.38561558e+02 8.66009739e-04 4.15684675e+02 2.59802922e-03
3.46403896e+02 2.16502435e-03 2.07842337e+02 1.29901461e-03]
xp:
: [1.38561560e+02 8.66009734e-04 4.15684676e+02 2.59802921e-03
3.46403896e+02 2.16502435e-03 2.07842338e+02 1.29901461e-03]
x.value:
: [1.38561558e+02 8.66009739e-04 4.15684675e+02 2.59802922e-03
3.46403896e+02 2.16502435e-03 2.07842337e+02 1.29901461e-03]

and without scaling

Truth:
: [1.6e-21 1.0e-26 4.8e-21 3.0e-26 4.0e-21 2.5e-26 2.4e-21 1.5e-26]
xp:
: [1.60000002e-21 9.99999994e-27 4.80000001e-21 3.00000000e-26
4.00000001e-21 2.50000000e-26 2.40000001e-21 1.50000000e-26]
x.value:
: [1.6e-21 1.0e-26 4.8e-21 3.0e-26 4.0e-21 2.5e-26 2.4e-21 1.5e-26]

Notwithstanding that it is still a bad problem with bad scaling, and that all-zeros would still be a numerically good solution to the unscaled problem.

Michal Adamaszek

unread,

Jun 27, 2024, 2:51:14 AM6/27/24

to mosek

And if you are still running your original code when you first solve two problems with the same variable x and afterward read the x value then it would explain why they are the same for both problems.

Nick Jensen

unread,

Jun 27, 2024, 3:41:12 AM6/27/24

to mosek

I apologize for the confusion. I have resolved the issues regarding the unconstrained least squares. This was an issue on my part.

My question now is how do I incorporate constraints to have a well-scaled problem?

I am attaching some updated code. When I comment out the constraints for an unconstrained problem, the pinv and Mosek solution match. When I include the constraints (which the unconstrained solution satisfies), the Mosek solution changes and doesn't satisfy the constraints.

Do you know how I can manage this constraint that is at such a small scale?

support2.zip

Personal

unread,

Jun 27, 2024, 3:41:12 AM6/27/24

to mo...@googlegroups.com

"Better" in this context is based on that this is an estimation problem where I know the true values. The pinv solution is better than the Mosek one in the sense that it is closer to the true values, though the Mosek one has a smaller objective value.

Do you have any thoughts about why the solutions are different? In theory, there is a single solution for unconstrained least squares of an overdetermined system. I understand that in practice we have different solution algorithms. But shouldn't they arrive at a similar answer?

Actually, now that I return to Michal's

Reading x.value after x participated in two different cvxpy problems is suspicious. For safety reasons I would never even let the same cp.Variable() participate in two different cp.Problem()s.

this was actually causing an issue. Now I am getting the same solution for pinv and Mosek for unconstrained overdetermined least squares!

The new issue (the one I wanted to get to) is that when I introduce a constraint (which is already satisfied with the unconstrained solution) the solution changes. The constraint is that x_diag>=0. How can I resolve this issue of a constraint with very small numbers (1e-26 scale)?

--

You received this message because you are subscribed to the Google Groups "mosek" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mosek+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/mosek/ded458e4-e14d-4f62-9ff2-6a731ac7b787n%40googlegroups.com.

Nick Jensen

unread,

Jun 27, 2024, 3:41:12 AM6/27/24

to mosek

"Better" in this context is based on that this is an estimation problem where I know the true values. The pinv solution is better than the Mosek one in the sense that it is closer to the true values, though the Mosek one has a smaller objective value.

Do you have any thoughts about why the solutions are different? In theory, there is a single solution for unconstrained least squares of an overdetermined system. I understand that in practice we have different solution algorithms. But shouldn't they arrive at a similar answer?

Actually, now that I return to Michal's

Reading x.value after x participated in two different cvxpy problems is suspicious. For safety reasons I would never even let the same cp.Variable() participate in two different cp.Problem()s.

this was actually causing an issue. Now I am getting the same solution for pinv and Mosek for unconstrained overdetermined least squares!

The new issue (the one I wanted to get to) is that when I introduce a constraint (which is already satisfied with the unconstrained solution) the solution changes. The constraint is that x_diag>=0. How can I resolve this issue of a constraint with very small numbers (1e-26 scale)?

Erling D. Andersen

unread,

Jun 28, 2024, 4:57:42 AM6/28/24

to mosek

I am the one who made the presolve and interior-point optimizer in Mosek.

These parts are made using the assumption that the numbers are reasonable in absolute size. 1e-26 is something the software consider 0.

I cannot tell you how to solve your problem, but I doubt you will find any traditional commercial software that can deal with your problem in a way you want.

Maybe you have to locate some optimization software that employs at least 128bit floating numbers for computations to get meaningful results.

Or maybe there is another modeling your problem.

Nick Jensen

unread,

Jul 1, 2024, 3:12:00 AM7/1/24

to mosek

Just an update here. Doing a simple scaling of bScaled = b * 1e20 works well. Since the numbers are all on a similar scale, this scaling has worked to solve and then scale back.

Thanks for all your comments

Reply all

Reply to author

Forward