Linear Regression when X^T X is not invertible

101 views
Skip to first unread message

Aras Selvi

unread,
May 5, 2023, 12:42:12 AM5/5/23
to mosek
Dear MOSEK developers,

I am calling MOSEK from Julia JuMP. I have a simple question. When I try to optimize a linear regression model, I am solving

model = Model() #start the model
set_optimizer(model, MosekTools.Optimizer) #call MOSEK
#define all variables together
@variables(model, begin
beta[1:size(X,2)] #intercept and coefficients
end)
#define the objective function
@objective(model, Min, sum((y .- X*beta).^2))
optimize!(model) #solve the model

Whenever my data includes the one-hot encoding of some categorical variable (so the columns are dependent), I get the error "PosDefException: matrix is not positive definite; Cholesky factorization failed." 

This is natural as MOSEK is trying to re-write the problem as a SOCP, and it is failing due to the fact that my data is not at full column rank. However, I would like to still solve the problem and obtain a solution (e.g., like IPOPT, without trying to specifically formulate an SOCP). I wanted to know whether there is an easy way out, without introducing a Ridge coefficient. 

Thank you for your time!

Erling D. Andersen

unread,
May 5, 2023, 3:21:17 AM5/5/23
to mosek
Solving the problem

min ||y - X*beta||

gives you a better conditioned problem.  Alternatively you can increase the parameter


and Mosek will perturb the problem to make the Hessian positive definite. 

PS. Your problem is just linear equation solving.

Aras Selvi

unread,
May 8, 2023, 1:06:01 AM5/8/23
to mosek
Thanks for your answer! About the linear equation solving part: this is an MWE for a bigger issue I have. The `min ||y - X*beta||` advice is great, thank you! 

In my experience, using YALMIP was more stable and I am having more issues in JuMP (e.g., I never had to add small diagonal matrices to make sure PD-ness is understood; but in JuMP I often find myself needing to do these). Do you have a similar experience? Depending on your answer, I might directly use the MOSEK API and not JuMP.

Erling D. Andersen

unread,
May 8, 2023, 3:44:45 AM5/8/23
to mosek
Doing the norm of course leads to a conic problem.

The QP formulation requires finding a factorization of a symmetric matrix

H

In your case at least 1 eigenvalue of H is zero and that cause problems. How Julia and Yalmip does this I do not know.
If Yalmip employs an eigenvalue decomposition whereas Julia employs a Choleksy factorization,
then that could explain what you observe. See also

Aras Selvi

unread,
May 8, 2023, 9:26:01 AM5/8/23
to mosek
Another interesting point on factorization. Great to know, many thanks!

Michal Adamaszek

unread,
May 8, 2023, 9:37:29 AM5/8/23
to mosek
Here https://docs.mosek.com/latest/faq/_downloads/be0755f07aae2c5caaf7d0d8049efb83/jump1.jl is an example which shows how to use cones directly in JuMP. I would guess that JuMP also has "norm" and maybe "sum_squares" that can be used here to generate directly conic form. 

I think your issues arise from the fact that you have a problem that is naturally SOC, but its formulation in JuMP makes it into a general QP, which then JuMP has to factorize back again to feed into the solver. You can see that this is an unnecessarily wasteful process, and prone to numerical noise appearing. It is possible that Yalmip detects the structure earlier and does not have to transition through a QP, depending on how the model is written, but I'm not sure. Hopefully the hints from my first paragraph can make something like that work in JuMP also (apologies for being vague but I lost track of the fast developments in JuMP syntax ).

Michal

Aras Selvi

unread,
May 17, 2023, 12:56:25 AM5/17/23
to mosek
Thank you so much, Michal. This is very useful!

Aras Selvi

unread,
Jun 9, 2023, 2:23:07 AM6/9/23
to mosek
Hello,

I have had the chance to experiment with this. My findings are summarized:

1. Optimizing directly "sum((y .- X*beta).^2)" with a QP solver (not MOSEK) takes 0.05 seconds solver time, but the JuMP formulation time takes >10 seconds
2. Optimizing directly "sum((y .- X*beta).^2)" with Mosek gives a "convexity not recognized" error.
3. Optimizing "t" via constraining [t; X*beta] in SecondOrderCone with MOSEK takes 0.5 seconds, but the JuMP formulation is < 1 seconds.
4. Trying "3" for the QP solver gives a "this constraint is not supported by the solver" kind of error.

Overall, conic optimization is better considering the whole formulation + solution time. However, I don't understand why once formulated the "solver time" of a QP solver might be better than MOSEK's conic version.

Thank you,
Aras

Erling D. Andersen

unread,
Jun 9, 2023, 3:52:17 AM6/9/23
to mo...@googlegroups.com
You mean why 1 is better than 3? 

--
You received this message because you are subscribed to the Google Groups "mosek" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mosek+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mosek/1c555b99-3869-4232-ac86-dc2490eed387n%40googlegroups.com.


--
Erling D. Andersen

MOSEK APS 
Fruebjergvej 3, 
Symbion Science Park Box 16, 
DK-2100 Copenhagen O

Michal Adamaszek

unread,
Jun 9, 2023, 4:24:44 AM6/9/23
to mosek
I guess in 3. you mean [t; y.-X*beta].

You could also try norm(y.-X*beta) , if something like that is understood in JuMP.

Your last question is hard to answer without seeing the full reproducible example. 

Michal

Aras Selvi

unread,
Jun 9, 2023, 8:27:08 AM6/9/23
to mosek
Exactly! Overall, given the formulation time, MOSEK's conic solver is the best. But if we fully focus on QP vs SOCP, I would have imagined SOCP to be faster. But it looks like the solver time is less for QP. 

Maybe it is hard to give a generic answer without context, apologies.

Reply all
Reply to author
Forward
0 new messages