tips for better performence

1,049 views
Skip to first unread message

Dan Caspi

unread,
Mar 31, 2016, 6:52:01 AM3/31/16
to cvxpy
Hey,
- I am solving quite big(?) logistic regression problems: I want to solve about 12 problems, each with 300 variables and 10k observation (matrix size = 10k X 300).
on top of that, I want to compute the best regularization parameter (lasso), so i must repeat that about 30 times. 

- of course, I already compute in parallel 
- I use the SCS solver. 

are there any tips of how to make this faster? maybe common pitfalls?
I looked into CVXGEN but apparently it doesn't support logistic regression yet (no logistic function ?)


Respectfully,
Dan
 


Brendan O'Donoghue

unread,
Apr 1, 2016, 8:44:50 AM4/1/16
to cvxpy
There are three things I would suggest:
  1. Run the indirect version of SCS (right now the python interface to SCS doesn't permit caching the matrix factorization for the direct solver, so indirect is better for your use case)
  2. Compile SCS using openmp, and use as many cores as you have available, this dramatically speeds up both the indirect linear system solver and the exponential cone projection
  3. Warm-start SCS from the previous solve, since the solutions along the regularization path are going to be very similar you can initialize one with the solution from the previous solve
These three together should provide big speed-ups. You can reply to this thread if you run into any issues with any of the steps.

Brendan O'Donoghue

unread,
Apr 1, 2016, 8:45:58 AM4/1/16
to cvxpy
Brendan

Dan Caspi

unread,
Apr 1, 2016, 4:28:29 PM4/1/16
to cvxpy
Wow thanks, sounds promising. Will do that and report back.

One more problem I am facing: this time, I have many more variables (3000). Any other tips for this case?


Thanks again,

Dan

Brendan O'Donoghue

unread,
Apr 1, 2016, 6:13:52 PM4/1/16
to cvxpy
From an optimization point of view it shouldn't really matter much, the same three changes listed above should help. A statistics / modeling point of view it might be different.

Dan Caspi

unread,
Apr 3, 2016, 10:26:04 AM4/3/16
to cvxpy
Alright, the above helped, i didn't compile with openMP just yet, but just the other two gave quite a boost.
Now i have another problem, which is semi related to CVXPY. I am performing logistic regression, and thus i calculate logistic loss for alrge matrices of observations:


nrows = X.shape[0]
X = np.hstack((X,np.ones(nrows).reshape(nrows,1)))
return sum(logistic(X[i,:]*beta*-y[i]) for i in range(nrows))/nrows

this is called in order to see the loss on the validation set, and ultimately choose the best gamma (regularization param).
Is there an efficient CVXPY way of doing that? this is currently my bottleneck.


thanks again for your previous help,

respectfully,
Dan 

Steven Diamond

unread,
Apr 3, 2016, 3:48:14 PM4/3/16
to cvxpy
You can vectorize the evaluation by writing "sum_entries(logistic(X*beta - y))". I would expect that to be much faster.

Brendan O'Donoghue

unread,
Apr 4, 2016, 5:43:09 AM4/4/16
to cvxpy
I would guess that compiling with openMP would probably give the largest boost to performance. It's easy to do, just download the source of SCS from github then run:

cd scs/python
python setup.py install --scs --openmp

(if there is a build directory in the python dir then delete it, to force it to recompile).

There are two other changes you can make that would likely improve performance:

1) If you have a GPU then you can run scs on it, to do this recompile SCS with gpu support (just append ---gpu to the install args listed above), and make your call to cvxpy something like Problem.solve('SCS', gpu=True). You have to have the cuda library installed on your system.
2) You can tweak the scale parameter (a heuristic data re-balancing parameter), for exponential cones it turns out to work a little better with larger scales, so something like Problem.solve('SCS', scale = 5)
Reply all
Reply to author
Forward
0 new messages