Issues with optimize in Julia

252 views
Skip to first unread message

Ryan Carey

unread,
Feb 15, 2015, 6:29:35 PM2/15/15
to julia...@googlegroups.com
Hi all,
I've just discovered Julia this last month, and have been greatly enjoying using it, especially because of its matlab-like linear algebra notation and all-round concise and intuitive syntax.

I've been playing with its optimisation functions, looking to implement gradient descent for logistic regression but I hit a couple of stumbling blocks, and was wondering how you've managed these.

Using Optim, I implemented regularized logistic regression with l_bfgs, and although it worked some times, when I stress-tested it with some k-fold validation, I got Linesearch errors.

I've got a dataset that's about 600 x 100 (m x n) with weights w and classes y.

my code:
  function J(w)        
    m,n = size(X)
    return sum(-y'*log(logistic(X*w)) - (1-y')*log(1-logistic(X*w))) + 
             reg/(2m) * sum(w.^2) # note normalizing bias weight
  end
    function g!(w,storage)
        storage[:] = X' * (logistic(X*w) - y) + reg / m * w
    end

    out = optimize(J, g!, w, method = :l_bfgs,show_trace=true)

the error:
Iter     Function value   Gradient norm 
...
    19    -9.034225e+02     2.092807e+02
    20    -9.034225e+02     2.092807e+02
    21    -9.034225e+02     2.092807e+02
    22    -9.034225e+02     2.092807e+02
    23    -9.034225e+02     2.092807e+02
Linesearch failed to converge
while loading In[6], in expression starting on line 2

 in hz_linesearch! at /home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:374
 in hz_linesearch! at /home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:188
 in l_bfgs at /home/ryan/.julia/v0.3/Optim/src/l_bfgs.jl:165
 in optimize at /home/ryan/.julia/v0.3/Optim/src/optimize.jl:340

Perhaps I should override its convergence criteria? Or there's a bug in my code? Anyway, I thought I might have more like with conjugate gradient descent, so I included types.jl and cg.jl from the Optim package, and tried to make it work too, defining a Differentiable Function type

function rosenbrock(g, x::Vector)
         d1 = 1.0 - x[1]
         d2 = x[2] - x[1]^2
         if !(g === nothing)
           g[1] = -2.0*d1 - 400.0*d2*x[1]
           g[2] = 200.0*d2
         end
         val = d1^2 + 100.0 * d2^2
         return val
       end

function rosenbrock_gradient!(x::Vector, storage::Vector)
           storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
           storage[2] = 200.0 * (x[2] - x[1]^2)
       end
 
 cg(rosenbrock,[0,0])

d2 = DifferentiableFunction(rosenbrock,rosenbrock_gradient!)

cg(d2,[0,0])
ERROR: InexactError()

I tried a few variations on the function 'cg' before coming here for help. I notice that there are a couple of other optimization packages out there but this one is by JMW and looks good.

Obviously, if I just wanted to perform linear regression, I could just use a built-in function, but to use more complex models, I would need to be able to do gradient descent.

How have others fared with Optim? Any thoughts on what's going wrong? General tips for how to make gradient descent work with Julia?


John Myles White

unread,
Feb 15, 2015, 7:05:18 PM2/15/15
to julia...@googlegroups.com
Here's my two, not very thorough, cents:

(1) The odds of a bug in Optim.jl are very high (>90%).
(2) The odds of a bug in your code are very high (>90%).

It's pretty easy to make a decision about (2). Deciding on (1) is a lot harder, since you need a specific optimization that Optim should solve, but fails to solve.

For resolving (2), you have a couple of sub-problems:

(a) Is your gradient analytically correct? You can check this by comparing it with finite differencing. If it doesn't produce a close match, be suspicious.
(b) Is your log likelihood + gradient numerically correct? Your stress test is, in theory, an attempt to test this. But numerical instability implies that the problem only occurs when the problem is likely to be numerically unstable. So you'd want to measure the correlation between the difficulty of the problem and the probability of failure.

My experience is that the Optim error messages don't make it easy to realize when you've made a mistake in your gradients. This is being worked on at the moment, but I think someone would need to dedicate a week to working on this to get us to a point where the error messages are always clear.

 -- John
Reply all
Reply to author
Forward
0 new messages