linearregression.py

mjl Martin J. Laubach

unread,

Oct 10, 2011, 8:15:29 PM10/10/11

to vienna-stanford-st...@googlegroups.com

I just had an entertaining evening with the code in the github repo. I think there is a bug in there somewhere, but I have absolutely no idea what it could be. I rechecked the formula and results by hand and it all seems all right, but what that returns does not converge at all. Implementing the whole iterative gradient descent algorithm just makes the thetas jump all over the place and crash and burn.

$ python -i linearregression.py

>>> cost_function(0, 0, training_set)

49541

>>> gradient_descent_step(0.001, 0, 0, training_set)

(0.29625, 482.8045, 276892161805.86005)

>>> cost_function(0.29625, 482.8045, training_set)

276892161805.86005

The cost should be diminishing, but it quite obviously isn't?

BTW, Wolfram Alpha linear regresses the example data to theta0 = -42.7833, theta1 = 0.22962 so it's clear the gradient descent step above is totally out of whack.

Pilot error by me?

mjl

Paul Mayer

unread,

Oct 10, 2011, 8:42:12 PM10/10/11

to vienna-stanford-st...@googlegroups.com

Hi & Welcome to the group Martin,

Unfortunately, I have not yet been able to look through anything yet. I might have a chance to look through the implementation during a longer train ride tomorrow.
Other than that: For all the others, who have joined since my last post: I am moving on October the 20th and will gladly contribute to discussions, meetings and
everything else afterwards!

Regards,
Paul

2011/10/11 mjl Martin J. Laubach <goo...@emsi.priv.at>

mjl Martin J. Laubach

unread,

Oct 10, 2011, 9:12:35 PM10/10/11

to vienna-stanford-st...@googlegroups.com

Hm, if I chose a very small learning rate, then it works all right, it only takes ages... I'm not sure it should behave that way.

By "very small" I mean 0.0000008 is the maximum that didn't diverge in my tests (and overflow the floats). To reach a stable state with that learning rate takes a veeeeery long time. You can see it approach the correct values:

iteration(79640000) -42.5691, 0.229487 -> cost 448.221

iteration(79650000) -42.5693, 0.229487 -> cost 448.221

iteration(79660000) -42.5694, 0.229487 -> cost 448.221

iteration(79670000) -42.5695, 0.229487 -> cost 448.221

So either there is a bug in the implementation or the algorithm sucks big time or I suck.

mjl

Ernad Halilovic

unread,

Oct 11, 2011, 3:38:08 AM10/11/11

to vienna-stanford-st...@googlegroups.com

Last time Andreas and I talked, we agreed that this implementation is not final because it needs to have a way of stabilizing alpha. For example, if cost_function is bigger than last iteration then lower the alpha. This is not required by the gradient descent algorithm they presented but it's necessary if you want accurate results. Someone should implement this.

mjl Martin J. Laubach

unread,

Oct 11, 2011, 5:09:37 AM10/11/11

to vienna-stanford-st...@googlegroups.com

Ah okay, so the algorithm Ng presented is just "it basically works if you chose your alpha carefully and have lots of time", but it's not really fit for everyday use unless tweaked. Good to know.

As an aside wrt to the code: it's useless to pass in the h parameter to gradient_descent_step, because the derivatives are hardcoded and it won't work for any outer hypothesis functions anyway. The rest of the code is okay since we aren't in a programming class here :-)

mjl

Andreas Schlapsi

unread,

Oct 11, 2011, 6:29:35 AM10/11/11

to vienna-stanford-st...@googlegroups.com

The code in linearregression.py is far from being production ready.
It's more like a direct translation from the formulas in Andrew Ng's
video to Python. I wrote the code after watching the first ML class
videos to experiment with the content of the videos. And one of the
things I learned while playing with the code is that alpha has to be
chosen very carefully. Otherwise the algorithm is unstable.

You're right, the hypothesis function doesn't need to be passed to
gradient_descent_step.

Andreas

Reply all

Reply to author

Forward