Non-linear Decision Boundaries (Logistic regression videos)

Jason Amster

unread,

Oct 21, 2011, 10:29:04 AM10/21/11

to NYC Machine Learning Review

Hi All,

I'm watching the third video in the Logistic regression set, and in
the example where the y=1 set is surrounding the y=0 set (Video 3,
11:00), Prof. Ng comes up with the hypothesis:

hƟ(x) = g(Ɵ0 + Ɵ1*x1 + Ɵ2*x2 + Ɵ3*x1^2 + Ɵ4*x2^2)

which cleverly turns out to form a decision boundary of a circle with
radius 1. If you didn't know the shape of the data a priori, how can
you just add more features of x1^2 and x2^2?

I know in a similar vein you can choose higher order polynomials in
linear regression, and you can possibly come up with a few differenty
hƟ(x)'s and test them, but I can't figure out a strategy that would
make sense to predict a circle or one of the other highly odd shapes
he draws.

Could be I'm getting ahead of myself, but I didn't see a clear answer
in the following 2 videos either.

Any insight would be appreciated.

sajit

unread,

Oct 22, 2011, 9:06:35 PM10/22/11

to NYC Machine Learning Review

I could be mistaken.. But I was under the impression that the
hypothesis was chosen after seeing the data points. In other words,
its because you had a circle-ish data set, that the hypothesis was
chosen like that.

Jason Amster

unread,

Oct 23, 2011, 10:22:51 AM10/23/11

to nyc-machine-l...@googlegroups.com

Right, I agree. But that's easy when you have only 2 features and you can see it plotted on a graph. What about when there more dimensions that can't be plotted, or when it's a shape where there is no clear formulaic set of features?

Daniel Lamblin

unread,

Oct 24, 2011, 10:52:42 AM10/24/11

to NYC Machine Learning Review

If I recall correctly one thing to try is to divide your sample data
(the one with 50,000 samples and 200 features) into two randomized
sets of say 35,000 learning samples and 15,000 test samples. Now try
solving polynomial fits with 1, 2, 3, 4 ... to some reasonable upper
bound that won't take all day like 8, or use gradient descent,
whichever, on the learning samples, then use the learned theta of each
to test the error-squared on the test samples, you'll pick the
polynomial power which has the minimal test error, while the learning
set error will get very very low with each higher order, at some point
you're actually over-fitting so it no longer applies generally, which
is where the test sample help you retain applicability.
If later you want to double check your choice of polynomial power, run
the whole process again on a different random segmentation of the
data, they'll probably match up, though the theta might be a tad
different. Now that you have the power that's about right, find the
coefficients for all the sample data you have at hand.

Reply all

Reply to author

Forward