Clarification for Problem 2.3?

David Fouhey

unread,

Feb 15, 2013, 11:34:16 AM2/15/13

to 10-701-spri...@googlegroups.com

I had a question about Problem 2.3. Sorry if I'm being really dense, and please correct me at any point if I'm wrong. I'm almost positive I must be reading something incorrectly, but I just can't figure out what it is.

At the start of the question it says that we are dropping the assumption that the samples are linearly separable. So, we are to assume that our input datapoints can be non-separable.

We are asked to show a bound on the number of updates of the perceptron. However, the perceptron will never converge if the data isn't linearly separable since no w will satisfy all of the datapoints. And so if we are dropping the assumption that the data is linearly separable, then the bound we must prove must be a lower bound (as the number of updates cannot be bounded from above since the updates will never stop).

However, my intuition suggest that this cannot be, and we must prove an upper bound (even looking at the form suggests this). Plugging in the setup (i.e., u = w, \rho = \rho) from the top appears to result in a divide-by-zero: as d_i = max(0,\rho - y_i <w,x_i>) = 0 since \rho is less than or equal to y_i <w,x_i>, and \delta = [\sum_{i=1}^n 0]^{-1/2} = 1 / sqrt(0). However, I assume that you can adjust rho ever-so-slightly by some epsilon to make one point have a very small deviation. This yields a very small change in rho; it also gives an infinitesimal but non-zero delta. In this case, the lower bound would effectively be the same as the upper bound we proved before, which is problematic since the number of updates is discrete and the bound has no such restrictions (so it'd be bounded by within this tiny interval). In this case, a modified upper bound makes more sense to me: if we don't have the separating w, we can take some arbitrary u and see how long the perceptron will take to find the separating w. However, that doesn't make sense if the data isn't separable.

Would someone please be able to clarify this to me?

Again, my apologies if I'm bad at reading.

Thanks,

David

Daniel Maturana

unread,

Feb 15, 2013, 12:18:08 PM2/15/13

to David Fouhey, 10-701-spri...@googlegroups.com

I'm confused as well. What does it mean for the perceptron to
'converge' in a non-linearly separable problem? In the linearly
separable case, it's defined as 'converging to a linear separator'.
Obviously this won't happen in the non-linearly separable case.

thanks
Daniel

> --
> http://alex.smola.org/teaching/cmu2013-10-701 (course website)
> http://www.youtube.com/playlist?list=PLZSO_6-bSqHQmMKwWVvYwKreGu4b4kMU9
> (YouTube playlist)
> ---
> You received this message because you are subscribed to the Google Groups
> "10-701 Spring 2013 CMU" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to 10-701-spring-201...@googlegroups.com.
> To post to this group, send email to
> 10-701-spri...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Mu Li

unread,

Feb 15, 2013, 4:49:25 PM2/15/13

to David Fouhey, 10-701-spri...@googlegroups.com

Hi David,

You are right, for some extremely cases on not linearly separable data, the perceptron does not converge.

But what problem 2.3 asked is quite simple: prove that the number of updates *during the n iterations* is upper bounded (see the algorithm at top). It doesn't claim that the perceptron has been converged or the the w you got separating x_i.

Thanks,

Mu

On Fri, Feb 15, 2013 at 11:34 AM, David Fouhey <david....@gmail.com> wrote:

David Fouhey

unread,

Feb 15, 2013, 4:56:36 PM2/15/13

to Mu Li, 10-701-spri...@googlegroups.com

I see... Thanks for clarifying!

Regards,
David

Reply all

Reply to author

Forward