Question on SVM dual form

jamal sasha

unread,

Feb 7, 2013, 3:30:04 AM2/7/13

to 10-701-spri...@googlegroups.com

Hi,

Probably a very lame question but will still ask.

So, the dual form of SVM can handle problems even if the feature space is infinite dimension.

Did i got that right?

I can see that in dual form of SVM we have no terms involving wi,wj..

But we are still taking the inner product <xi,xj> though minimizing over alpha

So.. how is that inner product computed??

or did i misunderstood the notation?

Krikamol Muandet

unread,

Feb 7, 2013, 3:59:36 AM2/7/13

to jamal sasha, 10-701-spri...@googlegroups.com

Hi Jamal,

You are right. The dual form of the SVM depends only on the inner product <x_i,x_j>. Thus, we can move to high-dimensional feature space by defining a feature map \phi : X->F, where X is your original input space and F is the feature space. Now, we can compute the inner product in this new feature space as follow:

<\phi(x_i), \phi(x_j)>_F

Note that this inner product is evaluated in F, instead of X. Then, to perform an SVM in this feature space, we need to perform this three steps:

1. define the feature map \phi.

2. For any pair of x_i, x_, compute the feature map \phi(x_i) and \phi(x_j)

3. evaluate the inner product <\phi(x_i),\phi(x_j)>_F

As you have already noticed, this is still problematic if we have very high-dimensional feature space. The solution here is to use a "kernel trick". That is, we replace the inner product <\phi(x_i),\phi(x_j)>_F by some symmetric function K(x_i,x_j). The Mercer's theorem ensures that K(x_i,x_j) = <\phi(x_i),\phi(x_j)>_F if and only if K is symmetric positive semi-definite. In other words, we can evaluate the inner product in the feature space without needing to know explicitly the feature map \phi and the feature space F. All we need to do is to compute K(x_i,x_j). Lastly, you can replace every inner products in the dual formulation by the kernel evaluation.

*for some kernel function such as Gaussian RBF kernel, the feature space induced by the kernel is infinite dimensional. So we are implicitly working in a infinite dimensional feature space.

Note also that it would have been impossible (but please correct me if I am wrong) to solve for w in the primal form because the solution will be of infinite dimensional, because in the primal form the size of the problem is proportional to the dimensionality (d), whereas in the dual form the size is proportional to the sample size (n).

I hope this is somewhat helpful.

Krik

--
http://alex.smola.org/teaching/cmu2013-10-701 (course website)
http://www.youtube.com/playlist?list=PLZSO_6-bSqHQmMKwWVvYwKreGu4b4kMU9 (YouTube playlist)
---
You received this message because you are subscribed to the Google Groups "10-701 Spring 2013 CMU" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 10-701-spring-201...@googlegroups.com.
To post to this group, send email to 10-701-spri...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Krikamol Muandet
PhD Student
Max Planck Institute for Intelligent Systems
Spemannstrasse 38, 72076 Tübingen, Germany
Telephone: +49-(0)7071 601 554
http://www.kyb.mpg.de/~krikamol

jamal sasha

unread,

Feb 7, 2013, 8:30:54 PM2/7/13

to Krikamol Muandet, 10-701-spri...@googlegroups.com

Thanks Krikamol for nice detailed explanation. Makes sense :)

Reply all

Reply to author

Forward