Hi Jamal,
You are right. The dual form of the SVM depends only on the inner product <x_i,x_j>. Thus, we can move to high-dimensional feature space by defining a feature map \phi : X->F, where X is your original input space and F is the feature space. Now, we can compute the inner product in this new feature space as follow:
<\phi(x_i), \phi(x_j)>_F
Note that this inner product is evaluated in F, instead of X. Then, to perform an SVM in this feature space, we need to perform this three steps:
1. define the feature map \phi.
2. For any pair of x_i, x_, compute the feature map \phi(x_i) and \phi(x_j)
3. evaluate the inner product <\phi(x_i),\phi(x_j)>_F
As you have already noticed, this is still problematic if we have very high-dimensional feature space. The solution here is to use a "kernel trick". That is, we replace the inner product <\phi(x_i),\phi(x_j)>_F by some symmetric function K(x_i,x_j). The Mercer's theorem ensures that K(x_i,x_j) = <\phi(x_i),\phi(x_j)>_F if and only if K is symmetric positive semi-definite. In other words, we can evaluate the inner product in the feature space without needing to know explicitly the feature map \phi and the feature space F. All we need to do is to compute K(x_i,x_j). Lastly, you can replace every inner products in the dual formulation by the kernel evaluation.
*for some kernel function such as Gaussian RBF kernel, the feature space induced by the kernel is infinite dimensional. So we are implicitly working in a infinite dimensional feature space.
Note also that it would have been impossible (but please correct me if I am wrong) to solve for w in the primal form because the solution will be of infinite dimensional, because in the primal form the size of the problem is proportional to the dimensionality (d), whereas in the dual form the size is proportional to the sample size (n).
I hope this is somewhat helpful.
Krik