Here's a possible (handwavy) explanation. Please correct me (anyone) if I'm wrong
What this amounts to is that the GP thinks the covariance between all the elements in the input space (x) is determined completely after seeing the first n points x_{1..n}, since adding another point does not increase the rank of the K matrix. I am not sure if there is an easier mathematical way to see this, but I can bet that if you find the variance of point x_{n+1} by conditioning the joint Gaussian on the n training points, you will get 0. As Prof Smola said in the lecture, this means that the GP is confident of predicting any point given n training points, which is unrealistic.
Perhaps one way to think about this is the following: Adding a row to the Gram Matrix (corresponding to the (n+1)th point) does not change the rank, which means that this new row is a linear combination of the first n rows. Let's take this to an extreme, let us add a row which is exactly the same as an already existing row. This happens if the test point is one of the training points. In the case that there is no noise, the GP is certain of this new test point since it knows the exact output value.
So in order to keep the rank up, we add small diagonal elements to K.
Hope this makes some sense, and is not going to confuse/mislead anyone.