Well, the point is that for each new step (each batch) we sample fresh. And if the steps are small enough (which they will be near the minimum and if the learning rate is small), then we effectively average over many more than just a single batch. This is how stochastic gradient descent can approximate the correct gradient.
It is the same as in other 'Monte Carlo' approaches: One has to average some function f(x), with an x that is a random variable, but instead of doing the exact integral p(x) f(x) dx, one just produces N random x values (according to the probability p(x)), and then averages over those. This can produce a good-enough approximation.