It strikes me that none of the answers found so far have address your question yet.
Indeed a batched optimization will never find the global optimum of the network as fed with the entire dataset at once. Instead what will happen with batching is that after the generalizing features have been fitted, which usually happens first, the solution will jump around trying to overfit the pecularities of each batch in turn, but never quite converging to either of those.
The question then is: is the 'average' of trying to fit many different and more pronounced non-generalizing maxima closer to the generalizing properties of the entire dataset of interest?
Ive come to this thread asking myself the very same question; and havnt gotten any more answers yet...
I think it is inevitable that batching provides some limit on overfitting; the large jumps in gradients experienced by going from batch to batch make it hard for the optimizer to ever get around to obsessing over much weaker gradients required to push each individual batch to its global overfitted maximum, so it will most effectively act on those features of the optimal solution that are common to each batch. However, if the features to be overfit in the batches tend 'not to compete' with eachother, they may still be overfitted eventually; just likely less efficiently so since the network is busy chasing the gradient jumps from batch to batch.
So I think the bottomline is: yes, it will tend to provide some limitation to overfitting, but not in a very controllable or dependable manner.
Note that I am shooting from the hip here though; my experience with limiting overfitting on large optimisation problems comes mostly from outside of a neural network context.