When you do maximum likelihood estimation, you are minimizing KL(\hat
p(x) || p_\theta(x)),
where \hat p(x) is your empirical distribution and p_\theta(x) is
your model, which you are trying to optimize.
In variational inference, we are minimizing KL(q(x) || p(x)), where
now the first argument q(x) is what you're trying to optimize.
-Percy