Bootstrapping is only as good as the sample is a good approximation of the population. So if your population has a lot of kurtosis, modest samples probably won't capture the tails of your true sampling distribution by resampling. But likewise, robust corrections also only work as well as the information from your sample approximates the true sampling distribution. I'm not sure which would be preferable in your case, but I can definitely say that you should use MLM if you have complete data. MLR is a less stable approximation whose only advantage is being available with incomplete data (so it can be combined with FIML).
Yes, but if you read my permutation paper, you'll see that there is no reason to rely on fit indices for testing. Permutation doesn't make them perform better than a LRT (a scaled test in your case), only as well as that. To test invariance, you should use a test statistic. Fit indices are meant to be interpreted as effect sizes, not as a replacement for (but rather complementary to) test statistics.
In the case of DWLS for binary/ordinal data, a subsequent paper did show that permutation maintains somewhat better Type I error rates than the scaled-and-shifted test. But I have not investigated the case of nonnormal (approximately) continuous data.
Best,
Terrence D. Jorgensen (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen