Dan, (1) I'd say that it's always a good idea to try to balance your design, but definitely not a big problem with the imbalance that you describe. Maybe more important with the n=4, 4, and 8 is that the sample sizes are overall fairly small, which makes it much more likely that a couple of odd sites will have undue influence on the outcome.
Nonmetric scaling doesn't care about how you have defined groups of sample units, since it's just trying to represent the variation among them as points in species space.
(2) By default, MRPP weights groups by their size in calculating the effect size and p value. Balancing a design is a consideration whether parametric or not. But yes, you can use MRPP with your data.
(3) Whatever logic you applied in transforming the data for NMS should also apply to other analyses that are based on a distance matrix. ISA is different in that no distance matrix is used, but the Hellinger transform will affect the results because it is relativizing by sample units (which changes the distribution of values for each species) and taking the square root of the values (which will affect the relative abundances calculated by ISA). This leaves the question about whether this is a good idea, which I can't answer because it would depend on the reason you are using Hellinger in the first place.
Bruce McCune