Both the Levene and B-F tests transform dependent variables for use in an ANOVA test. The only difference between the two tests is in how those transformed variables are constructed. The Levene test uses deviations from group means, which usually results in a highly-skewed set of data; This violates the assumption of normality. The Brown-Forsythe test attempts to correct for this skewness by using deviations from group medians. The result is a test that’s more robust. In other words, the B-F test is less likely than the Levene test to incorrectly declare that the assumption of equal variances has been violated.
The test works as follows:
The test statistic used in a regular ANOVA is an F-statistic. The statistic used in an ANOVA with transformed variables is sometimes called a W-Statistic — but it’s really just an F-Statistic with a different name. It should not be confused with the coefficient of concordance W-statistic, which is used to assess agreement between raters.
CautionsFor the most part, the B-F test is thought to perform as well as or better than other available tests for equal variances. However, Glass and Hopkins (1996 p. 436) state that the Levene and B-F tests are “fatally flawed”; It isn’t clear how robust they are when there is significant differences in variances and unequal sample sizes. Hill et. al (2006) advise repeating the test using a non-parametric method.
Therefore, perhaps I should not be following scipy and rename my function from Levene to BrownForsythe?