post-hoc test of contrasts after MixedLM?

91 views
Skip to first unread message

Brad Buran

unread,
Apr 20, 2016, 2:29:54 PM4/20/16
to pystat...@googlegroups.com
I'm trying to figure out how to do a post-hoc test of contrasts to
test whether the difference between a certain set of contrasts is
significant. However, I can't figure out how to do it in statsmodels.
I know how to do it in R using the glht package. Am I missing
something obvious or has this not been implemented?

josef...@gmail.com

unread,
Apr 20, 2016, 3:14:32 PM4/20/16
to pystatsmodels
Can you be more specific about the setup and which kind of test?

the glht description looks pretty long.


general answer: this is not available, at least not readily (out of
the box) available.


I made several attempts at this in the past. Initially it failed
because we didn't have enough information about the categoricals in
the design matrix. Now, patsy provides that information, but I haven't
gone back to writing the code to interface with it.

We have vectorized t-tests for given contrasts, we have function for
multiple testing p-value corrections. The distribution/pvalues for
Tukey range statistic are available but assume independent samples.
I wrote some prototype functions to automatically create all pairwise
contrast matrices.
(Gretz multivariate cdf distributions normal and t are not very
"fancy" for calculating the generic multiple comparison p-values.)

However, those things are not connected to each other. I don't know if
there is anything special about the Mixed model in this, in cases I
looked at we just use the asymptotic normality of the parameter
estimate.

One of the first links when I searched for glht pointed to a
stackoverflow question where the answer recommended lsmeans as
alternative to glht. lsmean or mean prediction is another area that
doesn't have builtin automatic support yet (although that should be
easy to do with t_test in linear models)


I haven't looked at the R functions for this in a while, but t_test
pairwise is on my high priority target list in analogy to the Stata
pairwise function.


Josef

josef...@gmail.com

unread,
Apr 20, 2016, 8:03:04 PM4/20/16
to pystatsmodels
On Wed, Apr 20, 2016 at 3:14 PM, <josef...@gmail.com> wrote:
> On Wed, Apr 20, 2016 at 2:29 PM, Brad Buran <bbu...@alum.mit.edu> wrote:
>> I'm trying to figure out how to do a post-hoc test of contrasts to
>> test whether the difference between a certain set of contrasts is
>> significant. However, I can't figure out how to do it in statsmodels.
>> I know how to do it in R using the glht package. Am I missing
>> something obvious or has this not been implemented?
>
> Can you be more specific about the setup and which kind of test?
>
> the glht description looks pretty long.
>
>
> general answer: this is not available, at least not readily (out of
> the box) available.
>
>
> I made several attempts at this in the past. Initially it failed
> because we didn't have enough information about the categoricals in
> the design matrix. Now, patsy provides that information, but I haven't
> gone back to writing the code to interface with it.
>
> We have vectorized t-tests for given contrasts, we have function for
> multiple testing p-value corrections. The distribution/pvalues for
> Tukey range statistic are available but assume independent samples.
> I wrote some prototype functions to automatically create all pairwise
> contrast matrices.
> (Gretz multivariate cdf distributions normal and t are not very

The name got mangled in my brain/memory: It's supposed to be Genz and Bretz.
https://cran.r-project.org/web/packages/mvtnorm/mvtnorm.pdf
scipy has one of Genz's Fortran functions for multivariate normal cdf
with a wrapper in the statsmodels sandbox.

Brad Buran

unread,
Apr 21, 2016, 5:48:29 PM4/21/16
to pystat...@googlegroups.com
Hi Josef:

So, the idea behind the contrast test is if I have the following model:

sf = area*phase*stream

Where area can be A1 or PEG, phase can be random or repeating and
stream can be foreground or background.

I can then test the following null hypothesis:

In A1 is the difference between sf in the repeating and random phase
in the foreground stream equal to the difference between the sf in the
repeating and random phase in the background stream. Or, in other
words

sf in area[A1], phase[foreground], phase[repeating] - sf in
area[A1]:phase[foreground]:phase[random] == sf in area[A1],
phase[background], phase[repeating] - sf in
area[A1]:phase[background]:phase[random]

I would then ask the same question, but for area[PEG].

So, if I set up my model in R using lmer:

m = lmer(sf~area*stream*phase+(1|cell/target), data=sf)

I can then test this by using post-hoc contrasts using glht:

contrasts = rbind('A1 : FG(rep-rand)-BG(rep-rand)'=c(0, 0, 0, 0, 0, 0, 1, 0),
'PEG: FG(rep-rand)-BG(rep-rand)'=c(0, 0, 0, 0, 0, 0, 1, 1))
tests = glht(m, linfct=contrasts)

In area[A1], the 7th coefficient, stream[foreground]:phase[repeating]
is the only difference between the two sides of the equation in the
null hypothesis. However, for area[PEG], the 7th and 8th coefficients
(area[PEG]:stream[foreground]:phase[repeating]) define the difference
between the two sides of the equation in the null hypothesis. Hence, I
cannot extract p-values directly from the output of the model. I need
to do a further test to see whether the combination of the 7th and 8th
terms are significant.


Brad

josef...@gmail.com

unread,
Apr 21, 2016, 6:30:59 PM4/21/16
to pystatsmodels
you can test arbitrary constrasts (linear restrictions) on the
parameters with the t_test method of the results instance.
constraints or contrasts can be defined either as string equations
using the parameter names or the constraint matrix like

[[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1, 1]]

two constraints checked independently, returning two pvalues that are
not multiple testing corrected,
the wald_test methods uses the same kind of interface but provides
results for the joint hypothesis.

This should work generically for (almost) all models.

I don't know where we have good examples for this.
http://www.statsmodels.org/dev/examples/notebooks/generated/ols.html#F-test
shows an f_test but t_test works the same way,
wald_test is the same test as f_test but uses the default distribution
which is chisquare for many nonlinear models.
MixedLM also seems to default to using normal and chisquare
distribution instead of t and F.



What's not available is automatic creation of sets of contrasts and
builtin multiple comparison correction.

Josef

Brad Buran

unread,
Apr 21, 2016, 7:25:45 PM4/21/16
to pystat...@googlegroups.com
This looks great. I wasn't aware of this. I will have to play with it
and see how it works.

Thanks!
Brad

josef...@gmail.com

unread,
Apr 21, 2016, 9:55:51 PM4/21/16
to pystatsmodels
On Thu, Apr 21, 2016 at 7:25 PM, Brad Buran <bbu...@alum.mit.edu> wrote:
> This looks great. I wasn't aware of this. I will have to play with it
> and see how it works.

Ask or reply if you have question or comments.
I don't remember at all what the documentation for this is, and I
didn't see a good overview in a brief search.
But I think it works very well.

Josef

Brad Buran

unread,
Apr 22, 2016, 12:35:49 PM4/22/16
to pystat...@googlegroups.com
So this does work pretty well and gives identical results to R. The only caveat is that for the t_test, I can't use the string equation. However, the following worked to give me my contrasts matrix:

from patsy import DesignInfo
LC = DesignInfo(fit.model.data.param_names) \
    .linear_constraint('phase[T.repeating]:stream[T.fg]+area[T.PEG]:phase[T.repeating]:stream[T.fg]')
fit.t_test(LC.coefs[:, :-2])

Cropping out the final two coefficients seems necessary (these are variance components set up using vc_formula).

josef...@gmail.com

unread,
Apr 22, 2016, 12:50:26 PM4/22/16
to pystatsmodels
On Fri, Apr 22, 2016 at 12:35 PM, Brad Buran <bbu...@alum.mit.edu> wrote:
So this does work pretty well and gives identical results to R. The only caveat is that for the t_test, I can't use the string equation. However, the following worked to give me my contrasts matrix:

from patsy import DesignInfo
LC = DesignInfo(fit.model.data.param_names) \
    .linear_constraint('phase[T.repeating]:stream[T.fg]+area[T.PEG]:phase[T.repeating]:stream[T.fg]')
fit.t_test(LC.coefs[:, :-2])

Cropping out the final two coefficients seems necessary (these are variance components set up using vc_formula).


Thanks for the feedback.

Can you post the function call and traceback for the string equation that doesn't work?


Based on a search for related issues, there might still be some problems with handling the extra parameters.

Most likely nobody tried to use the string interface.

Josef

Brad Buran

unread,
Apr 22, 2016, 4:49:15 PM4/22/16
to pystat...@googlegroups.com

josef...@gmail.com

unread,
Apr 22, 2016, 4:57:08 PM4/22/16
to pystatsmodels
On Fri, Apr 22, 2016 at 4:48 PM, Brad Buran <bbu...@alum.mit.edu> wrote:

thanks for checking

BUG/ENH
requires a new issue.

Please open an issue 
(I'm currently on family duties)

Josef
Reply all
Reply to author
Forward
0 new messages