# Linear mixed-effects model for placement of nuclear stress in 10-word turns

21 views

### Christoph Ruehlemann

Jan 10, 2019, 1:15:15 PM1/10/19
Hi all,

As suggested by Stefan, I'm duplicating this post from corpling-with-r to statforling-with-r.

I'm trying to model the placement of nuclear stress in 10-word turns in a linear mixed model but am *very* new to mixed modeling. The model includes these variables:

• `STRSS`, the binary response variable; the 10-word turns have been selected in such a way that only 1 word carries the nuclear stress
• `INFMX`, a binary explanatory variable denoting whether a word carries the maximum informativity (i.e., 'surprisal' given the preceding word)
• `CLASS`, an explanatory variable with three levels: function word, interjection, or content word
• `PSTN`, an explanatory variable denoting whether the nuclear stress occurs early in the turn (words 1-3), in mid-turn position (words 4-6), or late in the turn (words 7-10)
• `STRCT`, an explanatory variable denoting whether the nuclear stress falls on a word inside what is called the turn constructional unit (TCU) or not
• `SPKR`, a random factor referring to speaker IDs, and
• `SEQU`, another random factor referring each word to its place in the sequence of exactly 10 words, considered random because only 10-word turns are examined here, not turns of other lengths

Here's some reproducible data:

``````df <- data.frame(
SPKR = c(rep("A", 10), rep("B", 10), rep("C", 10)),
SEQU = rep(1:10, 3),
STRSS = rep(c(rep("notS", 8), "S", "notS"), 3),
INFMX = rep(c(rep("notMax", 8), "priorMax", "Max"), 3),
CLASS = rep(c(rep("fnc", 3), rep("itj", 1), rep("cnt", 6)), 3),
PSTN = rep(c(rep("earl", 3), rep("mid", 3), rep("lte", 4)), 3),
STRCT = rep(c(rep("notTCU", 2), rep("TCU", 6), rep("notTCU", 2)), 3)
)
df
SPKR SEQU STRSS    INFMX CLASS PSTN  STRCT
1     A    1  notS   notMax   fnc earl notTCU
2     A    2  notS   notMax   fnc earl notTCU
3     A    3  notS   notMax   fnc earl    TCU
4     A    4  notS   notMax   itj  mid    TCU
5     A    5  notS   notMax   cnt  mid    TCU
6     A    6  notS   notMax   cnt  mid    TCU
7     A    7  notS   notMax   cnt  lte    TCU
8     A    8  notS   notMax   cnt  lte    TCU
9     A    9     S priorMax   cnt  lte notTCU
10    A   10  notS      Max   cnt  lte notTCU
11    B    1  notS   notMax   fnc earl notTCU
12    B    2  notS   notMax   fnc earl notTCU
13    B    3  notS   notMax   fnc earl    TCU
14    B    4  notS   notMax   itj  mid    TCU
15    B    5  notS   notMax   cnt  mid    TCU
16    B    6  notS   notMax   cnt  mid    TCU
17    B    7  notS   notMax   cnt  lte    TCU
18    B    8  notS   notMax   cnt  lte    TCU
19    B    9     S priorMax   cnt  lte notTCU
20    B   10  notS      Max   cnt  lte notTCU
21    C    1  notS   notMax   fnc earl notTCU
22    C    2  notS   notMax   fnc earl notTCU
23    C    3  notS   notMax   fnc earl    TCU
24    C    4  notS   notMax   itj  mid    TCU
25    C    5  notS   notMax   cnt  mid    TCU
26    C    6  notS   notMax   cnt  mid    TCU
27    C    7  notS   notMax   cnt  lte    TCU
28    C    8  notS   notMax   cnt  lte    TCU
29    C    9     S priorMax   cnt  lte notTCU
30    C   10  notS      Max   cnt  lte notTCU
``````

My hypothesis is that a word will carry nuclear stress (i.e., `df\$STRSS=="S"`) if

• `df\$INFMX=="priorMAX"`, i.e., the word with the greatest informativity immediately follows the word with the nuclear stress
• `df\$CLASS=="cnt"`, i.e., the word is a content word
• `df\$STRCT=="notTCU"`, i.e., the word lies inside the TCU
• `df\$``PSTN``=="lte"`, i.e., the word occurs late in the turn

Given that the response variable is binary, I've tried a generalized mixed model so far, using `library("mlmRev")`:

``````model1 <- glmer(STRSS ~ (INFMX + CLASS + PSTN + STRCT)^2 +
(1 | SPKR) + (1 | SEQU), data = df, family = binomial(link = "logit"), nAGQ = 1)
``````

The problems I'd appreciate help with are the following:

• Is this the right approach? I.e., is this, at least in principle, the right model?
• The model call produces some unpleasant information--what to make of it?

``````fixed-effect model matrix is rank deficient so dropping 19 columns /coefficients
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt\$par, ctrl = control\$checkConv,  :
2: In checkConv(attr(opt, "derivs"), opt\$par, ctrl = control\$checkConv,  :
Hessian is numerically singular: parameters are not uniquely determined
``````
• And finally, how to read the output of the model summary?

``````summary(model1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial  ( logit )
Formula: STRSS ~ (INFMX + CLASS + PSTN + STRCT)^2 + (1 | SPKR) + (1 |      SEQU)
Data: df

AIC      BIC   logLik deviance df.resid
18.0     30.6      0.0      0.0       21

Scaled residuals:
Min        1Q    Median        3Q       Max
-1.49e-08  1.49e-08  1.49e-08  1.49e-08  1.49e-08

Random effects:
Groups Name        Variance Std.Dev.
SEQU   (Intercept) 0.83102  0.9116
SPKR   (Intercept) 0.05073  0.2252
Number of obs: 30, groups:  SEQU, 10; SPKR, 3

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept)    3.972e+01  7.249e+07       0        1
INFMXnotMax   -4.107e-01  6.711e+07       0        1
INFMXpriorMax -7.929e+01  5.479e+07       0        1
CLASSfnc       3.565e-05  4.745e+07       0        1
CLASSitj       1.581e-06  4.745e+07       0        1
PSTNlte        1.847e-05  3.875e+07       0        1
STRCTnotTCU   -1.472e-05  4.745e+07       0        1

Correlation of Fixed Effects:
(Intr) INFMXnM INFMXpM CLASSf CLASSt PSTNlt
INFMXnotMax -0.926
INFMXprirMx -0.378  0.408
CLASSfnc     0.218 -0.471   0.000
CLASSitj    -0.218  0.000   0.000   0.333
PSTNlte     -0.535  0.289   0.000   0.408  0.408
STRCTnotTCU -0.655  0.707   0.000  -0.667  0.000  0.000
fit warnings:
fixed-effect model matrix is rank deficient so dropping 19 columns / coefficients
convergence code: 0
Hessian is numerically singular: parameters are not uniquely determined

Warning messages:
1: In vcov.merMod(object, use.hessian = use.hessian) :
variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
2: In vcov.merMod(object, correlation = correlation, sigm = sig) :
variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
``````

I'm quite aware that this post is demanding a lot. Helpful pointers are appreciated all the more!

Best
Chris

--

### Stefan Th. Gries

Jan 10, 2019, 2:51:40 PM1/10/19
to StatForLing with R

- why did you use that package and not lme4? mlmRev seems to have not been developed since 2011 and the new routines and capabilities go write beyond what was possible back then.
- given your expectation that a certain interaction will be significant, most ppl would probably say that you need a more elaborate random-effects structure.
- not sure that SEQU should be random factor.

As for how to read the output, that's something that would require a whole chapter so I'll just refer you to, eg, Gelman & Hill, Zuur et al., and Fields et al.

Cheers,
STG
--
Stefan Th. Gries
----------------------------------
UC Santa Barbara & JLU Giessen
http://tinyurl.com/stgries
----------------------------------

### Christoph Ruehlemann

Jan 10, 2019, 4:28:48 PM1/10/19
Well, being a bloody beginner in mixed-effects modelling I've relied on what (little) input I could get hold of and digest ...
Not sure either that SEQU should be a random factor; could probably be left out or be used as a fixed effect.

--
You received this message because you are subscribed to the Google Groups "StatForLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to statforling-wit...@googlegroups.com.

--

### Christoph Ruehlemann

Jan 10, 2019, 4:35:54 PM1/10/19
and what would that "more elaborate random-effects structure" look like?
--

### Christoph Ruehlemann

Jan 11, 2019, 6:37:18 AM1/11/19
With a little help from another forum, Cross Validated, I seem to have found at least an initial solution:

1. First, I've got rid of SEQU and PSTN, leaving me with this:
`m6 <- glmer(STRSS ~ INFMX + CLASS + STRCT + (1 | SPKR), data = df,```` family = binomial(), nAGQ = 15) ```

2. However, the warnings persisted. As I've learnt, that's due to complete separation, that is, the fact that the frequency of some cells is 0 (that can be checked, for example, by

``````with(df, table(STRSS, INFMX))
``````

To solve this problem, a penalty can be placed on the fixed-effects coefficients available in this package:

``````library("GLMMadaptive")
This model produces no warnings, no errors:m7 <- mixed_model(STRSS ~ INFMX + CLASS + STRCT, random = ~ 1 | SPKR,
data = df, family = binomial(), nAGQ = 15, penalized = TRUE,
initial_values = list(betas = rep(0, 6)))

summary(m7)``````
``Curious to see how it plays out with the real data!``
``Chris``
--