I'm trying to model the placement of nuclear stress in 10-word turns from the BNC in a linear mixed model but am very new to mixed modeling. The model includes these variables:
STRSS
, the binary response variable; the 10-word turns have been
selected in such a way that only 1 word carries the nuclear stress INFMX
, a binary explanatory variable denoting whether a word
carries the maximum informativity (i.e., 'surprisal' given the
preceding word) CLASS
, an explanatory variable with three levels:
function word, interjection, or content word POST
, an explanatory variable denoting whether the nuclear stress occurs early in the turn
(words 1-3), in mid-turn position (words 4-6), or late in the turn
(words 7-10) STRCT
, an explanatory variable denoting whether the
nuclear stress falls on a word inside what is called the turn
constructional unit (TCU) or not SPKR
, a random factor referring to
speaker IDs, and SEQU
, another random factor referring each word to its
place in the sequence of exactly 10 words, considered random because
only 10-word turns are examined here, not turns of other lengthsHere's some reproducible data:
df <- data.frame(
SPKR = c(rep("A", 10), rep("B", 10), rep("C", 10)),
SEQU = rep(1:10, 3),
STRSS = rep(c(rep("notS", 8), "S", "notS"), 3),
INFMX = rep(c(rep("notMax", 8), "priorMax", "Max"), 3),
CLASS = rep(c(rep("fnc", 3), rep("itj", 1), rep("cnt", 6)), 3),
POST = rep(c(rep("earl", 3), rep("mid", 3), rep("lte", 4)), 3),
STRCT = rep(c(rep("notTCU", 2), rep("TCU", 6), rep("notTCU", 2)), 3)
)
df
SPKR SEQU STRSS INFMX CLASS POST STRCT
1 A 1 notS notMax fnc earl notTCU
2 A 2 notS notMax fnc earl notTCU
3 A 3 notS notMax fnc earl TCU
4 A 4 notS notMax itj mid TCU
5 A 5 notS notMax cnt mid TCU
6 A 6 notS notMax cnt mid TCU
7 A 7 notS notMax cnt lte TCU
8 A 8 notS notMax cnt lte TCU
9 A 9 S priorMax cnt lte notTCU
10 A 10 notS Max cnt lte notTCU
11 B 1 notS notMax fnc earl notTCU
12 B 2 notS notMax fnc earl notTCU
13 B 3 notS notMax fnc earl TCU
14 B 4 notS notMax itj mid TCU
15 B 5 notS notMax cnt mid TCU
16 B 6 notS notMax cnt mid TCU
17 B 7 notS notMax cnt lte TCU
18 B 8 notS notMax cnt lte TCU
19 B 9 S priorMax cnt lte notTCU
20 B 10 notS Max cnt lte notTCU
21 C 1 notS notMax fnc earl notTCU
22 C 2 notS notMax fnc earl notTCU
23 C 3 notS notMax fnc earl TCU
24 C 4 notS notMax itj mid TCU
25 C 5 notS notMax cnt mid TCU
26 C 6 notS notMax cnt mid TCU
27 C 7 notS notMax cnt lte TCU
28 C 8 notS notMax cnt lte TCU
29 C 9 S priorMax cnt lte notTCU
30 C 10 notS Max cnt lte notTCU
My hypothesis is that a word will carry nuclear stress (i.e., df$STRSS=="S"
) if
df$INFMX=="priorMAX"
, i.e., the word with the greatest informativity immediately follows the word with the nuclear stressdf$CLASS=="cnt"
, i.e., the word is a content worddf$STRCT=="notTCU"
, i.e., the word lies inside the TCUdf$POST=="lte"
, i.e., the word occurs late in the turnGiven that the response variable is binary, I've tried a generalized mixed model so far, using library("mlmRev")
:
model1 <- glmer(STRSS ~ (INFMX + CLASS + POST + STRCT)^2 +
(1 | SPKR) + (1 | SEQU), data = df, family = binomial(link = "logit"), nAGQ = 1)
The problems I'd appreciate help with are the following:
The model call produces some unpleasant information--what to make of it?
fixed-effect model matrix is rank deficient so dropping 19 columns /coefficients
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Hessian is numerically singular: parameters are not uniquely determined
And finally, how to read the output of the model summary?
summary(model1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: STRSS ~ (INFMX + CLASS + POST + STRCT)^2 + (1 | SPKR) + (1 | SEQU)
Data: df
AIC BIC logLik deviance df.resid
18.0 30.6 0.0 0.0 21
Scaled residuals:
Min 1Q Median 3Q Max
-1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08
Random effects:
Groups Name Variance Std.Dev.
SEQU (Intercept) 0.83102 0.9116
SPKR (Intercept) 0.05073 0.2252
Number of obs: 30, groups: SEQU, 10; SPKR, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.972e+01 7.249e+07 0 1
INFMXnotMax -4.107e-01 6.711e+07 0 1
INFMXpriorMax -7.929e+01 5.479e+07 0 1
CLASSfnc 3.565e-05 4.745e+07 0 1
CLASSitj 1.581e-06 4.745e+07 0 1
POSTlte 1.847e-05 3.875e+07 0 1
STRCTnotTCU -1.472e-05 4.745e+07 0 1
Correlation of Fixed Effects:
(Intr) INFMXnM INFMXpM CLASSf CLASSt POSTlt
INFMXnotMax -0.926
INFMXprirMx -0.378 0.408
CLASSfnc 0.218 -0.471 0.000
CLASSitj -0.218 0.000 0.000 0.333
POSTlte -0.535 0.289 0.000 0.408 0.408
STRCTnotTCU -0.655 0.707 0.000 -0.667 0.000 0.000
fit warnings:
fixed-effect model matrix is rank deficient so dropping 19 columns / coefficients
convergence code: 0
unable to evaluate scaled gradient
Hessian is numerically singular: parameters are not uniquely determined
Warning messages:
1: In vcov.merMod(object, use.hessian = use.hessian) :
variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
2: In vcov.merMod(object, correlation = correlation, sigm = sig) :
variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
I'm quite aware that this post is demanding a lot. Helpful pointers are appreciated all the more!
system("/Users/ekb5/xpdf-tools-mac-4.00/bin64/pdftotext /Users/ekb5/Downloads/19-1-165.pdf")