As suggested by Stefan, I'm duplicating this post from corpling-with-r to statforling-with-r.
I'm trying to model the placement of nuclear stress in 10-word turns in a linear mixed model but am *very* new to mixed modeling. The model includes these variables:
STRSS
, the binary response variable; the 10-word turns have been
selected in such a way that only 1 word carries the nuclear stress INFMX
, a binary explanatory variable denoting whether a word
carries the maximum informativity (i.e., 'surprisal' given the
preceding word) CLASS
, an explanatory variable with three levels:
function word, interjection, or content word PSTN
, an explanatory variable denoting whether the nuclear stress occurs early in the turn
(words 1-3), in mid-turn position (words 4-6), or late in the turn
(words 7-10) STRCT
, an explanatory variable denoting whether the
nuclear stress falls on a word inside what is called the turn
constructional unit (TCU) or not SPKR
, a random factor referring to
speaker IDs, and SEQU
, another random factor referring each word to its
place in the sequence of exactly 10 words, considered random because
only 10-word turns are examined here, not turns of other lengthsHere's some reproducible data:
df <- data.frame(
SPKR = c(rep("A", 10), rep("B", 10), rep("C", 10)),
SEQU = rep(1:10, 3),
STRSS = rep(c(rep("notS", 8), "S", "notS"), 3),
INFMX = rep(c(rep("notMax", 8), "priorMax", "Max"), 3),
CLASS = rep(c(rep("fnc", 3), rep("itj", 1), rep("cnt", 6)), 3),
PSTN = rep(c(rep("earl", 3), rep("mid", 3), rep("lte", 4)), 3),
STRCT = rep(c(rep("notTCU", 2), rep("TCU", 6), rep("notTCU", 2)), 3)
)
df
SPKR SEQU STRSS INFMX CLASS PSTN
STRCT
1 A 1 notS notMax fnc earl notTCU
2 A 2 notS notMax fnc earl notTCU
3 A 3 notS notMax fnc earl TCU
4 A 4 notS notMax itj mid TCU
5 A 5 notS notMax cnt mid TCU
6 A 6 notS notMax cnt mid TCU
7 A 7 notS notMax cnt lte TCU
8 A 8 notS notMax cnt lte TCU
9 A 9 S priorMax cnt lte notTCU
10 A 10 notS Max cnt lte notTCU
11 B 1 notS notMax fnc earl notTCU
12 B 2 notS notMax fnc earl notTCU
13 B 3 notS notMax fnc earl TCU
14 B 4 notS notMax itj mid TCU
15 B 5 notS notMax cnt mid TCU
16 B 6 notS notMax cnt mid TCU
17 B 7 notS notMax cnt lte TCU
18 B 8 notS notMax cnt lte TCU
19 B 9 S priorMax cnt lte notTCU
20 B 10 notS Max cnt lte notTCU
21 C 1 notS notMax fnc earl notTCU
22 C 2 notS notMax fnc earl notTCU
23 C 3 notS notMax fnc earl TCU
24 C 4 notS notMax itj mid TCU
25 C 5 notS notMax cnt mid TCU
26 C 6 notS notMax cnt mid TCU
27 C 7 notS notMax cnt lte TCU
28 C 8 notS notMax cnt lte TCU
29 C 9 S priorMax cnt lte notTCU
30 C 10 notS Max cnt lte notTCU
My hypothesis is that a word will carry nuclear stress (i.e., df$STRSS=="S"
) if
df$INFMX=="priorMAX"
, i.e., the word with the greatest informativity immediately follows the word with the nuclear stressdf$CLASS=="cnt"
, i.e., the word is a content worddf$STRCT=="notTCU"
, i.e., the word lies inside the TCUdf$
PSTN
=="lte"
, i.e., the word occurs late in the turnGiven that the response variable is binary, I've tried a generalized mixed model so far, using library("mlmRev")
:
model1 <- glmer(STRSS ~ (INFMX + CLASS + PSTN
+ STRCT)^2 +
(1 | SPKR) + (1 | SEQU), data = df, family = binomial(link = "logit"), nAGQ = 1)
The problems I'd appreciate help with are the following:
The model call produces some unpleasant information--what to make of it?
fixed-effect model matrix is rank deficient so dropping 19 columns /coefficients
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Hessian is numerically singular: parameters are not uniquely determined
And finally, how to read the output of the model summary?
summary(model1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: STRSS ~ (INFMX + CLASS + PSTN
+ STRCT)^2 + (1 | SPKR) + (1 | SEQU)
Data: df
AIC BIC logLik deviance df.resid
18.0 30.6 0.0 0.0 21
Scaled residuals:
Min 1Q Median 3Q Max
-1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08
Random effects:
Groups Name Variance Std.Dev.
SEQU (Intercept) 0.83102 0.9116
SPKR (Intercept) 0.05073 0.2252
Number of obs: 30, groups: SEQU, 10; SPKR, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.972e+01 7.249e+07 0 1
INFMXnotMax -4.107e-01 6.711e+07 0 1
INFMXpriorMax -7.929e+01 5.479e+07 0 1
CLASSfnc 3.565e-05 4.745e+07 0 1
CLASSitj 1.581e-06 4.745e+07 0 1
PSTNlte 1.847e-05 3.875e+07 0 1
STRCTnotTCU -1.472e-05 4.745e+07 0 1
Correlation of Fixed Effects:
(Intr) INFMXnM INFMXpM CLASSf CLASSt PSTNlt
INFMXnotMax -0.926
INFMXprirMx -0.378 0.408
CLASSfnc 0.218 -0.471 0.000
CLASSitj -0.218 0.000 0.000 0.333
PSTNlte -0.535 0.289 0.000 0.408 0.408
STRCTnotTCU -0.655 0.707 0.000 -0.667 0.000 0.000
fit warnings:
fixed-effect model matrix is rank deficient so dropping 19 columns / coefficients
convergence code: 0
unable to evaluate scaled gradient
Hessian is numerically singular: parameters are not uniquely determined
Warning messages:
1: In vcov.merMod(object, use.hessian = use.hessian) :
variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
2: In vcov.merMod(object, correlation = correlation, sigm = sig) :
variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
I'm quite aware that this post is demanding a lot. Helpful pointers are appreciated all the more!
--
You received this message because you are subscribed to the Google Groups "StatForLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to statforling-wit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
m6 <- glmer(STRSS ~ INFMX + CLASS + STRCT + (1 | SPKR), data = df,
family = binomial(), nAGQ = 15)
2. However, the warnings persisted. As I've learnt, that's due to complete separation, that is, the fact that the frequency of some cells is 0 (that can be checked, for example, by
with(df, table(STRSS, INFMX))
To solve this problem, a penalty can be placed on the fixed-effects coefficients available in this package:
library("GLMMadaptive")
This model produces no warnings, no errors:
m7 <- mixed_model(STRSS ~ INFMX + CLASS + STRCT, random = ~ 1 | SPKR,
data = df, family = binomial(), nAGQ = 15, penalized = TRUE,
initial_values = list(betas = rep(0, 6)))
summary(m7)
Curious to see how it plays out with the real data!
Chris