21 views

Skip to first unread message

Jan 10, 2019, 1:15:15 PM1/10/19

to statforli...@googlegroups.com

Hi all,

As suggested by Stefan, I'm duplicating this post from corpling-with-r to statforling-with-r.

I'm trying to model the placement of nuclear stress in 10-word turns in a linear mixed model but am *very* new to mixed modeling. The model includes these variables:

`STRSS`

, the binary response variable; the 10-word turns have been selected in such a way that only 1 word carries the nuclear stress`INFMX`

, a binary explanatory variable denoting whether a word carries the maximum informativity (i.e., 'surprisal' given the preceding word)`CLASS`

, an explanatory variable with three levels: function word, interjection, or content word`PSTN`

, an explanatory variable denoting whether the nuclear stress occurs early in the turn (words 1-3), in mid-turn position (words 4-6), or late in the turn (words 7-10)`STRCT`

, an explanatory variable denoting whether the nuclear stress falls on a word inside what is called the turn constructional unit (TCU) or not`SPKR`

, a random factor referring to speaker IDs, and`SEQU`

, another random factor referring each word to its place in the sequence of exactly 10 words, considered random because only 10-word turns are examined here, not turns of other lengths

Here's some reproducible data:

```
df <- data.frame(
SPKR = c(rep("A", 10), rep("B", 10), rep("C", 10)),
SEQU = rep(1:10, 3),
STRSS = rep(c(rep("notS", 8), "S", "notS"), 3),
INFMX = rep(c(rep("notMax", 8), "priorMax", "Max"), 3),
CLASS = rep(c(rep("fnc", 3), rep("itj", 1), rep("cnt", 6)), 3),
PSTN = rep(c(rep("earl", 3), rep("mid", 3), rep("lte", 4)), 3),
STRCT = rep(c(rep("notTCU", 2), rep("TCU", 6), rep("notTCU", 2)), 3)
)
df
SPKR SEQU STRSS INFMX CLASS
````PSTN`

STRCT
1 A 1 notS notMax fnc earl notTCU
2 A 2 notS notMax fnc earl notTCU
3 A 3 notS notMax fnc earl TCU
4 A 4 notS notMax itj mid TCU
5 A 5 notS notMax cnt mid TCU
6 A 6 notS notMax cnt mid TCU
7 A 7 notS notMax cnt lte TCU
8 A 8 notS notMax cnt lte TCU
9 A 9 S priorMax cnt lte notTCU
10 A 10 notS Max cnt lte notTCU
11 B 1 notS notMax fnc earl notTCU
12 B 2 notS notMax fnc earl notTCU
13 B 3 notS notMax fnc earl TCU
14 B 4 notS notMax itj mid TCU
15 B 5 notS notMax cnt mid TCU
16 B 6 notS notMax cnt mid TCU
17 B 7 notS notMax cnt lte TCU
18 B 8 notS notMax cnt lte TCU
19 B 9 S priorMax cnt lte notTCU
20 B 10 notS Max cnt lte notTCU
21 C 1 notS notMax fnc earl notTCU
22 C 2 notS notMax fnc earl notTCU
23 C 3 notS notMax fnc earl TCU
24 C 4 notS notMax itj mid TCU
25 C 5 notS notMax cnt mid TCU
26 C 6 notS notMax cnt mid TCU
27 C 7 notS notMax cnt lte TCU
28 C 8 notS notMax cnt lte TCU
29 C 9 S priorMax cnt lte notTCU
30 C 10 notS Max cnt lte notTCU

My hypothesis is that a word will carry nuclear stress (i.e., `df$STRSS=="S"`

) if

`df$INFMX=="priorMAX"`

, i.e., the word with the greatest informativity immediately follows the word with the nuclear stress`df$CLASS=="cnt"`

, i.e., the word is a content word`df$STRCT=="notTCU"`

, i.e., the word lies inside the TCU`df$`

`PSTN`

`=="lte"`

, i.e., the word occurs late in the turn

Given that the response variable is binary, I've tried a generalized mixed model so far, using `library("mlmRev")`

:

`model1 <- glmer(STRSS ~ (INFMX + CLASS + ``PSTN`

+ STRCT)^2 +
(1 | SPKR) + (1 | SEQU), data = df, family = binomial(link = "logit"), nAGQ = 1)

The problems I'd appreciate help with are the following:

- Is this the right approach? I.e., is this, at least in principle, the right model?
The model call produces some unpleasant information--what to make of it?

`fixed-effect model matrix is rank deficient so dropping 19 columns /coefficients Warning messages: 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : unable to evaluate scaled gradient 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Hessian is numerically singular: parameters are not uniquely determined`

And finally, how to read the output of the model summary?

`summary(model1) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod'] Family: binomial ( logit ) Formula: STRSS ~ (INFMX + CLASS +`

`PSTN`

+ STRCT)^2 + (1 | SPKR) + (1 | SEQU) Data: df AIC BIC logLik deviance df.resid 18.0 30.6 0.0 0.0 21 Scaled residuals: Min 1Q Median 3Q Max -1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08 Random effects: Groups Name Variance Std.Dev. SEQU (Intercept) 0.83102 0.9116 SPKR (Intercept) 0.05073 0.2252 Number of obs: 30, groups: SEQU, 10; SPKR, 3 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.972e+01 7.249e+07 0 1 INFMXnotMax -4.107e-01 6.711e+07 0 1 INFMXpriorMax -7.929e+01 5.479e+07 0 1 CLASSfnc 3.565e-05 4.745e+07 0 1 CLASSitj 1.581e-06 4.745e+07 0 1 PSTNlte 1.847e-05 3.875e+07 0 1 STRCTnotTCU -1.472e-05 4.745e+07 0 1 Correlation of Fixed Effects: (Intr) INFMXnM INFMXpM CLASSf CLASSt PSTNlt INFMXnotMax -0.926 INFMXprirMx -0.378 0.408 CLASSfnc 0.218 -0.471 0.000 CLASSitj -0.218 0.000 0.000 0.333 PSTNlte -0.535 0.289 0.000 0.408 0.408 STRCTnotTCU -0.655 0.707 0.000 -0.667 0.000 0.000 fit warnings: fixed-effect model matrix is rank deficient so dropping 19 columns / coefficients convergence code: 0 unable to evaluate scaled gradient Hessian is numerically singular: parameters are not uniquely determined Warning messages: 1: In vcov.merMod(object, use.hessian = use.hessian) : variance-covariance matrix computed from finite-difference Hessian is not positive definite or contains NA values: falling back to var-cov estimated from RX 2: In vcov.merMod(object, correlation = correlation, sigm = sig) : variance-covariance matrix computed from finite-difference Hessian is not positive definite or contains NA values: falling back to var-cov estimated from RX

I'm quite aware that this post is demanding a lot. Helpful pointers are appreciated all the more!

Best

Chris

--

Jan 10, 2019, 2:51:40 PM1/10/19

to StatForLing with R

First small comments/questions:

- why did you use that package and not lme4? mlmRev seems to have not been developed since 2011 and the new routines and capabilities go write beyond what was possible back then.

- given your expectation that a certain interaction will be significant, most ppl would probably say that you need a more elaborate random-effects structure.

- not sure that SEQU should be random factor.

As for how to read the output, that's something that would require a whole chapter so I'll just refer you to, eg, Gelman & Hill, Zuur et al., and Fields et al.

Cheers,

Cheers,

STG

--

Stefan Th. Gries

----------------------------------

UC Santa Barbara & JLU Giessen

http://tinyurl.com/stgries

----------------------------------

--

Stefan Th. Gries

----------------------------------

UC Santa Barbara & JLU Giessen

http://tinyurl.com/stgries

----------------------------------

Jan 10, 2019, 4:28:48 PM1/10/19

to statforli...@googlegroups.com

Well, being a bloody beginner in mixed-effects modelling I've relied on what (little) input I could get hold of and digest ...

Not sure either that SEQU should be a random factor; could probably be left out or be used as a fixed effect.

As to your refs, more complete refs would be helpful ...

--

You received this message because you are subscribed to the Google Groups "StatForLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to statforling-wit...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Jan 10, 2019, 4:35:54 PM1/10/19

to statforli...@googlegroups.com

and what would that "more elaborate random-effects structure" look like?

Jan 11, 2019, 6:37:18 AM1/11/19

to statforli...@googlegroups.com

With a little help from another forum, Cross Validated, I seem to have found at least an initial solution:

1. First, I've got rid of SEQU and PSTN, leaving me with this:

`m6 <- glmer(STRSS ~ INFMX + CLASS + STRCT + (1 | SPKR), data = df,`

```
family = binomial(), nAGQ = 15)
```

2. However, the warnings persisted. As I've learnt, that's due to complete separation, that is, the fact that the frequency of some cells is 0 (that can be checked, for example, by

```
with(df, table(STRSS, INFMX))
```

To solve this problem, a penalty can be placed on the fixed-effects coefficients available in this package:

```
library("GLMMadaptive")
```

`This model produces no warnings, no errors:`

m7 <- mixed_model(STRSS ~ INFMX + CLASS + STRCT, random = ~ 1 | SPKR,
data = df, family = binomial(), nAGQ = 15, penalized = TRUE,
initial_values = list(betas = rep(0, 6)))
summary(m7)

`Curious to see how it plays out with the real data!`

`Chris`

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu