Interaction between latent variable

2,740 views
Skip to first unread message

JB

unread,
May 10, 2013, 11:53:17 PM5/10/13
to lav...@googlegroups.com
Dear All,
I am trying to replicate a Mplus example of Quasi-ML method to estimate an interaction between latent variables (the reference is at the end). The model is very simple: two latent factors A (defined by X1 and X2) and B (by W1 and W2) predict Y (observed variable). Goal: estimate the interaction effect between A and B. The Mplus syntax looks like that:
VARIABLE: NAMES ARE X1 X2 W1 W2 Y;
  ANALYSIS: TYPE = RANDOM; ALGORITHM = INTEGRATION;
  MODEL:    A BY X1 X2;
            B BY W1 W2;
            AB | A XWITH B;
            Y ON A B AB;
My first attempt in Lavaan was:
 Mod=
  'A=~X1+X2
   B=~W1+W2
  AB:=A*B
   Y~ A + B + AB'
Fit=sem(Mod, data=data,estimator="MLR")#I used MLR just because it was used in Mplus output
First error I get is that I cant define AB as the product of A and B. I think this is because only predefined labels can be used in lavaan to define a new term. But I don't know how to label a latent factor (I can only label parameters on the right side of the equations).
And I guess I would have other problems even if I could do that because I am not sure that the method to estimate the interaction between two latent variables is implemented in lavaan and what wud be the syntax. I do not include data but because the model is very simple I guess the HolzingerSwineford1939 will be ok.
Many Thanks!
JB

Klein, A. G., & Muthén, B. O. (2007). Quasimaximum
likelihood estimation of structural
equation models with multiple interaction and
quadratic effects. Multivariate Behavioral
Research, 42, 647–673.

yrosseel

unread,
May 11, 2013, 4:32:39 AM5/11/13
to lav...@googlegroups.com
On 05/11/2013 05:53 AM, JB wrote:
> Dear All,
> I am trying to replicate a Mplus example of Quasi-ML method to estimate
> an interaction between latent variables (the reference is at the end).

I'm afraid this has not been implemented in lavaan (0.5-12) yet. You
could try out the kenny & judd (1984) approach, where you need to
(manually) create product terms of the factor indicators (that are
involved in the interaction) in the data.frame, and where you need to
specify a bunch of constraints in the model syntax.

Yves.

Alex Schoemann

unread,
May 12, 2013, 2:45:25 PM5/12/13
to lav...@googlegroups.com
Alternatively, you could use either double mean centering (Lin et al., 2010) or residual centering (Little et al., 2006) to estimate interactions. The indproduct function in semTools can help implement these approaches.

-Alex

Lin, G. C., Wen, Z., Marsh, H. W., & Lin, H. S. (2010). Structural equation models of latent interactions: Clarification of orthogonalizing and double-mean-centering strategies. Structural Equation Modeling, 17, 374-391.

Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13, 497-519.

Markus Brauer

unread,
Nov 25, 2013, 10:44:29 AM11/25/13
to lav...@googlegroups.com

Hi,

Are there any simple procedures in lavaan that allow us to estimate interactive effects of latent variables? It is possible, of course, to use the Kenny-Judd method to specify the latent product variables "by hand", and to estimate the product indicators and the variances of the latent product variables. However, product variables are generally not normally distributed even if their component variables are normally distributed. I recently read that Klein and colleagues suggested approaches that take into account the degree of non-normality implied by the latent product terms. Klein and Moosbrugger (2000) suggested the "latent moderated structural equations method", which uses a form of the expectation-maximization (EM) algorithm. Klein and Muthén (2006) suggested the "quasi-maximum likelihood estimation method", which uses a simpler algorithm but closely approximates results of the former method. Have any of these approaches been incorporated in lavaan?

Is the Lin et al. (2010) approach of orthogonalizing and double-mean-centering the latest state of the art?

Best wishes,

 -- Markus


Alex Schoemann

unread,
Nov 25, 2013, 12:10:21 PM11/25/13
to lav...@googlegroups.com
If you have software that performs the LMS/QML methods proposed by Klein and colleagues (i.e. Mplus), that would be your best bet. However, both double mean centering and orthogonalizing have been shown to perform well when estimating interactions among latent variables (and are much easier to use than the Kenny-Judd/Yoreskog-Yang) methods).

Markus Brauer

unread,
Dec 3, 2013, 2:51:29 PM12/3/13
to lav...@googlegroups.com

So just to make sure I understand: (1) The LMS/QML methods proposed by Klein and colleagues have not yet been implemented in lavaan. (2) Given that I want to use lavaan, I should be using either double mean centering or orthogonalizing. Correct?  -- M


yrosseel

unread,
Dec 5, 2013, 3:35:52 AM12/5/13
to lav...@googlegroups.com
On 12/03/2013 08:51 PM, Markus Brauer wrote:
>
> So just to make sure I understand: (1) The LMS/QML methods proposed by
> Klein and colleagues have not yet been implemented in lavaan.

Correct.

(2) Given
> that I want to use lavaan, I should be using either double mean
> centering or orthogonalizing. Correct?

Correct. And although a bit more work to setup (but see the 'indProd()'
function in the semTools package), this works really well.

Yves.

Message has been deleted

Mark Seeto

unread,
Dec 10, 2014, 8:34:43 PM12/10/14
to lav...@googlegroups.com
This isn't a lavaan question, but a question about the Lin et al. (2010) double mean centering method. Near the bottom of p. 378 of the article, why are there only 3 product terms (x1x4, x2x5, x3x6)? Why aren't all 9 possible products of x1,x2,x3 with x4,x5,x6 included?

Thanks,
Mark

Edward Rigdon

unread,
Dec 10, 2014, 10:14:39 PM12/10/14
to lav...@googlegroups.com
Mark--
     I first saw this notion in a paper by Herb Marsh et al.  This is called "matching"--multiplying the first indicator of A times the first indicator of B, the second by the second, the third by the third, etc.  Formally, it was "the best times the best, the second best times the second best . . ". The motivation to do this, rather than multiply all times all, is to limit the nonnormality in the model.  Each product term will be nonnormal, even if the original indicators were normal.  Marsh et al demonstrated that there was little lost in just using the matched product indicators, and you gained a reduction in nonnormality.  By the way, if you have not uncovered it yet, check out the indProd function in the semTools package.  You can request both double mean centering and matching, and the function does it just like that.  Remember to use MLR as your estimator, due to the nonnormality that results from using product indicators.
--Ed Rigdon

Sent from my iPad

On Dec 10, 2014, at 8:26 PM, Mark Seeto <mark...@gmail.com> wrote:

This isn't a lavaan question, but a question about the Lin et al. (2010) double mean centering method. Near the bottom of p. 378 of the article, why are there only 3 product terms (x1x3, x2x4, x3x6)? Why aren't all 9 possible products of x1,x2,x3 with x4,x5,x6 included?

Thanks,
Mark


On Monday, May 13, 2013 4:45:25 AM UTC+10, Alex Schoemann wrote:

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Mark Seeto

unread,
Dec 10, 2014, 10:28:41 PM12/10/14
to lav...@googlegroups.com
Thank you very much for your helpful reply, Ed. Since I have ordinal variables, I assume it will be ok to use the WLSMV estimator and look at the "Robust" column.

Alex Schoemann

unread,
Dec 11, 2014, 10:07:29 AM12/11/14
to lav...@googlegroups.com
Ed is right on describing the matched pair strategy, but it should be used with caution. While including only matched pair limits non-normality in the model, it makes a (fairly strong) assumption of tau equivalence (all indicators of a latent variable have equal factor loadings). As a result, if items are not tau equivalent different matched pair groupings can lead to different results (see Foldnes & Hagtvet, 2014 for a much fuller discussion of this topic). 

Foldnes, N., & Hagtvet, K. A. (2014). The choice of product indicators in latent variable interaction models: Post hoc analyses. Psychological Methods, 3, 447-457.

Mark Seeto

unread,
Dec 11, 2014, 3:41:58 PM12/11/14
to lav...@googlegroups.com
Thanks Alex, that article was very helpful, and it was nice to see that it used lavaan.

Mark Seeto

unread,
Dec 14, 2014, 6:18:22 PM12/14/14
to lav...@googlegroups.com
I've tried using the "all pairs" approach to estimating an interaction between latent variables. When the true interaction (beta3 in the example below) is 0, the estimate is close to 0, but when the true interaction is non-zero (0.3 in the example), the estimate appears to be biased.

## Example:
library(lavaan)
library(semTools)
library(mvtnorm)

n <- 10000  # sample size

rho <- 0.3  # correlation between exogenous latent variables

beta1 <- 0.7  # F3 ~ F1 coefficient
beta2 <- 0.5  # F3 ~ F2 coefficient
beta3 <- 0.3  # F3 ~ F1F2 coefficient

F <- rmvnorm(n, sigma = matrix(c(1, rho, rho, 1), nrow=2))

F3 <- beta1*F[, 1] + beta2*F[, 2] + beta3*F[, 1]*F[, 2] + rnorm(n, 0, 0.1)

d <- data.frame(x1 = 0.8*F[, 1] + rnorm(n, 0, 0.2),
                x2 = 0.7*F[, 1] + rnorm(n, 0, 0.2),
                x3 = 0.6*F[, 1] + rnorm(n, 0, 0.2),
                x4 = 0.8*F[, 2] + rnorm(n, 0, 0.2),
                x5 = 0.7*F[, 2] + rnorm(n, 0, 0.2),
                x6 = 0.6*F[, 2] + rnorm(n, 0, 0.2),
                x7 = 0.8*F3 + rnorm(n, 0, 0.2),
                x8 = 0.7*F3 + rnorm(n, 0, 0.2),
                x9 = 0.6*F3 + rnorm(n, 0, 0.2))

d <- scale(d, center=TRUE, scale=FALSE)

d <- indProd(d, 1:3, 4:6, match=FALSE)

# Model A: no correlations between residuals of product indicators
modelA <- '
F1 =~ x1 + x2 + x3
F2 =~ x4 + x5 + x6
F1.F2 =~ x1.x4 + x1.x5 + x1.x6 + x2.x4 + x2.x5 + x2.x6 + x3.x4 + x3.x5 + x3.x6
F3 =~ x7 + x8 + x9
F3 ~ F1 + F2 + F1.F2
'

# Model B: correlations between residuals of product indicators having a common component
modelB <- '
F1 =~ x1 + x2 + x3
F2 =~ x4 + x5 + x6
F1.F2 =~ x1.x4 + x1.x5 + x1.x6 + x2.x4 + x2.x5 + x2.x6 + x3.x4 + x3.x5 + x3.x6
F3 =~ x7 + x8 + x9
F3 ~ F1 + F2 + F1.F2
x1.x4 ~~ x1.x5
x1.x4 ~~ x1.x6
x1.x4 ~~ x2.x4
x1.x4 ~~ x3.x4
x1.x5 ~~ x1.x6
x1.x5 ~~ x2.x5
x1.x5 ~~ x3.x5
x1.x6 ~~ x2.x6
x1.x6 ~~ x3.x6
x2.x4 ~~ x2.x5
x2.x4 ~~ x2.x6
x2.x4 ~~ x3.x4
x2.x5 ~~ x2.x6
x2.x5 ~~ x3.x5
x2.x6 ~~ x3.x6
x3.x4 ~~ x3.x5
x3.x4 ~~ x3.x6
x3.x5 ~~ x3.x6
'

semA <- sem(modelA, data = d, estimator = "MLR")
semB <- sem(modelB, data = d, estimator = "MLR")

summary(semA)
## Regressions:
##   F3 ~
##     F1                0.698    0.003  205.030    0.000
##     F2                0.506    0.003  161.722    0.000
##     F1.F2             0.369    0.004  100.064    0.000

summary(semB)
## Regressions:
##   F3 ~
##     F1                0.699    0.003  204.940    0.000
##     F2                0.506    0.003  161.591    0.000
##     F1.F2             0.380    0.004   98.997    0.000

The estimates of beta1 and beta2 appear to be unbiased, but the estimate of beta3 appears to be biased. Is this to be expected, or am I doing something incorrectly?

Thanks,
Mark


Edward Rigdon

unread,
Dec 14, 2014, 9:15:26 PM12/14/14
to lav...@googlegroups.com

Mark—

     Be sure to allow for an intercept for the outcome factor. If I recall correctly, Marsh et al specified that, even with centering as you have done, the mean of the interaction factor would be equal to the covariance of the main effect factors, leading to a nonzero intercept for the outcome variable.  Alternatively, try this with F1 and F2 uncorrelated.

--Ed Rigdon  

--

Mark Seeto

unread,
Dec 14, 2014, 9:56:06 PM12/14/14
to lav...@googlegroups.com
Thanks for your reply, Ed.

Is an intercept included by adding 'F3 ~ 1' to the model specification? When I use

modelB <- '
F1 =~ x1 + x2 + x3
F2 =~ x4 + x5 + x6
F1.F2 =~ x1.x4 + x1.x5 + x1.x6 + x2.x4 + x2.x5 + x2.x6 + x3.x4 + x3.x5 + x3.x6
F3 =~ x7 + x8 + x9
F3 ~ F1 + F2 + F1.F2
x1.x4 ~~ x1.x5
x1.x4 ~~ x1.x6
x1.x4 ~~ x2.x4
x1.x4 ~~ x3.x4
x1.x5 ~~ x1.x6
x1.x5 ~~ x2.x5
x1.x5 ~~ x3.x5
x1.x6 ~~ x2.x6
x1.x6 ~~ x3.x6
x2.x4 ~~ x2.x5
x2.x4 ~~ x2.x6
x2.x4 ~~ x3.x4
x2.x5 ~~ x2.x6
x2.x5 ~~ x3.x5
x2.x6 ~~ x3.x6
x3.x4 ~~ x3.x5
x3.x4 ~~ x3.x6
x3.x5 ~~ x3.x6
F3 ~ 1
'
semB <- sem(modelB, data = d, estimator = "MLR")

I get:
In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: could not compute standard errors!
  lavaan NOTE: this may be a symptom that the model is not identified.

When I use the original model specification but with rho being 0 instead of 0.3, the estimate of the interaction effect is about the same as before, i.e. too high.

Thanks for your help.

Mark

Edward Rigdon

unread,
Dec 14, 2014, 10:12:54 PM12/14/14
to lav...@googlegroups.com

Mark—

     OK, the mean structure is already saturated—I guess with free intercepts for all the observed variables—and you have multiple indicators for the outcome factor.  Sorry to waste your time on that wild goose chase.

     If you are willing to risk another one, try the double mean centering option in the indProd function.  Factor interactions are an odd duck.  In most cases, mean structure and covariance structure operate independently, and you can safely ignore mean structure while modeling covariance structure.  But the two become entwined in factor interactions.  Double mean centering ought to remove this complication).

--Ed Rigdon

 

From: lav...@googlegroups.com [mailto:lav...@googlegroups.com] On Behalf Of Mark Seeto
Sent: Sunday, December 14, 2014 9:56 PM
To: lav...@googlegroups.com
Subject: Re: Interaction between latent variable

 

Thanks for your reply, Ed.

--

Mark Seeto

unread,
Dec 14, 2014, 10:25:45 PM12/14/14
to lav...@googlegroups.com
Apology not necessary, Ed. I really appreciate your help with this.

Do you mean use doubleMC=TRUE in indProd? I thought that was already the default (and I've verified that the sample means of the resulting products are zero).

Are you confident that structural equation modelling should be able to give an unbiased estimate of the interaction (at least when the sample size is large)?

Mark


On Monday, December 15, 2014 2:12:54 PM UTC+11, Edward Rigdon wrote:

Mark—

     OK, the mean structure is already saturated—I guess with free intercepts for all the observed variables—and you have multiple indicators for the outcome factor.  Sorry to waste your time on that wild goose chase.

Edward Rigdon

unread,
Dec 14, 2014, 10:49:18 PM12/14/14
to lav...@googlegroups.com

Mark—

     Yes, I am confident that a correctly specified factor interaction model will yield a consistent estimate of the coefficients.  There are a range of different methods—not all easily available in R—but comparisons across methods focus on statistical efficiency, because all of them yield consistency.

--Ed Rigdon

 

From: lav...@googlegroups.com [mailto:lav...@googlegroups.com] On Behalf Of Mark Seeto
Sent: Sunday, December 14, 2014 10:26 PM
To: lav...@googlegroups.com
Subject: Re: Interaction between latent variable

 

Apology not necessary, Ed. I really appreciate your help with this.

--

Njål Foldnes

unread,
Dec 15, 2014, 5:59:21 PM12/15/14
to lav...@googlegroups.com
lavaan fixes factor loadings to one, so try to change the first factor loading for F1, F2 and F1.F2 to one, so that these factors are scaled correctly and in line with the data-generating process. 

I then get: 

d <- data.frame(x1 = 1*F[, 1] + rnorm(n, 0, 0.2),
                x2 = 0.7*F[, 1] + rnorm(n, 0, 0.2),
                x3 = 0.6*F[, 1] + rnorm(n, 0, 0.2),
                x4 = 1*F[, 2] + rnorm(n, 0, 0.2),
                x5 = 0.7*F[, 2] + rnorm(n, 0, 0.2),
                x6 = 0.6*F[, 2] + rnorm(n, 0, 0.2),
                x7 = 1*F3 + rnorm(n, 0, 0.2),
                x8 = 0.7*F3 + rnorm(n, 0, 0.2),
                x9 = 0.6*F3 + rnorm(n, 0, 0.2))

With this, I get
F3  ~    F1        0.694 0.003 250.157      0    0.689    0.700
F3  ~    F2        0.494 0.003 190.876      0    0.489    0.499
F3  ~    F1.F2   0.301 0.002 129.987      0    0.297    0.306

Mark Seeto

unread,
Dec 15, 2014, 8:37:13 PM12/15/14
to lav...@googlegroups.com
Thank you for your help, Njål. I'll have to think about how to correctly interpret the different scaling and standardisation options.

Diana Meter

unread,
Aug 26, 2015, 8:30:37 PM8/26/15
to lavaan
Hello,
I used the indprod function in semTools to help with double mean centering, estimated a model with multiple interaction terms, and probed the interactions. The model converged, but the model fit is not great. I know that RMSEA, CFI, etc. are not appropriate when the LMS approach is used, but what about when double mean centering is used? 
I have been reading about different ways of assessing the model fit in latent interaction models. Little (2013) describes comparing the model with no interaction terms to one with interaction terms and looking for an insubstantial change in CFI after adjusting the degrees of freedom. In a paper about the LMS approach, Wasowsky and colleagues describe using a log-likelihood ratio test to compare the model that includes the interaction effects to one that does not (Maslowski, Jager, & Hemken, 2014). Can the log-likelihood ratio test be used with the double mean centering approach? Are there any other suggestions for how to best assess fit in latent interaction models, and any functions in R that are recommended to do this? 
Thank you very much in advance for your response.
Diana 

Terrence Jorgensen

unread,
Aug 27, 2015, 5:39:48 AM8/27/15
to lavaan
Yes, a log-(likelihood ratio) test is the same as the chi-squared difference test.  I assume that because you are using product indicators (which are not normally distributed, and are typically highly kurtotic when made from centered variables), that you are using a robust estimator (MLR).  You can get the correct calculation of a scaled (robust) chi-squared difference test by default using the lavTestLRT() function

fit.int <- sem(...)
fit.noint <- sem(...)
lavTestLRT(fit.int, fit.noint)

If the difference test comes out negative, you can use a different formulation that was proposed to prevent this:

lavTestLRT(fit.int, fit.noint, method = "satorra.bentler.2010")


Terry

Diana Meter

unread,
Aug 27, 2015, 12:58:14 PM8/27/15
to lavaan
Thank you very much, Terry!
Diana

Aiden

unread,
Mar 12, 2020, 7:22:36 AM3/12/20
to lavaan
Hello, 

Just to continue on this thread. I experimented the code provided by Mark Seeto and updated it as suggested by Njål Foldnes. 
However, for the second model where all the error terms were allowed to covary, I recevied a warning from lavaan. 

Warning message:
In lav_model_hessian(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: Hessian is not fully symmetric. Max diff = 4.84723343639228e-06

Could someone please help to explain what this mean? Are the estimated parameters still trustworthy if you receive this warning message? 
Also, I have seen several papers where the residuals terms for the product indicators are always correlated for residual centering.
So I assume this needs to be done in lavaan when we use residual centering. But do we do the same with double mean centering? 

Thank you, 
Aiden

Aiden

unread,
Mar 12, 2020, 10:11:18 AM3/12/20
to lavaan
Hello, 

Is it correct that we covary residuals of the indicators (e.g. x1z2 ~~ x1z5... etc) that make up the latent moderator variable when we do the cross-product indicator matching is due to specification reasons? And this isn't necessary when we do the match pair strategy?

Thanks, 
Aiden

Yves Rosseel

unread,
Mar 12, 2020, 12:37:08 PM3/12/20
to lav...@googlegroups.com
On 3/12/20 12:22 PM, Aiden wrote:
> However, for the second model where all the error terms were allowed to
> covary, I recevied a warning from lavaan.
>
> Warning message:
> In lav_model_hessian(lavmodel = lavmodel, lavsamplestats =
> lavsamplestats,  :
>   lavaan WARNING: Hessian is not fully symmetric. Max diff =
> 4.84723343639228e-06
>
> Could someone please help to explain what this mean?

This a numerical issue that occurs when computing the Hessian of the
free parameters. I recently added a check to verify if the Hessian is
symmetric (as needed), and this check is currently somewhat too
sensitive. It is usually triggered when some observed variances are much
larger than others. But since in your case, the 'difference' is so tiny,
it has no effect. You can safely ignore it.

The 'check' is now more robust in the dev version of lavaan.

Yves.

Aiden

unread,
Mar 12, 2020, 1:10:10 PM3/12/20
to lavaan
Thanks Yves!
Reply all
Reply to author
Forward
0 new messages