std.lv=TRUE

3,394 views
Skip to first unread message

Wil Cunningham

unread,
Nov 26, 2012, 3:53:44 PM11/26/12
to lav...@googlegroups.com
Hello all,
I am learning lavaan for my structural equation modeling course that I am teaching in the Spring (I would like to teach all in open source software, and stop using LISREL). I have been going through all of my slides, and I found my first difference between LISREL and lavaan, and I am wondering whether I am doing something wrong.
In the lecture, I am showing the difference between setting the latent variable to have a variance of 1, vs. setting a path. In the lecture, I show that it doesn't make a different for the z-scores of the relationships between the latent variables. But, when I do this in lavaan using std.lv = TRUE rather than std.lv= FALSE, I get quite different results. Some paths go from being significant to not, etc. 
Is there something else that I need to do when standardizing the latent variables to 1 for analysis that I may be missing?
Thanks
Wil

Alex Schoemann

unread,
Nov 26, 2012, 10:35:28 PM11/26/12
to lav...@googlegroups.com
Hi Will,

Without seeing your code I'm not sure if anything else is going on, but I wouldn't expect z scores to be the same depending on how you set your scale. The standard errors of parameters are sensitive to the method of scale setting (Gonzalez & Griffin, 2001). In this situation, the standardized parameter estimated and likelihood ratio test of a single parameter should be the same regardless of the method of scale setting.

Hope this helps,

Alex

Gonzalez, R., & Griffin, D. (2001). Testing parameters in structural equation modeling: Every ‘‘one’’ matters. Psychological Methods,6, 258–269.

yrosseel

unread,
Nov 27, 2012, 2:51:41 AM11/27/12
to lav...@googlegroups.com
On 11/26/2012 09:53 PM, Wil Cunningham wrote:
> Hello all,
> I am learning lavaan for my structural equation modeling course that I
> am teaching in the Spring (I would like to teach all in open source
> software, and stop using LISREL). I have been going through all of my
> slides, and I found my first difference between LISREL and lavaan, and I
> am wondering whether I am doing something wrong.

You should be aware that std.lv=TRUE sets the variances of the latent
variables to unity, but if that latent is a dependent variable, it
actually sets the _residual_ variance to unity. And this often has an
impact on the regression coefficients, standard errors, etc. The
'std.lv=TRUE' flag is very handy for CFA models, but once there is a
structural component, it may be less useful. You can always use the
syntax to be more specific.

But this is similar in LISREL. If you fix the residual variance to
unity, it impacts the regression coefficients.

Would you be able to send us the LISREL input file you used? I will show
how to translate it to lavaan (to get identical results).

Yves.

Ruben Arslan

unread,
Nov 27, 2012, 9:27:22 AM11/27/12
to lav...@googlegroups.com
Hi,

I think you just casually revealed a deep misunderstanding that I had.

> You should be aware that std.lv=TRUE sets the variances of the latent variables to unity, but if that latent is a dependent variable, it actually sets the _residual_ variance to unity. And this often has an impact on the regression coefficients, standard errors, etc. The 'std.lv=TRUE' flag is very handy for CFA models, but once there is a structural component, it may be less useful. You can always use the syntax to be more specific.

I was using std.lv=T, mostly because I wanted a simple way to extract semi-standardized estimates and their CIs. But if the residual variance is fixed to unity, my regression coefficients' size isn't well-interpretable.
How can I specify, using syntax, that I actually want the variances to be set to unity?
x ~~ 1*x doesn't do the trick.
Maybe it'd be easier to manually calculate CIs for the Std.lv estimate column..?

Best regards,

Ruben

y =rnorm(1000)
x = 0.3*y + 0.7*rnorm(1000)
dta = data.frame(x1=x+0.2*rnorm(1000), x2=x+0.2*rnorm(1000),x3=x+0.2*rnorm(1000),y1=y+0.2*rnorm(1000),y2=y+0.2*rnorm(1000),y3=y+0.2*rnorm(1000))
model = "x =~ x1 + x2 + x3
y =~ y1 + y2 + y3
x ~ y
x ~~ 1*x
y ~~ 1*y"
sem.m <- sem(model=model, data=dta,std.lv=T)
summary(sem.m,standardized=T)

##Variances:
## x 1.000 0.845 0.845
## y 1.000 1.000 1.000

yrosseel

unread,
Nov 30, 2012, 8:06:56 AM11/30/12
to lav...@googlegroups.com
> I was using std.lv=T, mostly because I wanted a simple way to extract semi-standardized estimates and their CIs.

One of these days, I will make sure the 'se' column in the output of
standardizedSolution() is fill in, and then you will have CIs based on
standardized parameters.

> How can I specify, using syntax, that I actually want the variances to be set to unity?
> x ~~ 1*x doesn't do the trick.

Yes it does? The variance of x is 1.0 in the output below, isn't? But
perhaps I misunderstood the question.

Ruben Arslan

unread,
Nov 30, 2012, 8:41:04 AM11/30/12
to lav...@googlegroups.com
One of these days, I will make sure the 'se' column in the output of standardizedSolution() is fill in, and then you will have CIs based on standardized parameters.

Great, thank you. :-) 

How can I specify, using syntax, that I actually want the variances to be set to unity?
x ~~ 1*x doesn't do the trick.

Yes it does? The variance of x is 1.0 in the output below, isn't? But perhaps I misunderstood the question.

Um, yes, stupid me. What worried me was:

library(lavaan); library(psych)
y =rnorm(1000)
x = 0.5*y + 0.5*rnorm(1000)
dta = data.frame(x1=x+0.2*rnorm(1000), x2=x+0.2*rnorm(1000),x3=x+0.2*rnorm(1000),y1=y+0.2*rnorm(1000),y2=y+0.2*rnorm(1000),y3=y+0.2*rnorm(1000))
model = "x =~ x1 + x2 + x3
y =~ y1 + y2 + y3
x ~  y
x ~~ 1*x
y ~~ 1*y"
sem.m <- sem(model=model, data=dta,std.lv=T)
summary(sem.m,standardized=T)
psych::describe(predict(sem.m))

#   var    n mean   sd median trimmed  mad   min  max range  skew kurtosis   se
# x   1 1000    0 1.32  -0.03   -0.01 1.41 -3.91 4.39  8.29  0.07    -0.24 0.04
# y   2 1000    0 0.99   0.03    0.01 1.03 -3.22 2.96  6.18 -0.10    -0.19 0.03
# Regressions:
#   x ~
#     y                 0.896    0.044   20.291    0.000    0.667    0.667

i.e. I thought this was the behaviour you described, where the residual variance of x is set to 1 meaning that the variance of the dependent latent x "grows" the more y explains, and the unstandardised slope does too.

Wil Cunningham

unread,
Nov 30, 2012, 11:04:54 PM11/30/12
to lav...@googlegroups.com
Hello all. Thanks for thinking through this. I played for a little bit to see if I could figure it out, and am still a little stumped. I used to set the variances of my latent variables to one so that I could get estimates of all the lambda paths for all the indicators. I thought that I had it figured out here:

library("lavaan")
library("semTools")
data(exLong)
sample.model <- '
# measurement model
x =~ y1t1 + y2t1 + y3t1
y =~ y1t2 + y2t2 + y3t2
x ~ y
'
fit <- sem(sample.model, data=exLong)
summary(fit, fit.measures=TRUE,  standardized=TRUE)

sample.model2 <- '
# measurement model
x =~ y1t1 + y2t1 + y3t1
y =~ y1t2 + y2t2 + y3t2
x ~  y
x ~~ 1*x
y ~~ 1*y
'
fit2 <- sem(sample.model2, data=exLong, std.lv=TRUE)
summary(fit2, fit.measures=TRUE,  standardized=TRUE)

[as a side note, I also tried x =~ NA*y1t1 + y2t1 + y3t1 and removing the std.lv=TRUE from the model and I get the same results as what I am pasting below]

Things look OK, except for when I look at the z tests for the parameters. 

For fit1, I get:
Regressions:
  x ~
    y                 0.715    0.072    9.942    0.000    0.763    0.763

Whereas for fit2. I get:
Regressions:
  x ~
    y                 1.181    0.157    7.515    0.000    0.763    0.763

I see that the standardized estimates are the same, but I would assume that the z value should be the same too, no?

Sorry for the confusion. And, thanks for the help
Best,
Wil

Sunthud Pornprasertmanit

unread,
Dec 1, 2012, 3:27:20 AM12/1/12
to lav...@googlegroups.com
In the first example, you use the marker-variable approach for scale identification such that one factor loading from each factor will be fixed to 1. All factor variances are freely estimated. If you see the result of the fit object, the x and y variances are 0.421 and 1.147, respectively. However, the second script uses the fixed-factor method of scale identification. The factor variances are fixed to 1 and all factor loadings are freely estimated.

The first model have factor variances of x and y of 0.421 and 1.147.
The first model have factor variances of x and y of 1 and 1.

Therefore, the regression coefficients from both models will be different.

Like any multiple regression, if we multiply the dependent variable by a number, the unstandardized regression coefficient will be changed. However, the standardized regression coefficients will have the same values regardless of the scale of dependent variables.

Sunthud


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan?hl=en.
 
 

Wil Cunningham

unread,
Dec 1, 2012, 11:06:18 AM12/1/12
to lav...@googlegroups.com
Thanks. Sorry if I am being dense, but I don't understand why the z values for the coefficient would change. One latent variable is more significantly predictive of another depending on how the variance is defined. This is something that never happened in LISREL (setting PS(1,1) to be 1, vs. estimating PS(1,1) and setting ly(1,1) to be 1). In those causes, the parameters were different, but the z test was the same.
Best,
Wil

Sunthud Pornprasertmanit

unread,
Dec 1, 2012, 1:13:11 PM12/1/12
to lav...@googlegroups.com
Sorry. I forgot that the factor variance is currently residual variance. However, the basic principle is the same. Please think about the standardized regression formula

std_beta = beta * sqrt(totalvar_y) / sqrt(totalvar_X)

Note that you have Y as independent variable and X as dependent variable

The total variance can be computed by

totalvar_X = beta^2*totalvar_Y + residualvar_X

In the first model with marker-variable approach:

totalvar_X = (0.715^2) * 1.147 + 0.421 = 1.007

Thus, the std_beta will be

0.715 * sqrt(1.147) / sqrt(1.007) = 0.763

In the second model with fixed-factor approach:

totalvar_X = (1.181^2) * 1 + 1 = 2.395

Thus, the std_beta will be

1.181 * sqrt(1) / sqrt(2.395) = 0.763

From the math, the standardized regressions are the same. Two models have different factor variances. However, the difference in factor variances are offset by the difference in factor loadings, which make the equivalent models. Also, because factor variances are different, the regression coefficients are adjusted for the change in factor variances. However, the standardized regression coefficients are the same.

Sunthud

Alex Schoemann

unread,
Dec 2, 2012, 1:38:48 PM12/2/12
to lav...@googlegroups.com
To add to what Sunthud said: Setting the scale with different methods affects BOTH parameter estimates and standard errors, thus the z statistic will be different across the methods. If you're interested in why standard errors are different I highly recommend Gonzalez and Griffin.

-Alex

Gonzalez, R., & Griffin, D. (2001). Testing parameters in structural equation modeling: Every ‘‘one’’ matters. Psychological Methods,6, 258–269.

Wil Cunningham

unread,
Dec 2, 2012, 7:05:28 PM12/2/12
to lav...@googlegroups.com
Thanks. Reading that paper was amazing. I'm so glad that I thought I found an "error," as I have learned that I had a larger misunderstanding. Now, to figure out how how to estimate latent means and try MACS modeling in lavaan (so, I may be back).
Best,
Wil
Reply all
Reply to author
Forward
0 new messages