ordered observable exogenous variables in path analysis

156 views
Skip to first unread message

Peter Paprzycki

unread,
Jun 24, 2020, 3:08:37 PM6/24/20
to lavaan
Hello group members, I am trying to use age variable as an ordered variable (factor by using the R syntax below) in a multivariate regression (4 observable continuous factor scores obtained from my CFA analysis), so no latent variables. 

data$AgeCat.R<- factor(data$AgeCat, ordered = TRUE)

I received the following error message:

Warning in lav_data_full(data = data, group = group, cluster = cluster,  :
  lavaan WARNING: exogenous variable(s) declared as ordered in data: AgeCat.R
Error in lav_samplestats_step1(Y = Data, wt = wt, ov.names = ov.names,  : 
  lavaan ERROR: unknown ov.types:factor

Of course, I can go around it and turn my factor variable into a sequence of dummy variables, and it works, but can lavaan accommodate ordered exogenous variables?

Peter

Terrence Jorgensen

unread,
Jun 25, 2020, 4:37:08 AM6/25/20
to lavaan
Of course, I can go around it and turn my factor variable into a sequence of dummy variables, and it works, but can lavaan accommodate ordered exogenous variables?

No, you need to use dummies:  https://lavaan.ugent.be/tutorial/cat.html

If you really want to interpret the effect of an underlying normally distributed latent response, you would have to treat it as a single-indicator factor:

mod <- ' x1 + x2 ~ AGE
AGE =~ ageyr
ageyr ~~ 0*ageyr
AGE ~~ 1*AGE
'

summary
(sem(mod, data = HolzingerSwineford1939, ordered = "ageyr", parameterization = "theta"))

But that is highly dubious for "age".  Certainly age is actually continuous, but it is not normally distributed, and tends to be completely dependent on your sampling design.  So I would recommend treating your few categories as categories.   If it is measured in years, you could always test a linearity constraint using labels:

for (i in 12:16) HolzingerSwineford1939[,paste0("age", i)] <- as.integer(HolzingerSwineford1939$ageyr == i)
mod
<- ' x1 ~ b1_12*age12 + b1_13*age13 + b1_14*age14 + b1_15*age15 + b1_16*age16
## year 11 is reference group, so first slope is a 1-year effect.
'

con
<- '## constrain other years to be a multiple of it
b1_13 == 2*b1_12
b1_14 == 3*b1_12
b1_15 == 4*b1_12
b1_16 == 5*b1_12
'

fit
<- sem(mod, data = HolzingerSwineford1939)
lavTestWald
(fit, constraints = con) # fail to reject H0 of linearity


Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
 

Peter Paprzycki

unread,
Jun 25, 2020, 11:41:22 AM6/25/20
to lavaan
Thank you very much Terrence. I essentially did what you showed with the dummies, in the end, indeed. Thank you for the constraint recommendation. If I would have adopted the 'theta' parametrization, of course, I obtain the thresholds, but what would be the procedure for testing group differences?

Peter

Terrence Jorgensen

unread,
Jun 26, 2020, 6:15:41 PM6/26/20
to lavaan
 If I would have adopted the 'theta' parametrization, of course, I obtain the thresholds,

You can (and by default, do) estimate thresholds also with the default "delta" parameterization.  "theta" is merely necessary for residual variances to be model parameters, so that it can be fixed to zero for ageyr.

what would be the procedure for testing group differences?

differences in what?

Peter Paprzycki

unread,
Jun 28, 2020, 11:49:24 AM6/28/20
to lavaan
Oh, thank you Terrence. My situation has four age group categories (ordered factor). I know I can obtain the thresholds and p-values associated with them. I am interested, however, in testing the differences in the four age groups with respect to the endogenous variables. So, this is why the adoption of the dummy variable approach would work (comparisons versus a reference group).

Peter

Patrick (Malone Quantitative)

unread,
Jun 28, 2020, 11:52:03 AM6/28/20
to lav...@googlegroups.com
Peter,

Dummy coding doesn't hold order information, though. If this is a
question of interest, you're probably better off with a multiple-group
model and inequality constraints to keep the parameter estimates
moving in the same direction as the age groups.

Pat
> --
> You received this message because you are subscribed to the Google Groups "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/cfa4589e-fd79-4aef-a41b-10288b36be29o%40googlegroups.com.



--
Patrick S. Malone, Ph.D., Malone Quantitative
NEW Service Models: http://malonequantitative.com

He/Him/His

Peter Paprzycki

unread,
Jun 28, 2020, 11:58:56 AM6/28/20
to lavaan
Yes, this is what I was thinking, Patrick. How to preserve the order of the age factor.

Peter


On Sunday, June 28, 2020 at 11:52:03 AM UTC-4, Patrick (Malone Quantitative) wrote:
Peter,

Dummy coding doesn't hold order information, though. If this is a
question of interest, you're probably better off with a multiple-group
model and inequality constraints to keep the parameter estimates
moving in the same direction as the age groups.

Pat

On Sun, Jun 28, 2020 at 11:49 AM Peter Paprzycki
<peter.p...@gmail.com> wrote:
>
> Oh, thank you Terrence. My situation has four age group categories (ordered factor). I know I can obtain the thresholds and p-values associated with them. I am interested, however, in testing the differences in the four age groups with respect to the endogenous variables. So, this is why the adoption of the dummy variable approach would work (comparisons versus a reference group).
>
> Peter
>
> On Friday, June 26, 2020 at 6:15:41 PM UTC-4, Terrence Jorgensen wrote:
>>>
>>>  If I would have adopted the 'theta' parametrization, of course, I obtain the thresholds,
>>
>>
>> You can (and by default, do) estimate thresholds also with the default "delta" parameterization.  "theta" is merely necessary for residual variances to be model parameters, so that it can be fixed to zero for ageyr.
>>
>>> what would be the procedure for testing group differences?
>>
>>
>> differences in what?
>>
>> Terrence D. Jorgensen
>> Assistant Professor, Methods and Statistics
>> Research Institute for Child Development and Education, the University of Amsterdam
>> http://www.uva.nl/profile/t.d.jorgensen
>>
>
> --
> You received this message because you are subscribed to the Google Groups "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lav...@googlegroups.com.

Peter Paprzycki

unread,
Jun 28, 2020, 3:18:42 PM6/28/20
to lavaan
I am not sure if the following constraints on the four age groups are ok? For example, first DV1,  reference = youngest group (18-25 yrs), b11 = age group (26-33 yrs),  b12 = (34-45 yrs), and b13 = (46-65). 

con <-'
b11 < b12 
b12 < b13

b21 < b22 
b22 < b23

b31 < b32 
b32 < b33

b41 < b42 
b42 < b43
'

Terrence Jorgensen

unread,
Jun 30, 2020, 6:39:01 PM6/30/20
to lavaan
Dummy coding doesn't hold order information
 
No, but it doesn't have to.  It is less restrictive, so you can specify constraints that imply order, then test them against the less constrained model that uses dummy codes (e.g., you could specify a linear contrast among labeled parameters and pass those constraints to the lavTestWald() function).

con <-'
b11 < b12 
b12 < b13

b21 < b22 
b22 < b23

b31 < b32 
b32 < b33

b41 < b42 
b42 < b43
'

Since these are inequality constraints, you might find lavaan's InformativeTesting() function helpful.

Patrick (Malone Quantitative)

unread,
Jun 30, 2020, 7:19:25 PM6/30/20
to lav...@googlegroups.com
Terrence,

True, but for what it's worth, I did suggest inequality constraints in
the part you snipped!
> --
> You received this message because you are subscribed to the Google Groups "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/9e24188d-4bc7-4758-a342-29c86ef98f4ao%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages