NAs in lavPredict for ordered data

526 views
Skip to first unread message

MOL

unread,
Nov 8, 2015, 3:06:41 PM11/8/15
to lavaan
Hello,

I can't deal with a problem:
I fit a model with ordered data by

fit_CFA <- cfa(model_CFA, data=Dane, ordered = names(Dane), control=list(iter.max = 1000))

where Dane contains ordered factors. Analysis looks fine - the result is well fitted model.
Then I try to predict lv's for "new" data by

lavPredict(object = fit_CFA, type = "lv", newdata = Dane[1:10,], method = "EBM")

I get only NA's.

Remark: when I use raw real data (just 1,2,3,4,5 instead factors in Dane) and fit model with ML then everything works fine with predict, that's why I suppose I do something wrong with lavPredict for ordered data.

Thank you for any help,

Michal

Terrence Jorgensen

unread,
Nov 10, 2015, 3:23:56 AM11/10/15
to lavaan
lavaan 0.5-20 is now available  on CRAN.  Does this happen with the latest version?

Terry

MOL

unread,
Nov 10, 2015, 8:40:59 AM11/10/15
to lavaan
Thank you very much for your interest.

I reinstalled everything and still nothing :(( Maybe you will find some time to look on my scipts? I send you cfa analysis and "predict" file with my attempt to calculate lv scores for "new" data.
KSKD.xlsx
CFA1.R
Predict.R

Terrence Jorgensen

unread,
Nov 13, 2015, 4:25:21 AM11/13/15
to lavaan
When you convert the variables in your data.frame to ordered factors, then the "ordered" argument in the lavaan/cfa/sem function is redundant.  Given that you have so many categorical indicators (all with 5 categories), you should expect estimation problems.  I can see you set the max number of iterations to 1000 -- why?  Could you even get your model to converge on a solution?  I couldn't.

You might want to read this study about treating ordinal variables as continuous:


I would advise you to do so, because your model is too big and your sample is too small to expect it to work, even if your complex model were correctly specified.  Judging from the results when treating the indicators as continuous, I expect you have gross misspecification to investigate before you could trust factor scores (which are already indeterminate, a problem exacerbated by having fewer indicators -- 3 of your 2nd-order factors have only 2 first-order "indicators").  And several model-implied factor correlations exceed 1, suggesting you should start with the first-order factor model to identify any divergent validity problems among those 8 factors, before positing that they measure 4 more higher-order factors.  

Anyway, disregarding the problems with your model, I had no problem extracting factor scores from the model that actually converged.  See syntax below.

Terry

library(gdata)
library(lavaan)

Dane <- read.xls("KSKD.xlsx", sheet = "baza")
Dane[,"Plec"] <- as.factor(Dane[,"Plec"])

## missing?
which(sapply(Dane, function(x) any(is.na(x))))
which(is.na(Dane$k23))
Dane$k23[348]

model_CFA <- "
KOr =~ k9 + k14 + k19 + k24 + k29 + k34 + k39 + k49 + k53 + k57 + k61
KOs =~ k3 + k8 + k13 + k18 + k23 + k28 + k33 + k43 + k48
AZ  =~ k1 + k11 + k21 + k31 + k51
RS  =~ k6 + k16 + k26 + k36 + k46 + k59 + k63 + k64 + k65
AS  =~ k5 + k10 + k15 + k20 + k25 + k35 + k38+ k40 + k45
RP  =~ k30 + k54 + k58 + k62
KO  =~ k2 + k12 + k17 + k27 + k52 + k55 + k60
WS  =~ k7 + k22 + k32 + k37 + k47 + k56
KE  =~ AZ + RS
KP  =~ AS + RP
KS  =~ KO + WS
SKD =~ KE + KOr + KOs + KP + KS
"
fit_CFA <- cfa(model_CFA, data = Dane, missing = "fiml")
## pay attention to warning
inspect(fit_CFA, "cor.lv")
## several factor correlations exceed 1.  Measure the same thing?  Missing
## cross-loadings / correlated errors?  2-indicator factors a problem?
summary(fit_CFA, stand = TRUE, fit = TRUE)

lavPredict(object = fit_CFA, newdata = Dane[1:10, ])



MOL

unread,
Nov 13, 2015, 1:58:57 PM11/13/15
to lavaan
Thank you for your answer. Specially the reference is very helpful.
I know that this model looks "akward" and my sample is small. At the same time I use PLSPM - it requires less data and in this case result seems to be coherent. Moreower, which I did not mention, my data comes from a quesstionaire with 5-level Likert scale, that's why I thought that assuming they are ordered factors is correct.

Thank you one more time for help,

M.

PS: above model converges with all my assuptions in 74 iterations. Summary of the fitting in an attachment.

Przyjęty model hierarchiczny.txt

yrosseel

unread,
Nov 21, 2015, 11:08:33 AM11/21/15
to lav...@googlegroups.com
On 11/08/2015 09:06 PM, MOL wrote:
> lavPredict(object = fit_CFA, type = "lv", newdata = Dane[1:10,], method
> = "EBM")
>
> I get only NA's.

The technical reason is that the variance/covariance matrix of the
latent factors is not positive definite. In this case, you all get NAs
(in 0.5-20).

In dev 0.5-21.922 (or higher), lavPredict() will give an error, and not
even attempt to compute factor scores.

Yves.


yrosseel

unread,
Nov 21, 2015, 11:10:18 AM11/21/15
to lav...@googlegroups.com
On 11/13/2015 07:58 PM, MOL wrote:
> PS: above model converges with all my assuptions in 74 iterations.
> Summary of the fitting in an attachment.

Notice the negative variances!

KOr 0.015 0.005 3.099 0.002 0.034 0.034
KOs 0.006 0.007 0.888 0.374 0.011 0.011
AZ -0.029 0.009 -3.124 0.002 -0.074 -0.074
RS 0.086 0.012 7.360 0.000 0.145 0.145
AS 0.038 0.008 4.920 0.000 0.069 0.069
RP -0.018 0.013 -1.384 0.166 -0.029 -0.029
KO 0.079 0.014 5.443 0.000 0.156 0.156
WS 0.117 0.016 7.384 0.000 0.240 0.240
KE 0.018 0.006 3.065 0.002 0.043 0.043
KP 0.009 0.005 1.831 0.067 0.018 0.018
KS 0.047 0.011 4.465 0.000 0.110 0.110
SKD 0.406 0.032 12.530 0.000 1.000 1.000

This will give you a non-positive definite variance/covariance matrix
for the latent variables, and therefore, no factor scores can be computed.

Yves.

MOL

unread,
Nov 24, 2015, 6:25:47 AM11/24/15
to lavaan
Thank you very much. Now it is clear.
It looks I must learn a lot about CFA, but once again - thank you for help.

With all the best,

MO
Reply all
Reply to author
Forward
0 new messages