Confirmatory Factor Analysis with ordinal data

3,830 views
Skip to first unread message

Hellen Geremias dos Santos

unread,
Feb 16, 2014, 5:13:45 PM2/16/14
to lav...@googlegroups.com
Dear list,

I am trying to fit a confirmatory factor analysis in lavaan with a model that contains only categorical variables. So I used the categorical capabilities of lavaan and provided the full data to cfa(), indicating which variables can be considered as 'ordered'. But I receive the error:

Erro em muthen1984(Data = X[[g]], ov.names = ov.names[[g]], ov.types = ov.types,  : 
  lavaan ERROR: some categories of variable `aa1' are empty in group 1; frequencies are [0 0 0 0]

Below is the code that I am using for my dataset.

banco1<-data.frame(id,aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,aa17)
#all variables are factors

model1 <- ' dm =~ aa1+aa2+aa3+aa4+aa5
ct1 =~ aa6+aa7+aa8
ct2 =~ aa10+aa11
as =~ aa12+aa13+aa14+aa15+aa16+aa17 '

fitcfa <-cfa(model1, data=banco1,mimic="Mplus",test="scaled.shifted",orthogonal = FALSE,ordered=c("aa1","aa2","aa3","aa4","aa5","aa6","aa7","aa8","aa10","aa11","aa12","aa13","aa14","aa15","aa16","aa17"),start="Mplus",std.lv=T,std.ov=T)

Any help would be much appreciated.
Hellen

yrosseel

unread,
Feb 19, 2014, 5:04:54 AM2/19/14
to lav...@googlegroups.com
On 02/16/2014 11:13 PM, Hellen Geremias dos Santos wrote:
> Dear list,
>
> I am trying to fit a confirmatory factor analysis in lavaan with a model
> that contains only categorical variables. So I used the categorical
> capabilities of lavaan and provided the full data to cfa(), indicating
> which variables can be considered as 'ordered'. But I receive the error:
>
> Erro em muthen1984(Data = X[[g]], ov.names = ov.names[[g]], ov.types =
> ov.types, :
> lavaan ERROR: some categories of variable `aa1' are empty in group 1;
> frequencies are [0 0 0 0]

Hm. Could you look at a frequency table for 'aa1'? For example,

table(Data$aa1)

Do you have many missing values? It looks like you have 4 levels for
'aa1', but (perhaps after listwise deletion), only missing values are
left for this variable...

Yves.

Hellen Geremias dos Santos

unread,
Feb 19, 2014, 8:08:44 AM2/19/14
to lav...@googlegroups.com
Dear Yves,

Thanks for your quick reply. 

This is the frequency for the variable 'aa1': 

1   2   3   4 
12  71  359 400 

There are no missing values for any variable (these cases were excluded before the analysis)
All variables have 4 levels.

The error occurs for the first variable that is orderded. For exemple, if I change
the command ordered to:
orderded=c("aa2","aa1","aa3","aa4","aa5","aa6","aa7","aa8","aa10","aa11","aa12","aa13","aa14","aa15","aa16","aa17")
the same error occurs for variable 'aa2'.

Hellen

yrosseel

unread,
Feb 19, 2014, 8:53:56 AM2/19/14
to lav...@googlegroups.com
On 02/19/2014 02:08 PM, Hellen Geremias dos Santos wrote:

> The error occurs for the first variable that is orderded.

Strange. Would you be able to send me your full R script and (a snippet
of) the data. If I can reproduce this, it will be easier to track down
the problem.

Yves.
Message has been deleted

yrosseel

unread,
Feb 19, 2014, 12:43:01 PM2/19/14
to lav...@googlegroups.com
On 02/19/2014 06:01 PM, Hellen Geremias dos Santos wrote:
> Hi Yves,
>
> Thank you once again for your quick reply. Attached is my R-script
> and a snippet of my data.

Thanks, I found the problem:

fitcfa <-cfa(model1,
data=banco1,mimic="Mplus",test="scaled.shifted",orthogonal =
FALSE,ordered=c("aa1","aa2","aa3","aa4","aa5","aa6","aa7","aa8","aa10",
"aa11","aa12","aa13","aa14","aa15","aa16","aa17"),start="Mplus",std.lv=T,std.ov=T)

just remove the std.ov=TRUE argument, and it will work fine.

The std.ov=TRUE argument 'standardizes' the observed variables, but this
does not make sense if the variables are ordinal. I will make sure that
this argument is simply ignored for non-numeric variables in a next update.

Yves.


Hellen Geremias dos Santos

unread,
Feb 19, 2014, 1:31:16 PM2/19/14
to lav...@googlegroups.com
Dear Yves

Many thanks for your advice on this. 

Kind regards, 
Hellen

yrosseel

unread,
Mar 18, 2014, 11:36:00 AM3/18/14
to lav...@googlegroups.com
On 02/19/2014 07:31 PM, Hellen Geremias dos Santos wrote:
> The std.ov=TRUE argument 'standardizes' the observed variables, but
> this
> does not make sense if the variables are ordinal. I will make sure that
> this argument is simply ignored for non-numeric variables in a next
> update.

Now done in dev 0.5-17.

Yves.

Hellen Geremias dos Santos

unread,
Mar 22, 2014, 7:35:56 AM3/22/14
to lav...@googlegroups.com
Thanks!

David Lim

unread,
Mar 13, 2015, 1:57:28 PM3/13/15
to lav...@googlegroups.com
Dear Yves, 

I am also faced with the same problem with my ordinal data but I did NOT declare std.ov to be TRUE. 
My codes are like this.I have also attached the CSV file.  I will really appreciate if any help can be rendered on this since I have spent fruitless hours on this.   

Seven.factor <- 'space =~ X1 + X2 + X3 + X4 + X5 

                    personal =~ X6 + X7 + X8 + X9 + X10 + X11

              listening =~ X12 + X13 + X14

              activities =~ X15 + X16 + X17 + X18 + X19 + X20  + X22 + X24

              interaction =~ X25 + X26 + X27 + X28

              programme =~ X29 + X30  

            parentstaff =~ X33 + X34 + X35 + X36 + X37 + X38 + X39'

fit <-cfa(Seven.factor,data=sevenfactor, ordered=c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11","X12","X13","X14","X15","X16","X17","X18","X19","X20","X22","X24","X25","X26","X27","X28","X29","X30","X33","X34","X35","X36","X37","X38","X39"))

summary(fit, fit.measures=TRUE)  
ITERS-R Data (for R).csv

yrosseel

unread,
Mar 13, 2015, 2:28:15 PM3/13/15
to lav...@googlegroups.com
On 03/13/2015 06:57 PM, David Lim wrote:
> Dear Yves,
>
> I am also faced with the same problem with my ordinal data

So, the error message you get is:

Error in muthen1984(Data = X[[g]], ov.names = ov.names[[g]], ov.types =
ov.types, :
lavaan ERROR: some categories of variable `X2' are empty in group 1;
frequencies are [26 3 78 9 21 1 0]

The problem is that -after listwise deletion- some response categories
simply do not occur any more in the data, and lavaan does not proceed in
this case.

Possible solutions:
- merge some adjacent response categories to avoid almost-empty cells
- use missing = "pairwise"

Yves.

David Lim

unread,
Mar 17, 2015, 1:30:08 AM3/17/15
to lav...@googlegroups.com, yros...@gmail.com
Hi Yves,

I tried posting my thanks but it was not successful so I thought I'd sent a mail directly to you.  

Thanks so much for providing the solution of using missing="pairwise".  Much appreciated. 
I can now run the program but there are 2 warnings I'd like to seek your opinion on. 

a) I have a lot of warnings like this
 Warning in pc_cor_TS(fit.y1 = FIT[[i]], fit.y2 = FIT[[j]], method = optim.method,  :
  lavaan WARNING: empty cell(s) in bivariate table of X2 x X1

b)  lavaan WARNING: covariance matrix of latent variables is not positive definite; use inspect(fit,"cov.lv") to investigate.
     > inspect(fit,"cov.lv")
                space persnl lstnng actvts intrct prgrmm prntst
space       0.271                                          
personal    0.240 0.402                                    
listening    0.247 0.338  0.626                             
activities   0.269 0.312  0.377  0.361                      
interaction 0.203 0.314  0.458  0.254  0.368               
programme   0.248 0.471  0.438  0.457  0.406  0.581        
parentstaff 0.197 0.232  0.218  0.161  0.146  0.223  0.252 

I also tried running the eigenvalues 
> eigen( inspect(fit, "cov.lv") )$values 
[1]  2.291420538  0.270285310  0.195764056  0.124092911  0.054206511    -0.003413806 -0.070878533

Do I have to heed the warnings or can I ignore it?  

c)  Last but not least, could you be kind enough to explain your earlier reply of "some response categories simply do not occur any more in the data" 


David Lim 



--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/jt0Y1LP_bZ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+unsubscribe@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Terrence Jorgensen

unread,
Mar 17, 2015, 12:52:05 PM3/17/15
to lav...@googlegroups.com
a) I have a lot of warnings like this
  lavaan WARNING: empty cell(s) in bivariate table of X2 x X1


There's not much you can do about this.  It is almost guaranteed to happen when you have many categorical variables, especially with more than 2 categories.  Even if you have binary data, you have 39 indicators, so there are 2^39 possible response patterns, in which case you would need a minimum sample size of N = 549,755,813,888 to have no empty cells (assuming there is at least 1 observation of each response pattern).  If your CSV file represents all of your data, then your sample size is only about 150, so you can expect lots of empty cells.

 
b)  lavaan WARNING: covariance matrix of latent variables is not positive definite; use inspect(fit,"cov.lv") to investigate.
     > inspect(fit,"cov.lv")
                space persnl lstnng actvts intrct prgrmm prntst
space       0.271                                          
personal    0.240 0.402                                    
listening    0.247 0.338  0.626                             
activities   0.269 0.312  0.377  0.361                      
interaction 0.203 0.314  0.458  0.254  0.368               
programme   0.248 0.471  0.438  0.457  0.406  0.581        
parentstaff 0.197 0.232  0.218  0.161  0.146  0.223  0.252 

I also tried running the eigenvalues 
> eigen( inspect(fit, "cov.lv") )$values 
[1]  2.291420538  0.270285310  0.195764056  0.124092911  0.054206511    -0.003413806 -0.070878533

Do I have to heed the warnings or can I ignore it?

There are negative eigenvalues, so your latent covariance matrix would actually be impossible to observe in practice.  This could be an indication of model misspecification.  There are no negative variances on the diagonal, so that's not the problem.  Now it would be informative to look at the correlation matrix so you can see whether any correlations are out-of-bounds (greater than 1 in absolute value). 

inspect(fit, "cor.lv")

If you find an out-of-bounds correlation, then it could point to an inadequacy of the model.  That is, if a correlation is greater than 1, then the model parameters are trying to make up for the fact that indicators of different factors are more strongly related than the model structure implies, so there might be an unmodeled residual correlation or cross-loading.  Inspect the meaning of your indicators for clues as to which variables might be related for reasons beyond the correlations between their latent factors.

Of course, because your sample size is so small (perhaps too small for reliable results with this estimator?), then an out-of-bounds correlation might occur frequently just due to sampling error.  Read Bollen & Kolenikov (2012) about a method to use bootstrap confidence intervals to test whether an out-of-bounds estimate is out-of-bounds in the population.  Their paper discusses negative variances, but I think the same logic applies to testing correlations (or any parameter with a logical/theoretical boundary):


 
c)  Last but not least, could you be kind enough to explain your earlier reply of "some response categories simply do not occur any more in the data" 

If you have ordinal categories [1,2,3], and one of the response categories is seldom observed (e.g., if only 3 people in the sample indicates category 3 as a response), then it may happen that those few rows are deleted due to missing data.


Terry

David Lim

unread,
Mar 18, 2015, 3:03:35 AM3/18/15
to lav...@googlegroups.com
Dear Terrence, 

Thanks so much for your response. Much appreciated.  
I have understood the situation much better.  

Yes, I only have 151 records but with 36 items spread over 7 possible categories.
(I am excluding 3 items as there are too many missing records). So I fully understand the limitation of my data.  
Fortunately, it's low stake cos it's for my project assignment and not for publishing and it's an academic exercise for me to understand better.     

a) Yves mentioned we can possibly ignore the negative eigenvalues if they are really tiny. But I'm not sure -.07 is considered small lol.  

b)  I've checked the correlation beween LV.  
> inspect(fit, "cor.lv")
               space persnl lstnng actvts intrct prgrmm prntst
space       1.000                                          
personal    0.777 1.000                                    
listening    0.603 0.697  1.000                             
activities   0.869 0.834  0.794  1.000                      
interaction 0.642 0.848  0.954  0.699  1.000               
programme 0.650 1.016  0.763  0.975  0.879  1.000        
parentstaff 0.762 0.744  0.544  0.527  0.482  0.539  1.000   

There is indeed one correlation >1.  I also have 2 more correlation close to 1 (at 0.954 and 0.975).    
Perhaps I might have to combine both the affected LV together and make it a 6 factor model than a 7 factor one. 
I will read up on Bollen & Kolenikov (2012).       
  

 


--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/jt0Y1LP_bZ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

yrosseel

unread,
Mar 25, 2015, 4:32:06 AM3/25/15
to lav...@googlegroups.com
On 03/18/2015 08:03 AM, David Lim wrote:
> a) Yves mentioned we can possibly ignore the negative eigenvalues if
> they are really tiny. But I'm not sure -.07 is considered small lol.

For future reference: 'tiny' negative eigenvalues are values that are
close to machine precision. For example,

-2.220446e-16

(which is -1 * .Machine$double.eps on my machine) is tiny. -0.07 is huge.

> b) I've checked the correlation beween LV.
> > inspect(fit, "cor.lv <http://cor.lv>")
> space persnl lstnng actvts intrct prgrmm prntst
> space 1.000
> personal 0.777 1.000
> listening 0.603 0.697 1.000
> activities 0.869 0.834 0.794 1.000
> interaction 0.642 0.848 0.954 0.699 1.000
> programme 0.650 1.016 0.763 0.975 0.879 1.000
> parentstaff 0.762 0.744 0.544 0.527 0.482 0.539 1.000
>
> There is indeed one correlation >1. I also have 2 more correlation
> close to 1 (at 0.954 and 0.975).

This would suggest that your factor (more or less) measure the same
thing. And if there are any differences among the latent factors, your
sample size is too small to detect them.

Yves.

David Lim

unread,
Mar 27, 2015, 3:38:35 AM3/27/15
to lav...@googlegroups.com
Thanks so much for the insightful comments!! 

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/jt0Y1LP_bZ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+unsubscribe@googlegroups.com.

David Disabato

unread,
Aug 24, 2017, 9:10:06 PM8/24/17
to lavaan
Hi Yves,

Thanks for helping so many lavaan users with the google groups!

I am running a SEM model with categorical indicators and am running into the same problem as David Lim in 2015:  lavaan ERROR: some categories of variable `GQ1.1' are empty in group 1; frequencies are [1 15 32 82 184 0]. I am assuming the issue is because I have zero frequencies for some of my category cells. For example, variable GQ1.1 has 7 categories and the third category cell is empty (i.e., has a frequency of zero).

I am confused why lavaan cannot compute polychoric correlations with zeros in some frequency table cells. Bill Revelle's "polychoric" function from the "psych" package is able to by using the correction for continuity (i.e., argument "correct"). I thought that the lavaan argument "zero.add" did the same. What is "zero.add" doing if not correcting for zero cell frequencies by replacing the zero frequency with a small decimal (e.g., .01)? If it is not possible in lavaan, what are your thoughts on inputting a polychoric correlation matrix estimated with the "polychoric" function from the "psych" package into a lavaan model as the observed covariance matrix?

Thanks for any insight you have,
David

To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

Terrence Jorgensen

unread,
Aug 26, 2017, 6:39:33 AM8/26/17
to lavaan
Bill Revelle's "polychoric" function from the "psych" package is able to by using the correction for continuity (i.e., argument "correct"). I thought that the lavaan argument "zero.add" did the same. What is "zero.add" doing if not correcting for zero cell frequencies by replacing the zero frequency with a small decimal (e.g., .01)?

That addresses zeros in 2-way contingency tables between outcome/indicator variables, the same thing Bill does in the psych package.  You are describing a different problem, which is zeros in a 2-way contingency table between an outcome/indicator and the grouping variable.  In other words, a zero cell for the 7-level variable means you only observed 6 categories, so there is no way for the software to get information about how frequent the 7th category is endorsed.  That means there is no way to estimate the 6th threshold, so the polychoric correlation (in that group) has to be estimated by treating it as a 6-category indicator.  If the numbers of observed response categories differ across groups, there is no logical automatic way for the software to link 5 thresholds from one group to 6 thresholds in another group.  You would have to do that manually, assuming you can give different models to different groups.  Most people, however, solve this problem by collapsing categories so that all groups have the same number of observed categories, and thus estimated thresholds.

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

David Disabato

unread,
Aug 26, 2017, 11:27:22 AM8/26/17
to lavaan
Thanks for your thoughts Terrence. That makes sense; however, I am doing a total sample CFA (i.e., no multi-group analysis). With a "single-group" CFA model, it was my understanding lavaan would just drop any categories with zero *univariate* frequency cells; for example, if one cell has no observed frequencies, then reduce a 7-level category variable to a 6-level categorical variable with only 5 (rather than 6) estimated thresholds. I now understand (thank you Terrence) that the "zero.add" argument in lavaan is for zero *bivariate* frequency cells in a 2-way contingency table.

I played around with the lavaan arguments and found something interesting. If I input my full dataset with missing values into lavaan and specify missing = "listwise", I get the following error:  lavaan ERROR: some categories of variable `GQ1.1' are empty in group 1; frequencies are [1 15 32 82 184 0]. But, if I create a listwise datatset on my own (i.e., using the base R function "na.omit") and then input that listwise deleted dataset into lavaan, the model runs. I am not sure if it is something unique to my data or something with the lavaan package code. Here is the syntax I ran:

# Full dataset with missing values
fit <- lavaan(model = conf, data = dat, estimator = "WLSMV", std.lv = T,
+               zero.add = c(.001,.001), missing = "listwise",
+               ordered = c("GQ1.1","GQ2.1","GQ4.1","GQ5.1",
+                                  "GQ1.4","GQ2.4","GQ4.4","GQ5.4",
+                                  "GQ1.5","GQ2.5","GQ4.5r","GQ5.5r"))
Error in lav_samplestats_step1(Y = Data, ov.names = ov.names, ov.types = ov.types,  : 
  lavaan ERROR: some categories of variable `GQ1.1' are empty in group 1; frequencies are [1 15 32 82 184 0]

# Listwise deletion dataset with no missing values
dat.listwise <- na.omit(dat[,c("GQ1.1","GQ2.1","GQ4.1","GQ5.1",
                                             "GQ1.4","GQ2.4","GQ4.4","GQ5.4",
                                             "GQ1.5","GQ2.5","GQ4.5r","GQ5.5r")])
fit <- lavaan(model = conf, data = dat.listwise, estimator = "WLSMV", std.lv = T,
+               zero.add = c(.001,.001),
+               ordered = c("GQ1.1","GQ2.1","GQ4.1","GQ5.1",
+                                  "GQ1.4","GQ2.4","GQ4.4","GQ5.4",
+                                  "GQ1.5","GQ2.5","GQ4.5r","GQ5.5r"))

Terrence Jorgensen

unread,
Aug 27, 2017, 6:19:02 AM8/27/17
to lavaan
What is the output of table(dat$GQ1.1)?  And of table(dat.listwise$GQ1.1)?  And class(dat$GQ1.1)?  I don't know whether lavaan checks whether ordered factor vectors have empty categories in single-group models and then re-formats them to remove empty levels(dat$GQ1.1)

David Disabato

unread,
Aug 27, 2017, 2:45:19 PM8/27/17
to lavaan
My best guess for what is going on is that lavaan is determining the number of ordered categories in each variable BEFORE doing listwise deletion. Then, after doing listwise deletion, some of the variable's categories have zero univariate frequency cells and lavaan doesn't know what to do. If lavaan determined the number of ordered categories AFTER doing listwise deletion, then there shouldn't be a problem.

Either way, here is the output you requested:

table(dat$GQ1.1, exclude = NULL)

   1    2    4    5    6      7       <NA> 
   3    1   34   48  167  313    1 
# 3 was a potential response option, but was not observed

table(datl$GQ1.1, exclude = NULL)

   1    4    5     6     7      <NA> 
   1   15   32   82  184    0 
# 2 was observed, but not after listwise deletion

The variable GQ1.1 is "numeric" so there are no levels. I let lavaan convert the variable to class "ordered factor".

Yves Rosseel

unread,
Aug 29, 2017, 5:33:18 AM8/29/17
to lav...@googlegroups.com
On 08/27/2017 08:45 PM, David Disabato wrote:
> My best guess for what is going on is that lavaan is determining the
> number of ordered categories in each variable BEFORE doing listwise
> deletion.

That is correct.

> Then, after doing listwise deletion, some of the variable's
> categories have zero univariate frequency cells and lavaan doesn't know
> what to do. If lavaan determined the number of ordered categories AFTER
> doing listwise deletion, then there shouldn't be a problem.

Hm. I do not think that would be a good idea (in general). If listwise
deletion renders a response category empty, the user should be alerted.

The user can then choose to either use na.omit() on the data, or
collapse response categories with few observations. All of that should
be done before calling lavaan.

Yves.

C C

unread,
Apr 15, 2019, 1:33:31 PM4/15/19
to lavaan
Hey community, 

i do have a similar problem and could not solve it by going through your replies and hoping for help :)

I am doing a CFA with my full dataset and it functions properly (DWLS using odered= c (XXX).... I then use the Group= "Gender" input, which is a variable in my dataset and getting the Error: Error in lav_samplestats_step1(Y = Data, ov.names = ov.names, ov.types = ov.types,  : 
  lavaan ERROR: some categories of variable `IFCB4_3' are empty in group 1; frequencies are [6 37 54 15 0]

I did 
table(cleandat$IFCB4_3) and get 

  1   2   3   4   5 
  2  12  71 125  40 
missing = "pairwise"  doesnt help

any advice?

Terrence Jorgensen

unread,
Apr 16, 2019, 5:02:02 AM4/16/19
to lavaan
  lavaan ERROR: some categories of variable `IFCB4_3' are empty in group 1; frequencies are [6 37 54 15 0]

I did 
table(cleandat$IFCB4_3) and get 

  1   2   3   4   5 
  2  12  71 125  40 

Because you are not taking groups into account, which is what the message is telling you to do.

table(cleandat$IFCB4_3, cleandat$Gender)

 any advice?

A lthough it is possible to fit unique models to each group, which would allow for IFCB4_3 to have different observed numbers of categories (and therefore thresholds) in each group, the easiest advice is to collapse extreme categories.  That is probably the best advice too.  When you have so few observations in an extreme category, the threshold has extreme instability due to sampling error (imagine trying to estimate the probability of a rare event, say ~1 in 1000, using a sample of 500).

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

C C

unread,
Apr 16, 2019, 8:12:12 AM4/16/19
to lavaan
Dear Terrence,

thank you for the feedback. I now see the point !!!

But there is still something which confuses me. If I split my cleandat into 2 subsets, specifying Gender== 1 and Gender ==2, the cfa for the subset runs smoothly. (cfa without the "group= Gender" as I am using just the subset) 
Any idea why that is happening ? Is lavaan defining group as in Gender and likert scale (in my case 1-5) catergory together if I use group="Gender"?

Terrence Jorgensen

unread,
Apr 22, 2019, 12:45:42 PM4/22/19
to lavaan
If I split my cleandat into 2 subsets, specifying Gender== 1 and Gender ==2, the cfa for the subset runs smoothly. 

Because those are single-group models.  If you look closely, you will see they estimate different numbers of parameters, which is the problem with trying to fit a simultaneous multigroup model.

C C

unread,
May 22, 2019, 9:02:12 AM5/22/19
to lavaan
Dear Terrence, 

i am still stuck with the group= "gender" Error. 
I ran the CFA and get good results but if I do the group= Gender, I get the Error 
Error in lav_samplestats_icov(COV = cov[[g]], ridge = ridge, x.idx = x.idx[[g]],  : 
  lavaan ERROR: sample covariance matrix is not positive-definite

Cov.lv output prior to the group= "gender" is
      IFCB1 IFCB2 IFCB3 IFCB4 IFCB5 IFCB6 IFCB7 dv   
IFCB1 1.000                                          
IFCB2 0.459 1.000                                    
IFCB3 0.707 0.460 1.000                              
IFCB4 0.485 0.377 0.824 1.000                        
IFCB5 0.580 0.354 0.877 0.856 1.000                  
IFCB6 0.727 0.426 0.909 0.734 0.849 1.000            
IFCB7 0.658 0.350 0.909 0.802 0.878 0.949 1.000      
dv    0.524 0.336 0.456 0.521 0.326 0.512 0.594 1.000

with the eigen$value matrix as following:

$values
[1] 5.51782169 0.89424701 0.69738648 0.50247324 0.21897399 0.09598559 0.05024369 0.02286831

$vectors
           [,1]        [,2]        [,3]        [,4]        [,5]        [,6]        [,7]        [,8]
[1,] -0.3291314 -0.30734067 -0.03930056  0.72221724  0.49367096  0.07824618  0.06754174  0.14027328
[2,] -0.2243065 -0.67937425  0.61401437 -0.29100512 -0.12719270  0.07229616 -0.01403450  0.06955512
[3,] -0.4047781  0.13782552  0.13238969  0.06792723 -0.03361880 -0.74729732 -0.43445689 -0.21605870
[4,] -0.3681518  0.20936986 -0.02431665 -0.51503491  0.55056220 -0.14753949  0.44894228  0.16834630
[5,] -0.3816420  0.35471246  0.20081417 -0.07702731  0.14458579  0.62444941 -0.40515350 -0.32753240
[6,] -0.4023226  0.09934523  0.01567695  0.20950355 -0.49042771  0.03095828  0.62360120 -0.39213945
[7,] -0.4054682  0.16916579 -0.14151509  0.01646040 -0.41490322  0.10785978 -0.14215237  0.76334481
[8,] -0.2646877 -0.46567506 -0.73669863 -0.27156459 -0.04788494  0.07681404 -0.17805639 -0.23815443

From the lavaan group, I found out that cov> 0.9 might cause problems and reduced my model by deleting IFCB3 (general trouble maker) and IFCB7 (cov >0.9).

The new model CFA also runs smoothly but i get the same error with 
      IFCB1 IFCB2 IFCB4 IFCB5 IFCB6 dv   
IFCB1 1.000                              
IFCB2 0.458 1.000                        
IFCB4 0.483 0.376 1.000                  
IFCB5 0.574 0.351 0.857 1.000            
IFCB6 0.723 0.423 0.734 0.846 1.000      
dv    0.522 0.338 0.520 0.324 0.513 1.000

and 
eigen() decomposition
$values
[1] 3.75046650 0.83866578 0.67227874 0.50099606 0.17437713 0.06321579

$vectors
           [,1]       [,2]        [,3]        [,4]         [,5]        [,6]
[1,] -0.4112717  0.2503323 -0.05702238  0.73778394 -0.466936478 -0.05082260
[2,] -0.3013502  0.6252594  0.65395015 -0.29467637  0.051572544  0.03307641
[3,] -0.4442036 -0.3309664 -0.03924432 -0.47348979 -0.476133085 -0.49062100
[4,] -0.4488044 -0.4696197  0.19328211 -0.02638179 -0.001738558  0.73482987
[5,] -0.4746870 -0.1705254  0.01056323  0.23152156  0.733829104 -0.39163034
[6,] -0.3405489  0.4326940 -0.72807141 -0.30058257  0.118742189  0.24952913

Can you advise me how to proceed? 
Is it possible to just do the group= "gender" via the subset CFA ?

balal izanloo

unread,
May 22, 2019, 9:42:21 AM5/22/19
to lav...@googlegroups.com
Hi
1) Check your data for missing data pattern. may be there are large number of missing in your data so pairwise deletion by default and computing covariance matrix based on this can produce not positive cov matrix. under this condition sine your variables are ordinal computing tetrachoric or polychoric correlation can result to not positive cov matrix. 
2) some times not positive cov (cor) matrix produced since there are linear combination of multiple variables. simple cor for two variable (cor>0.9) is just an special simple case.  
2)  not positive matrix produce negative eigenvalue i can not find any negative eigenvalue in your eigen$value matrix . just note some package resolve the problem automatically by smoothing the matrix and just report the problem. in lavaan i am not sure. you have to read its help or ask from its maintainer   
3) if the problem is present any way, you have to smooth your matrix by a function in lavaan. may be there is function that do it in lavaan but i am not sure.
4) if lavaan do not smooth the matrix you can smooth it with cor.smooth function in psych package.

HTH

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

To post to this group, send email to lav...@googlegroups.com.

Terrence Jorgensen

unread,
May 23, 2019, 4:29:06 AM5/23/19
to lavaan

  lavaan ERROR: sample covariance matrix is not positive-definite

Cov.lv output prior to the group= "gender" is

The covariance matrix for the single entire group is irrelevant.  The error is caused when there is a separate covariance matrix for each group.  Furthermore, your error message is about the sample covariance matrix, not the covariance matrix of latent variables (cov.lv).

Daisy C

unread,
Jul 17, 2019, 10:57:17 AM7/17/19
to lavaan
Hi there,

Thank you for the pointers in this thread, very helpful. I get the same error, even when collapsing values of 3 to 2 (ordinal scale 0,1,2,3 - now 0,1,2) and I imagine that's because I still have differences leading to zero value categories, as the below shows. 

table(full_collapsed$rbsrsterotype2, full_collapsed$sex)
   
    Female Male
  0     89  209
  1     14   41
  2      0   10

But is there anything else I can do rather than collapsing responses of 2 to 1 because the differences are exactly what I'd like to examine and capture, i.e. they are likely meaningful so I would rather not collapse them further,

My sample is 363 in total (103 females, 260 males), no missing data, 43 items in the questionnaire.

Many thanks in advance,
Daisy

Terrence Jorgensen

unread,
Jul 18, 2019, 12:07:26 AM7/18/19
to lavaan
is there anything else I can do rather than collapsing responses of 2 to 1 because the differences are exactly what I'd like to examine and capture, i.e. they are likely meaningful so I would rather not collapse them further,

You could define a unique model block for each group, so that the same parameters do not need to be estimated across groups (i.e., you can specify different numbers of thresholds).  The syntax would look like the multilevel syntax:


But you would specify

Group: Male
...
Group: Female
...

Daisy C

unread,
Jul 23, 2019, 11:16:50 AM7/23/19
to lavaan
Thank you for the advice.

One follow up question - the link you shared suggests this kind of syntax only works for continuous data and I have ordinal data. Is it therefore off limits? / Does it mean the output would need to be interpreted with caution?

Terrence Jorgensen

unread,
Aug 6, 2019, 11:23:07 AM8/6/19
to lavaan
the link you shared suggests this kind of syntax only works for continuous data and I have ordinal data. 

That is a current software limitation for lavaan.  Multilevel SEMs are currently only available for continuous data.  If you treat ordinal data with few (e.g., < 7) categories as continuous, your relationships are likely to be underestimated and test statistics are unlikely to have nominal error rates.
Reply all
Reply to author
Forward
0 new messages