Mixing continuous and categorical indicators?

1,051 views
Skip to first unread message

Christopher Bratt

unread,
Dec 12, 2017, 3:37:56 PM12/12/17
to lavaan
Am I correct in believing that lavaan does not allow for the combination of continuous and categorical indicators of factors? Combining such indicators for a single latent variable, or trying a model that uses one latent variable with continuous indicators and another latent variable with only categorical indicators give the same error:

lavaan ERROR: unknown ov.types:labelled

Strangely, I am not able to find any mentioning of this error message in any online document. I was also unsuccessful when trying to find an answer to the combination of continuous and categorical indicators.  

Yves Rosseel

unread,
Dec 12, 2017, 4:48:53 PM12/12/17
to lav...@googlegroups.com
> lavaan ERROR: unknown ov.types:labelled

ov.type 'labelled'? That is not standard R. Are you using the haven
package to read in data? Or another package from the RStudio universe?
These package seem to introduce non-standard data types, like 'labelled'
(instead of 'factor' or 'ordered').

If that is the case, try to convert the 'labelled' columns to 'ordered'.

(you can mix categorical with continuous indicators).

Yves.

Christopher Bratt

unread,
Dec 12, 2017, 5:22:49 PM12/12/17
to lavaan

Thank you for spotting the problem, Yves. You were right, I import data with haven in RStudio.

Declaring as "ordered" didn't tempt me, since I had declared categorical variables as "ordered", the continuous variables seemed to cause the problem by being "labelled". 
I'm new to R, but this seems to work:

attributes(mydata$myvariable) <- NULL

The model now runs fine now. 
Please allow me to thank you for your wonderful work! I use Mplus for advanced stuff, but being able to get results in the console and draw advantage of some features in R, is very nice. And students with little money should be really grateful to you.

Christopher 

Yves Rosseel

unread,
Dec 13, 2017, 3:03:35 AM12/13/17
to lav...@googlegroups.com
On 12/12/2017 11:22 PM, Christopher Bratt wrote:
> Thank you for spotting the problem, Yves. You were right, I import data
> with haven in RStudio.

Would it be possible to send me the data (or a snippet of the data),
together with a small R script? I want to explore the possibility to
handle these types of non-standard data.frames (tibbles) within lavaan.

Yves.

Christopher Bratt

unread,
Dec 13, 2017, 6:48:46 AM12/13/17
to lavaan
Sure! 
Is your email available here (I don't see it)?
(I'd rather send the data to you than to the public.)

Christopher Bratt

unread,
Mar 7, 2018, 9:48:50 AM3/7/18
to lavaan
Just a follow up on this. 
Yves has received data file an code causing the error. He might be able to fix the problem (lavaan cannot handle a tibble, a more advance data frame, used by the "haven" package when importing data in a foreign format (e.g. Stata, SAS, SPSS).

In an earlier post I wrote that removing attributes from continuous variables seemed to be a workaround. Now, I have a more complex CFA model that cannot be estimated with lavaan using a tibble rather than a ordinary data frame, even when all manifest variables have their attributes removed. (The error message is the same, referring to labelling of data, even after attributes are removed)

So: for those using lavaan, currently I would avoid tibbles when using lavaan. I assume Yves will be able to come up with a solution in an update. 
If importing from Stata (as I do), try the readstata13 package:

data.frame <- readstata13::read.dta13("path_to_datafile.dta")


This works on data files from later versions of Stata too. And the model could be estimated with the new data format (a true data frame). 

Amonet

unread,
Mar 8, 2018, 7:02:20 AM3/8/18
to lavaan
Thanks for the follow-up, just a heads-up for those who use tidyverse (dplyr in specific): it also produces a 'tibble'. 
I'm not sure if it works in your case Christopher Bratt, but usually you should be able to convert it into a normal r dataframe, e.g.: df <- as.data.frame(a_tibble_dataframe)

Kind regards,

Christopher Bratt

unread,
Mar 10, 2018, 5:22:26 AM3/10/18
to lavaan
Thanks, but this seems not to work. It seems I have to manually remove attributes of single variables even if converting a tibble to a data frame with data.frame or as.data.frame.

So I end up having two large data frames (data files) in R, one for various data management tasks and tests with a tibble (such a functions in the wonderful sjmisc package) and then a separate data frame for lavaan. But I assume this will be solved in future updates of lavaan.  

kma...@aol.com

unread,
Mar 10, 2018, 11:44:12 PM3/10/18
to lavaan
Christopher,
  I do not have much experience with tibbles.  However, this is how I would approach the problem.

  Given that data frames came before tibbles, and given that lavaan was designed to work with data frames, it seems only fair to frame this as a tibble problem rather than a lavaan problem.  Lavaan is doing what it was designed to do.  Tibbles have introduced a data structure that fails to maintain compatibility with functions designed to work with data frames.  Accepting tibbles might be a nice feature request for lavaan, but failing to accept tibbles is certainly not a bug in lavaan.  You might request that the maintainers of the tibble package add a method for converting them to data frames that produces backward compatible data frames.  That would make more sense that every package that uses data frames producing its own redundant tibble conversion method.

  If I had data in a tibble and wanted to analyze it using a package that was not designed for tibbles, and as.data.frame() did not work, this is the strategy that I would follow:
1. Save the tibble as a text file (e.g., csv).
2. Read the text file as a data frame.

require(tibble)
xx
<- 1:3
yy
<- letters[1:3]
zz
<- factor(c('duck', 'duck','goose'))

my.tibble <- tibble(xx, yy, zz)
my.tibble

as.data.frame(my.tibble)

write
.table(my.tibble, file='tibbledata.csv', sep=',')
my.dataframe <- read.csv(file='tibbledata.csv')
my.dataframe

str
(my.tibble)
str
(my.dataframe)

  The text file has no capacity to store anything but data values.  So, unless I am missing something, this procedure should give you a clean data frame with no problematic elements inherited from the tibble that might interfere with your analysis.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/


Christopher Bratt

unread,
Mar 11, 2018, 4:57:04 AM3/11/18
to lavaan
I don't think I have said the requirement of traditional data frames is a bug? That would make no sense. But it is a problem for users who use tibbles.

I know too little about data frames to understand why tibbles are not dealt with as data frames by lavaan, simply ignoring the labelling. But Yves is looking into it.

Yves Rosseel

unread,
Mar 11, 2018, 12:05:59 PM3/11/18
to lav...@googlegroups.com
lavaan can handle tibbles just fine.

The problem is with the haven package. It create a tibble/data.frame
(the distinction does not matter) where the variables have a
non-standard type/class. In particular, it uses a 'labelled' type, which
is NOT part of standard R. (The standard types are
integer/numeric/factor/ordered and so on).

If your data is in 'Data', then type

lapply(Data, class)

to see the type/class of each variable.

To fix this using the current version of lavaan (0.5), you need to
manually override the class type to 'numeric' or 'ordered'. For example,
if you have a continuous variable 'x', you can type

class(Data$x) <- "numeric"

while you can use the ordered= argument to tell lavaan which variables
are to be considered as binary/ordered.

In dev 0.6, I have added a check for this, and it should work out of the
box.

My advise would be to avoid the haven package altogether. Use the
(standard) foreign package instead. The haven package is part of the
RStudio 'tidyverse' which does not seem to care about non-tidyverse
packages, let alone standard R behaviour.

Yves.
> --
> You received this message because you are subscribed to the Google
> Groups "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to lavaan+un...@googlegroups.com
> <mailto:lavaan+un...@googlegroups.com>.
> To post to this group, send email to lav...@googlegroups.com
> <mailto:lav...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/lavaan.
> For more options, visit https://groups.google.com/d/optout.

kma...@aol.com

unread,
Mar 12, 2018, 10:31:08 PM3/12/18
to lavaan
Thanks for the detailed explanation.  That clarifies why I was unable to reproduce Christopher's problem.

Manolo Cabran

unread,
Nov 19, 2019, 11:06:25 AM11/19/19
to lavaan
Hi, I have a similar problem with a model with both continuous and categorical indicators. My model is

# Model_a
cc_model_a <- '
hp =~ a*rooms + b*edu + c*sol + d*nets
cc => e*age + f*gend + g*disab + h*orph

cc ~~ hp
'

cc_fit_a <- lavaan::cfa(cc_model_a, data = cc, ordered = c("rooms", "edu", "sol", "nets", "gend", "disab", "orph"), conditional.x = FALSE, std.lv = FALSE)

7 of the 8 variables are categorical and are included in the
ordered = c("rooms", "edu", "sol", "nets", "gend", "disab", "orph")

. However, age is a continuous (actually an integer) from 0 to 17.

I get the following error:
> cc_fit_a <- lavaan::cfa(cc_model_a, data = cc, ordered = c("rooms", "edu", "sol", "nets", "gend", "disab", "orph"), conditional.x = FALSE, std.lv = FALSE)
Error in lav_data_full(data = data, group = group, cluster = cluster,  : 
  lavaan ERROR: missing observed variables in dataset: cc

As you can see, my model includes all the variables in cc:
> str(cc)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 6059 obs. of  8 variables:
 $ age  : num  2 6 8 10 6 14 5 4 8 5 ...
 $ gend : num  4 2 2 2 1 2 4 5 2 3 ...
 $ disab: num  0 1 0 1 0 0 0 0 0 0 ...
 $ orph : num  0 1 1 1 0 0 1 0 0 0 ...
 $ sol  : num  3 3 3 NA 2 3 3 3 3 3 ...
 $ edu  : num  2 1 0 0 1 NA 0 0 1 0 ...
 $ rooms: num  3 1 4 3 2 3 2 3 3 2 ...
 $ nets : num  1 0 0 0 1 0 0 0 0 0 ...

Could the error message be due to the fact that age is not actegorical and somehow the ordered = c() does exclude it?

Thanks

Edward Rigdon

unread,
Nov 19, 2019, 11:38:55 AM11/19/19
to lav...@googlegroups.com
Do you have a typo?
cc => e*age + f*gend + g*disab + h*orph

Should the operator be =~, not =>? The package believes cc to be an observed variable and not in your dataset.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/a5341fef-ed1b-4dba-9c44-e312b86f5109%40googlegroups.com.

Manolo Cabran

unread,
Nov 19, 2019, 12:13:55 PM11/19/19
to lav...@googlegroups.com
thank you edward,
i was focusing so much on the actual variables names that I did not check the rest of the model
yes, you are right, there actually 2 typos, the one that you spotted and cc that  is a formative LV, so the operator is <~
thanks again

----------------------------------------------

CABRAN Manolo

----------------------------------------------
Please consider the environment before printing this e-mail!
----------------------------------------------


You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/_8Nx9oRjGwc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAHxMgedE2nrgc%3DGL1Fp_J9%3D8V85o5zt4H9K5zt4jsTUMKrY%2Bcg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages