Long and wide tables, and lavaan

407 views

Skip to first unread message

Luis

unread,

Feb 5, 2017, 7:00:19 PM2/5/17

to lavaan

Hi everyone,

I am relatively new to lavaan, thus I am not used to some statistical terms. So be patient. This is my first time posting here: first of all, thanks the developpers for the excellent tool.

I am dealing with a large dataset that I intend to use in a SEM. The model design I have in mind is a mediation where a latent variable representing structural neuroimaging data mediates the effect of genetic data (another latent) into cognitive performance (another latent).

The dataset was provided by another university, which offers the information in fine and easy to use format (csv), all in a wide format datatable which was almost entirely loaded to R. Except for the cognitive data, which has 33058 observations spread in 9 variables in LONG format.

A dummy of the cognitive dataset would look like:

> mydata
    id assessment  value
1  ana     memory  0.000
2  ana     memory  1.000
3 brad  attention  0.895
4  ana     verbal  0.000
5 brad  attention 15.000
6 matt     memory  3.000
7 matt  attention  5.000

This can be easily converted into long format, becoming:

> newdata
Source: local data frame [7 x 5]
Groups: id [3]

      id    i1 attention memory verbal
* <fctr> <int>     <dbl>  <dbl>  <dbl>
1    ana     1        NA      0     NA
2    ana     2        NA      1     NA
3    ana     3        NA     NA      0
4   brad     1     0.895     NA     NA
5   brad     2    15.000     NA     NA
6   matt     1        NA      3     NA
7   matt     2     5.000     NA     NA

Now there are numorous NAs in the dataset, even more if I attach this set to the genetic and neuroimage set, as the subjects are repeated multiple times. Two questions:

Will lavaan understand this data orientation?
Or should I reduce the dimensionality of my dataset and perform a principal component analysis for each of the test domains?

Thank you,

Terrence Jorgensen

unread,

Feb 6, 2017, 5:12:40 AM2/6/17

to lavaan

Will lavaan understand this data orientation?

Whatever your sampling unit of interest is (e.g., occasions, individuals), lavaan expects data in wide format (one row per sampling unit). It does not (yet) have features for analyzing multilevel data. The lavaan.survey package can adjust standard errors for clustering, but I'm not sure whether that is relevant for you.

Or should I reduce the dimensionality of my dataset and perform a principal component analysis for each of the test domains?

I'm not sure what that would accomplish. A structural equation model is meant to represent a theoretical causal model, so the entities in the model should be real interpretable variables. But this is beyond the scope of a software question. SEMNET is available for general SEM questions, and hopefully someone in that wide audience could provide guidance on appropriate ways to use SEM for neuroimaging data.

www2.gsu.edu/~mkteer/semnet.html

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam