Hi everyone,
I am relatively new to lavaan, thus I am not used to some statistical terms. So be patient. This is my first time posting here: first of all, thanks the developpers for the excellent tool.
I am dealing with a large dataset that I intend to use in a SEM. The model design I have in mind is a mediation where a latent variable representing structural neuroimaging data mediates the effect of genetic data (another latent) into cognitive performance (another latent).
The dataset was provided by another university, which offers the information in fine and easy to use format (csv), all in a wide format datatable which was almost entirely loaded to R. Except for the cognitive data, which has 33058 observations spread in 9 variables in LONG format.
A dummy of the cognitive dataset would look like:
> mydata
id assessment value
1 ana memory 0.000
2 ana memory 1.000
3 brad attention 0.895
4 ana verbal 0.000
5 brad attention 15.000
6 matt memory 3.000
7 matt attention 5.000
This can be easily converted into long format, becoming:
> newdata
Source: local data frame [7 x 5]
Groups: id [3]
id i1 attention memory verbal
* <fctr> <int> <dbl> <dbl> <dbl>
1 ana 1 NA 0 NA
2 ana 2 NA 1 NA
3 ana 3 NA NA 0
4 brad 1 0.895 NA NA
5 brad 2 15.000 NA NA
6 matt 1 NA 3 NA
7 matt 2 5.000 NA NA
Now there are numorous NAs in the dataset, even more if I attach this set to the genetic and neuroimage set, as the subjects are repeated multiple times. Two questions:
Thank you,
Will lavaan understand this data orientation?
Or should I reduce the dimensionality of my dataset and perform a principal component analysis for each of the test domains?