Autoregressive SEM model with one observation

232 views
Skip to first unread message

Kamil Krawczyk

unread,
Apr 26, 2019, 2:53:37 AM4/26/19
to lavaan
Hi,

Some days ago I started using lavaan in my job as data scientist. 

I think that I understand SEM models well - I already do some DSEM AR(p) models with many observations and lavaan package, 
but I don't know how to DSEM AR(p) for only one observation. 

In some articles there are talking about DSEM AR(p) with only one observation, but I don't understand it very well..

Anyone could write some example model for it or maybe some hint?

Terrence Jorgensen

unread,
Apr 26, 2019, 4:02:31 AM4/26/19
to lavaan
I don't know how to DSEM AR(p) for only one observation. 

You mean one case/subject, with several observations?  In wide format, one 1 row of data means there are no variables, because every column is a constant.  Thus, there is no (co)variance to model.  Not sure about DSEM in lavaan, but at the very least, I imagine you would need your data in long format (one row per occasion), where your sample is occasions from the population of 1 person's experiences.

In some articles there are talking about DSEM AR(p) with only one observation, but I don't understand it very well..

Neither do I, but I know Ellen Hamaker is doing a lot of work in this area recently, and it is a fairly new feature of Mplus.  So I would recommend searching the Mplus site for reference materials.  Here is a recent SEM paper:


Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Kamil Krawczyk

unread,
Apr 26, 2019, 4:08:59 AM4/26/19
to lavaan
Talking simple, I want do some simple time series modeling with lavaan where I have only one time series (eg. LakeHuron data in R)

I have read some papers about Mplus but in this software I don't understand syntax of the model...

Mauricio Garnier-Villarreal

unread,
Apr 27, 2019, 12:56:45 PM4/27/19
to lavaan

If you have only 1 subject SEM methods will not be able to work with it. The most generous approach would be to use the package gimme (https://cran.r-project.org/web/packages/gimme/index.html) which the developers sustain can work with as few as 10 subjects. Does require long time series (at least 60 time points). Again, wont work with just 1

MH Manuel Haqiqatkhah

unread,
Apr 28, 2019, 8:00:51 AM4/28/19
to lavaan
Take a look at this question of mine on CV (but ignore the last two paragraphs). What I suggested (making a new vector of observations out of lagged observations and structuring the residual errors, by imposing correlation/AR constraint), in my opinion, does the job of RDSEM modeling. And it can be implemented in lavaan too.

Edward Rigdon

unread,
Apr 28, 2019, 9:09:25 AM4/28/19
to lav...@googlegroups.com
Take a look at Steven Boker's page, where he describes models for dynamic factor analysis:
With intensively longitudinal data, overlapping snips of 1 data series perform roughly as independent cases. Common factors are specified so that they describe the moments of such a series. Boker has created utilities (using OpenMx) that do the hard part, setting up the data. The actual dynamic factor model is then easily specified with lavaan.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Kamil Krawczyk

unread,
Apr 30, 2019, 5:09:01 AM4/30/19
to lavaan
I have heard already about OpenMX from other package ctsem.

ctsem is using OpenMX functions to create DSEM models. In instruction pdf for ctsem thera are some cases of DSEM model only with one observation, but I don't get how the model looks like...

A case from instruction pdf:

genm <- ctModel(Tpoints = 200, n.latent = 2, n.manifest = 1, # creating model with lag 2 and one observation
                LAMBDA = matrix(c(1, 0), nrow = 1, ncol = 2),
                DIFFUSION = matrix(c(0, 0, 0, 1), 2),
                MANIFESTVAR = t(chol(diag(.6, 1))),
                DRIFT = matrix(c(0, -0.1, 1, -0.2), nrow = 2),
                CINT = matrix(c(1, 0), nrow = 2))
data <- ctGenerate(genm, n.subjects = 1, burnin = 200) # generating 200 observation from this model and ???
ctIndplot(data, n.subjects = 1 , n.manifest = 1, Tpoints = 200) # ploting time series
model <- ctModel(Tpoints = 200, n.latent = 2, n.manifest = 1, # making another model with lag 2 and one observation
                    LAMBDA = matrix(c(1, 0), nrow = 1, ncol = 2),
                    DIFFUSION = matrix(c(0, 0, 0, "diffusion"), 2),
                    DRIFT = matrix(c(0, "regulation", 1, "diffusionAR"), nrow = 2))
fit <- ctFit(data, model, stationary = "T0VAR") # fiting new model.


W dniu niedziela, 28 kwietnia 2019 15:09:25 UTC+2 użytkownik Edward Rigdon napisał:
Take a look at Steven Boker's page, where he describes models for dynamic factor analysis:
With intensively longitudinal data, overlapping snips of 1 data series perform roughly as independent cases. Common factors are specified so that they describe the moments of such a series. Boker has created utilities (using OpenMx) that do the hard part, setting up the data. The actual dynamic factor model is then easily specified with lavaan.

On Sun, Apr 28, 2019 at 8:00 AM MH Manuel Haqiqatkhah <mhsci...@gmail.com> wrote:
Take a look at this question of mine on CV (but ignore the last two paragraphs). What I suggested (making a new vector of observations out of lagged observations and structuring the residual errors, by imposing correlation/AR constraint), in my opinion, does the job of RDSEM modeling. And it can be implemented in lavaan too.


On Friday, April 26, 2019 at 8:53:37 AM UTC+2, Kamil Krawczyk wrote:
Hi,

Some days ago I started using lavaan in my job as data scientist. 

I think that I understand SEM models well - I already do some DSEM AR(p) models with many observations and lavaan package, 
but I don't know how to DSEM AR(p) for only one observation. 

In some articles there are talking about DSEM AR(p) with only one observation, but I don't understand it very well..

Anyone could write some example model for it or maybe some hint?

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lav...@googlegroups.com.

Mauricio Garnier-Villarreal

unread,
Apr 30, 2019, 3:03:15 PM4/30/19
to lavaan

The model you presented is a higher order model, would be more similar to a moving average model, instead of an AR.

If you want to specify the similar model to an AR model, you can specify it something like this:
From this model the DRIFT matrix include the continuous time Auto-regressive parameter. This is the time independent AR, which for the same is hard to interpret, you can transform this into the AR at different time lag dependencies, expm(summary(example1fit)$DRIFT * 2.5) like this will give the AR at time difference 2.5 units of time. While summary(example1fit, verbose = TRUE)["discreteDRIFTstd"] gives you the AR at time lag 1.

I didnt know ctsem could handle a single observation, apparently it can. I use the AR model since it is a simpler, and I think is what you are looking for

library(ctsem)

data("ctExample1")
head(ctExample1)
ctExample1 <- ctExample1[,c(1,3,5,7,9,11,13:17)] ## select Leisure variables
example1model <- ctModel(n.latent = 1, n.manifest = 1, Tpoints = 6,
                         manifestNames = c("LeisureTime"),
                         latentNames = c("LeisureTime"),
                         LAMBDA = diag(1))

example1fit <- ctFit(dat = ctExample1, ctmodelobj = example1model)

summary(example1fit)
summary(example1fit, verbose = TRUE)["discreteDRIFTstd"]

expm(summary(example1fit)$DRIFT * 2.5)

Kamil Krawczyk

unread,
May 7, 2019, 9:03:38 AM5/7/19
to lavaan
Mauricio I know that this model wasn't looking like AR model. I only wanted to reproduce this model in lavaan model syntax only.

During May holidays in my country I had time to think about it. 

I'm considering now about replications of the same time series. It will give me more observations than one and add some white noise (standardised multivariate normal distribution) to them to get some nonzero variances and covariances.

I also wrote a function with many loops that create simple DSEM AR model (I think). My model will be attached as txt file. Unfortunately, number of iterations is very big even for small cases (like 40 observations in time series)

Simple case with LakeHuron dataset:

library(mvtnorm) # Package for multivariate normal distribution
library(lavaan)

n <- 40
p <- 2
x <- LakeHuron[1:n]
dataset <- t(replicate(n,x))

set.seed(12345)

disturbance_matrix <- rbind(rep(0,n),rmvnorm(n-1,rep(0,n),1*diag(n)))  # First row is keeped normal to have some predictions about time series

disturbed_dataset <- dataset + disturbance_matrix 

model <- readLines("model.txt")
model <- paste0(model,collapse = "\n")

sem_fitted_model <- sem(model,disturbed_dataset)

Now I want to reduce somehow number of iterations, but I don't know how to do it
model.txt

Mauricio Garnier-Villarreal

unread,
May 11, 2019, 3:24:18 PM5/11/19
to lavaan
 
Kamil

Your model is an AR model. Now, there are several issues here, replicating one time series to have multiple to "trick" lavaan into running would most likely bias standard errors, and not sure how it might affect the AR parameters as you are adding noise to it

An AR model by itself is not a Dynamic model. This would be an AR panel model. Dynamic models are more commonly categorized as dynamic discrete time, and dynamic continuous time. I believe the only way to have a proper dynamic model in lavaan would be by manipulating the data using Steve Boker method to embed the time series and later define the derivatives as latent factors, similar to growth curve model.

I understand you are comparing Mplus DSEM, I am unclear exactly which type of models Mplus use, but I do know they are in the dynamic discrete time family.

The large number of iterations make sense given the estimation method and equations used in lavaan, when it is inverting the large dimensional matrices will take longer to converge

In R the best options to work with Dynamic models are ctsem (https://cran.r-project.org/web/packages/ctsem/index.html), gimme (https://cran.r-project.org/web/packages/gimme/index.html), or dynr (https://cran.r-project.org/web/packages/dynr/index.html). These are packages specifically design for these types of models. lavaan is a great package but I believe the requirements for dynamic modeling are handle better by these packages




Kamil Krawczyk

unread,
May 14, 2019, 5:14:04 AM5/14/19
to lavaan
Mauricio

I have checked how it works with some examples and for me it works, but predicted observations with estimated model was very overfitted.. (especially when I'm adding more replications of time series) and CFI was equal 0 ... 

So I added some fixes for counter measure:

- Added noise is now multivariate normal distribution with zero mean and it's covariance matrix is autocovariance matrix of time series - better keeped relation of variables

- Coefficients and free term is now calculated separate for each equation - it causes decreasing number of iterations and lowering of overfit.

Using these fixes CFI is now equal around 52%, RMSEA is 0.153 and model only needed 3103 iterations for 98 observations.

Then coefficients for time series are calculated as mean of coefficients of every equations. Calculated MSE for my method: 0.4456673 and MSE for simple ARIMA in forecast package: 0.4788206 so I think that this model is a good fit.

Regarding the second paragraph: I considered that my model maybe isn't a dynamic model. I only needed to do working AR model and the best what I have found was info about DSEM so I wrote the title in this way. If it's misleading I'm sorry.

Best regards,
Kamil
Reply all
Reply to author
Forward
0 new messages