How to get outliers for a selection of variables used in a SEM

Lies Henderickx

unread,

Mar 21, 2018, 5:23:21 AM3/21/18

to lavaan

I need to find whether there are outliers in my dataset impacting my model fit and parameter estimation results.

I only use a selection of variables of the dataset.

This is my SEM syntax:

model <- '

extsoc =~ JOBMOTIVATIE_extsoc1 + JOBMOTIVATIE_extsoc2 + JOBMOTIVATIE_extsoc3

extmat =~ JOBMOTIVATIE_extmat1 + JOBMOTIVATIE_extmat2 + JOBMOTIVATIE_extmat3

introj =~ JOBMOTIVATIE_introj1 + JOBMOTIVATIE_introj2 + JOBMOTIVATIE_introj3 + JOBMOTIVATIE_introj4

ident =~ JOBMOTIVATIE_ident1 + JOBMOTIVATIE_ident2 + JOBMOTIVATIE_ident3

intrin =~ JOBMOTIVATIE_intrin1 + JOBMOTIVATIE_intrin2 + JOBMOTIVATIE_intrin3

MEAN_Smartconsumer ~ extsoc

MEAN_Smartconsumer ~ extmat

MEAN_Smartconsumer ~ introj

MEAN_Smartconsumer ~ ident

MEAN_Smartconsumer ~ intrin

MEAN_BEINGABLETO ~ extsoc

MEAN_BEINGABLETO ~ extmat

MEAN_BEINGABLETO ~ introj

MEAN_BEINGABLETO ~ ident

MEAN_BEINGABLETO ~ intrin

MEAN_VALUING ~ extsoc

MEAN_VALUING ~ extmat

MEAN_VALUING ~ introj

MEAN_VALUING ~ ident

MEAN_VALUING ~ intrin

MEAN_DOINGRESEARCH ~ extsoc

MEAN_DOINGRESEARCH ~ extmat

MEAN_DOINGRESEARCH ~ introj

MEAN_DOINGRESEARCH ~ ident

MEAN_DOINGRESEARCH ~ intrin

JOBMOTIVATIE_introj1 ~~ JOBMOTIVATIE_introj4

JOBMOTIVATIE_introj3 ~~ JOBMOTIVATIE_introj4

JOBMOTIVATIE_extsoc2 ~~ JOBMOTIVATIE_extsoc3'

fit <- sem(model, data=dat_SEM, group="fitK3.cluster")

I tried to use the faoutlier package, but I keep getting errors.

Syntax:

(FS <- forward.search(dat_SEM$JOBMOTIVATIE_extsoc2,dat_SEM$JOBMOTIVATIE_extsoc2,dat_SEM$JOBMOTIVATIE_extsoc3,

dat_SEM$JOBMOTIVATIE_extmat1,dat_SEM$JOBMOTIVATIE_extmat2, dat_SEM$JOBMOTIVATIE_extmat3,

dat_SEM$JOBMOTIVATIE_introj1, dat_SEM$JOBMOTIVATIE_introj2, dat_SEM$JOBMOTIVATIE_introj3, dat_SEM$JOBMOTIVATIE_introj4,

dat_SEM$JOBMOTIVATIE_ident1, dat_SEM$JOBMOTIVATIE_ident2,dat_SEM$JOBMOTIVATIE_ident3,

dat_SEM$JOBMOTIVATIE_intrin1, dat_SEM$JOBMOTIVATIE_intrin2, dat_SEM$JOBMOTIVATIE_intrin3,

dat_SEM$MEAN_BEINGABLETO, dat_SEM$MEAN_DOINGRESEARCH, dat_SEM$MEAN_Smartconsumer, dat_SEM$MEAN_VALUING,

model))

Error:

Error in 1:N : argument of length 0

What can I do to prevent this error?

And how can I extract the outliers?

kma...@aol.com

unread,

Mar 25, 2018, 10:01:08 AM3/25/18

to lavaan

Lies,
Again, this is not a lavaan question so much as a general SEMNET question. That may be why it has gone unanswered. Here is the code that I use in my data preparation lecture. Substitute your own data frame for NSKK.df. If I were writing the code today, I would probably use the head() function rather than indexing to print out just the first 20 cases. However, that is more for illustration and to check for errors than it is a functional part of the analysis.

# These packages must be installed before loading.

require(psych)

require(moments)

require(robustbase)

require(MVN)

require(mnormt)

require(mi)

require(rela)

(snip)


# Multivariate Normality tests from package MVN

#  Useful to run all three because they are not equally sensitive.

Mardia.NSKK <- mardiaTest(NSKK.df)

Mardia.NSKK

 
Henze.Zirkler.NSKK <- hzTest(NSKK.df)

Henze.Zirkler.NSKK

 
Royston.NSKK <- roystonTest(NSKK.df)

Royston.NSKK

 
 
# MVN provides univariate tests and plots to follow up

uniPlot(NSKK.df, type='histogram')

uniNorm(NSKK.df, type='SW', desc=TRUE)

 
 
 
# Mahalanobis test (for multivariate outliers)

NSKK.chisq <- mahalanobis(

    x = NSKK.df, 

    center = colMeans(NSKK.df), 

    cov = cov(NSKK.df)

    )

NSKK.pvalue <- pchisq(q=NSKK.chisq, df=4, lower.tail=FALSE)

MahalanobisOut <- cbind(1:1000, NSKK.chisq, NSKK.pvalue)

 
# Only print first 20 cases.

MahalanobisOut[1:20,]

 
# View outliers only.

MahalanobisOut[MahalanobisOut[,3] < .05,]

 
#plot

par(ask=FALSE, mfrow=c(1,1))

hist(NSKK.chisq)

Critical.chisq <- qchisq(p=.05, df=4, lower.tail=FALSE)

abline(v=Critical.chisq,col='red')

text(x=Critical.chisq, y=400, labels='p < .05', col='red', pos=4)

 
# Robust Mahalanobis Distance

# Compute robust Mahalanobis distances

robMD <- covMcd(NSKK.df)

plot(NSKK.chisq, robMD$mah, pch=16, 

  xlab='Standard Mahalanobis Distance',

  ylab='Robust Mahalanobis Distance')

abline(a=0, b=30, col=gray(.60))

 
# Compare robust and sample means

colMeans(NSKK.df)

robMD$center

 
# Compare SDs

sqrt(diag(cov(NSKK.df)))

sqrt(diag(robMD$cov))

 
# Compare correlations

round(cov2cor(cov(NSKK.df)),2)

round(cov2cor(robMD$cov),2)

 
# The centroid and covarianc matrix look very similar

# Nonetheless they make a big difference to the range of

#  Mahalanobis distances.

 
 
 
# Convenient plot & list

quan.NSKK <- mvOutlier(NSKK.df, method='quan')

head(quan.NSKK$outlier, 10)

tail(quan.NSKK$outlier, 10)

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Terrence Jorgensen

unread,

Mar 27, 2018, 7:22:10 AM3/27/18

to lavaan

this is not a lavaan question so much as a general SEMNET question

Agreed, but here is a software package that provides robust estimation for outliers:

https://cran.r-project.org/web/packages/rsem/index.html

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

Reply all

Reply to author

Forward