Outliers

518 views
Skip to first unread message

Peter Taylor

unread,
May 12, 2014, 9:52:28 AM5/12/14
to lav...@googlegroups.com
Hello,
 
  Is there anyway of listing potential outlying or influential cases, such as Mahalanobis' distances or similar, for Lavaan? I'm interested in estimating a CFA model with ordinal indicators so it would need to be available with WLSMV estimation. Any suggestions much appreciated. Thanks
 
Peter

Phil Chalmers

unread,
May 12, 2014, 11:56:57 AM5/12/14
to lav...@googlegroups.com
Hi Peter,

I wrote an R package a while back to compute outliers in factor analysis and SEMs which supports lavaan syntax. It's called faoutlier, and computes stats like the generalized Cook's distances, likelihood drop, robust Mahalanobis distance, etc. That might be what you are looking for, and it should support the WLSMV estimator without any issues. Cheers.

Phil

Peter Taylor

unread,
May 13, 2014, 4:40:26 AM5/13/14
to lav...@googlegroups.com
Great, will look into this, thanks for your help
 
Peter

Peter Taylor

unread,
May 13, 2014, 6:52:08 AM5/13/14
to lav...@googlegroups.com
Hi,
 
I'm having some problems running faoutlier. Specifically using the following syntax:
 

model1 <- '
  # measurement model
    DS =~ PBEQ4 + PBEQ6R + PBEQ10 + PBEQ13 + PBEQ2 + PBEQ8
    NA =~ PBEQ5 + PBEQ7 + PBEQ1 + PBEQ9
    ES =~ PBEQ12R + PBEQ3'
(FS <- forward.search(edie, model1))
(FS.outlier <- forward.search(edie.outlier, model1))
plot(FS)
plot(FS.outlier)
 
I get the error message "Error in na.omit(data) : object 'edie.outlier' not found"
 
I also had two other queries 1) do I need to inform faoutlier that the data are ordinal (normally I do this in lavaan in the fitting function, e.g., fit1 <- sem(model1, data = edie, ordered=c ..., 2) could you recommend where I could read an overview of the forward search approach to outlier detection as it is new to me.
 
Help with these points is veyr much appreciated. Thanks
 
Peter

Phil Chalmers

unread,
May 13, 2014, 10:00:53 AM5/13/14
to lav...@googlegroups.com
On Tue, May 13, 2014 at 6:52 AM, Peter Taylor <f1r3t...@gmail.com> wrote:
Hi,
 
I'm having some problems running faoutlier. Specifically using the following syntax:
 

model1 <- '
  # measurement model
    DS =~ PBEQ4 + PBEQ6R + PBEQ10 + PBEQ13 + PBEQ2 + PBEQ8
    NA =~ PBEQ5 + PBEQ7 + PBEQ1 + PBEQ9
    ES =~ PBEQ12R + PBEQ3'
(FS <- forward.search(edie, model1))
(FS.outlier <- forward.search(edie.outlier, model1))
plot(FS)
plot(FS.outlier)
 
I get the error message "Error in na.omit(data) : object 'edie.outlier' not found"

Is there a data set object in your workspace called 'edie.outlier'? 
 
 
I also had two other queries 1) do I need to inform faoutlier that the data are ordinal (normally I do this in lavaan in the fitting function, e.g., fit1 <- sem(model1, data = edie, ordered=c ...,

Yes. Any options you pass to the functions will also be passed to lavaan.
 
2) could you recommend where I could read an overview of the forward search approach to outlier detection as it is new to me.

Mavridis, D. & Moustaki, I.
Detecting Outliers in Factor Analysis Using the Forward Search Algorithm
Multivariate Behavioral Research, 2008, 43, 453-475
     
 
 
Help with these points is veyr much appreciated. Thanks
 
Peter

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/vNFWp9Ir01c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Peter Taylor

unread,
May 13, 2014, 10:34:38 AM5/13/14
to lav...@googlegroups.com
Ah, I've just been silly with the syntax, first issue makes sense now. Thanks again

Peter Taylor

unread,
May 14, 2014, 5:50:16 AM5/14/14
to lav...@googlegroups.com

As I'm new to R I just wanted to check I had the syntax down correctly. Currently I'm using the following:


edie[,c("PBEQ1", "PBEQ2", "PBEQ3", "PBEQ4", "PBEQ5", "PBEQ6R", "PBEQ7", "PBEQ8", "PBEQ9", "PBEQ10", "PBEQ11", "PBEQ12R", "PBEQ13")] <-
lapply(edie[,c("PBEQ1", "PBEQ2", "PBEQ3", "PBEQ4", "PBEQ5", "PBEQ6R", "PBEQ7", "PBEQ8", "PBEQ9", "PBEQ10", "PBEQ11", "PBEQ12R", "PBEQ13")], ordered)

library(lavaan)
model5 <- '


  # measurement model
    DS =~ PBEQ4 + PBEQ6R + PBEQ10 + PBEQ13 + PBEQ2 + PBEQ8
    NA =~ PBEQ5 + PBEQ7 + PBEQ1 + PBEQ9
    ES =~ PBEQ12R + PBEQ3'

(gCDresult <- gCD(edie, model5))
plot(gCDresult)

 

This seems to provide different results depending on whether I set the items to be categorical, so I assume faoutlier is taking this into account.

 

Incidentally, robustMD does not work as it states some IQR = 0, I assume this relates to the ordinal nature of the data (4 ordered categories), so that an IQR of 0 might be expected. For obs.resid, I get the error message “Error: is.numeric(x) || is.logical(x) is not TRUE”, any thoughts. Everything else seems to work fine.

 

Peter

Phil Chalmers

unread,
May 14, 2014, 10:58:17 AM5/14/14
to lav...@googlegroups.com
On Wed, May 14, 2014 at 5:50 AM, Peter Taylor <f1r3t...@gmail.com> wrote:

As I'm new to R I just wanted to check I had the syntax down correctly. Currently I'm using the following:


edie[,c("PBEQ1", "PBEQ2", "PBEQ3", "PBEQ4", "PBEQ5", "PBEQ6R", "PBEQ7", "PBEQ8", "PBEQ9", "PBEQ10", "PBEQ11", "PBEQ12R", "PBEQ13")] <-
lapply(edie[,c("PBEQ1", "PBEQ2", "PBEQ3", "PBEQ4", "PBEQ5", "PBEQ6R", "PBEQ7", "PBEQ8", "PBEQ9", "PBEQ10", "PBEQ11", "PBEQ12R", "PBEQ13")], ordered)

library(lavaan)
model5 <- '


  # measurement model
    DS =~ PBEQ4 + PBEQ6R + PBEQ10 + PBEQ13 + PBEQ2 + PBEQ8
    NA =~ PBEQ5 + PBEQ7 + PBEQ1 + PBEQ9
    ES =~ PBEQ12R + PBEQ3'

(gCDresult <- gCD(edie, model5))
plot(gCDresult)

 

This seems to provide different results depending on whether I set the items to be categorical, so I assume faoutlier is taking this into account.

 

Incidentally, robustMD does not work as it states some IQR = 0, I assume this relates to the ordinal nature of the data (4 ordered categories), so that an IQR of 0 might be expected.


That's correct, and in any event Mahalanobis distances are not defined for ordinal data with three or more categories. Dichotomous or continuous variables only.
 

For obs.resid, I get the error message “Error: is.numeric(x) || is.logical(x) is not TRUE”, any thoughts. Everything else seems to work fine.


That is somewhat peculiar and not something I've run into before. It may have something to do with specifying variables as ordinal. I'll look into it to see if I can replicated it. Cheers.

Phil 

Phil Chalmers

unread,
May 15, 2014, 4:21:45 PM5/15/14
to lav...@googlegroups.com
Hi Peter,

I was not able to reproduce this issue on the dev version of the package on Github. Could you provide more information to help me replicate the problem? Thanks.

Phil

Peter Taylor

unread,
May 19, 2014, 5:37:51 AM5/19/14
to lav...@googlegroups.com
Sure, data is 13 ordinal items scored 1 to 4 . I used both R 3.0.2 and 3.1.0 with the same result. Syntax was:
 

edie[,c("PBEQ1", "PBEQ2", "PBEQ3", "PBEQ4", "PBEQ5", "PBEQ6R", "PBEQ7", "PBEQ8", "PBEQ9", "PBEQ10", "PBEQ11", "PBEQ12R", "PBEQ13")] <-
lapply(edie[,c("PBEQ1", "PBEQ2", "PBEQ3", "PBEQ4", "PBEQ5", "PBEQ6R", "PBEQ7", "PBEQ8", "PBEQ9", "PBEQ10", "PBEQ11", "PBEQ12R", "PBEQ13")], ordered)
 
library(lavaan)
model1 <- '

  # measurement model
    DS =~ PBEQ4 + PBEQ6R + PBEQ10 + PBEQ13 + PBEQ2 + PBEQ8
    NA =~ PBEQ5 + PBEQ7 + PBEQ1 + PBEQ9
    ES =~ PBEQ12R + PBEQ3'

library(faoutlier)

(resid <- obs.resid(edie, model1))
 

Peter Taylor

unread,
May 19, 2014, 5:38:12 AM5/19/14
to lav...@googlegroups.com
let me know if you need any more info, thanks
 
Peter 

Phil Chalmers

unread,
May 19, 2014, 11:57:09 AM5/19/14
to lav...@googlegroups.com
It appears you are passing a list to the function rather than a data.frame (which lavaan requires). Use something like as.data.frame() on your data object to change the object class. It's also possible that the model is not uniquely identified since the third latent variable only has 2 indicators (3 are generally required, but two are possible with additional constraints).

However, it could be something else entirely, but if further issues arise I don't think I'll be able to detect it from syntax alone (you're welcome to send your dataset to my email if you'd prefer not to post it publicly). Cheers

Phil 


On Mon, May 19, 2014 at 5:38 AM, Peter Taylor <f1r3t...@gmail.com> wrote:
let me know if you need any more info, thanks
 
Peter 

--

peter....@gmail.com

unread,
May 23, 2014, 4:55:41 PM5/23/14
to lav...@googlegroups.com
Hi Phil,

I am actually trying to fit a model with a latent variable that has only two indicators. What additional constraints do I need to to implement to get my model to fit in lavaan? My latent variable with two indicators is part of a much larger model. Thanks in advance for any help!

Best,
Peter

Phil Chalmers

unread,
May 23, 2014, 4:58:26 PM5/23/14
to lav...@googlegroups.com
Usually constraining the two slopes/loadings to be equal is the common approach to identifying these kinds of models. Though of course that means you assume the variables load equally well on though latent variable, which might be harder to justify theoretically.

Phil
Reply all
Reply to author
Forward
0 new messages