Recognition of missing data when asking R to read an SPSS file

1,028 views
Skip to first unread message

p007...@brookes.ac.uk

unread,
Feb 15, 2017, 3:10:16 AM2/15/17
to lavaan
I am asking R to read an SPSS file in which discrete missing variables are coded as '99' for some scores (student achievement grades). Does R have an automatic ability to also detect these are missing variables or will I need to recode cells where there are missing variables in SPSS so that they are blank? I currently have invariance when running a model and wonder if it may be caused by this. I am using FIML.
Many thanks
Carol

Mikko Rönkkö

unread,
Feb 15, 2017, 3:16:39 AM2/15/17
to lav...@googlegroups.com
Hi,

On 15 Feb 2017, at 10.10, p007...@brookes.ac.uk wrote:

I am asking R to read an SPSS file in which discrete missing variables are coded as '99' for some scores (student achievement grades). Does R have an automatic ability to also detect these are missing variables or will I need to recode cells where there are missing variables in SPSS so that they are blank?

The answer may depend on which package you are using to load the dataset, but for example the haven package that RStudio uses when you import a dataset using menus does not automatically code missing data.

The easiest way to know is to print out the dataset to console. If you see 99s, then they are not recognised as missing data. If you see NAs, then missing data has been coded as missing.

If you want to code missing data, it is simple to do. See the example below:

# Generate an artificial dataset with 99 marking missing data
data <- as.data.frame(matrix(sample(c(1:5,99),100, replace = TRUE),20,5))
print(data)

# Code 99 as missing
data[data == 99] <- NA
print(data)

Mikko


I currently have invariance when running a model and wonder if it may be caused by this. I am using FIML.
Many thanks
Carol

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

p007...@brookes.ac.uk

unread,
Feb 15, 2017, 3:30:13 AM2/15/17
to lavaan
Dear Mikko

Thanks for such a quick response to this. How do I print the dataset to console? I tried view -> show file and selected the relevant but it opened the original SPSS file? Many thanks
Carol

Mikko Rönkkö

unread,
Feb 15, 2017, 4:12:40 AM2/15/17
to lav...@googlegroups.com
Hi,

On 15 Feb 2017, at 10.30, p007...@brookes.ac.uk wrote:

Dear Mikko

Thanks for such a quick response to this. How do I print the dataset to console?

print(data)

Where “data" is the name of your data object.

I tried view -> show file and selected the relevant but it opened the original SPSS file?

I assume that that means that you are using some kind of graphical user interface (e.g. RStudio) and refer to the view menu. Which user interface are you using? RStudio does not have a “show file” option under the “view” menu.

Mikko

Carol Brown

unread,
Feb 15, 2017, 4:23:20 AM2/15/17
to lav...@googlegroups.com
Morning

Yes I am using RStudio. So do I just add print(data) at the bottom?
 
Sorry for the questions! I am just learning R for the first time for my thesis.

Thanks

Carol

Syntax as follows;

# (b) How do socio-economic status, gender and school sector affect achievement the STV attached to A-level achievement?

 

rq.4 <- '

# Measurement part

#STV (excluding utility)


intrinsic =~ Q21 + Q30

attainment =~ Q25 + Q31 + Q37

 

STV =~ intrinsic + attainment

 

SES =~ momed + daded + ISEImo2 + ISEIfa2 + books + homepossessions

 

# Structural part

STV ~ SES + girl + school'

 

fit <- sem(rq.4, data = mydata, missing='fiml', estimator='MLR')

summary(fit, fit.measures=TRUE, standardized = TRUE)

inspect(fit, 'r2')

inspect(fit,'cor.lv')

semPaths (fit, what="std",

          style="lisrel",edge.color="black", fade=FALSE,

          edge.width=0.5, nCharNodes=7, intercepts = FALSE)



Hi,

To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+unsubscribe@googlegroups.com.

To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/lF-2GNKJUaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+unsubscribe@googlegroups.com.

To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.



--
Carol Brown

Office hours by appointment - please email

MSc (Oxon), BSc (Lond), DipSw, QTS
Senior Lecturer in Child Development and Education
Oxford Brookes University 
Harcourt Hill Campus
Oxford OX2 9AT 

Mikko Rönkkö

unread,
Feb 15, 2017, 4:27:23 AM2/15/17
to lav...@googlegroups.com
Hi,

On 15 Feb 2017, at 11.23, Carol Brown <carol...@brookes.ac.uk> wrote:

Morning

Yes I am using RStudio. So do I just add print(data) at the bottom?
 
Sorry for the questions! I am just learning R for the first time for my thesis.

No need to be sorry for asking questions. However, try to provide sufficient details so that they can be answered. 

Your data seems to be named “mydata”. Just type

print(mydata) 

to console to view the dataset. Or double click on mydata in the environment panel on top left of the RStudio window. That will open the data into the data viewer and you can check if you have 99s or NAs.

Mikko

Carol Brown

unread,
Feb 15, 2017, 4:57:15 AM2/15/17
to lav...@googlegroups.com
Dear Mikko

Thank you so much for your help - I very much appreciate it. It has printed and the missing data has coded as NA so I know it's accounted for. And I have learnt lots of new things in the process. It has helped the start of my day enormously :) Many many thanks for guiding me through
Carol

Hi,

To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+unsubscribe@googlegroups.com.

To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/lF-2GNKJUaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+unsubscribe@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

kma...@aol.com

unread,
Feb 15, 2017, 10:22:18 PM2/15/17
to lavaan
Carol & Mikko,

Instead of using print(), you can use edit() to view a data set in a spreadsheet type display.  However, if you want to save any changes, then you need to assign the result of edit() to an R object:  newData <- edit(oldData).

The following summary can be helpful regarding working with missing data in R.

http://www.statmethods.net/input/missingdata.html

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/


p007...@brookes.ac.uk

unread,
Feb 16, 2017, 4:05:44 AM2/16/17
to lavaan
This is also very useful and thanks for the link

Carol

Meital Mashash

unread,
Jun 25, 2020, 5:28:51 AM6/25/20
to lavaan
Dear Mikko, 

I ran into your response dealing with the same question as Carol about how to recode missing data when transferring a file from SPSS to R.
I am also using Rstudio, so I think I will need to recode them and would like to use the syntax you've suggested: 

# Generate an artificial dataset with 99 marking missing data
data <- as.data.frame(matrix(sample(c(1:5,99),100, replace = TRUE),20,5))
print(data)

# Code 99 as missing
data[data == 99] <- NA
print(data)

Can you please explain what the numbers in this row means? 
(c(1:5,99),100, replace = TRUE),20,5)

I apologize if this is basic knowledge as I am kind of new to R. 
Thank you so much!

May 

Nickname

unread,
Jun 26, 2020, 10:18:40 AM6/26/20
to lavaan
May,
Better still, let me explain how you can answer such questions in the future.  Here is the line of code in question.


data <- as.data.frame(matrix(sample(c(1:5,99),100, replace = TRUE),20,5))

The first thing to notice is that this is an assignment, so we can ignore the "data <-" and focus on parsing the right hand side which would be a valid line of code evaluating an expression all by itself.

Next, notice that the expression is formed by nesting the following functions:
as.data.frame()
matrix()
sample()
c()

So, the next thing you want to do is to look at the help files for each of these functions to learn their parameters.
?as.data.frame
?matrix
?sample
?c

Now we can work our way through step by step in order.  I will work from the inside out.

c(1:5, 99)

The colon is shorthand for seq(from=1, to=5, by=1).  So, the result is the vector 1,2,3,4,5,99.  To simplify notaton and break down the steps, let's call that Ralph.

Ralph <- c(1:5, 99)
sample(Ralph,100, replace = TRUE)

According to the help file, the first two parameters to sample() are x and size.  So, a sample is being drawn from Ralph of size 100.  Let's name the parameters and call this Alice.

Alice <- sample(x = Ralph, size = 100, replace = TRUE)
matrix(Alice,20,5)

The first three parameters to the matrix() function  are data, nrow, and ncol.  So, this is taking the data from Alice and arranging it in a matrix with 20 rows and 5 columns.  Let's call that Trixie.

Trixie <-
matrix(data=Alice, nrow=20, ncol=5)
as.data.frame(Trixie)

The first parameter to as.data.frame is x, the object to be converted to a data frame.

as.data.frame(x = Trixie)

HTH,

Stas Kolenikov

unread,
Jun 26, 2020, 10:31:08 AM6/26/20
to lav...@googlegroups.com
See also this -- this is what I would have used:  https://cran.r-project.org/web/packages/naniar/vignettes/replace-with-na.html  

To view your data in RStudio, you can use

# note that it is capitalized
View(mydata)

# This is also very helpful:

library(vtable)
vtable(mydata)

-- Stas Kolenikov, PhD, PStat (ASA, SSC)  @StatStas
-- Principal Scientist, Abt Associates @AbtDataScience
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name
 


May

unread,
Jun 27, 2020, 5:31:04 AM6/27/20
to lavaan
Thank you very much!

May

unread,
Jun 27, 2020, 5:31:50 AM6/27/20
to lavaan
Thank you very much!

On Friday, June 26, 2020 at 7:18:40 AM UTC-7, Nickname wrote:
Reply all
Reply to author
Forward
0 new messages