Beginner's Problems with missMDA and MCA

33 views
Skip to first unread message

Holly Beavon

unread,
May 1, 2025, 8:42:14 PMMay 1
to FactoMineR users
Hello,

This is my first time working with FactoMineR and missMDA.
I've had success with FactoMineR with and without the FactoMineR Plugin (incredibly helpful!), and now I am trying to get missMDA to work so the NA values are not plotted and analyzed. The tutorial and video worked great!

My 100% categorical data (except for case IDs and Date columns) are also 100% yes/no binary data. They were originally coded as 0s and 1s, but numerous errors caused me to  recode it. FactoMineR worked, but missMDA is now generating different errors and I'm working in circles.

Here's how my data is currently formatted:
The first categorical variable is the first column, with case ID and Date moved to the last 2 columns.
I have 31 pairs of qualitative variables, with the headers as follows: X1, X1.1, X2, X2.1, etc. The values in the rows are con1 or NA in X1, and in its paired second column labeled X1.1, the values are op1 or NA. Some rows have con1 and op1, so although they are both for code 1, they are applied differently and the values cannot be in the same cell. 8 columns have no values in any of the rows.

I also have 18 qualitative variable columns with labels such as "Children" and their values are Yes and NA, and all columns have at least one row with data in a cell.

I read in the CSV data as follows:
CIData <- read.table("C:/Users/holly/Documents/R Data/CID/ID May1 for MCA.csv", header=TRUE, stringsAsFactors=TRUE, sep=",", na.strings="NA", dec=",", strip.white=TRUE)

Then, I REMOVED THE EMPTY columns from the active dataframe, which I filled with "FALSE" values after getting this error in missMDA:
complete <- imputeMCA(CIData.MCA, ncp=5)
Error in while (continue) { : missing value where TRUE/FALSE needed

The dataframe was altered to remove the empty columns and name the active qualitative variables as follows:
CIData.MCA<-CIData[, c("X1", "X1.1", ...., "Date")]

Here's a current error that doesn't make sense to me. It sees the data, as indicated by the dimensions and head(CIData.MCA) working, but does not see it when calling the data() command in missMDA:

dim(CIData.MCA) - seems good
[1] 162  73
> data(CIData.MCA) - mysterious failure
Warning message:
In data(CIData.MCA) : data set ‘CIData.MCA’ not found
??: Is data() a vital part of the missMDA process?

Moving on, this has worked a few times and I have 3 attractive graphs, but now it throws this error:

nb <- estim_ncpMCA(CIData.MCA)
Error in apply(tabdisj[, (vec[i] + 1):vec[i + 1]], 1, which.max) :
  dim(X) must have a positive length

The next line of code also throws an error:

complete <- imputeMCA(CIData.MCA, ncp=5)
Error in while (continue) { : missing value where TRUE/FALSE needed

The object is not found, probably because it follow the errors & did not make the object it should have:
res.mca <- MCA(comp$completeObs)
Error: object 'comp' not found

The header kept the tutorial's tab.disj information, as indicated by the header
 head(complete$tab.disj)
  Q7.1.1 Q7.1.2 Q7.1.3....Not my data with the X1, X1.1, column names.

Additionally, the dimdesc() function for describing the categorical variables has always given me an error about not being able to make contrasts when there is only one level to the data.

I'm sure this is a very simple problem with the setup of the data itself costing me a week so far, but I'm so new to this that I cannot find the right solution.

Thank you to anyone offering help!!
It's all so quick and easy when it works! Haha

Best wishes in your endeavors and huge thanks to anyone offering help for my own!
Holly

François Husson

unread,
May 5, 2025, 4:30:20 AMMay 5
to factomin...@googlegroups.com
Hello,

Your first column is the name of the individuals, so it is necessary to add the argument row.names=1 when you read data.
CIData <- read.table("C:/Users/holly/Documents/R Data/CID/ID May1 for MCA.csv", header=TRUE, stringsAsFactors=TRUE, sep=",", na.strings="NA", dec=",", strip.white=TRUE, row.names=1)

If you don't do that, the column is consider as a categorical variable but with as many categories as individuals, so the MCA cannot be run.

Then your variable need to have at least 2 categories (you cannot have only "yes" and NA for some column).

I do not understand your dataset.

FH
--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "FactoMineR users".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse factominer-use...@googlegroups.com.
Pour afficher cette discussion, accédez à https://groups.google.com/d/msgid/factominer-users/3c33c2cb-4d90-42b8-a453-6fdb6b5588acn%40googlegroups.com.

--
François Husson
Department Statistics & Computer Science
L'Institut Agro
65 rue de St-Brieuc - 35042 Rennes
Tel: +33 2 23 48 58 86
https://husson.github.io/
https://www.youtube.com/@HussonFrancois/videos

Holly Beavon

unread,
May 5, 2025, 2:56:35 PMMay 5
to factomin...@googlegroups.com
Thank you, Professor Husson!
My dataset is highly unusual, and challenging as a result. I've inserted an image so you can see what I am talking about. I do not know how to make it two-level data because of the subject matter. I found empirical psychological research spanning 100 years and they all used the same term to study the same general phenomenon. However, the theoretical basis of these studies changed a lot over time, as did the methods they used to measure it, from memory recall to fMRI. What you see are matching pairs of conceptual definitions which predict using the same codebook definition to measure the phenomenon, or not. While code 1 is the current standard, and matched pairs of con1 for concept code 1 and op1 for operational code 1 represent 60% of the empirical research, the other 40% need to be analyzed to quantify how they are used together and find the meaningful groups with statistics. Code 14, the miscellaneous definitions was about 30% of total, so examining it further caused me to find MCA as an approach, and your excellent software for conducting the analysis.
To make it a 2-level dataset with matching pairs or not in each cell, it could potentially be coded as 1: Concept only, 2: Operational only, and 3: Both Conceptual and Operational. However, each of these levels may actually correspond to different groups in terms of frequencies with the other codes and the psychological theoretical basis for the definitions used in each of them. Further, that complicates the chi-square analyses upon which this is based.

I did see the note you made about the data being optimized for FactoMineR by moving case ID or other non-variable columns to the last rows, and I followed that. Thanks for this additional coding advice, however!

Below, you can see the first part of my data. After removing empty columns, my dataset has around 75 columns and 162 rows. The missing columns of the con-op pairs are where I removed the blank columns with no frequencies. I was able to get eigenvalues and beautiful graphs from FactoMineR, but missMDA also seemed to look at my data and go, huh?! What is your suggestion?

Thank you so much!
Best,

Holly
Screenshot 2025-05-05 111725.png


Reply all
Reply to author
Forward
0 new messages