How works the argument "excl" in MFA() ?

69 views
Skip to first unread message

Aliette ROUX - PROGEDO-Loire

unread,
Jul 20, 2022, 6:01:33 AM7/20/22
to FactoMineR users

Dear all,

I don't understand how works the argument "excl" of the MFA function.

The manual says : "excl : an argument that may possible to exclude categories of active variables of categorical variable groups. NULL by default, it is a list with indexes of categories that are excluded per group".

I tried several possibilities, but all of them have failed... I show some of them bellow.

Many thanks,
(and a very big thank for this so useful package, and the help list)

Aliette

********************************

library(dplyr)
library(FactoMineR)

#- "d" = dataframe exemple. 8 variables réparties en 3 groupes. Categories to exclude are "NR".

d <- data.frame(G1A=sample(c("O","N"),100,replace=T),
                G1B=sample(c("O","N","NR"),100,prob=c(.6,.37,.03),replace=T),
                G1C=sample(c("O","N","NR"),100,prob=c(.6,.35,.05),replace=T),
                G1D=sample(c("O","N"),100,replace=T),
                G2A=sample(c("O","N"),100,replace=T),
                G2B=sample(c("O","N"),100,replace=T),
                G3A=sample(c("O","N"),100,replace=T),
                G3B=sample(c("O","N","NR"),100,prob=c(.6,.37,.03),replace=T)) %>%
  mutate_all(.,as.factor)

#- "tab.disj" = disjunctive data table
tab.disj <- tab.disjonctif.prop(d,seed=NULL,row.w=NULL)

#- 2 useful vectors for follow-up :
#---- "excl.names" = names of categories to exclude
#---- "excl.index.disj" = index of categories to exclude in the disjunctive data table
excl.names <- colnames(tab.disj)[grepl("NR",colnames(tab.disj))]
excl.index.disj <- which(colnames(tab.disj) %in% excl.names)

#- I first tried to use "excl.names" or "excl.index.disj" (such in MCA)

res.afm <- MFA(d,group=c(4,2,2),type=rep("n",3),excl=excl.names)
#- Error in -excl : invalid argument to unary operator
res.afm <- MFA(d,group=c(4,2,2),type=rep("n",3),excl=excl.index.disj)
#- Error in eigen(crossprod(X, X), symmetric = TRUE) : valeurs infinies ou manquantes dans 'x'

#- So I tried four other solutions... and they've all failed...

#========================================================
#- ATTEMPT 1 - a list of length 3 (because 3 groups) ; each element of the list contains the indexes of categories to exclude within the variables of the group. These indexes come from the disjunctive data table "tab.disj".
#========================================================


excl.list.index.disj.cont <- vector(mode = "list", length = 3)
excl.list.index.disj.cont[[1]] <- which(colnames(tab.disj) %in% excl.names[grepl("G1",excl.names)])
excl.list.index.disj.cont[[3]] <- which(colnames(tab.disj) %in% excl.names[grepl("G3",excl.names)])

res.afm <- MFA(d,group=c(4,2,2),type=rep("n",3),
               excl=excl.list.index.disj.cont)

#- Error in eigen(crossprod(X, X), symmetric = TRUE) : valeurs infinies ou manquantes dans 'x'

#========================================================
#- ATTEMPT 2 - a list of length 3 (because 3 groups) ; each element of the list contains the indexes of categories to exclude within the variables of the group. Unlike ATTEMPT 1, these indexes come from differents disjunctives data-tables  (1 disjunctive data table per group).
#========================================================

tab.disj1 <- tab.disjonctif.prop(d[,1:4],seed=NULL,row.w=NULL)
tab.disj3 <- tab.disjonctif.prop(d[,7:8],seed=NULL,row.w=NULL)

excl.list.index.disj.pergroup <- vector(mode = "list", length = 3)
excl.list.index.disj.pergroup[[1]] <- which(colnames(tab.disj1) %in% excl.names[grepl("G1",excl.names)])
excl.list.index.disj.pergroup[[3]] <- which(colnames(tab.disj3) %in% excl.names[grepl("G3",excl.names)])

res.afm <- MFA(d,group=c(4,2,2),type=rep("n",3),
               excl=excl.list.index.disj.pergroup)

#- Error in f() : Insufficient values in manual scale. 4 needed but only 3 provided.

#========================================================
#- ATTEMPT 3 - a list of length 3 (because 3 groups) ; each element of the list contains the names of categories to exclude within the variables of the group.
#========================================================


excl.list.names <- vector(mode = "list", length = 3)
excl.list.names[[1]] <- excl.names[grepl("G1",excl.names)]
excl.list.names[[3]] <- excl.names[grepl("G3",excl.names)]

res.afm <- MFA(d,group=c(4,2,2),type=rep("n",3),
               excl=excl.list.names)

#- Error in -excl : invalid argument to unary operator

#========================================================
#- ATTEMPT 4 - a list of length 3 (because 3 groups) ; each element of the list contains the indexes of rows that contain a category to exclude within the variables of the group. It's a priori useless because we try to exclude categories and not rows, but I've tried everything... !
#========================================================

excl.list.index.rows <- vector(mode = "list", length = 3)
excl.list.index.rows[[1]] <- which(d[,c("G1A","G1B","G1C")]=="NR")
excl.list.index.rows[[3]] <- which(d[,c("G3A","G3B")]=="NR")

res.afm <- MFA(d,group=c(4,2,2),type=rep("n",3),
               excl=excl.list.index.rows)

#- Error in eigen(crossprod(X, X), symmetric = TRUE) : valeurs infinies ou manquantes dans 'x'

Aliette ROUX - PROGEDO-Loire

unread,
Jul 29, 2022, 9:28:44 AM7/29/22
to FactoMineR users
I haven’t found a solution, but I realise I made a mistake in the description of the fourth attempt : in this attempt, each element of the list contains the indexes of cells (and not rows). I've tried both ;  I’ve also tried with "discontinuous" and "continuous" indexes (see below). The error message is always : « Error in eigen(crossprod(X, X), symmetric = TRUE) : valeurs infinies ou manquantes dans 'x' ».

Many thanks, and I wish you a good summer,

#========================================================
#- ATTEMPT 5 - a list of length 3 (because 3 groups) ; each element of the list contains the indexes of cells that contain a category to exclude within the variables of the group. Unlike ATTEMPT 4, the numbering of these indexes does not restart from 0 for each group, but is continuous through the groups.
#========================================================

'%ni%' <- Negate('%in%')
excl.list.index.cells.follow <- vector(mode = "list", length = 3)
excl.list.index.cells.follow[[1]] <- which(d[,c("G1A","G1B","G1C")]=="NR")
excl.list.index.cells.follow[[3]] <- index.cells[which(d=="NR") %ni% excl.list.index.cells.follow[[1]]]

res.afm <- MFA(d,group=c(4,2,2),type=rep("n",3),
               excl=excl.list.index.cells.follow)


#- Error in eigen(crossprod(X, X), symmetric = TRUE) : valeurs infinies ou manquantes dans 'x'

Francois Husson

unread,
Aug 12, 2022, 9:06:30 AM8/12/22
to factomin...@googlegroups.com
If you want to use the argument excl, you have to consider the index of the category you want to suppress (in the matrix with the dummy varables and the continuous variables).
For instance,

data(wine)
res <- MFA(wine, group=c(2,5,3,10,9,2), type=c("n",rep("s",5)),excl=c(2,5),gr=FALSE)

and then you have:
res$quali.var$coord
               Dim.1       Dim.2      Dim.3       Dim.4       Dim.5
Saumur     0.4894930  0.68019791 -0.3069132  0.54712499 -0.09380437
Chinon    -1.1116957 -0.67358776  1.4066090 -0.45894610  1.01481117
Reference  1.9650440 -0.51886451  0.2344775 -0.54426331 -0.36305315
Env1      -1.0397719 -0.74983394 -1.0981201 -0.02714524  0.35293995
Env2      -1.0234905  0.03468086  1.3340039  1.02113104  0.02057986
Env4      -0.6797263  4.35374245 -0.3122604 -0.55289766 -0.01605343

and the the 2nd categoy and the 5th are suppressed compared to the results obtained withhout excl:
                Dim.1      Dim.2       Dim.3      Dim.4       Dim.5
Saumur      0.4976454  0.7412042 -0.08167579 -0.6196604 -0.26684694
Bourgueuil -0.1942220 -1.0460967 -0.82601646  1.0254442 -0.22929092
Chinon     -1.0771918 -0.4691665  1.46363313  0.1658998  1.07776548
Reference   1.9259884 -0.5729796  0.04855158  0.5553865  0.05584063
Env1       -1.0310134 -0.7797017 -0.91377924 -0.4661021  0.27886639
Env2       -0.9902240  0.2027834  1.47904754 -0.3372500 -0.55283504
Env4       -0.6568526  4.2274260 -0.66932207  0.5306294  0.21061302

FH
--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "FactoMineR users".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse factominer-use...@googlegroups.com.
Cette discussion peut être lue sur le Web à l'adresse https://groups.google.com/d/msgid/factominer-users/583efe2d-8b8a-4b58-8ac1-b3f6f16de3f5n%40googlegroups.com.

--
Logo signature e-mail
François Husson
Department Statistics & Computer science
UMR 6625 IRMAR CNRS
65 rue de Saint-Brieuc, CS 84215, 35042 Rennes Cedex
Tel: +33 2 23 48 58 86
https://husson.github.io
En 2022, Agrocampus Ouest devient l'Institut Agro Rennes-Angers.

Aliette ROUX - PROGEDO-Loire

unread,
Aug 26, 2022, 5:24:05 AM8/26/22
to FactoMineR users
Thank you very much for your answer and this helpful example. I have two questions…

Question 1 – How to exclude several categories (not only one)

In your example (using the data "wine"), I assume that :
  • 2nd category = "Bourgueuil"
  • 5th category = "Env1"
However, only the 2nd category ("Bourgueil") is suppressed when we use "excl=c(2,5)". The 5th category ("Env1") is not suppressed. It seems that only the first index is taken into account. Indeed, I have to write "excl=5" to suppress the 5th category ("Env1") : but how can I suppress both 2nd and 5th categories ("Bourgueil" and "Env1") ? Is it possible ?

Question 2 – How to exclude categories when we’ve got two groups (or more) of categorical variables

In your example (using the data "wine"), there is only one group of categorical variables (that contains the variables "Label" and "Soil"). If I add two others variables (named "AB" et "CD") regrouped in a second group of categorical variables, I’ve got this error message : "Error in excl[[g]] : subscript out of bounds" (see below). Is it possible to apply a specific MFA when we've got more than one group of categorical variables ?

library(dplyr)
data(wine)
wine <- wine %>%
  mutate(AB=c(rep("A",10),rep("B",11)),
         CD=c(rep("C",9),rep("D",2),rep("C",10)))

res <- MFA(wine, group=c(2,5,3,10,9,2,2), type=c("n",rep("s",5),"n"),excl=c(2,5),gr=FALSE)

#- Error in excl[[g]] : subscript out of bounds

Once again, thank you very much,
Aliette
Reply all
Reply to author
Forward
0 new messages