Thanks a lot for helping!
I could create my table for CaGalt appropriately, I think, with Factor columns for variables and numeric columns for word frequencies.
When I run the CaGalt() function, I have no more error in the console, but the process gets stuck for quite a long time and I get no output, even after several minutes (I didn’t wait more than 10 minutes…). I have to interrupt the process on each intent. My machine (MacBook Pro M2) usually computes things quite fast, even for heavy computations like clustering. I have the impression that the process is stuck, but have no clue of what is happening.
My table is composed of 3990 observations × 18 variables and 1073 words.
Here is the output of the beginning of str(myTable):
data.frame':
3990 obs. of 1091 variables:
$ Speaker : Factor w/ 11 levels "teacher1","teacher10",..: 1 1 1 1 1 1 1 1 1 5 ...
$ To : Factor w/ 3 levels "address to class",..: 1 1 1 1 1 1 1 1 1 1 ...
$ lesson_id : Factor w/ 10 levels "cl01_pr1","cl01_pr2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ lesson_topic : Factor w/ 2 levels "lesson prog1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ programming_type : Factor w/ 2 levels "textual programing",..: 2 2 2 2 2 2 2 2 2 2 ...
$ class_id : Factor w/ 5 levels "class01","class02",..: 1 1 1 1 1 1 1 1 1 1 ...
$ gender : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
$ age_range : Factor w/ 3 levels "age < 23 yo",..: 1 1 1 1 1 1 1 1 1 3 ...
$ professional_role : Factor w/ 2 levels "lower sec teacher",..: 2 2 2 2 2 2 2 2 2 2 ...
$ discipline : Factor w/ 3 levels "discipl maths",..: 3 3 3 3 3 3 3 3 3 3 ...
$ teaching_experience : Factor w/ 4 levels "teaching exp < 3y",..: 1 1 1 1 1 1 1 1 1 1 ...
$ cs_teaching_experience : Factor w/ 3 levels "cs teaching exp 0y",..: 1 1 1 1 1 1 1 1 1 1 ...
$ teaching_qualification : Factor w/ 2 levels "graduated teacher",..: 2 2 2 2 2 2 2 2 2 2 ...
$ degree : Factor w/ 4 levels "Lower sec teacher's Master deg",..: 2 2 2 2 2 2 2 2 2 2 ...
$ cs_education : Factor w/ 3 levels "cs ed inside teacher ed",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Integrated_TPCK_Mastery : Factor w/ 4 levels "Integ TPCK Mastery Fair",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Foundational_Knowledge_Base: Factor w/ 4 levels "Found Knowledge Base Fair",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Cluster : Factor w/ 40 levels "cluster_1","cluster_10",..: 8 34 34 34 19 14 26 15 34 34 ...
$ être : num 0 1 4 2 0 1 2 3 5 0 ...
$ avoir : num 0 1 0 0 0 0 0 1 0 1 ...
$ faire : num 0 1 0 0 0 0 1 1 2 1 ...
$ pouvoir : num 0 0 0 0 0 0 0 0 0 0 ...
$ aller : num 1 2 0 0 0 2 0 2 6 0 ...
$ là : num 0 1 0 0 0 0 0 0 0 0 ...
$ alors : num 1 1 0 0 0 0 0 0 0 0 ...
$ ouais : num 0 0 0 0 1 2 1 0 1 0 ...
$ donc : num 0 0 0 0 0 0 0 2 1 0 ...
…
The function I run is this one:
res.cagalt <- CaGalt(Y=table_for_ca_galt[,19:1091],X=table_for_ca_galt[,1:18],type="n")
If I try to run it with less categorical variables at the same time, like:
res.cagalt <- CaGalt(Y=table_for_ca_galt[,19:1091],X=table_for_ca_galt[,1:2],type="n")
I have the same problem… 7 minutes and waiting.
And if I try to limit the numeric columns (with word frequencies), like this:
res.cagalt_temp<-CaGalt(Y=table_for_ca_galt[,19:250],X=table_for_ca_galt[,1:2],type="n")
I get the following error:
Error in eigen(crossprod(X, X), symmetric = TRUE) :
infinite or missing values in 'x'
Do you have an idea of what I’m doing wrong?
Should I just be more patient and wait longer?
Thanks a lot for helping again,
Gabriel