Hi,
I have an isse with the current FAMD implementation. I used the FAMD on a dataset where one feature corresponds to a categorical feature with a lot of unique values (~8000 zip codes).
The problem is, that when I apply FAMD on the dataset, this single feature dominates all other features, which basically means with around 20 ncp you get 0.01% cumulated explained variances.
I think this is due to the fact that internally every unique zip code is one hot encoded and somehow the probability squareroot scaling does not work anymore.
A solution would be to add an option to use binary encoding instead of one hot encoding. However I lack the skill to change this by myself. What do you think about this and the "solution" or am I completely wrong about this?
Best regards