hi,
I'm trying to understand if a categorical variable with about 40,000 possible values is effecting some ratio variable.
I want to see also the impact of each category.
I though of running logistic regresion:
model_1 = glm(formula = cbind(clicks, impressions-clicks) ~profile , data, family = binomial)
where profile has about 40K different string values.
but it seems like it is to heavy to handle.
eventually I would like to add more predicting variables (country for example..)
Can you think of a better way to solve the problem?
Thanks.