logistic regression

已查看 22 次
跳至第一个未读帖子

Adi Milrad

未读,
2018年1月17日 08:11:362018/1/17
收件人 israel-r-...@googlegroups.com
hi,
I'm trying to understand if a categorical variable with about 40,000 possible values is effecting some ratio variable.
I want to see also the impact of each category.

I though of running logistic regresion: 

model_1 = glm(formula = cbind(clicks, impressions-clicks) ~profile , data, family = binomial)

where profile has about 40K different string values. 
but it seems like it is to heavy to handle.

eventually I would like to add more predicting variables (country for example..) 
Can you think of a better way to solve the problem? 

Thanks.


עמרי מנדלס

未读,
2018年1月22日 02:19:452018/1/22
收件人 Israel R User Group
Do you wish to understand whether there is a significant effect, or try to predict the ratio variable?
How many samples do you have?

Essentially when you convert the categorical variable into binary (one hot encoding) you get 40,000 different features.
Now you can start doing feature selection or other dimensionality reduction techniques. You could also incorporate new features (country as you mentioned) at this point. This will better position you towards predicting the ratio variable.

If you wish to understand the effect, you can cluster categories (profiles) together and look for an effect at a higher granularity.

Hope this helps,
Omri
回复全部
回复作者
转发
0 个新帖子