grouping factors of categorical variable

38 views
Skip to first unread message

Adi Milrad

unread,
Apr 3, 2017, 4:52:32 AM4/3/17
to Israel R User Group
Hi,
I have a categorical variable - country, and I want to use it as predicting variable in logistic regression in order to predict Click Through Rate.
I have more than 200 countries , most of them have less than percent from total impressions, one of them is 86% from impressions.
Can you recommend a way in R to narrow number of countries to about 10 categories ? according to their Click Through Rate (value of predicted variable) or according to number of impressions (the country share).

Thanks ! 

Jonathan Rosenblatt

unread,
Apr 3, 2017, 5:35:10 AM4/3/17
to israel-r-user-group
Are you grouping for statistical reasons (power/SNR), or computational reasons?


--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-group+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
--
Jonathan Rosenblatt
Dept. of Industrial Engineering and Management
Ben Gurion University of the Negev

Adi Milrad

unread,
Apr 3, 2017, 5:37:19 AM4/3/17
to Israel R User Group
statistical reasons

Jonathan Rosenblatt

unread,
Apr 3, 2017, 5:56:49 AM4/3/17
to israel-r-user-group
Several ways I can see:
- Use a mixed model (lmer). This will allow to "borrow power" (in the words of Tukey) between countries without forcing an equal coefficient.

- You can manually recode countries, but I guess that is what you want to avoid. 

- cutting alog the regression coefficients is pretty much the same as cutting along the within-country click-rate. You can do that, but be careful if you want to do inference with these data-driven/circular country factors. 



On Mon, Apr 3, 2017 at 12:37 PM, Adi Milrad <adim...@gmail.com> wrote:
statistical reasons

On Monday, April 3, 2017 at 12:35:10 PM UTC+3, Jonathan Rosenblatt wrote:
Are you grouping for statistical reasons (power/SNR), or computational reasons?

On Mon, Apr 3, 2017 at 11:52 AM, Adi Milrad <adim...@gmail.com> wrote:
Hi,
I have a categorical variable - country, and I want to use it as predicting variable in logistic regression in order to predict Click Through Rate.
I have more than 200 countries , most of them have less than percent from total impressions, one of them is 86% from impressions.
Can you recommend a way in R to narrow number of countries to about 10 categories ? according to their Click Through Rate (value of predicted variable) or according to number of impressions (the country share).

Thanks ! 

--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-group+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
--
Jonathan Rosenblatt
Dept. of Industrial Engineering and Management
Ben Gurion University of the Negev

--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-group+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages