[R] How to effectively remove Outliers from a binary logistic regression in R

9 views
Skip to first unread message

Marcus Tullius

unread,
Sep 5, 2012, 3:40:56 AM9/5/12
to r-h...@r-project.org
Hallo there,

greetings from Germany.

I have a simple question for you.

I have run a binary logistic model, but there are lots of outliers distorting the real results.

I have tried to get rid of the outliers using the following commands:

remove = -c(56, 303, 365, 391, 512, 746, 859, 940, 1037, 1042, 1138, 1355)
MIGRATION.rebuild <- glm(MIGRATION, subset=remove)
influence(MIGRATION.rebuild)
influence.measures(MIGRATION.rebuild)

BUT it did not work.


My question is:

*Do you know a simple R-command which erases outliers and rebuilds the model without them?*

I am including my model below so that you may have an idea of how I am trying to do it.

Thanks in advance for your help.

Francisco M. da Rocha

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Lemon

unread,
Sep 5, 2012, 6:15:09 AM9/5/12
to Marcus Tullius, r-h...@r-project.org
On 09/05/2012 05:40 PM, Marcus Tullius wrote:
> Hallo there,
>
> greetings from Germany.
>
> I have a simple question for you.
>
> I have run a binary logistic model, but there are lots of outliers distorting the real results.
>
> I have tried to get rid of the outliers using the following commands:
>
> remove = -c(56, 303, 365, 391, 512, 746, 859, 940, 1037, 1042, 1138, 1355)
> MIGRATION.rebuild<- glm(MIGRATION, subset=remove)
> influence(MIGRATION.rebuild)
> influence.measures(MIGRATION.rebuild)
>
> BUT it did not work.
>
>
> My question is:
>
> *Do you know a simple R-command which erases outliers and rebuilds the model without them?*
>
> I am including my model below so that you may have an idea of how I am trying to do it.
>
Hi Francisco,
Your model didn't make it to the help list, but I think that the problem
is in your attempt to use the "subset" argument in glm. The vector is
supposed to include the indices of the values that you _want_ in the
analysis, and it looks like you are trying to remove the values that you
_don't_ want. Say you have 2000 rows in your data frame in the model.
The "subset" argument should look something like this:

glm(MIGRATION,
subset=!(1:2000 %in% c(56,303,365,391,512,746,859,940,1037,1042,1138,
1355))

Jim
Reply all
Reply to author
Forward
0 new messages