[R] Winsorizing Multiple Variables

79 views
Skip to first unread message

Karl Healey

unread,
Jan 16, 2009, 3:50:57 PM1/16/09
to r-h...@r-project.org
Hi All,

I want to take a matrix (or data frame) and winsorize each variable.
So I can, for example, correlate the winsorized variables.

The code below will winsorize a single vector, but when applied to
several vectors, each ends up sorted independently in ascending order
so that a given observation is no longer on the same row for each
vector.

So I need to winsorize the variable but then return it to its original
order. Or another solution that will take a data frame, wisorize each
variable, and return a new data frame with all the variables in the
original order.

Thanks for any help!

-Karl


#The function I'm working from

win<-function(x,tr=.2,na.rm=F){

if(na.rm)x<-x[!is.na(x)]
y<-sort(x)
n<-length(x)
ibot<-floor(tr*n)+1
itop<-length(x)-ibot+1
xbot<-y[ibot]
xtop<-y[itop]
y<-ifelse(y<=xbot,xbot,y)
y<-ifelse(y>=xtop,xtop,y)
win<-y
win
}

#Produces an example data frame, ss is the observation id, vars 1-5
are the variables I want to winzorise.

ss
=
c
(1
:
5
);var1
=
rnorm
(5
);var2
=
rnorm
(5
);var3
=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))-
>data
data

#Winsorizes each variable, but sorts them independently so the
observations no longer line up.

sapply(data,win)


___________________________
M. Karl Healey
Ph.D. Student

Department of Psychology
University of Toronto
Sidney Smith Hall
100 St. George Street
Toronto, ON
M5S 3G3

ka...@psych.utoronto.ca

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius

unread,
Jan 16, 2009, 4:14:16 PM1/16/09
to Karl Healey, r-h...@r-project.org
Might work better to determine top and bottom for each column with
quantile() using an appropriate quantile option, and then process
each variable "in place" with your ifelse logic.

I did find a somewhat different definition of winsorization with no
sorting in this code copied from a Patrick Burns posting from earlier
this year on R-SIG-Finance;

function(x, winsorize=5) {
s <- mad(x) * winsorize
top <- median(x) + s
bot <- median(x) - s
x[x > top] <- top
x[x < bot] <- bot x }

--
David Winsemius

Michael Conklin

unread,
Jan 16, 2009, 4:24:36 PM1/16/09
to Karl Healey, r-h...@r-project.org
Don't sort y. Calculate xbot and xtop using
xtemp<-quantile(y,c(tr,1-tr),na.rm=na.rm)
xbot<-xtemp[1]
xtop<-xtemp[2]

William Revelle

unread,
Jan 16, 2009, 7:41:07 PM1/16/09
to Michael Conklin, Karl Healey, r-h...@r-project.org
Thanks to Michael for giving a nice solution to Karl's question .

This identified a bug in the psych package winsor function which has
now been fixed in version 1.0.63. (The current development version).
Although my winsor.means function in 1.0..62 (and ealier) worked
correctly, my winsor function when applied to matrices or data.frames
gave an incorrect result.

Bill


--
William Revelle http://personality-project.org/revelle.html
Professor http://personality-project.org/personality.html
Department of Psychology http://www.wcas.northwestern.edu/psych/
Northwestern University http://www.northwestern.edu/
Attend ISSID/ARP:2009 http://issid.org/issid.2009/

Reply all
Reply to author
Forward
0 new messages