[R] detect if data is normal or skewed (without a boxplot)

1 view
Skip to first unread message

Felipe Carrillo

unread,
Aug 10, 2008, 4:55:26 PM8/10/08
to r-h...@stat.math.ethz.ch
Hello all:
Is there a way to detect in R if a dataset is normally distributed or skewed without graphically seeing it? The reason I want to be able to do this is because I have developed and application with Visual Basic where Word,Access and Excel "talk" to each other and I want to integrate R to this application to estimate confidence intervals on fish sizes (mm). I basically want to automate the process from Excel by detecting if my data has a normal distribution then use t.test, but if my data is skewed then use wilcox.test. Something like the pseudo code below:

fishlength <- c(35,32,37,39,42,45,37,36,35,34,40,42,41,50)
if fishlength= "normally distributed" then
t.test(fishlength)
else
wilcox.text(fishlength)

I hope this isn't very confussing

Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Ben Bolker

unread,
Aug 11, 2008, 12:42:09 PM8/11/08
to r-h...@stat.math.ethz.ch
Felipe Carrillo <mazatlanmexico <at> yahoo.com> writes:

>
> Hello all:
> Is there a way to detect in R if a dataset is normally distributed or skewed
without graphically seeing it?
> The reason I want to be able to do this is because I have developed and
application with Visual Basic where
> Word,Access and Excel "talk" to each other and I want to integrate R to this
application to estimate
> confidence intervals on fish sizes (mm). I basically want to automate the
process from Excel by detecting
> if my data has a normal distribution then use t.test, but if my data is skewed
then use wilcox.test.
> Something like the pseudo code below:
>
> fishlength <- c(35,32,37,39,42,45,37,36,35,34,40,42,41,50)
> if fishlength= "normally distributed" then
> t.test(fishlength)
> else
> wilcox.text(fishlength)
>
> I hope this isn't very confussing
>
> Felipe D. Carrillo
> Supervisory Fishery Biologist
> Department of the Interior
> US Fish & Wildlife Service
> California, USA


There's a whole package (nortest) devoted to tests of normality,
BUT: I would suggest that your procedure is not a good idea.
It's often hard to detect non-normality, and "fail to reject"
shouldn't mean "accept". If you're concerned about non-normality,
you should probably just use the Wilcoxon test all the time
(it has about 95% of the power of the t-test if the data are
normal: http://en.wikipedia.org/wiki/Mann-Whitney_U ), or
use robust statistics (e.g. rlm in the MASS package).

Ben Bolker

Felipe Carrillo

unread,
Aug 11, 2008, 1:55:38 PM8/11/08
to Ben Bolker, r-h...@stat.math.ethz.ch
Thanks Jim and Ben for your replies, Reading further about data normalization found shapiro.test. I understand that if the p-value is smaller than 0.05 then the data isn't normal, I just don't understand what the "W" means.


Hi Felipe,
Here's one way:

library(nortest)
if(sf.test(fishlength)$p.value>0.05) t.test(fishlength)
else wilcox.test(fishlength)

Jim

Reply all
Reply to author
Forward
0 new messages