dplyr summarize with sd

2,597 views
Skip to first unread message

Rachael

unread,
Aug 26, 2014, 5:56:10 PM8/26/14
to manip...@googlegroups.com
Hi,

I am trying to simply make a summary table using dplyr with means and standard deviations of a few variables.  Most of my standard deviations return NA (but oddly two calculations return numbers).  I have subseted my data so that I don't have any NA in the dataframe.  My variables are all numeric.  The means turn out fine. What else do I need to check to get these standard deviations to work?!

hab=my data

a<-hab %>% group_by(year, eco, id) %>% summarize(sst=mean(SST, na.rm=TRUE),sstSD=sd(sst, na.rm=TRUE),
                                                 depth=mean(depth, na.rm=TRUE),depthSD=sd(depth, na.rm=TRUE),
                                                 slope=mean(slope),slopeSD=sd(slope, na.rm=TRUE),
                                                 ssh=mean(ssh, na.rm=TRUE),
                                                 eke=mean(eke, na.rm=TRUE),
                                                 hill=mean(hill, na.rm=TRUE),
                                                 d2ed=mean(disttocenter, na.rm=TRUE),
                                                 wind=mean(windsp, na.rm=TRUE))

I am also trying to take the means from a and do the same thing.  This time all of the standard deviations equal NA.  

b<-a %>% group_by(eco,year) %>% summarize(sst=mean(sst, na.rm=TRUE),sstSD=sd(sst, na.rm=TRUE),
                                                 depth=mean(depth, na.rm=TRUE),depthSD=sd(depth, na.rm=TRUE),
                                                 slope=mean(slope),slopeSD=sd(slope, na.rm=TRUE),
                                                 ssh=mean(ssh, na.rm=TRUE),sshSD=sd(ssh, na.rm=TRUE),
                                                 eke=mean(eke, na.rm=TRUE),ekeSD=sd(eke, na.rm=TRUE),
                                                 hill=mean(hill, na.rm=TRUE),hillSD=sd(hill, na.rm=TRUE),
                                                 d2ed=mean(d2ed, na.rm=TRUE),d2edSD=sd(d2ed, na.rm=TRUE),
                                                 wind=mean(wind, na.rm=TRUE),windSD=sd(wind, na.rm=TRUE))

Thanks!


Hadley Wickham

unread,
Aug 26, 2014, 6:05:15 PM8/26/14
to Rachael, manipulatr
Hi Rachael,

The problem is this:

sst=mean(SST, na.rm=TRUE),
sstSD=sd(sst, na.rm=TRUE)

You're overwriting the original sst variable with it's mean, then
trying to create the standard deviation of one number.

You might also want to look at summarise_each which makes this sort of
thing easier:

hab %>%
group_by(year, eco, id) %>%
summarize_each(
funs(mean = mean(., na.rm = TRUE), sd = sd(., na.rm = TRUE)),
sst, depth, slope, ssh, eke, hill, disttocenter, windsp
)

Hadley
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/d/optout.



--
http://had.co.nz/

Rachael

unread,
Aug 26, 2014, 6:09:14 PM8/26/14
to manip...@googlegroups.com, rao...@gmail.com
ah ha!  thanks for the extremely rapid clarification!!
Reply all
Reply to author
Forward
0 new messages