bootstrapped confidence intervals using dplyr

323 views
Skip to first unread message

Felipe

unread,
Jul 24, 2016, 12:10:19 PM7/24/16
to ggplot2
Considering the code below, could someone give a hint on how
to estimate confidence intervals through bootstrapping.

library(dplyr)
mtcars %>% group_by(cyl) %>%
summarise(MEAN = mean(mpg, na.rm = TRUE),
 SD= sd(mpg, na.rm = TRUE),  N = n()) %>%
mutate(SE = SD / sqrt(N),
lower = MEAN - qt(1 - (0.05 / 2), N - 1) * SE,
 upper = MEAN + qt(1 - (0.05 / 2), N - 1) * SE)

Ben Bolker

unread,
Jul 24, 2016, 12:31:27 PM7/24/16
to ggp...@googlegroups.com

This isn't really a ggplot question ... more appropriate for some more
generic R forum (r-help, StackOverflow ...) As far as a hint goes, you
could see the rms::smean.cl.boot function ... which *is* actually
provided in the ggplot2 package as mean_cl_boot (but the documentation
is in the rms package).

smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)
> --
> --
> You received this message because you are subscribed to the ggplot2
> mailing list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google
> Groups "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to ggplot2+u...@googlegroups.com
> <mailto:ggplot2+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Felipe

unread,
Jul 24, 2016, 1:30:35 PM7/24/16
to ggplot2
Thanks Ben, I will mention that. I will attempt to include smean.cl.boot in my chain-mutate call

Dennis Murphy

unread,
Jul 24, 2016, 8:11:06 PM7/24/16
to Felipe, ggplot2
I agree with Ben that you have not asked a question appropriate for
the ggplot2 list (the manipulatr list is the analogue to ggplot2-help
for (d)plyr/tidyr/reshape2 questions), but it may be of interest to
this group to know that the broom package has some really nice
features, many of which dovetail with ggplot2, and this is an
opportunity to show off one of them.

The bootstrap() function in broom can perform bootstrapping on
statistical models whose results are returned as a tidy data frame, so
it is appropriate for the question posed here. I used a cell means
model to get bootstrap CIs of the mean mpg by number of cylinders as
follows:

library(dplyr)
library(broom)

mtcars %>% bootstrap(999) %>%
do(tidy(lm(mpg ~ factor(cyl) - 1, data = .))) %>%
group_by(term) %>%
summarise(lcl95 = quantile(estimate, 0.025),
ucl95 = quantile(estimate, 0.975))

The bootstrap() and tidy() functions come from broom. It is probably
worth saving the results from the first two lines of code and inspect
the returned object, as it helps in understanding the last few lines.

HTH,
Dennis
> --
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ggplot2+u...@googlegroups.com.

Felipe

unread,
Jul 25, 2016, 2:14:35 AM7/25/16
to ggplot2
Thanks, really cool. Is the 'estimate' the mean?
Reply all
Reply to author
Forward
0 new messages