How to calculate mean for intervals via plyr?

56 views
Skip to first unread message

Varvara Ryzhkova

unread,
Nov 20, 2018, 4:23:50 PM11/20/18
to manipulatr
Hi everyone!


I'm beginner in R, so, guys, I need your help!

I have a data, which contain 2 cols: cholesterin level and ages. And I need to calculate mean and sd of cholesterin for several age's interval. 
i.e. first mean of cholesterin for 20 - 25 y.o., next mean for 25-30 y.o. etc.

How I can do this with plyr, reshape or  base functions, may be

Thanks a lot!

Corey N

unread,
Nov 20, 2018, 6:47:37 PM11/20/18
to manipulatr
Hi Varvara, 

If your ages column is already binned into the groups/intervals you want, then I would use dplyr or data.table syntax to create the summary you're after like this:

# with dplyr
library
(dplyr)

cholesterin_by_age <- your_table %>%
group_by
(ages) %>%
summarize
(mean = mean(cholesterin), sd = sd(cholesterin)) %>%
ungroup
()

# with data.table
library
(data.table)
setDT
(your_table)

cholesterin_by_age <- your_table[, .(mean = mean(cholesterin),  sd = sd(cholesterin)),  keyby = ages]


If your ages column isn't already binned into the groups/intervals you're interested in, then you can create those using the `cut()` function. You use the breaks argument to create your intervals. 

In the example below, I ask it to create 10 intervals. You can feed it your specific cut points instead. I always need to look at the documentation to figure out how to get the intervals correct:  https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/cut


# dplyr
cholesterin_by_age
<- your_table %>%
  mutate
(age_group = cut(ages, breaks = 10)) %>%
  group_by
(age_group) %>%
  summarize
(mean = mean(cholesterin), sd = sd(cholesterin))

# data.table
cholesterin_by_age
<- your_table[, .(mean = mean(cholesterin),  sd = sd(cholesterin)), keyby = .(age_group = cut(ages, breaks = 10)]

Varvara Ryzhkova

unread,
Nov 20, 2018, 7:48:59 PM11/20/18
to manipulatr
Corey, 

thank you for your answer.

I've done this case with cut().

But if you know, tell me pls, can we get results for determinate intervals with ddply() or any similar function (without binned into the groups/interval before)?                                         


Corey N

unread,
Nov 20, 2018, 8:20:25 PM11/20/18
to manipulatr
No, sorry, I don't know. My understanding is that you need to create the groups first and I don't know how one would do it otherwise. A linear model would give you a continuous estimate of the mean by age. Yet that does not seem to be what you're after. 

And my understanding is that dplyr is meant to be the next iteration of plyr's functions for working with tables. I haven't tried using ddplyr in years...

Best of luck. 
Reply all
Reply to author
Forward
0 new messages