can I use different aggregate functions within on dcast formula?

1,328 views
Skip to first unread message

Wiltrud Fassbinder

unread,
Jan 8, 2012, 9:40:03 PM1/8/12
to manipulatr
Hi,
with help from this group I figured out how to make a basic dcast
function work (see below). For the dataset I am working on right now I
need two different aggregate functions (mean and median) Is it
possible to include that in a single formula?
Thanks!!
Wiltrud


ap_test <- read.table("./AP_test3.txt",head = TRUE, sep = "\t")
ap1 <- melt(ap_test, id=c("subject","Group","Complexity","Length"),
measured=c("C1CorNR", "S1CorNR"), na.rm=TRUE)
ap2 <- dcast(ap1, subject + Group ~ Length + Complexity + variable,
fun.aggregate = mean)
print (ap2)

David Winsemius

unread,
Jan 9, 2012, 12:27:22 AM1/9/12
to Wiltrud Fassbinder, manipulatr

I would have guessed that

fun.aggregate=function(x) { c(mean(x), median(x))}

... might work. Untested in the absence of a working example.

Unfortunately the help page suggests that may not work: "This function
should take a vector of numbers and return a single summary statistic."

Maybe you could just use aggregate in base R:

ag <- aggregate(len ~ dose, data = ToothGrowth, FUN=function(x){c(mn
=mean(x), mdn= median(x))})
> ag
dose len.mn len.mdn
1 0.5 10.605 9.850
2 1.0 19.735 19.250
3 2.0 26.100 25.950

> print (ap2)
>
> --
> You received this message because you are subscribed to the Google
> Groups "manipulatr" group.
> To post to this group, send email to manip...@googlegroups.com.
> To unsubscribe from this group, send email to manipulatr+...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/manipulatr?hl=en
> .
>

David Winsemius, MD
West Hartford, CT

Hadley Wickham

unread,
Jan 9, 2012, 8:58:53 AM1/9/12
to Wiltrud Fassbinder, manipulatr
On Sun, Jan 8, 2012 at 8:40 PM, Wiltrud Fassbinder
<wf_g...@wiltrud.com> wrote:
> Hi,
> with help from this group I figured out how to make a basic dcast
> function work (see below). For the dataset I am working on right now I
> need two different aggregate functions (mean and median) Is it
> possible to include that in a single formula?

No - you need to use ddply from the plyr package instead:

> ap2 <- dcast(ap1, subject + Group ~ Length + Complexity + variable,
> fun.aggregate = mean)

ddply(ap1, c("subject", "Group", "Length", "Complexity"), summarise,
mean = mean(variable),
median = median(variable)

It's possible this might be easier to do in the tables package
(http://cran.r-project.org/web/packages/tables/index.html), since it
looks like you're interested in creating a table for display, rather
than a dataset for further analysis.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Pindar Os

unread,
Jan 1, 2013, 6:48:43 AM1/1/13
to manip...@googlegroups.com, Wiltrud Fassbinder, had...@rice.edu
Hi,

is there a build-in way to perform operations such as mean, median, etc. in one statement on multiple variables?
In the above example 'variable' is a concrete column name. What if there is a second one like 'variable2'.
I don't want to write '  mean_v2 = mean(variable2),   median_v2 = median(variable2)'.
Shall one use a R loop construction instead? What mechanisims exist for name creation?

I think that this is a rather basic question, but since I'm new to R and RStudio I'd be happy for some advice.

Pindar

Dennis Murphy

unread,
Jan 1, 2013, 6:12:16 PM1/1/13
to Pindar Os, manip...@googlegroups.com
Hi:

You should have started a new thread rather than added on to an
existing one, but the short answer to your question is to look into
the colwise() function.

Dennis
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/manipulatr/-/9aWUbf9wsRIJ.

Pindar Os

unread,
Jan 4, 2013, 9:13:47 AM1/4/13
to manip...@googlegroups.com, Pindar Os
Hi,

thanks for the hint!
However, up until now I'm not able to generate exactly what I want.
Here is an example:

library("plyr")

x <- rnorm(10)
y <- rnorm(10)*200
group1 <- sample(c("a","b"), 10, replace=TRUE)
group2 <- sample(c("c","d"), 10, replace=TRUE)

df_test <- data.frame(group1,group2,x,y)

#works, but without knowing from the output what was computed.
#should be 'mean_x' and  'mean_y' as default
ddply(df_test, ~ group1 + group2, colwise(mean,is.numeric))
ddply(df_test, ~ group1 + group2, colwise(sd,is.numeric))

#a combination of both is not working as expected, since I did not find a working ddply command
#that generates e.g 2 columns like 'mean' and 'sd' and all the combinations
#in the rows below. 

test01 <- daply(df_test, ~ group1 + group2,summarise,
                each(colwise(mean,is.numeric),colwise(sd,is.numeric)))

test02 <- daply(df_test, ~ group1 + group2,
                each(colwise(mean,is.numeric),colwise(sd,is.numeric))) 

Is there a recommended way to combine both functions on an arbitrary number of variables and 
combinations?

Cheers
Pindar

Dennis Murphy

unread,
Jan 4, 2013, 3:52:06 PM1/4/13
to Pindar Os, manip...@googlegroups.com
The doBy package is a little more convenient for this purpose:

library(doBy)
<loads snipped>

f <- function(x) c(mean = mean(x), sd = sd(x))
summaryBy(x + y ~ group1 + group2, data = df_test, FUN = f)

group1 group2 x.mean x.sd y.mean y.sd
1 a c -0.4182937 0.4682402 114.91945 46.42162
2 a d -0.4551962 0.6727132 -10.72071 141.73288
3 b d 0.6926873 0.4737253 103.17876 235.32852

Dennis
Reply all
Reply to author
Forward
0 new messages