passing arguments to ddply in a function

1,096 views
Skip to first unread message

Eliot

unread,
May 12, 2013, 12:29:42 AM5/12/13
to manip...@googlegroups.com
Hi all,

There is reproducible code below (the segment that is not hashed out).

I think this might be a simple question that has been asked in more complicated ways before. I have read through many examples online and on this board, but I still am having trouble with the syntax. I have a large data frame that I want to break by a column "richness", then find confidence intervals for "metric" per each richness value. Traditionally, I just had it in a function like this,

#con.intervals1 <- function(null.output)
{
    confidence <- ddply(null.output, "richness", summarise, iterations=length(metric), average=mean(metric), upper=quantile(metric, 0.975, na.rm=TRUE), lower=quantile(metric, 0.025, na.rm=TRUE))
    return(confidence)
}

Where, metric was the name of a column. However, now I am looking at a data frame that has multiple different "metrics" (they are called different things, e.g. "metric1", "metric2"). I want to modify the function so I could at the bare minimum call each metric like this:

#con.intervals2 <- function(null.output, specific.metric)
{
    confidence <- ddply(null.output, "richness", summarise, iterations=length(specific.metric), average=mean(specific.metric), upper=quantile(specific.metric, 0.975, na.rm=TRUE), lower=quantile(specific.metric, 0.025, na.rm=TRUE))
    return(confidence)
}

However, that doesn't work, I believe for various reasons I have seen discussed online but don't really understand. So far, I haven't been able to figure out what combination of quotes, as.quoted(), or .() will get me the prize.

Ideally I'd like to make it more useful, so it looks more like the code below. There's probably a smarter way to do this with an apply function, but humor me. What do I need to do so that ddply() properly interprets what I'm passing to it? Begin reproducible code here.

richness <- rep(c(2:8), 4)

metric <- rnorm(28, 10, 1)

metric2 <- rnorm(28, 1, 1)

metric3 <- rnorm(28, 7.5, 1)

null.output <- data.frame(richness, metric, metric2, metric3)

metricNames <- names(null.output)[2:4]

con.intervals3 <- function(null.output, metricNames)
{
    results <- list()
    for(i in 1:length(metricNames))   
    {
        results[[i]] <- ddply(null.output, "richness", summarise, iterations=length(metricNames[i]), average=mean(metricNames[i]), upper=quantile(metricNames[i], 0.975, na.rm=TRUE), lower=quantile(metricNames[i], 0.025, na.rm=TRUE))
    }
    return(results)
}

PS if you call con.intervals1 on the null.output data frame defined here you'll see what I want, only iterated over multiple columns in a function.

Thanks for any help you can offer!

Dennis Murphy

unread,
May 12, 2013, 8:30:07 AM5/12/13
to Eliot, manipulatr
Hi:

Here are a few ways you could go about it using plyr functions.

null.output <- data.frame(richness = factor(rep(2:8, 4)),
metric = rnorm(28, 10, 1),
metric2 = rnorm(28, 1, 1),
metric3 = rnorm(7.5, 1, 1))

##### Method 1: Use numcolwise().

# Function to compute summaries from a variable v
f <- function(v) c(iterations = length(v),
average = mean(v, na.rm = TRUE),
lower = quantile(v, 0.025, na.rm = TRUE),
upper = quantile(v, 0.975, na.rm = TRUE))

# Apply f to each numeric variable in null.output, by richness
u <- ddply(null.output, .(richness), numcolwise(f))

# Names didn't get attached, so put these into each set of four rows
summnames <- c("iterations", "average", "lower", "upper")
u$summary <- summnames
head(u)

##### Method 2: Write a function that returns all of the summaries
##### for each input variable as a vector.

summfun <- function(d)
{
# Create the output variable names
vars <- names(d)[grepl("metric", names(d))]
snames <- c("iterations", "average", "lower", "upper")
vnames <- paste(rep(vars, each = length(summnames)),
rep(snames, length(vars)), sep = ".")

# Function to be applied to each variable
g <- function(x) c(length(x), mean(x, na.rm = TRUE),
quantile(x, c(0.025, 0.975), na.rm = TRUE))

# Convert the numeric part of the data frame to matrix
# and then apply the function g to each column
L <- apply(data.matrix(d[, vars]), 2, g)

# String the matrix in column order to a vector, attach
# the names to it, then return
v <- as.vector(L)
names(v) <- vnames
v
}

w <- ddply(null.output, .(richness), summfun)


# Another way to display the output:

library(reshape2)
dm <- melt(w, id = "richness")
dcast(dm, variable ~ richness, value.var = "value")

Hopefully one of these appeals to you...

Dennis
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Hadley Wickham

unread,
May 12, 2013, 9:21:22 AM5/12/13
to Eliot, manipulatr
Have a look at ?here
Hadley

On Sat, May 11, 2013 at 9:29 PM, Eliot <eliot...@gmail.com> wrote:
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
Chief Scientist, RStudio
http://had.co.nz/

Eliot Miller

unread,
May 13, 2013, 9:09:59 AM5/13/13
to manipulatr
Hello Dennis and Hadley,

Thank you very much for your help! I appreciate the input. I am sorting through the different approaches to find a way that works best for me. Dennis, your examples are very very helpful. There are some neat tricks in there that I'm still piecing together and understanding. They definitely all work.

I appreciate it!

Best,
Eliot
Reply all
Reply to author
Forward
0 new messages