using ddply within function

250 views
Skip to first unread message

SimonDB

unread,
Feb 3, 2011, 6:54:07 AM2/3/11
to manipulatr
Dear all,

I want to use the great tool that is ddply within a function,
something like this:

df <- data.frame(dhp=c(11.5, 12.6, 18.9, 45.9, 43.2, 65.3),
observed=c(0,1,0,0,1,1), fitted=c(0.10,0.20,0.30,0.40,0.50,0.60))

myfunction <- function (data, response, pred, covar, scale) #
Please note that, "response", "pred" & "covar" are colum of the
dataframe "data"... scale is just a number.

data$covarcl <- round(data[,covar]/scale)*scale
data.tmp <- data[order(data$covarcl), ] # simple
operations...

freq <- function(data.tmp) sum(data.tmp[,response]==1)
mpred <- function(data.tmp)
mean(data.tmp[,pred])
mresponse <- function(data.tmp)
{mean(data.tmp[,response])} # I create 3 new
functions to be executed in ddply

data.tmp2 <-ddply(data.tmp, .(covarcl),
c("freq","mpred","mresponse")

return(data.tmp2)
}

myfunction(df, "observed", "fitted", "dhp", 2)


When I run the function, I get the following error:


Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'mpred' of mode 'function' was not found


I suppose it doesn't find the predefined functions, but I can't figure
out why. It works if you use it "manually".
None of my colleagues could find the problem either.

Thanks in advance for your help,

Simon Delisle

Kohske Takahashi

unread,
Feb 3, 2011, 8:10:38 AM2/3/11
to SimonDB, manipulatr
Hi,

the easiest way is probably to define the function in a global environment:

myfunction <- function (data, response, pred, covar, scale) {


data$covarcl <- round(data[,covar]/scale)*scale
data.tmp <- data[order(data$covarcl), ]

freq <<- function(data.tmp) sum(data.tmp[,response]==1)


mpred <<- function(data.tmp) mean(data.tmp[,pred])

mresponse <<- function(data.tmp)mean(data.tmp[,response])

data.tmp2 <-ddply(data.tmp, .(covarcl), c("freq","mpred","mresponse"))
rm("freq", "mpred", "mresponse", inherits = TRUE)
return(data.tmp2)
}

myfunction(df, "observed", "fitted", "dhp", 2)

--
Kohske Takahashi <takahash...@gmail.com>

Research Center for Advanced Science and Technology,
The University of  Tokyo, Japan.
http://www.fennel.rcast.u-tokyo.ac.jp/profilee_ktakahashi.html

> --
> You received this message because you are subscribed to the Google Groups "manipulatr" group.
> To post to this group, send email to manip...@googlegroups.com.
> To unsubscribe from this group, send email to manipulatr+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/manipulatr?hl=en.
>
>

SimonDB

unread,
Feb 3, 2011, 8:22:48 AM2/3/11
to manipulatr
Hi,

This works like a charm!!

I thought that it might me a problem with ddply handling the functions
with the columns names.
Thanks for this simple solution!

Simon


On 3 fév, 14:10, Kohske Takahashi <takahashi.koh...@gmail.com> wrote:
> Hi,
>
> the easiest way is probably to define the function in a global environment:
>
> myfunction <- function (data, response, pred, covar, scale) {
>   data$covarcl <- round(data[,covar]/scale)*scale
>   data.tmp <- data[order(data$covarcl), ]
>
>   freq <<- function(data.tmp) sum(data.tmp[,response]==1)
>   mpred <<- function(data.tmp) mean(data.tmp[,pred])
>   mresponse <<- function(data.tmp)mean(data.tmp[,response])
>
>    data.tmp2 <-ddply(data.tmp, .(covarcl), c("freq","mpred","mresponse"))
>    rm("freq", "mpred", "mresponse", inherits = TRUE)
>    return(data.tmp2)
>
> }
>
> myfunction(df, "observed", "fitted", "dhp", 2)
>
> --
> Kohske Takahashi <takahashi.koh...@gmail.com>
>
> Research Center for Advanced Science and Technology,
> The University of  Tokyo, Japan.http://www.fennel.rcast.u-tokyo.ac.jp/profilee_ktakahashi.html

Dennis Murphy

unread,
Feb 3, 2011, 10:30:33 AM2/3/11
to SimonDB, manipulatr

Hi:

You don't mention what the solution is supposed to be, but a couple of cosmetic changes to your function produced the following:


myfunction <- function (data, response,  pred,  covar,  scale)   {

# Please note that, "response", "pred" & "covar" are colum of the
# dataframe "data"... scale is just a number.

require(plyr)

       data$covarcl <- round(data[,covar]/scale)*scale
       data.tmp <- data[order(data$covarcl), ]         
       freq <- function(data.tmp) sum(data.tmp[,response]==1)
       mpred <- function(data.tmp) mean(data.tmp[,pred])
       mresponse <- function(data.tmp) {mean(data.tmp[,response])}                    

       data.tmp2 <-ddply(data.tmp, .(covarcl), transform, c("freq","mpred","mresponse"))


   return(data.tmp2)
}

myfunction(df, "observed", "fitted", "dhp", 2)
   dhp observed fitted covarcl
1 11.5        0    0.1      12
2 12.6        1    0.2      12
3 18.9        0    0.3      18
4 43.2        1    0.5      44
5 45.9        0    0.4      46
6 65.3        1    0.6      66

HTH,
Dennis


Stavros Macrakis

unread,
Feb 3, 2011, 10:07:30 AM2/3/11
to Kohske Takahashi, SimonDB, manipulatr
There appear to be two bugs in plyr.

First of all, it does not interpret function names in the calling environment as it should -- leading to the workaround of defining them in the global environment.

Secondly, the correct, principled way to do this is to pass the functions by value, not to pass their names.  But though ddply accepts individual functions either by name or by value, lists of functions must be specified by names, not by value.  Ugly. See below for example.

          -s

> ddply(data.frame(a=1:2,b=11:12),.(b),"min") # name of func
   b min
1 11   1
2 12   2
> ddply(data.frame(a=1:2,b=11:12),.(b),min) # func
   b V1
1 11  1
2 12  2
> ddply(data.frame(a=1:2,b=11:12),.(b),function(x)min(x)) # anonymous func
   b V1
1 11  1
2 12  2
> ddply(data.frame(a=1:2,b=11:12),.(b),c("min","max")) # name of func
   b min max
1 11   1  11
2 12   2  12
> ddply(data.frame(a=1:2,b=11:12),.(b),c(min,max)) # func
Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress,  : 
  .fun is not a function.
> ddply(data.frame(a=1:2,b=11:12),.(b),c(function(x)min(x),function(x)max(x))) # anonymous func
Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress,  : 
  .fun is not a function.

SimonDB

unread,
Feb 3, 2011, 1:19:19 PM2/3/11
to manipulatr
Hummm yess, I forgot to specify what it's suppose to do. It's supposed
to return a new dataframe in which each line is a level of the
variable specify in ddply, in this case "covarcl". In each column goes
the result of the 3 functions. For example, the mean of "pred" for
each level of "covalcl". This is a classic use of ddply, isn't?

And for the passing the names or the value of the function, the only
way I now to use ddply with a user defined function is by calling its
name. I've just started using plyr so maybe I'm not doing it right...

At least for now, the workaround is working very well, I have been
using it for a couple hours and no error. So thanks again!


Simon
> <takahashi.koh...@gmail.com>wrote:
>
>
>
>
>
>
>
> > Hi,
>
> > the easiest way is probably to define the function in a global environment:
>
> > myfunction <- function (data, response, pred, covar, scale) {
> >   data$covarcl <- round(data[,covar]/scale)*scale
> >  data.tmp <- data[order(data$covarcl), ]
>
> >   freq <<- function(data.tmp) sum(data.tmp[,response]==1)
> >  mpred <<- function(data.tmp) mean(data.tmp[,pred])
> >  mresponse <<- function(data.tmp)mean(data.tmp[,response])
>
> >    data.tmp2 <-ddply(data.tmp, .(covarcl), c("freq","mpred","mresponse"))
> >   rm("freq", "mpred", "mresponse", inherits = TRUE)
> >    return(data.tmp2)
> > }
>
> > myfunction(df, "observed", "fitted", "dhp", 2)
>
> > --
> > Kohske Takahashi <takahashi.koh...@gmail.com>
>
> > Research Center for Advanced Science and Technology,
> > The University of  Tokyo, Japan.
> >http://www.fennel.rcast.u-tokyo.ac.jp/profilee_ktakahashi.html
>
> > On Thu, Feb 3, 2011 at 8:54 PM, SimonDB <simondelisleboulia...@gmail.com>
> > manipulatr+...@googlegroups.com<manipulatr%2Bunsubscribe@googlegrou ps.com>
> > .
> > > For more options, visit this group at
> >http://groups.google.com/group/manipulatr?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "manipulatr" group.
> > To post to this group, send email to manip...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > manipulatr+...@googlegroups.com<manipulatr%2Bunsubscribe@googlegrou ps.com>
> > .

Hadley Wickham

unread,
Feb 4, 2011, 9:56:50 AM2/4/11
to Stavros Macrakis, Kohske Takahashi, SimonDB, manipulatr
> First of all, it does not interpret function names in the calling
> environment as it should -- leading to the workaround of defining them in
> the global environment.

Yes, see here https://github.com/hadley/plyr/issues#issue/3

This is turning out to be really tricky to fix in general - it's a
violation of R's usual static scoping rules, and it's very hard to get
right. lapply does everything at the C level and works by manually
creating calls and setting their parent to the correct environment.

> Secondly, the correct, principled way to do this is to pass the functions by
> value, not to pass their names.  But though ddply accepts individual
> functions either by name or by value, lists of functions must be specified
> by names, not by value.  Ugly. See below for example.

This was an easy fix and will be present in the next version of plyr.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages