Using quantile() with summarize

9,615 views
Skip to first unread message

Stuart Luppescu

unread,
Apr 10, 2014, 8:28:53 AM4/10/14
to manip...@googlegroups.com
Hello, I'm a beginner with dplyr and am having trouble using quantile() with summarize. I am working with a grouped data set. I want to get the nth, 50th and (100-n)th quantile for the variable score.diff for each group.

When I use more than one value for the arguments to p= in quantile (e.g., p=c(0.1, 0.5, 0.9)) summarize tells me
Error: expecting a single value
So, I did three separate summarize calls like this:

conf.int <- 0.9
intervals <- as.data.frame(summarize(grouped.ts, quantile(score.diff, p=conf.int)))
intervals$med <- summarize(grouped.ts, quantile(score.diff, p=0.5))
intervals$bot <- summarize(grouped.ts, quantile(score.diff, p=1-conf.int))

But the structure of the resulting data frame is very peculiar, so in order to reference intervals$med and intervals$bot I have to use intervals$med[,2]. 

Can someone give me some guidance?
TIA.
--
Stuart Luppescu -- pixbuf .at. gmail.com
What's another word for euphemism?
  -- Karen Ellis: http://planetkaren.girl-wonder.org/index.php?strip_id=608

Alain Content

unread,
Apr 10, 2014, 10:25:36 AM4/10/14
to Stuart Luppescu, manip...@googlegroups.com
Hi, 


# A dataframe example 

s <- 1:20
d <- data.frame(S=rep(s, each=100), var=rnorm(n=2000, mean=0, s=1))

# Here is how I would do it with plyr 

ddply(d, .(S), summarize, M=mean(var), Med=median(var),  Q=matrix(quantile (var, probs=c(0.25,0.50,0.75)) ,ncol=3) )

# However the equivalent does not seem to work with dplyr, unfortunately

library(dplyr)

d %.% group_by(S) %.% 
  summarise(M=mean(var), Med=median(var), Q= matrix(quantile (var, probs=c(0.25, 0.50, 0.75) ), ncol=3) )
# Error: expecting a single value

Here is a workaround 
d %.% group_by(S) %.% 
  summarise(M=mean(var), Med=median(var), Q1=quantile (var, probs=0.25), Q2=quantile (var, probs=0.50), Q3=quantile(var, probs=0.75))

Alain 


--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Stuart Luppescu

unread,
Apr 10, 2014, 10:47:22 AM4/10/14
to Alain Content, manip...@googlegroups.com
Thank you, Alain. Making good progress. 
Here's my code:

intervals <- summarize(grouped.ts, obs.mean=mean(Obs.Average, na.rm=T),
                       med=quantile(score.diff, p=.5),
                       top=quantile(score.diff, p=conf.int),
                       bot=quantile(score.diff, p=(1-conf.int)))

The result has this structure (which looks good):

str(intervals)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 11 obs. of  5 variables:
 $ bins    : Factor w/ 10 levels "(1,2]","(2,2.22]",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ obs.mean: num  1.77 2.16 2.37 2.55 2.65 ...
 $ med     : Named num  -0.11 -0.09 -0.11 -0.13 -0.06 ...
  ..- attr(*, "names")= chr "50%"
 $ top     : Named num  0.114 0.213 0.2 0.262 0.26 ...
  ..- attr(*, "names")= chr "90%"
 $ bot     : Named num  -0.34 -0.394 -0.37 -0.5 -0.324 ...
  ..- attr(*, "names")= chr "10%"
 - attr(*, "drop")= logi TRUE

But then when I try to print one of the columns of the data frame:

 intervals$top

 *** caught segfault ***
address 0x20006009, cause 'memory not mapped'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

Uh oh.

Stuart Luppescu

unread,
Apr 10, 2014, 10:52:34 AM4/10/14
to Alain Content, manip...@googlegroups.com
Ah, well, never mind. When I use the data to plot in ggplot it works fine, which is all I really want.
Thanks.

Hadley Wickham

unread,
Apr 11, 2014, 6:21:38 PM4/11/14
to Stuart Luppescu, Alain Content, manipulatr
I'm pretty sure we've fixed the crashing bug in the dev version of dplyr.
Hadley
http://had.co.nz/

Axel Urbiz

unread,
Oct 30, 2015, 7:34:04 AM10/30/15
to manipulatr, pix...@gmail.com
Hello, 


this is an old post, but I'm having exact the same issue, and does;t seem to be fixed. Did you manage to get this done in dplyr instead of plyr?
Thanks
Axel

Hadley Wickham

unread,
Oct 30, 2015, 7:36:14 AM10/30/15
to Axel Urbiz, manipulatr, Stuart Luppescu
Please supply a minimal reproducible example. For future reference,
it's better to start a new thread, rather than trying to revive
something that's over a year old.

Hadley
--
http://had.co.nz/

Alain Content

unread,
Oct 30, 2015, 7:41:54 AM10/30/15
to Axel Urbiz, manipulatr, pix...@gmail.com
Hi Axel, 

I think this should work 

res <- d %>% group_by (S) %>% 
do(q=data.frame(quantile(.$var)))

Alain 

ALAIN CONTENT
Laboratory Cognition Language & Development  – http://crcn.ulb.ac.be/lcld
ULB Neuroscience Institute - Centre for Research in Cognition & Neuroscience
Reply all
Reply to author
Forward
0 new messages