What does the future hold for the `group_by` %>% `do()` method?

137 views

Skip to first unread message

Tiernan Martin

unread,

Mar 10, 2017, 1:48:41 PM3/10/17

to manipulatr

I recently asked a question on SO about the tidyverse method for splitting a df by multiple columns.

In the example I provided, I wanted to split a df by two cols and obtain a summary() output for each subset of the df.

My initial instinct was to use purrr::by_slice(), but that has been deprecated. The suggested solution uses group_by followed by do(), resulting in a list-col with the summaries:

library(tidyverse) 

library(magrittr) 

mtcars_summary <- 

    mtcars %>% 
    select(1:3) %>% 
       mutate(GRP_A = sample(LETTERS[1:2],  n(), replace = TRUE), 
           GRP_B = sample(c(1:2), n(), replace = TRUE)) %>% 
    group_by(GRP_A,  GRP_B) %>% 
    do(SUMMARY = summary(.))

Here's the structure of the output:

mtcars_summary

#> Source: local data frame [4 x 3]
#> Groups: <by row>
#> 
#> # A tibble: 4 Ã— 3
#>   GRP_A GRP_B     SUMMARY 
#> * <chr> <int>      <list>
#> 1     A     1 <S3: table>
#> 2     A     2 <S3: table>
#> 3     B     1 <S3: table>
#> 4     B     2 <S3: table>

... and the summaries themselves:


mtcars_summary[["SUMMARY"]]

#> [[1]] 
#>       mpg             cyl         disp          GRP_A          
#>  Min.   :14.30   Min.   :4   Min.   :120.3   Length:7          
#>  1st Qu.:18.00   1st Qu.:4   1st Qu.:143.8   Class :character  
#>  Median :21.00   Median :6   Median :160.0   Mode  :character 
#>  Mean   :20.64   Mean   :6   Mean   :223.4                     
#>  3rd Qu.:23.60   3rd Qu.:8   3rd Qu.:317.9                     
#>  Max.   :26.00   Max.   :8   Max.   :360.0                     
#>      GRP_B  
#>  Min.   :1  
#>  1st Qu.:1  
#>  Median :1  
#>  Mean   :1  
#>  3rd Qu.:1  
#>  Max.   :1  
#> 
#> [[2]]
#>       mpg             cyl             disp          GRP_A          
#>  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Length:11         
#>  1st Qu.:14.95   1st Qu.:5.000   1st Qu.:143.8   Class :character  
#>  Median :16.40   Median :8.000   Median :275.8   Mode  :character  
#>  Mean   :18.94   Mean   :6.545   Mean   :247.4                     
#>  3rd Qu.:20.35   3rd Qu.:8.000   3rd Qu.:334.0                     
#>  Max.   :33.90   Max.   :8.000   Max.   :460.0                     
#>      GRP_B  
#>  Min.   :2  
#>  1st Qu.:2  
#>  Median :2  
#>  Mean   :2  
#>  3rd Qu.:2  
#>  Max.   :2  
#> 
#> [[3]]
#>       mpg             cyl             disp           GRP_A          
#>  Min.   :15.00   Min.   :4.000   Min.   : 78.70   Length:6          
#>  1st Qu.:19.32   1st Qu.:4.000   1st Qu.: 86.25   Class :character  
#>  Median :21.25   Median :5.000   Median :126.50   Mode  :character  
#>  Mean   :22.73   Mean   :5.667   Mean   :185.28                     
#>  3rd Qu.:26.18   3rd Qu.:7.500   3rd Qu.:262.00                     
#>  Max.   :32.40   Max.   :8.000   Max.   :400.00                     
#>      GRP_B  
#>  Min.   :1  
#>  1st Qu.:1  
#>  Median :1  
#>  Mean   :1  
#>  3rd Qu.:1  
#>  Max.   :1  
#> 
#> [[4]]
#>       mpg             cyl            disp          GRP_A          
#>  Min.   :10.40   Min.   :4.00   Min.   : 95.1   Length:8          
#>  1st Qu.:15.65   1st Qu.:5.50   1st Qu.:150.2   Class :character  
#>  Median :19.55   Median :6.00   Median :241.5   Mode  :character  
#>  Mean   :19.21   Mean   :6.25   Mean   :248.3                     
#>  3rd Qu.:21.40   3rd Qu.:8.00   3rd Qu.:315.8                     
#>  Max.   :30.40   Max.   :8.00   Max.   :472.0                     
#>      GRP_B  
#>  Min.   :2  
#>  1st Qu.:2  
#>  Median :2  
#>  Mean   :2  
#>  3rd Qu.:2  
#>  Max.   :2

Main Question: is the use of do() in this example the recommended way of working with a grouped/split dataframe?

Secondary Question: Is do() going to be deprecated, as was suggested here?

Tiernan Martin

unread,

Mar 18, 2017, 6:48:19 PM3/18/17

to manipulatr

The suggested solution appears to be a combination of "mutate + list-cols + purrr", as seen here and illustrated with the above example here.

Reply all

Reply to author

Forward

0 new messages