ggplot2 plots can include large extraneous objects

103 views
Skip to first unread message

CK

unread,
Jun 10, 2020, 5:03:16 AM6/10/20
to ggplot2
There are some use cases for serializing ggplot2 objects to disk (e.g. `saveRDS()`, `qs::qsave()`.  However, the plots may bundle up extraneous environmental objects resulting in large serialized objects.  In the reprex below, a function generates a plot from a `dt` object which is 0.2Mb; however, multiple copies of / pointers to the matrix `mat` are bundled into the ggplot object `p` such that its compressed serialized size is 1.4Gb.

The `mat` is included in mapping and layer quosure environments and in `p$plot_env`.  The latter can be cleaned with `p$plot_env <- rlang::new_environment()` however cleaning the large `mat` out of the quosure environments is more involved.

Would appreciate any suggestions on trimming these objects or ideas to avoid this in the first place.

Thanks

``` r
mat <- outer(1:20000,1:20000) # 3.2Gb matrix

plot_fn <- function(mat) {
  dt <- data.frame(x = 1:dim(mat)[1], y = rowSums(mat))
  p <- ggplot2::ggplot(dt, ggplot2::aes(x, y)) +
    ggplot2::geom_point() +
    ggplot2::geom_point(ggplot2::aes(x = x+1, y = y+1))
}

p <- plot_fn(mat)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
butcher::weigh(p)
#> # A tibble: 12 x 2
#>    object             size
#>    <chr>             <dbl>
#>  1 mapping.x   3201.      
#>  2 mapping.y   3201.      
#>  3 layers2     3201.      
#>  4 plot_env    3201.      
#>  5 layers1        0.367   
#>  6 facet          0.172   
#>  7 data.y         0.160   
#>  8 coordinates    0.118   
#>  9 data.x         0.0800  
#> 10 scales         0.0426  
#> 11 labels.x       0.000112
#> 12 labels.y       0.000112
saveRDS(p, "p.rds")
file.info("p.rds")
#>             size isdir mode               mtime               ctime
#> p.rds 1423868059 FALSE  644 2020-06-10 09:45:49 2020-06-10 09:45:49
```

CK

unread,
Jun 10, 2020, 7:43:04 AM6/10/20
to ggplot2
UPDATE:  As the referenced environment is identical for the mapping/layer quosures and also for `p$plot_env`, a quick fix is to remove the large object/s from the referenced environment.  The previous fix of `p$plot_env <- rlang::new_environment()` was leaving the quosure env's intact.

```

butcher::weigh(p)
#> # A tibble: 12 x 2
#>    object             size
#>    <chr>             <dbl>
#>  1 mapping.x   3201.     
#>  2 mapping.y   3201.     
#>  3 layers2     3201.     
#>  4 plot_env    3201.     
#>  5 layers1        0.367  
#>  6 facet          0.172  
#>  7 data.y         0.160  
#>  8 coordinates    0.118  
#>  9 data.x         0.0800 
#> 10 scales         0.0426 
#> 11 labels.x       0.000112
#> 12 labels.y       0.000112
p$plot_env$mat <- NULL

butcher::weigh(p)
#> # A tibble: 12 x 2
#>    object          size
#>    <chr>          <dbl>
#>  1 mapping.x   0.855  
#>  2 mapping.y   0.855  
#>  3 layers2     0.855  
#>  4 plot_env    0.855  
#>  5 layers1     0.367  
#>  6 facet       0.172  
#>  7 data.y      0.160  
#>  8 coordinates 0.118  
#>  9 data.x      0.0800 
#> 10 scales      0.0426 
#> 11 labels.x    0.000112
#> 12 labels.y    0.000112
```

Perhaps a long-term solution would involve ggplot omitting unrelated objects from its environments'.
Reply all
Reply to author
Forward
0 new messages