There are some use cases for serializing ggplot2 objects to disk (e.g. `saveRDS()`, `qs::qsave()`. However, the plots may bundle up extraneous environmental objects resulting in large serialized objects. In the reprex below, a function generates a plot from a `dt` object which is 0.2Mb; however, multiple copies of / pointers to the matrix `mat` are bundled into the ggplot object `p` such that its compressed serialized size is 1.4Gb.
The `mat` is included in mapping and layer quosure environments and in `p$plot_env`. The latter can be cleaned with `p$plot_env <- rlang::new_environment()` however cleaning the large `mat` out of the quosure environments is more involved.
Would appreciate any suggestions on trimming these objects or ideas to avoid this in the first place.
Thanks
``` r
mat <- outer(1:20000,1:20000) # 3.2Gb matrix
plot_fn <- function(mat) {
dt <- data.frame(x = 1:dim(mat)[1], y = rowSums(mat))
p <- ggplot2::ggplot(dt, ggplot2::aes(x, y)) +
ggplot2::geom_point() +
ggplot2::geom_point(ggplot2::aes(x = x+1, y = y+1))
}
p <- plot_fn(mat)
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
butcher::weigh(p)
#> # A tibble: 12 x 2
#> object size
#> <chr> <dbl>
#> 1 mapping.x 3201.
#> 2 mapping.y 3201.
#> 3 layers2 3201.
#> 4 plot_env 3201.
#> 5 layers1 0.367
#> 6 facet 0.172
#> 7 data.y 0.160
#> 8 coordinates 0.118
#> 9 data.x 0.0800
#> 10 scales 0.0426
#> 11 labels.x 0.000112
#> 12 labels.y 0.000112
saveRDS(p, "p.rds")
file.info("p.rds")
#> size isdir mode mtime ctime
#> p.rds 1423868059 FALSE 644 2020-06-10 09:45:49 2020-06-10 09:45:49
```