hash of a ggplot object

137 views
Skip to first unread message

Christophe Ladroue

unread,
Mar 16, 2012, 2:49:08 PM3/16/12
to ggplot2
Dear all,

I'm currently writing the tests (with testthat) for a package which
produces some ggplot objects. I'm using the digest library to compare
the MD5 hash of the objects against what is expected, instead of
eye-balling all the graphs.
Is it a good idea? I've noticed to my surprise that the hash of a
ggplot object changes after it's been printed for example, so I'm not
sure sure a hash is stable enough to use for the tests.

-
library("digest")
p<-ggplot(diamonds)+geom_point(aes(x=carat,y=price,colour=cut))
digest(p)
print(p)
digest(p)
-
Returns:
[1] "99597b0b7820b1eb45d5229e34a127f1"
[1] "ee4bebea3461109ae349631bcee1668e"

(R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] digest_0.5.1 testthat_0.6 reshape2_1.2.1 plyr_1.7.1
scales_0.2.0 ggplot2_0.9.0

loaded via a namespace (and not attached):
[1] colorspace_1.1-1 DBI_0.2-5 dichromat_1.2-4
evaluate_0.4.1 grid_2.14.2 MASS_7.3-16 memoise_0.1
[8] munsell_0.3 proto_0.3-9.2 RColorBrewer_1.0-5
RMySQL_0.8-0 stringr_0.6 tools_2.14.2 )

thank you,
Christophe

--
GnuPG key: 0x99A37D7E

Winston Chang

unread,
Mar 16, 2012, 4:04:19 PM3/16/12
to Christophe Ladroue, ggplot2
Hi Christophe -

I think it's because there's a lot of stuff that happens (like setting up scales) when you print the ggplot object. Another way of forcing that all to happen is to use ggplot_build:

library("digest")
p<-ggplot(diamonds)+geom_point(aes(x=carat,y=price,colour=cut))
digest(p)
# "66ccf8e3a35e344f5af3d53434c71788"

invisible( ggplot_build(p) )  # There's a lot of output. Suppress it with invisible()
digest(p)
#"8486471ae63df5a97766e4acedc62ae8"

I don't know if these MD5 hashes are stable across platforms, though -- you'll notice that my numbers are different from yours, but that may be because I'm using a different version of ggplot2. The hashes will certainly change with different versions of ggplot2, because there are lots of internal changes.



I don't know the purpose of your tests, but I can tell you that I'm working on a visual testing system for ggplot2, so that it's possible to detect and visualize when changes to the code result in changes to the output. It's very much a work in progress, but it will hopefully be ready in the next week.

-Winston



--
GnuPG key: 0x99A37D7E

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Christophe Ladroue

unread,
Mar 16, 2012, 4:16:34 PM3/16/12
to Winston Chang, ggplot2
Thank you Winston, that looks very useful.
-
I think I'm not going to use the hash; the hashes were different when
I ran the script in a different session! So not stable at all.

(besides it'd probably be very sensitive to the version of ggplot2 it runs.)

Thanks!
Christophe

--
GnuPG key: 0x99A37D7E

baptiste auguie

unread,
Mar 16, 2012, 4:33:00 PM3/16/12
to Christophe Ladroue, ggplot2
Hi,


I don't know how hashes work, but I imagine the output of
ggplotGrob(p) should be less volatile than p itself.

HTH,

b.

Christophe Ladroue

unread,
Mar 16, 2012, 4:58:30 PM3/16/12
to baptiste auguie, ggplot2
Unfortunately, I have the same problem with ggplotGrob. I also tried
with serializing the object first:
md5<-function(p) digest(serialize(p,NULL),algo='md5')

but the hashes still change after a new session. Even weirder: the
hashes do get identical to the values I saved previously, but only
after the second run of the test.

A doomed idea it seems.

Christophe

--
GnuPG key: 0x99A37D7E

baptiste auguie

unread,
Mar 16, 2012, 5:46:51 PM3/16/12
to Christophe Ladroue, ggplot2
I'm guessing it's got something to do with the random names assigned
to grobs. It would be nice if ggplot2 followed a naming scheme [*]
(also for post-processing, e.g interactive svg); I don't know if would
be easy to implement. A brute force approach would be to store
counters for each kind of grob, reinitialized for each new plot,

.rect <- 1
rectGrob <- function(..., name=NULL)
{
name <- if(is.null(name)) paste("rect.", .rect, sep="") else
paste("name.", .rect, sep="")
.rect <<- .rect + 1
grid::rectGrob(..., name=name)
}

rectGrob()
rectGrob()


[*]: http://lattice.r-forge.r-project.org/Vignettes/src/naming-scheme/namingScheme.pdf

b.

Jean-Olivier Irisson

unread,
Mar 17, 2012, 7:09:19 AM3/17/12
to Winston Chang, Christophe Ladroue, ggplot2
On 2012-Mar-16, at 21:04 , Winston Chang wrote:
>
> I don't know the purpose of your tests, but I can tell you that I'm working on a visual testing system for ggplot2, so that it's possible to detect and visualize when changes to the code result in changes to the output. It's very much a work in progress, but it will hopefully be ready in the next week.

In the mean time and while a truly visual diff is being tested, a solution could be to save the plot to images and compare the images. In my case:

library("ggplot2")
p = qplot(1:10, 1:10)
ggsave("foo.png", p, width=5, height=5)

in two different R sessions gave me images with the same MD5:

MD5 (foo.png) = 7a2651157032ac1a34ef6140fae5f32e
MD5 (foo1.png) = 7a2651157032ac1a34ef6140fae5f32e

You could compare them using a system("md5 ***") call or directly a system("diff ***").

Of course, the issue here is that a change to the theme, or fonts in your system, or etc. would result in a change in the image, which is probably not what you wanted to test. So you may want to control that more thoroughly (i.e. define your own theme for example).

Jean-Olivier Irisson
---
Observatoire Océanologique
Station Zoologique, B.P. 28, Chemin du Lazaret
06230 Villefranche-sur-Mer
Tel: +33 04 93 76 38 04
Mob: +33 06 21 05 19 90
http://jo.irisson.com/

Christophe Ladroue

unread,
Mar 17, 2012, 7:25:10 AM3/17/12
to Jean-Olivier Irisson, Winston Chang, ggplot2
thank you, that's another idea. But as you pointed out, the test will
be sensitive to small changes in theme etc. On top of that, png don't
seem to carry across platforms.
In retrospect, using object comparisons might not the best idea
because it's too stringent: the slightest difference in naming scheme
or order of operations will fail the test, even if there's no
difference to the naked eye. So I think I'm going to go the visual
route instead.

Thanks all for your input!
Cheers,
Christophe

--
GnuPG key: 0x99A37D7E

Jean-Olivier Irisson

unread,
Mar 17, 2012, 9:17:53 AM3/17/12
to Christophe Ladroue, Winston Chang, ggplot2
On 2012-Mar-17, at 12:25 , Christophe Ladroue wrote:
>
> thank you, that's another idea. But as you pointed out, the test will
> be sensitive to small changes in theme etc. On top of that, png don't
> seem to carry across platforms.
> In retrospect, using object comparisons might not the best idea
> because it's too stringent: the slightest difference in naming scheme
> or order of operations will fail the test, even if there's no
> difference to the naked eye. So I think I'm going to go the visual
> route instead.

Just to highlight something possibly trivial but… A visual test will *also* be sensitive to the slightest change in theme, just as the comparison of objects is.

Christophe Ladroue

unread,
Mar 17, 2012, 9:33:35 AM3/17/12
to Jean-Olivier Irisson, Winston Chang, ggplot2
sure, what I meant is that instead of having a test either passing or
failing (à la testthat or any unit test package), with a visual test
you can decide whether the difference is worth investigating or not.

Christophe

--
GnuPG key: 0x99A37D7E

Hadley Wickham

unread,
Mar 18, 2012, 6:46:33 PM3/18/12
to Christophe Ladroue, ggplot2
In the past, I have tried without success to make stable hashes. It's
surprisingly hard, and at the end of the day it wasn't that helpful
for developing tests.

Hadley

On Fri, Mar 16, 2012 at 1:49 PM, Christophe Ladroue
<chris....@gmail.com> wrote:

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages