creating the "best" graph in ggplot2

537 views
Skip to first unread message

Joshua Wiley

unread,
Sep 23, 2010, 11:37:06 PM9/23/10
to ggplot2
Hi all,
 
This strays a bit from strictly ggplot2, but I thought that it was applicable enough to ask here anyways (besides as one of my teachers liked to say, "There's no award for being timid").
 
My basic question is: how does one create the "best" or ideal graph to communicate a particular set of data or results in ggplot2?  Assuming there is no simple answer (a la the answer to the universe, 42), what is a good way to go about learning?  I find that most books on statistics have a dearth, if any, advice on graphical presentations.  I looked for books that seemed to deal in general with graphics and collected this list from the references in the ggplot2 and lattice books and a paper by Cleveland.
 
Does anyone have additional references they think are good, or have a particular recommendation for a neophyte?  Would a list of recommended reading be appropriate for the wiki?
 
------------------------------------------------------------------
 
***William Cleveland. Visualizing Data. Hobart Press, Summit, New Jersey, 1993.
 
***William Cleveland. The Elements of Graphing Data. Wadsworth, Monterey, California, 1985.
 
***John Tukey. Exploratory Data Analysis. Addison-Wesley Publishing Co., Reading, Massachusetts, 1977.
 
**Paul Murrell. R Graphics. Chapman & Hall/CRC, 2005.
 
**John Chambers, William Cleveland, Beat Kleiner, and Paul Tukey. Graphical methods for Data Analysis. Wadsworth, 1983.
 
**Leland Wilkinson. The Grammar of Graphics. Springer, New York, 1999.
 
**Edward R. Tufte. The Visual Display of Quantitative Information. Graphics Press, 2001.
 
Edward R. Tufte. Envisioning Information. Graphics Press, 1990.
 
Edward R. Tufte. Visual Explanations. Graphics Press, 1997.
 
Edward R. Tufte. Beautiful Evidence. Graphics Press, 2006.
 
Paul Murrell. Investigations in Graphical Statistics. PhD thesis, The University of Auckland, 1998.
 
M. Friendly. Visualizing Categorical Data. SAS Institute, Carey, NC, 2000. ISBN 1-58
 
W. C. Brinton. Graphical Methods for Presenting Facts. Engineering Magazine Co., NewYork, 1914.
 
C. F. Schmid and S. E. Schmid. Handbook of Graphic Presentation. Wiley,New York, 1979.
 
I. Spence and S. Lewandowsky. Graphical perception. In J. Fox and Long S., editors, Modern Methods of Data Analysis, Beverly Hills, CA, to appear. Sage Publications.
 
H. Wainer and D. Thissen. Graphical data analysis. Ann. Rev. Psychol.,, 32:191–241, 1981.
 
William Cleveland. A model for studying display methods of statistical graphics. Journal of Computational and Graphical Statistics, 2:323–364, 1993b. URL http://stat.bell-labs.com/doc/93.4.ps.
 
William Cleveland, editor. The Collected Works of John W. Tukey, Volume V: Graphics 1965–1985. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA, 1988.
 
------------------------------------------------------------------
 
Asterisks indicate how many times I saw a particular references in the three sources I drew on.
 
Thanks and tongue lashings for irrelevance willingly accepted,
 
 
Josh
 
P.S.  I saw a new type of chart in Excel today.  It takes the idea of a bar chart, but makes it 3d.  If you're still not content, you can choose tapered cones instead of rectangles/cubes.  These may also be stacked for a truly confusing experience.
 
 
 
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles

Ben Bolker

unread,
Sep 24, 2010, 9:54:23 AM9/24/10
to Joshua Wiley, ggplot2
Looks good. I look forward to browsing through some of these that I
haven't checked out before.
Re "is this appropriate": It's a wiki, why not go for it? (Or, just
mark me down for +1)

The Tufte books are beautiful and entertaining, but longer on general
philosophy and great examples than on detailed practical case studies or
"how to" information.

Another suggestion for learning would be to follow the various blogs
that comment on data visualization (Flowing Data, Junk Charts, Andrew
Gelman's blog ...) Such a link-farm would also seem appropriate for the
wiki.

> 32:191�241, 1981.


>
> William Cleveland. A model for studying display methods of statistical

> graphics. Journal of Computational and Graphical Statistics, 2:323�364,


> 1993b. URL http://stat.bell-labs.com/doc/93.4.ps.
>
> William Cleveland, editor. The Collected Works of John W. Tukey, Volume

> V: Graphics 1965�1985. Wadsworth & Brooks/Cole Advanced Books &


> Software, Monterey, CA, 1988.
>
> ------------------------------------------------------------------
>
> Asterisks indicate how many times I saw a particular references in the
> three sources I drew on.
>
> Thanks and tongue lashings for irrelevance willingly accepted,
>
>
> Josh
>
> P.S. I saw a new type of chart in Excel today. It takes the idea of a
> bar chart, but makes it 3d. If you're still not content, you can choose
> tapered cones instead of rectangles/cubes. These may also be stacked
> for a truly confusing experience.
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>

> --
> You received this message because you are subscribed to the ggplot2
> mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2

Joshua Wiley

unread,
Sep 24, 2010, 11:16:46 AM9/24/10
to Ben Bolker, ggplot2
On Fri, Sep 24, 2010 at 6:54 AM, Ben Bolker <bbo...@gmail.com> wrote:
 Looks good. I look forward to browsing through some of these that I
haven't checked out before.
 Re "is this appropriate": It's a wiki, why not go for it? (Or, just
mark me down for +1)

 The Tufte books are beautiful and entertaining, but longer on general
philosophy and great examples than on detailed practical case studies or
"how to" information.

 Another suggestion for learning would be to follow the various blogs
that comment on data visualization (Flowing Data, Junk Charts, Andrew
Gelman's blog ...)  Such a link-farm would also seem appropriate for the
wiki.

This seems to be the general opinion from on- and off-list emails.  I will work on compiling and cleaning up the references and also look into adding links to those blogs.  My only thought on the blogs is that I think they should be carefully chosen to be high quality...I am hoping for this list to be an alternative to the bar, pie, Pareto charts that are often introduced.

 
> 32:191–241, 1981.

>
> William Cleveland. A model for studying display methods of statistical
> graphics. Journal of Computational and Graphical Statistics, 2:323–364,

> 1993b. URL http://stat.bell-labs.com/doc/93.4.ps.
>
> William Cleveland, editor. The Collected Works of John W. Tukey, Volume
> V: Graphics 1965–1985. Wadsworth & Brooks/Cole Advanced Books &

thomas

unread,
Sep 24, 2010, 12:07:45 PM9/24/10
to ggplot2
read the Cleveland books first, they explain what you should do rather
than being a portfolio list.

Why no mention of Stephen Few?

Joshua Wiley

unread,
Sep 26, 2010, 9:05:16 PM9/26/10
to ggplot2
Hi All,

I have created a wiki page here:
http://github.com/hadley/ggplot2/wiki/Recommended-Reading

If anyone has books/websites they feel should be included, please feel
free to add them or email me. I tried to include full references and
as many links as possible (book website, author, publisher, etc.). I
think it would also help the page if anyone could add reviews.

Thanks for everyone's feedback and suggestions. I really appreciate
all of them.

Sincerely,

Josh

P.S. Thomas, I did not mention Stephen Few because I am unfamiliar
with his works (actually I am with all of these), and you were the
first one to mention his name. If you have some good books/articles
by him, I'm happy to include them or you can directly.

Joshua Wiley

unread,
Sep 26, 2010, 9:05:25 PM9/26/10
to ggplot2

goo...@wittongilbert.free-online.co.uk

unread,
Sep 27, 2010, 8:50:29 AM9/27/10
to ggplot2
Hi this was going to be two questions so I probably should post them
separately but I think I found a workaround for the tick marks so I
decided just to check I hadn't missed something obvious!

I chose to use ggplot because I read it was very controllable but two
things are perplexing me:

I have a plot ( http://php5.chemo.org.uk/plot.png ) which I starting to
look like I want it to.

However, I'd like to improve a couple of things

1. The Y-axis - I have put a lot of text in there which I know is wrong.
So I'd either like to put the first line of each label in bigger font
(but can't see a way to do that) OR put a secondary y axis on the right
with the additional information in it. I also can't see that option either.

Here are a couple of options:

http://php5.chemo.org.uk/plot-1.png
http://php5.chemo.org.uk/plot-2.png

2. I gather ggplot doesn't understand what minor tick marks are.
(Honestly?) I've come up with a work around

# Set the breaks - I want a scale from 0 to 8, with ticks at 0.25 increments
blks = seq(0,8, by = 0.25)
tks = blks
# Set the tick labels. i only want ticks on whole numbers. So check if
number is whole if not set a blank label
tks = ifelse(tks==round(tks,0),format(tks,nsmall=1),"")

# Label the x axis and set the breaks
myPlot = myPlot + scale_x_continuous (
name = "Odds Ratio of Reaction",
breaks = blks,
formatter=scientific,
labels= tks)

That gives me this graph:
http://php5.chemo.org.uk/plot-3.png

Is there a better way to achieve this?

James Howison

unread,
Sep 27, 2010, 10:53:04 AM9/27/10
to ggplot2
below

On Sep 27, 2010, at 08:50, goo...@wittongilbert.free-online.co.uk wrote:

> Hi this was going to be two questions so I probably should post them separately but I think I found a workaround for the tick marks so I decided just to check I hadn't missed something obvious!
>
> I chose to use ggplot because I read it was very controllable but two things are perplexing me:
>
> I have a plot ( http://php5.chemo.org.uk/plot.png ) which I starting to look like I want it to.
>
> However, I'd like to improve a couple of things
>
> 1. The Y-axis - I have put a lot of text in there which I know is wrong. So I'd either like to put the first line of each label in bigger font (but can't see a way to do that) OR put a secondary y axis on the right with the additional information in it. I also can't see that option either.
>
> Here are a couple of options:
>
> http://php5.chemo.org.uk/plot-1.png
> http://php5.chemo.org.uk/plot-2.png

You could add that text in the main graph area using geom_text, perhaps under the line?

> 2. I gather ggplot doesn't understand what minor tick marks are. (Honestly?) I've come up with a work around
>
> # Set the breaks - I want a scale from 0 to 8, with ticks at 0.25 increments
> blks = seq(0,8, by = 0.25)
> tks = blks
> # Set the tick labels. i only want ticks on whole numbers. So check if number is whole if not set a blank label
> tks = ifelse(tks==round(tks,0),format(tks,nsmall=1),"")
>
> # Label the x axis and set the breaks
> myPlot = myPlot + scale_x_continuous (
> name = "Odds Ratio of Reaction",
> breaks = blks,
> formatter=scientific,
> labels= tks)
>
> That gives me this graph:
> http://php5.chemo.org.uk/plot-3.png

Well, you won't get ticks but you could use scale_x_continuous(minor_breaks=seq(0,8,by=0.25)) to get the minor lines on the grid, but that doesn't get you ticks.

> Is there a better way to achieve this?
>

thomas

unread,
Oct 3, 2010, 6:23:04 PM10/3/10
to ggplot2
Hi Joshua,

> P.S. Thomas, I did not mentionStephenFewbecause I am unfamiliar
> with his works (actually I am with all of these), and you were the
> first one to mention his name.  If you have some good books/articles
> by him, I'm happy to include them or you can directly.

he published a book called 'Now you see it". Beautiful examples and
print. Would start with Cleveland though.

articles: http://www.perceptualedge.com/examples.php

Thomas

Joel Schwartz

unread,
Nov 24, 2010, 2:39:48 PM11/24/10
to Joshua Wiley, ggplot2
Joshua,
 
Here is a list of books on data visualization that I hope to browse through at the library sometime soon. I keep a list of such books as I run across them during reading, searching, etc. I don't know how good any of them are, but they have intriguing titles. If anyone has looked at these books, perhaps they can chime in with recommendations:
  1. Bedersen, The craft of information visualization (2003)
  2. Bonneau, Scientific visualization: Visual extraction of knowledge from data (2006)
  3. Chen, Information visualization, Beyond the horizon, 2nd ed. (2006)
  4. Hansen, The Visualization Handbook (2005)
  5. Mazza, Introduction to information visualization (2009)
  6. Soukup, Visual data mining (2002)
  7. Steele, Beautiful visualization: Looking at data through the eyes of experts (2010)
  8. Unwin, Hardle (eds), Handbook of data visualization (2008)
  9. Wright, Introduction to scientific visualization (2007)
HTH,
Joel


From: ggp...@googlegroups.com [mailto:ggp...@googlegroups.com] On Behalf Of Joshua Wiley
Sent: Thursday, September 23, 2010 8:37 PM
To: ggplot2
Subject: creating the "best" graph in ggplot2

--

Joel Schwartz

unread,
Nov 24, 2010, 2:44:35 PM11/24/10
to Joshua Wiley, ggplot2
Oh, one more thing: ggplot-ers who haven't already seen it might be interested in this web site:
 
Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization
Joel
 

From: ggp...@googlegroups.com [mailto:ggp...@googlegroups.com] On Behalf Of Joshua Wiley
Sent: Thursday, September 23, 2010 8:37 PM
To: ggplot2
Subject: creating the "best" graph in ggplot2

--
Reply all
Reply to author
Forward
0 new messages