Dendrograms / tree models

600 views
Skip to first unread message

Andrie de Vries

unread,
May 12, 2010, 12:49:55 PM5/12/10
to ggplot2
Hi all

As a newbie on this mailing list, let me first congratulate Hadley and
the ggplot community with creating a truly wonderful piece of graphic
software.

As part of my work I need to create compelling graphic displays of
dendrograms, such as the output produced by tree() and hclust(). Much
to my surprise, a google search for ggplot dendrogram reveals nothing
useful.

So I decided to write some code to plot regression/classification
trees, i.e. the output of tree() in library(tree).

I would be interested to know whether there are better ways of doing
this. In particular, I created a fortify method for objects of class
tree. This creates the data frame for plotting the lines. But I also
need two additional data frames, one for the text labels, and one for
the value labels. Is there a sensible way of merging the data frames
in such a way that a subset gets used by geom_segment, and another
subset by geom_text?

Andrie

# Plots tree object in ggplot2


fortify.tree <- function(model, data, ...){
require(tree)
# Uses tree:::treeco to extract data frame of plot locations
xy <- tree:::treeco(model)
n <- model$frame$n

# Lines copied from tree:::treepl
x <- xy$x
y <- xy$y
node = as.numeric(row.names(model$frame))
parent <- match((node%/%2), node)
sibling <- match(ifelse(node%%2, node - 1L, node + 1L), node)

linev <- data.frame(x=x, y=y, xend=x, yend=y[parent], n=n)
lineh <- data.frame(x=x[parent], y=y[parent], xend=x,
yend=y[parent], n=n)

rbind(linev[-1,], lineh[-1,])
}

label.tree <- function(model, ...){
require(tree)
# Uses tree:::treeco to extract data frame of plot locations
xy <- tree:::treeco(model)
label <- model$frame$var
sleft <- model$frame$splits.cutleft
sright <- model$frame$splits.right

# Lines copied from tree:::treepl
x <- xy$x
y <- xy$y
node = as.numeric(row.names(model$frame))
parent <- match((node%/%2), node)
sibling <- match(ifelse(node%%2, node - 1L, node + 1L), node)

data <- data.frame(x=x, y=y, label=label)
data <- data[data$label != "<leaf>",]
data
}

label.tree.leaf <- function(model, ...){
require(tree)
# Uses tree:::treeco to extract data frame of plot locations
xy <- tree:::treeco(model)
label <- model$frame$var
yval <- model$frame$yval
sleft <- model$frame$splits.cutleft
sright <- model$frame$splits.right

# Lines copied from tree:::treepl
x <- xy$x
y <- xy$y
node = as.numeric(row.names(model$frame))
parent <- match((node%/%2), node)
sibling <- match(ifelse(node%%2, node - 1L, node + 1L), node)

data <- data.frame(x, y, label, yval)
data <- data[data$label == "<leaf>",]
data$label <- round(data$yval, 2)
data
}



################
# Example code #
################

library(ggplot2)
library(tree)

data(cpus, package="MASS")
cpus.ltr <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, cpus)


p <- ggplot(data=cpus.ltr)
p <- p +
geom_segment(aes(x=x,y=y,xend=xend,yend=yend,size=n),colour="blue",
alpha=0.5)
p <- p + scale_size("n", to=c(0, 3))
p <- p + geom_text(data=label.tree(cpus.ltr), aes(x=x, y=y,
label=label), vjust=-0.5, size=4)
p <- p + geom_text(data=label.tree.leaf(cpus.ltr), aes(x=x, y=y,
label=label), vjust=0.5, size=3)
theme_null <- theme_update(panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank(),
axis.text.x = theme_blank(),
axis.text.y = theme_blank(),
axis.ticks = theme_blank(),
axis.title.x = theme_blank(),
axis.title.y = theme_blank(),
legend.position = "none")

p <- p + theme_set(theme_null)
print(p)

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Hadley Wickham

unread,
May 16, 2010, 1:46:08 PM5/16/10
to Andrie de Vries, ggplot2
> As a newbie on this mailing list, let me first congratulate Hadley and
> the ggplot community with creating a truly wonderful piece of graphic
> software.

Thanks!

> As part of my work I need to create compelling graphic displays of
> dendrograms, such as the output produced by tree() and hclust().  Much
> to my surprise, a google search for ggplot dendrogram reveals nothing
> useful.
>
> So I decided to write some code to plot regression/classification
> trees, i.e. the output of tree() in library(tree).

I had some code for doing this too, but I think yours does a better
job - thanks! Do you mind if I include it in the next version of
ggplot2?

> I would be interested to know whether there are better ways of doing
> this.  In particular, I created a fortify method for objects of class
> tree.  This creates the data frame for plotting the lines.  But I also
> need two additional data frames, one for the text labels, and one for
> the value labels.  Is there a sensible way of merging the data frames
> in such a way that a subset gets used by geom_segment, and another
> subset by geom_text?

Not that I'm aware of. I'm in the middle of some major rewriting of
ggplot2, but when I come back to smaller problems this summer, I'll
have a think about it.

Hadley


--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
Reply all
Reply to author
Forward
0 new messages