Scatterplot label overlap problem

2,656 views
Skip to first unread message

Bill Harris

unread,
Apr 16, 2010, 5:26:41 PM4/16/10
to ggp...@googlegroups.com
I'm trying to create a scatterplot that describes certain subjects by
four attributes. I'll use the two Cartesian axes for the two key
attributes and color for the third. As I want to use the subject
names as entries in the scatterplot, I'll highlight the few I want to
emphasize (the fourth attribute) with the size attribute.

I could use colored or shaped points for each subject and use a
legend to associate the aesthetics with the subject names, but, with
ca. 24 subjects, that keeps the viewer jumping back and forth between
the legend and the graph.

My challenge is to make the graph readable. At readable sizes for
subject names, the names tend to land on top of each other. If I
shrink the size of the names to avoid overlapping, I need a magnifying
glass to make sense of them. I've tried (perhaps unskillfully)
jitter, dodge and stack, to no avail. Is there a clever way to
eliminate the overlap?

I'm open to colored dots with their names floating nearby. The dots
wouldn't need to be jittered much if at all, and the color could make
the association, but I don't know how to ensure the names don't
exhibit any overlap.

Here's a sample command sequence:

q <- ggplot(stats,
aes(x=abscissa,y=ordinate,label=SubjectName,size=factor(FinalAttribute),color=factor(Type)))+scale_size_manual(values=c(2,3))+scale_colour_brewer(type="seq",palette="Set1")
> q+geom_text(vjust=0.5,hjust=-0.25)+geom_point()+scale_x_continuous(limits=c(0,5e6))

I also have had problems with label text running off the plot. I know
I could use subsetting and hjust
(http://stackoverflow.com/questions/1939098/repositioning-scatter-plot-labels-in-ggplot2),
but that seems to make the overlap problem more challenging.
Currently I have expanded scale_x_continuous. Is there a better way?

Any advice is welcome.

Thanks,

Bill
--
Bill Harris


--
You received this message because you are subscribed to the ggplot2 mailing list.
To post to this group, send email to ggp...@googlegroups.com
To unsubscribe from this group, send email to
ggplot2+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/ggplot2

hadley wickham

unread,
Apr 16, 2010, 5:49:04 PM4/16/10
to Bill Harris, ggp...@googlegroups.com
Hi Bill,

Have you tried any of the techniques in the direct labels package?
http://directlabels.r-forge.r-project.org/

Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Bill Harris

unread,
Apr 16, 2010, 7:29:14 PM4/16/10
to hadley wickham, Bill Harris, ggp...@googlegroups.com
On Fri, April 16, 2010 2:49 pm, hadley wickham wrote:
> Hi Bill,
>
> Have you tried any of the techniques in the direct labels package?
> http://directlabels.r-forge.r-project.org/

Hadley,

No; thanks for the pointer. I've installed it and will likely try it out
Monday. I'm curious to see if it can handle ~24 different labeled points
on a scatterplot.

Thanks,

Bill

Bill

unread,
Apr 19, 2010, 2:13:30 PM4/19/10
to ggplot2
On Apr 16, 4:29 pm, "Bill Harris" <bill_har...@facilitatedsystems.com>
wrote:
> On Fri, April 16, 2010 2:49 pm, hadley wickham wrote:
> > Hi Bill,
>
> > Have you tried any of the techniques in the direct labels package?
> >http://directlabels.r-forge.r-project.org/

Okay: I'm stuck.

Here's some sample data:

n <- 24

alist <- c("alpha", "bravo", "charlie",
"delta","echo","foxtrot","golf","hotel","india","juliett","kilo","lima","mike","november","oscar","papa","quebec","romeo","sierra","tango","uniform","victor","whiskey","xray")

testdata <- data.frame(a=runif(n),b=runif(n),c=alist,d=rbinom(n,
1,0.5),e=rbinom(n,3,0.25))

and a try at ggplot:

td1 <-
ggplot(testdata,aes(x=a,y=b,label=c,size=factor(d),colour=factor(e)))
+scale_size_manual(values=c(2,3))
+scale_colour_brewer(type="seq",palette="Set1")

td1a <- td1+geom_text(vjust=0.5,hjust=-0.25)+geom_point()

Depending upon the random numbers you get, some of the data will
overlap. I'm trying to figure out a call to direct.label that spreads
out the labels on the points so that one can read all the labels, but
nothing I've tried so far has seemed to work.

Suggestions?

Dennis Murphy

unread,
Apr 19, 2010, 8:22:44 PM4/19/10
to Bill, ggplot2
Hi:

I don't think directlabels will work in the way you intend it. The examples on its
home page tend to show labels associated with grouping or conditioning variables
rather than labels for individual points. The author also mentions that several of the
functions in his package don't work yet in ggplot2. I'd look at the examples
posted in the package's web page on R-forge and decide for yourself... I may
well be wrong.

I suspect you'll have to make do with geom_text() unless you or someone else
can take the label spacing algorithm from directlabels and port it to be compatible
with geom_text(); that may not be trivial, as it needs to be able to sense point position,
point size, label size, sample size and undoubtedly several other plot characteristics before it
can properly set the coordinate position in the graphics region for the label.

Hope this is of some service,
Dennis

Bill Harris

unread,
Apr 20, 2010, 12:45:50 AM4/20/10
to Dennis Murphy, ggplot2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dennis Murphy <djm...@gmail.com> writes:

> I don't think directlabels will work in the way you intend it. The
> examples on its home page tend to show labels associated with grouping
> or conditioning variables rather than labels for individual
> points. The author also mentions that several of the functions in his
> package don't work yet in ggplot2. I'd look at the examples posted in
> the package's web page on R-forge and decide for yourself... I may
> well be wrong.

Dennis,

Thanks; I was beginning to suspect that.

> I suspect you'll have to make do with geom_text() unless you or
> someone else can take the label spacing algorithm from directlabels
> and port it to be compatible with geom_text(); that may not be
> trivial, as it needs to be able to sense point position, point size,
> label size, sample size and undoubtedly several other plot
> characteristics before it can properly set the coordinate position in
> the graphics region for the label.

I'm not inclined to try modifying the label spacing algorithm; I'll have
to read up on what options geom_text() offers.

Thanks; I only have to come up with something tomorrow. :-)

Bill
- --
Bill Harris http://makingsense.facilitatedsystems.com/
Facilitated Systems Everett, WA 98208 USA
http://www.facilitatedsystems.com/ phone: +1 425 374-1845
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkvNMYIACgkQ3J3HaQTDvd9XGwCbBjEY9lPY/Wp23LOLgAhF902v
UXUAmwdgJLDik02nh+KhjVuue/jEIwh2
=DG8J
-----END PGP SIGNATURE-----

Bill Harris

unread,
Apr 20, 2010, 12:47:25 AM4/20/10
to Dennis Murphy, ggplot2
Dennis Murphy <djm...@gmail.com> writes:

> I don't think directlabels will work in the way you intend it. The
> examples on its home page tend to show labels associated with grouping
> or conditioning variables rather than labels for individual
> points. The author also mentions that several of the functions in his
> package don't work yet in ggplot2. I'd look at the examples posted in
> the package's web page on R-forge and decide for yourself... I may
> well be wrong.

Supposedly labcurve from Hmisc does something similar. Does that work
with ggplot2, and does it do what I'm trying to do? I tried to answer
that by reading the docs, but I'm still uncertain.

Bill
--
Bill Harris http://makingsense.facilitatedsystems.com/
Facilitated Systems Everett, WA 98208 USA
http://www.facilitatedsystems.com/ phone: +1 425 374-1845

hadley wickham

unread,
Apr 21, 2010, 9:55:11 PM4/21/10
to Bill Harris, Dennis Murphy, ggplot2
> Supposedly labcurve from Hmisc does something similar.  Does that work
> with ggplot2, and does it do what I'm trying to do?  I tried to answer
> that by reading the docs, but I'm still uncertain.

I doubt it will be compatible with ggplot2. I guess I was thinking of
http://directlabels.r-forge.r-project.org/smart.html, but it seems
like that doesn't work with ggplot2 (yet).

Unfortunately I don't have any other suggestions apart from manually
positioning the labels.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Bill Harris

unread,
Apr 21, 2010, 11:23:11 PM4/21/10
to hadley wickham, Dennis Murphy, ggplot2
hadley wickham <h.wi...@gmail.com> writes:

> Unfortunately I don't have any other suggestions apart from manually
> positioning the labels.

Thanks, Hadley.

Bill
--
Bill Harris http://makingsense.facilitatedsystems.com/
Facilitated Systems Everett, WA 98208 USA
http://www.facilitatedsystems.com/ phone: +1 425 374-1845

Harlan Harris

unread,
Apr 22, 2010, 11:07:08 AM4/22/10
to ggplot2
I've had pretty good luck saving ggplots to PDFs, then using Inkscape
to move labels and such around manually. Saves a lot of time and
headache versus trying to trick ggplot into doing stuff like that.

-Harlan

On Apr 21, 11:23 pm, Bill Harris <bill_har...@facilitatedsystems.com>
wrote:
> hadley wickham <h.wick...@gmail.com> writes:
> > Unfortunately I don't have any other suggestions apart from manually
> > positioning the labels.
>
> Thanks, Hadley.
>
> Bill
> --
> Bill Harris                  http://makingsense.facilitatedsystems.com/
> Facilitated Systems                              Everett, WA 98208 USAhttp://www.facilitatedsystems.com/              phone: +1 425 374-1845
>
> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> To post to this group, send email to ggp...@googlegroups.com
> To unsubscribe from this group, send email to
> ggplot2+u...@googlegroups.com
> For more options, visit this group athttp://groups.google.com/group/ggplot2

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Bill Harris

unread,
Apr 22, 2010, 10:17:24 PM4/22/10
to Harlan Harris, ggplot2
Harlan Harris <harlan...@gmail.com> writes:

> I've had pretty good luck saving ggplots to PDFs, then using Inkscape
> to move labels and such around manually. Saves a lot of time and
> headache versus trying to trick ggplot into doing stuff like that.

Harlan,

Thanks! That's an interesting idea. Not easily automatable,
unfortunately, which is what I need in this case, but useful in other
cases.

Bill
--
Bill Harris http://makingsense.facilitatedsystems.com/
Facilitated Systems Everett, WA 98208 USA
http://www.facilitatedsystems.com/ phone: +1 425 374-1845

--
You received this message because you are subscribed to the ggplot2 mailing list.
Reply all
Reply to author
Forward
0 new messages