Problems with factors

214 views
Skip to first unread message

Stavros Macrakis

unread,
May 9, 2009, 6:07:25 PM5/9/09
to ggplot2
There seem to be various problems with factors in ggplot....

1) ggplot ignores factor ordering

df <- data.frame(
a=factor(c("r","y","o","l","a"),levels=letters,ordered=T),
b=factor(c("l","s","q","b","a"),levels=letters,ordered=T))

ggplot(df) + geom_point(aes(x=a,y=b))

gives a diagonal line.

2) ggplot does not understand ranges in factors

ggplot(df) + geom_point(aes(x=a,y=b)) +
geom_rect(aes(x=a,y=b),xmin="y",xmax="l",ymin="s",ymax="b")

Error in x - from[1] : non-numeric argument to binary operator

Though it allows using their scaled positions (not their as.integer values):

ggplot(df) + geom_point(aes(x=a,y=b)) +
geom_rect(aes(x=a,y=b),xmin=2,xmax=4,ymin=2,ymax=4)

3) geom_path doesn't work with factors

ggplot(df) + geom_path(aes(x=a,y=b))

produces the plot background, but nothing in it.

hadley wickham

unread,
May 9, 2009, 6:15:29 PM5/9/09
to macr...@alum.mit.edu, ggplot2
On Sat, May 9, 2009 at 5:07 PM, Stavros Macrakis <macr...@gmail.com> wrote:
>
> There seem to be various problems with factors in ggplot....
>
> 1) ggplot ignores factor ordering
>
> df <- data.frame(
>          a=factor(c("r","y","o","l","a"),levels=letters,ordered=T),
>          b=factor(c("l","s","q","b","a"),levels=letters,ordered=T))
>
> ggplot(df) + geom_point(aes(x=a,y=b))
>
> gives a diagonal line.

Works for me - do you have the latest version of ggplot2 installed?

> 2) ggplot does not understand ranges in factors
>
> ggplot(df) +  geom_point(aes(x=a,y=b)) +
> geom_rect(aes(x=a,y=b),xmin="y",xmax="l",ymin="s",ymax="b")

That's a bug, but it's pretty easy to work around:

ggplot(df) + geom_point(aes(x=a,y=b)) +
geom_rect(aes(x=a,y=b,xmin="y",xmax="l",ymin="s",ymax="b"))

> 3) geom_path doesn't work with factors
>
> ggplot(df) +  geom_path(aes(x=a,y=b))
>
> produces the plot background, but nothing in it.

You need to understand the grouping rules:

ggplot(df) + geom_path(aes(x=a,y=b, group = 1))

By default a group is created for each combination of discrete
variables in the plot. 95% of the time this is what you want -
however, when you have discrete position variables you often need to
override this default.

Hadley

--
http://had.co.nz/

hadley wickham

unread,
May 10, 2009, 1:52:56 PM5/10/09
to macr...@alum.mit.edu, ggplot2
>> 2) ggplot does not understand ranges in factors
>>
>> ggplot(df) +  geom_point(aes(x=a,y=b)) +
>> geom_rect(aes(x=a,y=b),xmin="y",xmax="l",ymin="s",ymax="b")
>
> That's a bug, but it's pretty easy to work around:

Actually, that's not a bug - when you set aesthetics they are not
scaled (otherwise colour="green" would not work the way you expect")

The correct specification would be

geom_rect(aes(x=a,y=b,xmin="y",xmax="l",ymin="s",ymax="b"))

Hadley

--
http://had.co.nz/

Stavros Macrakis

unread,
May 10, 2009, 3:22:36 PM5/10/09
to hadley wickham, ggplot2

Ah, I think I understand the scaling reasoning, thanks for the
clarification; sorry for my confusion.

I have been thinking of the mapping argument as specifying things that
vary with the data, but in addition, mapping arguments are specified
in data space, while setting arguments are specified in aesthetic
space.

This is an example where a clearer error message would have helped me, e.g.

"y" is not a legal value for xmin, which must be a numeric
aesthetic-space value

or something.

-s

Stavros Macrakis

unread,
May 10, 2009, 4:10:48 PM5/10/09
to ggplot2
On Sun, May 10, 2009 at 3:22 PM, Stavros Macrakis <macr...@gmail.com> wrote:
> On Sun, May 10, 2009 at 1:52 PM, hadley wickham <h.wi...@gmail.com> wrote:
>> ...Actually, that's not a bug - when you set aesthetics they are not

>> scaled (otherwise colour="green" would not work the way you expect")

This must be only for discrete scales?

After all, geom_rect(aes(xmin=X)) behaves the same as
geom_rect(aes(...),xmin=X).

         -s

hadley wickham

unread,
May 10, 2009, 4:14:48 PM5/10/09
to Stavros Macrakis, ggplot2
On Sun, May 10, 2009 at 3:10 PM, Stavros Macrakis <macr...@alum.mit.edu> wrote:

> On Sun, May 10, 2009 at 3:22 PM, Stavros Macrakis <macr...@gmail.com> wrote:
>> On Sun, May 10, 2009 at 1:52 PM, hadley wickham <h.wi...@gmail.com> wrote:
>>> ...Actually, that's not a bug - when you set aesthetics they are not

>>> scaled (otherwise colour="green" would not work the way you expect")
>
> This must be only for discrete scales?
>
> After all, geom_rect(aes(xmin=X)) behaves the same as
> geom_rect(aes(...),xmin=X).

Yes, because numeric X is meaningful in the context of the plot (you
could argue that this shouldn't be the case as it depends on where the
final conversion to screen pixel coordinates occurs).

Hadley

PS. A lot of your messages are coming through to the mailing list twice.

--
http://had.co.nz/

Stavros Macrakis

unread,
May 10, 2009, 4:28:19 PM5/10/09
to hadley wickham, ggplot2
>>>> ...Actually, that's not a bug - when you set aesthetics they are not
>>>> scaled (otherwise colour="green" would not work the way you expect")
>> This must be only for discrete scales?
>> After all, geom_rect(aes(xmin=X)) behaves the same as
>> geom_rect(aes(...),xmin=X).
>
> Yes, because numeric X is meaningful in the context of the plot (you
> could argue that this shouldn't be the case as it depends on where the
> final conversion to screen pixel coordinates occurs).

Well, by that argument, "Health" or "Auckland" or whatever other value
of a factor/string is just as meaningful in the context of the plot,
isn't it? And is unambiguous as an x/y coordinate.

The problem comes when the aesthetic space and the data space can be
of the same class. Shape, for example, is specified using a number
and colour is specified using a string. (Why don't they have their own
classes? Presumably because of tradition and convenience....)

Arguably xmin in setting context should be in aesthetic space (whether
that is 0.0 - 1.0 as the edges of the plot space or pixel number 1-400
or 0.0-2.3 inches or whatever), but that doesn't seem terribly useful.

-s

hadley wickham

unread,
May 11, 2009, 5:23:51 PM5/11/09
to macr...@alum.mit.edu, ggplot2
On Sun, May 10, 2009 at 3:28 PM, Stavros Macrakis <macr...@gmail.com> wrote:
>>>>> ...Actually, that's not a bug - when you set aesthetics they are not
>>>>> scaled (otherwise colour="green" would not work the way you expect")
>>> This must be only for discrete scales?
>>> After all, geom_rect(aes(xmin=X)) behaves the same as
>>> geom_rect(aes(...),xmin=X).
>>
>> Yes, because numeric X is meaningful in the context of the plot (you
>> could argue that this shouldn't be the case as it depends on where the
>> final conversion to screen pixel coordinates occurs).
>
> Well, by that argument, "Health" or "Auckland" or whatever other value
> of a factor/string is just as meaningful in the context of the plot,
> isn't it? And is unambiguous as an x/y coordinate.

Yes, you could make that argument. However, it isn't the way ggplot2
works currently and it's unlikely to change. Getting categorical
scales right is hard enough as it is.

> The problem comes when the aesthetic space and the data space can be
> of the same class.  Shape, for example, is specified using a number
> and colour is specified using a string. (Why don't they have their own
> classes? Presumably because of tradition and convenience....)

Yes, this is just tradition.

> Arguably xmin in setting context should be in aesthetic space (whether
> that is 0.0 - 1.0 as the edges of the plot space or pixel number 1-400
> or 0.0-2.3 inches or whatever), but that doesn't seem terribly useful.

Actually it would be useful - because then you could easily pin geoms
to positions relative to the plot border. I'm not really sure why it
doesn't work this way already - I'll look into it.

I'm still not sure what to do about geoms with no data (e.g.
geom_rect(xmin = 0, ymin = 0, xmax = 1, ymax = 1). I know I looked
into this a couple of weeks ago and decided it was too hard to make it
work, but maybe I just didn't have a compelling enough reason to make
it work.

Hadley


--
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages