On 7/16/2014 8:58 AM, eipi10 wrote:
> I was trying to answer a ggplot question on Stack Overflow
> <
http://stackoverflow.com/questions/24769934/ggplot-bar-chart-with-two-dataframes?noredirect=1>
> and came up with a result I couldn't explain. I'm hoping someone here will
> have the answer.
I've added an answer to the question there, but I wanted to give more of
a discussion of what is happening here.
> The questioner wanted to plot the following two data frames separately, but
> without having the bars overlap:
>
> x <- data.frame(dat=rep(seq(1,4),3),let=rep("X"))
> y <- data.frame(dat=rep(seq(1,4),4),let=rep("y"))
> He tried the following, but the bars are right on top of each other:
>
> ggplot(NULL,aes(dat))+
> geom_bar(data=y,fill="red",width=0.1,position = "dodge")+
> geom_bar(data=x,fill="blue",width=0.1,position = "dodge")
>
> I thought the following code would solve the problem:
>
> ggplot() +
> geom_bar(data=y, aes(dat - 0.05), fill="red", width=0.2) +
> geom_bar(data=x, aes(dat + 0.05), fill="blue", width=0.2)
>
> Although my code results in the bars being separated, they are not
> positioned symmetrically on either side of each major tick mark, but
> are shifted by different amounts at different tick marks. Can anyone
> explain what's going on and whether there is a way to control bar
> placement relative to each tick mark?
Realize what you are doing here is shifting the data *before* it is
binned. Now, this shift is enough to move it to the next bin over, but
you are not really putting two bins at the same value next to each
other; you are moving the data into different (adjacent) bins. The
visual effect is about the same, though.
What you are seeing with the shifting around the breaks is that the bins
are not what you think (I'm guessing you think they are 1, 2, 3, 4), but
rather 30 bins spread across the range of x values.
Consider your example
ggplot(NULL,aes(dat))+
geom_bar(data=y, fill="red", position = "dodge") +
geom_bar(data=x, fill="blue", position = "dodge")
The bins on this are
xmin xmax
1 0.9 1.0
2 1.0 1.1
3 1.1 1.2
4 1.2 1.3
5 1.3 1.4
6 1.4 1.5
7 1.5 1.6
8 1.6 1.7
9 1.7 1.8
10 1.8 1.9
11 1.9 2.0
12 2.0 2.1
13 2.1 2.2
14 2.2 2.3
15 2.3 2.4
16 2.4 2.5
17 2.5 2.6
18 2.6 2.7
19 2.7 2.8
20 2.8 2.9
21 2.9 3.0
22 3.0 3.1
23 3.1 3.2
24 3.2 3.3
25 3.3 3.4
26 3.4 3.5
27 3.5 3.6
28 3.6 3.7
29 3.7 3.8
30 3.8 3.9
31 3.9 4.0
32 4.0 4.1
The data all fall in the 2nd, 12th, 22nd, and 32nd bins. And the bins
line up nicely on the round values. That is because the data is from 1
to 4, so the range is 3, divided into 30 bins give a binwidth of 0.1.
And if you put a bin boundary at 0 and then take the relevant ones (and
some padding), you get these breakpoints. But when you shift the data
before binning
ggplot() +
geom_bar(data=y, aes(dat - 0.05), fill="red", width=0.2) +
geom_bar(data=x, aes(dat + 0.05), fill="blue", width=0.2)
the bins are then
xmin xmax
1 0.8266667 0.930000
2 0.9300000 1.033333
3 1.0333333 1.136667
4 1.1366667 1.240000
5 1.2400000 1.343333
6 1.3433333 1.446667
7 1.4466667 1.550000
8 1.5500000 1.653333
9 1.6533333 1.756667
10 1.7566667 1.860000
11 1.8600000 1.963333
12 1.9633333 2.066667
13 2.0666667 2.170000
14 2.1700000 2.273333
15 2.2733333 2.376667
16 2.3766667 2.480000
17 2.4800000 2.583333
18 2.5833333 2.686667
19 2.6866667 2.790000
20 2.7900000 2.893333
21 2.8933333 2.996667
22 2.9966667 3.100000
23 3.1000000 3.203333
24 3.2033333 3.306667
25 3.3066667 3.410000
26 3.4100000 3.513333
27 3.5133333 3.616667
28 3.6166667 3.720000
29 3.7200000 3.823333
30 3.8233333 3.926667
31 3.9266667 4.030000
32 4.0300000 4.133333
33 4.1333333 4.236667
This is consistent. The data go from 0.95 to 4.05, a range of 3.1.
Divide that into 30 bins for a binwidth of 0.1033333. All these
breakpoints are multiples of that binwidth. But that doesn't line them
nicely up on the integers.
Note that this also relies on the shift you are giving (0.05) is less
than the computed binwidth (0.103333) so that some data is shifted down
(no more than) one bin and the other data is not shifted out of its
original bin.
> I realize the "standard" ggplot solution would be to rbind the two
> data frames and use a fill aesthetic to get the two separate bars
> properly dodged (which is discussed in the SO question). However, if,
> for some reason, you want to maintain two separate data frames, is
> there some way to control bar placement when plotting the two data
> frames with separate calls to geom_bar?
The answer I gave at stackoverflow was
ggplot(mapping=aes(x=dat))+
geom_bar(data=y, aes(x=dat-0.1), fill="red", binwidth=0.1)+
geom_bar(data=x, fill="blue", binwidth=0.1)
This makes specific used of the shifting of data into different bins and
is explicit about the binwidth. One set of data is shifted one binwidth
(0.1), the bars are guaranteed to be separated. For this to work, the
binwidth must be smaller than the separation between unique values and
the values need to be multiples of the binwidth. If the data is not
quasi-discrete, this approach will not work, since there won't be empty
bins to shift the other data set(s) into. It will, however, generalize
to more than 2 sets (with the shifts being multiples of the binwidth),
so long as the product of the number of set and the binwidth is still
less than the separation between values.
> Thanks,
> Joel
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University