See inline in several places.
On 6/21/2014 3:11 AM, Ben Bond-Lamberty wrote:
> Here's a solution, but if anyone understands why my first attempt
> (using stat='identity', see below) doesn't work, I'd appreciate
> knowing. Anyway, hope this works for you.
>
> library(ggplot2)
> library(hexbin)
> df <- data.frame(x=runif(1000),y=runif(1000)) # sample data
>
> # pre-summarise the data
> hexinfo <- hexbin(df$x,df$y,xbins=10)
> df_hex <- data.frame(x=hexinfo@xcm,y=hexinfo@ycm,count=hexinfo@count)
>
> # suppress too-small values and plot
> ggplot(subset(df_hex,count>3),aes(x,y,fill=count))+geom_hex(stat="identity")
>
> # I don't understand why this doesn't work. The hexagons are actually being
> # plotted (try color='red' to see them) but are very tiny; something about the
> # scale is screwed up. geom_hex bug? My mistake?
The problem here is how you extract the x/y coefficients of the hexagons
from hexinfo. xcm and ycm are the center-of-mass of the data in the
hexagon, not the center of the hexagon. Thus those points are not on a
hexagonal grid. When geom_hex goes to plot them, it determines what size
hex grid will put those points on in a regular hexagonal grid. It is a
fine grid that will do that. Here is the proper way to extract the
center of the hexagons for ggplot to use:
df_hex2 <- data.frame(hcell2xy(hexinfo), count=hexinfo@count)
# now this works
ggplot(df_hex2,aes(x,y,fill=count)) +
geom_hex(stat="identity") +
> # Different approach - map hex cell counts back to original data
> hexinfo <- hexbin(df$x,df$y,xbins=10,IDs=TRUE)
> df$cell <- hexinfo@cID
> df$hexcount <- hexinfo@count[df$cell]
>
> ggplot(subset(df,hexcount>2),aes(x,y))+geom_hex() # compare these two plots
> ggplot(subset(df,hexcount>4),aes(x,y))+geom_hex()
>
> Ben
>
> On Fri, Jun 20, 2014 at 1:05 AM, Corey N <
nilescbn-Re5JQE...@public.gmane.org> wrote:
>> I have plotted some data using geom_hex and geom_bin2d, which are fantastic.
>> In short, I am plotting the locations, by depth and latitude, at which
>> fishing vessels deployed fishing gear.
>>
>> Unfortunately I cannot publically display bins that contain values less than
>> 3 because of legal reasons.
>>
>> I have tried the following to mask the bins with less than 3 records:
>>
>> 1) using scale_fill_continuous to set the low value color to white, yet this
>> masks more bins than necessary
You could use scale_fill_gradientn to make everything below 3 the same
color (setting breaks a 0, 3, and whatever the max is and mapping those
points to the colors white, white, and blue(?)). However, see the next
point for a better alternative.
>> 2) using scale_fill_continuous to limit the values to those greater than 3,
>> yet this appears to turn those cells with fewer than 3 records to a dark
>> gray (which makes them even more visible).
This is the right approach, but with one additional step: set the
missing value to clear.
scale_fill_continuous(lim=c(3,17), na.value=NA)
Unfortunately, if you set the lower limit, you have to also set the
upper limit; there is no way to say to set it to whatever it would have
been.
>> 3) using scale_fill_discrete, which produces an error.
Right. Because it is continuous.
>> I see that geom_bin2d has a drop argument for zero values. I'm after the
>> same functionality but I want to drop cells where the count is less than 3.
>>
>> I believe I could use the hexbin package to eliminate cells with fewer than
>> 3 records yet I'd prefer to find a solution in ggplot2.
>>
>> I'm relatively inexperienced in R and ggplot2, so apologies if this question
>> is vague. Suggestions would be very much appreciated. My code is pasted
>> below. I could not easily figure out how to generate a fake dataset to
>> simulate the same circumstances.
>>
>>
>>
>> lat.depth <- ggplot(data, aes(y = AVG_LAT, x = AVG_DEPTH,)
>>
>> lat.depth + geom_hex(bins = 20) +
>>
>> scale_fill_continuous(limits = c(3, 500))
>>
Here is an example using all the approaches:
library("ggplot2")
library("hexbin")
set.seed(123)
# sample data, following Ben
df <- data.frame(x=runif(1000),y=runif(1000))
# The basic plot, using just ggplot. Setting the sizes using binwidth
# rather than bins because it will be easier this way to match
# the other approaches.
# Adding a text annotation so that we can verify exactly what is in
# each hex. Also setting coord_equal since this example is a square
# grid and this will keep proportions the same. Both of these are
# there for showing what is going on (and making sure that the right
# thing is happening) and would not (necessarily) be part of the
# final plot.
ggplot(df, aes(x, y, fill = ..count..)) +
geom_hex(bins=10) +
geom_text(aes(label=..count..), stat="binhex", binwidth=c(0.1,0.1),
colour="white") +
coord_equal()
# pre-summarise the data
hexinfo <- hexbin(df$x, df$y, xbins=10, xbnds=c(0,1), ybnds=c(0,1))
# Extract the _centers_ of the hexagons as the x and y coordinates
df_hex <- data.frame(hcell2xy(hexinfo), count=hexinfo@count)
# This is identical to the previous plot; Before using this technique
# to subset, I want to make sure I can reproduce without the subset.
ggplot(df_hex,aes(x,y,fill=count)) +
geom_hex(stat="identity") +
geom_text(aes(label=count), colour="white") +
coord_equal()
# suppress too-small values and plot
ggplot(subset(df_hex,count>3),aes(x,y,fill=count)) +
geom_hex(stat="identity") +
geom_text(aes(label=count), colour="white") +
coord_equal()
# However, we do not need to pre-summarize the data. Use the
# scale to enforce the limit. The downside to this is that
# the upper limit must also be specified (manually). Note
# setting na.value in addition to setting the limits.
ggplot(df, aes(x, y, fill = ..count..)) +
geom_hex(bins=10) +
geom_text(aes(label=..count..), stat="binhex", binwidth=c(0.1,0.1),
colour="white") +
scale_fill_continuous(lim=c(3,17), na.value=NA) +
coord_equal()
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University