setting minimum value for displaying bins in geom_hex and geom_bin2d

1,676 views
Skip to first unread message

Corey N

unread,
Jun 20, 2014, 1:05:17 AM6/20/14
to ggp...@googlegroups.com

I have plotted some data using geom_hex and geom_bin2d, which are fantastic. In short, I am plotting the locations, by depth and latitude, at which fishing vessels deployed fishing gear.

 

Unfortunately I cannot publically display bins that contain values less than 3 because of legal reasons.

  

I have tried the following to mask the bins with less than 3 records:

 

1) using scale_fill_continuous to set the low value color to white, yet this masks more bins than necessary

 

2) using scale_fill_continuous to limit the values to those greater than 3, yet this appears to turn those cells with fewer than 3 records to a dark gray (which makes them even more visible).

 

3) using scale_fill_discrete, which produces an error.

 

I see that geom_bin2d has a drop argument for zero values. I'm after the same functionality but I want to drop cells where the count is less than 3.

 

I believe I could use the hexbin package to eliminate cells with fewer than 3 records yet I'd prefer to find a solution in ggplot2.

 

I'm relatively inexperienced in R and ggplot2, so apologies if this question is vague. Suggestions would be very much appreciated. My code is pasted below. I could not easily figure out how to generate a fake dataset to simulate the same circumstances.

 

lat.depth <- ggplot(data, aes(y = AVG_LAT, x = AVG_DEPTH,)

lat.depth + geom_hex(bins = 20) +

   scale_fill_continuous(limits = c(3, 500))

Ben Bond-Lamberty

unread,
Jun 21, 2014, 6:11:22 AM6/21/14
to Corey N, ggplot2
Here's a solution, but if anyone understands why my first attempt
(using stat='identity', see below) doesn't work, I'd appreciate
knowing. Anyway, hope this works for you.

library(ggplot2)
library(hexbin)
df <- data.frame(x=runif(1000),y=runif(1000)) # sample data

# pre-summarise the data
hexinfo <- hexbin(df$x,df$y,xbins=10)
df_hex <- data.frame(x=hexinfo@xcm,y=hexinfo@ycm,count=hexinfo@count)

# suppress too-small values and plot
ggplot(subset(df_hex,count>3),aes(x,y,fill=count))+geom_hex(stat="identity")

# I don't understand why this doesn't work. The hexagons are actually being
# plotted (try color='red' to see them) but are very tiny; something about the
# scale is screwed up. geom_hex bug? My mistake?

# Different approach - map hex cell counts back to original data
hexinfo <- hexbin(df$x,df$y,xbins=10,IDs=TRUE)
df$cell <- hexinfo@cID
df$hexcount <- hexinfo@count[df$cell]

ggplot(subset(df,hexcount>2),aes(x,y))+geom_hex() # compare these two plots
ggplot(subset(df,hexcount>4),aes(x,y))+geom_hex()

Ben
> --
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ggplot2+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Brian Diggs

unread,
Jun 23, 2014, 3:50:01 PM6/23/14
to Ben Bond-Lamberty, Corey N, ggplot2
See inline in several places.

On 6/21/2014 3:11 AM, Ben Bond-Lamberty wrote:
> Here's a solution, but if anyone understands why my first attempt
> (using stat='identity', see below) doesn't work, I'd appreciate
> knowing. Anyway, hope this works for you.
>
> library(ggplot2)
> library(hexbin)
> df <- data.frame(x=runif(1000),y=runif(1000)) # sample data
>
> # pre-summarise the data
> hexinfo <- hexbin(df$x,df$y,xbins=10)
> df_hex <- data.frame(x=hexinfo@xcm,y=hexinfo@ycm,count=hexinfo@count)
>
> # suppress too-small values and plot
> ggplot(subset(df_hex,count>3),aes(x,y,fill=count))+geom_hex(stat="identity")
>
> # I don't understand why this doesn't work. The hexagons are actually being
> # plotted (try color='red' to see them) but are very tiny; something about the
> # scale is screwed up. geom_hex bug? My mistake?

The problem here is how you extract the x/y coefficients of the hexagons
from hexinfo. xcm and ycm are the center-of-mass of the data in the
hexagon, not the center of the hexagon. Thus those points are not on a
hexagonal grid. When geom_hex goes to plot them, it determines what size
hex grid will put those points on in a regular hexagonal grid. It is a
fine grid that will do that. Here is the proper way to extract the
center of the hexagons for ggplot to use:

df_hex2 <- data.frame(hcell2xy(hexinfo), count=hexinfo@count)

# now this works

ggplot(df_hex2,aes(x,y,fill=count)) +
geom_hex(stat="identity") +


> # Different approach - map hex cell counts back to original data
> hexinfo <- hexbin(df$x,df$y,xbins=10,IDs=TRUE)
> df$cell <- hexinfo@cID
> df$hexcount <- hexinfo@count[df$cell]
>
> ggplot(subset(df,hexcount>2),aes(x,y))+geom_hex() # compare these two plots
> ggplot(subset(df,hexcount>4),aes(x,y))+geom_hex()
>
> Ben
>
> On Fri, Jun 20, 2014 at 1:05 AM, Corey N <nilescbn-Re5JQE...@public.gmane.org> wrote:
>> I have plotted some data using geom_hex and geom_bin2d, which are fantastic.
>> In short, I am plotting the locations, by depth and latitude, at which
>> fishing vessels deployed fishing gear.
>>
>> Unfortunately I cannot publically display bins that contain values less than
>> 3 because of legal reasons.
>>
>> I have tried the following to mask the bins with less than 3 records:
>>
>> 1) using scale_fill_continuous to set the low value color to white, yet this
>> masks more bins than necessary

You could use scale_fill_gradientn to make everything below 3 the same
color (setting breaks a 0, 3, and whatever the max is and mapping those
points to the colors white, white, and blue(?)). However, see the next
point for a better alternative.

>> 2) using scale_fill_continuous to limit the values to those greater than 3,
>> yet this appears to turn those cells with fewer than 3 records to a dark
>> gray (which makes them even more visible).

This is the right approach, but with one additional step: set the
missing value to clear.

scale_fill_continuous(lim=c(3,17), na.value=NA)

Unfortunately, if you set the lower limit, you have to also set the
upper limit; there is no way to say to set it to whatever it would have
been.

>> 3) using scale_fill_discrete, which produces an error.

Right. Because it is continuous.

>> I see that geom_bin2d has a drop argument for zero values. I'm after the
>> same functionality but I want to drop cells where the count is less than 3.
>>
>> I believe I could use the hexbin package to eliminate cells with fewer than
>> 3 records yet I'd prefer to find a solution in ggplot2.
>>
>> I'm relatively inexperienced in R and ggplot2, so apologies if this question
>> is vague. Suggestions would be very much appreciated. My code is pasted
>> below. I could not easily figure out how to generate a fake dataset to
>> simulate the same circumstances.
>>
>>
>>
>> lat.depth <- ggplot(data, aes(y = AVG_LAT, x = AVG_DEPTH,)
>>
>> lat.depth + geom_hex(bins = 20) +
>>
>> scale_fill_continuous(limits = c(3, 500))
>>

Here is an example using all the approaches:

library("ggplot2")
library("hexbin")
set.seed(123)
# sample data, following Ben
df <- data.frame(x=runif(1000),y=runif(1000))

# The basic plot, using just ggplot. Setting the sizes using binwidth
# rather than bins because it will be easier this way to match
# the other approaches.
# Adding a text annotation so that we can verify exactly what is in
# each hex. Also setting coord_equal since this example is a square
# grid and this will keep proportions the same. Both of these are
# there for showing what is going on (and making sure that the right
# thing is happening) and would not (necessarily) be part of the
# final plot.

ggplot(df, aes(x, y, fill = ..count..)) +
geom_hex(bins=10) +
geom_text(aes(label=..count..), stat="binhex", binwidth=c(0.1,0.1),
colour="white") +
coord_equal()

# pre-summarise the data
hexinfo <- hexbin(df$x, df$y, xbins=10, xbnds=c(0,1), ybnds=c(0,1))
# Extract the _centers_ of the hexagons as the x and y coordinates
df_hex <- data.frame(hcell2xy(hexinfo), count=hexinfo@count)

# This is identical to the previous plot; Before using this technique
# to subset, I want to make sure I can reproduce without the subset.
ggplot(df_hex,aes(x,y,fill=count)) +
geom_hex(stat="identity") +
geom_text(aes(label=count), colour="white") +
coord_equal()

# suppress too-small values and plot
ggplot(subset(df_hex,count>3),aes(x,y,fill=count)) +
geom_hex(stat="identity") +
geom_text(aes(label=count), colour="white") +
coord_equal()


# However, we do not need to pre-summarize the data. Use the
# scale to enforce the limit. The downside to this is that
# the upper limit must also be specified (manually). Note
# setting na.value in addition to setting the limits.
ggplot(df, aes(x, y, fill = ..count..)) +
geom_hex(bins=10) +
geom_text(aes(label=..count..), stat="binhex", binwidth=c(0.1,0.1),
colour="white") +
scale_fill_continuous(lim=c(3,17), na.value=NA) +
coord_equal()

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
Reply all
Reply to author
Forward
0 new messages