How to get a tick at 0 using scale_x_sqrt

939 views
Skip to first unread message

Woody Setzer

unread,
Jun 26, 2014, 10:27:05 AM6/26/14
to ggplot2
I am plotting count data that is strongly skewed, but include 0's, so I want to use a square root scale on x and y so the small values are not so crowded together. I would like a tick at 0 to make it clear where the bottom of the range is, but scale_x_sqrt refuses to plot one for me. Here is an example set of code:

  ## Construct the test data set
  x <- rlnorm(30)
  x <- (x - min(x)) / max(x) * 700
  y <- rlnorm(30)
  y <- (y - min(y)) / max(y) * 700
  tdta <- data.frame(x=x, y=y)

  ## Plot it naively, accepting defaults:
  p <- ggplot(data=tdta, aes(x=x, y=y)) + geom_point() + scale_x_sqrt() + scale_y_sqrt()
  print(p)
  ## No tick at 0, just at 200, 400, 600.

## If I explicitly add breaks, there is no change.
  p <- ggplot(data=tdta, aes(x=x, y=y)) + geom_point() +
    scale_x_sqrt(breaks=c(0,200,400,600)) +
    scale_y_sqrt(breaks=c(0,200,400,600))
  print(p)

How can I get a tick-mark at 0?

Thanks!

Woody Setzer

Brandon Hurr

unread,
Jun 26, 2014, 11:36:06 AM6/26/14
to Woody Setzer, ggplot2
No idea how but I'll add that the smallest whole number it will accept is 2 and I've got it down to 1.674027. I think it may relate to the padding that ggplot adds to the edges of a plot. You get too close and you're in negative numbers which only my imaginary friends can understand. 

That's my hypothesis anyway. 
--
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

---
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian Diggs

unread,
Jun 26, 2014, 12:27:35 PM6/26/14
to Woody Setzer, ggplot2
Short answer:

You need to add an expand=c(0,0) argument to both the scale_x_sqrt and
scale_y_sqrt calls.

Long answer:

This was tricky. My first few (standard) answers for this (use
expand_limits(x=0, y=0) or setting the limits in the scale call) didn't
work. I decided it has to do with the scale sqrt transformation because
it worked fine with aes(x=sqrt(x), y=sqrt(y)).

By this point, I was using a different test data set:

tdta <- data.frame(x = seq(0,2,by=0.1),
y = seq(0,2,by=0.1))


I finally figured out that what was happening was that when the scale
was expanded (by default, the scale is expanded by 5% of the range in
each direction to give some padding between the extreme data/labels and
the edge of the plot), this was giving a negative number which was then
being inverted back to a positive number (greater than 0) which then put
the 0 break outside the limits.

library("scales")
# specifying limits of 0 to 2
sqrt(c(0,2))
expand_range(sqrt(c(0,2)), 0.05, 0)
# new limits after expanding and transforming back; now excludes 0
expand_range(sqrt(c(0,2)), 0.05, 0)^2

In the transformed space, negative numbers have no meaning and they
should not be being mapped to just their square. Mapping them all to
zero solves this problem, but requires creating a new transformation.

library("scales")
mysqrt_trans <- function() {
trans_new("mysqrt",
transform = base::sqrt,
inverse = function(x) ifelse(x<0, 0, x^2),
domain = c(0, Inf))
}

Then you can specify this transformation be used with

ggplot(data=tdta, aes(x=x, y=y)) +
geom_point() +
scale_x_continuous(trans="mysqrt") +
scale_y_continuous(trans="mysqrt")

which gives a nicer looking graph with the expected padding and breaks
and 0 shows up as a break.

> Thanks!
>
> Woody Setzer

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

Brian Diggs

unread,
Jun 26, 2014, 12:44:24 PM6/26/14
to Brandon Hurr, Woody Setzer, ggplot2
On 6/26/2014 8:36 AM, Brandon Hurr wrote:
> No idea how but I'll add that the smallest whole number it will accept is 2
> and I've got it down to 1.674027. I think it may relate to the padding that
> ggplot adds to the edges of a plot. You get too close and you're in
> negative numbers which only my imaginary friends can understand.
>
> That's my hypothesis anyway.

You were right. It was interaction between the padding (expand argument)
and the scale transformation.

As to why 1.674027, it depends on the range of the original data (which
can differ since the data was randomly generated). In the example given,
the upper bound can't be more than 700. Were it 700, the derivation goes

limits in data scale: 0, 700
limits in transformed scale: 0, 26.457 (square root of 0, 700)
expanded limits in transformed scale: -1.32287, 27.78038 (extending by
5% of the range each way)
expanded limits in data scale: 1.75, 771.75 (squaring previous numbers)

So any breaks less than 1.75 are "outside" the limits and so aren't drawn.

(I have a workaround in a separate answer, but thought you might be
interested as to why such an odd number was the limit).

Woody Setzer

unread,
Jun 26, 2014, 12:49:12 PM6/26/14
to Brandon Hurr, ggplot2
Adding geom_hline(yintercept=0) + geom_vline(xintercept=0) does indeed add the lines, so it looks like it is the axis labeling code that does not like the tick at 0. Maybe I don't understand what you meant by padding.

Woody

Brandon Hurr

unread,
Jun 26, 2014, 12:53:19 PM6/26/14
to Woody Setzer, ggplot2
Brian explained why it was doing it and a fix for you. Would be nice to push that as a bug, but I realize there is little further development planned for ggplot2. 

Woody Setzer

unread,
Jun 26, 2014, 12:59:39 PM6/26/14
to Brian Diggs, Woody Setzer, ggplot2
Thanks! That works. I don't understand why only the axis labels, ticks, and grid lines were affected.  I could plot points and lines on the 0 edge. Why weren't they dropped as well?

Woody Setzer


--
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility

To post: email ggp...@googlegroups.com

More options: http://groups.google.com/group/ggplot2

--- You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+unsubscribe@googlegroups.com.

Brian Diggs

unread,
Jun 26, 2014, 1:13:14 PM6/26/14
to Woody Setzer, ggplot2
On 6/26/2014 9:59 AM, Woody Setzer wrote:
> Thanks! That works. I don't understand why only the axis labels, ticks, and
> grid lines were affected. I could plot points and lines on the 0 edge. Why
> weren't they dropped as well?
>
> Woody Setzer

I'm not sure about that either. I _think_ that it only goes through the
last inverting process prior to break determination, which then leaves
some breaks out. But that deciding what data points to include or
exclude happens on the transformed scale (where 0 is above the negative
lower bound) and so the points at 0 are still displayed. That was one of
the aspects that was confusing me, too.

I am writing up a bug report about it now.

> On Thu, Jun 26, 2014 at 12:27 PM, Brian Diggs <brian.s.diggs-Re5J...@public.gmane.org>

Brian Diggs

unread,
Jun 26, 2014, 2:23:46 PM6/26/14
to ggplot2
On 6/26/2014 10:13 AM, Brian Diggs wrote:
> On 6/26/2014 9:59 AM, Woody Setzer wrote:
>> Thanks! That works. I don't understand why only the axis labels,
>> ticks, and
>> grid lines were affected. I could plot points and lines on the 0
>> edge. Why
>> weren't they dropped as well?
>>
>> Woody Setzer
>
> I'm not sure about that either. I _think_ that it only goes through the
> last inverting process prior to break determination, which then leaves
> some breaks out. But that deciding what data points to include or
> exclude happens on the transformed scale (where 0 is above the negative
> lower bound) and so the points at 0 are still displayed. That was one of
> the aspects that was confusing me, too.
>
> I am writing up a bug report about it now.

https://github.com/hadley/ggplot2/issues/980

>> On Thu, Jun 26, 2014 at 12:27 PM, Brian Diggs
>> <brian.s.diggs-Re5JQEeQqe8Avxt...@public.gmane.org>
Reply all
Reply to author
Forward
0 new messages