[R] Lattice Histogram with Normal Curve - Y axis as percentages

87 views
Skip to first unread message

jimdare

unread,
May 5, 2014, 4:23:23 PM5/5/14
to r-h...@r-project.org
Hello,

This may seem like a simple problem, but it's frustrating me immensely. I'm
trying to overlay a normal curve (dnorm) on top of a histogram using the
code below. This works find when the type = "density", but the person for
whom I'm making the plot wants the y axis in percent of total rather than
density. When I change type to "percent", I get the histogram scale I'm
after, but the dnorm plot is greatly reduced. How could I scale the density
plot to the percent of total axis. Alternatively, perhaps there is a way to
add density to a secondary y axis?

Thanks in advance for your help.

Jimdare


plot<-histogram(~rdf[,j]|Year,nint=20, data=rdf,main = i,strip =
my.strip,xlab = j,
type = "percent",layout=c(2,1),
panel=function(x, ...) {
panel.histogram(x, ...)

panel.mathdensity(dmath=dnorm, col="black",
# Add na.rm = TRUE to mean() and sd()
args=list(mean=mean(x, na.rm = TRUE),
sd=sd(x, na.rm = TRUE)), ...)
})



--
View this message in context: http://r.789695.n4.nabble.com/Lattice-Histogram-with-Normal-Curve-Y-axis-as-percentages-tp4690000.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Duncan Mackay

unread,
May 6, 2014, 1:27:19 AM5/6/14
to R, jimdare
Hi Jim

Without going to the histogram function to study it in detail it appears
when there is density used in the histogram function the data has to be in
the original scale to to send to the panel function.

As you example is not reproducible use the singer data to get the ylimits
for the histogram and density plot separately

For density
d <-
histogram( ~ height | voice.part, data = singer,
xlab = "Height (inches)", type = "density",
panel = function(x, ...) {
# panel.histogram(x, ...)
panel.abline(v= 70)
panel.mathdensity(dmath = dnorm, col = "black",
args = list(mean=mean(x),sd=sd(x)))
} )

d$y.limits ...

and the same for the histogram with type = percent

if you divide the last by the former there is a difference of 192 (ie
scaling factor)

Using the full dataset as it is quicker to demonstrate

histogram( ~ height, data = singer,
type = "percent", border = "transparent", col.line = "grey60",
xlab = "Height (inches)",
ylab = "Density Histogram\n with Normal Fit" )

trellis.focus("panel", 1, 1, clip.off=F, highlight = FALSE)
llines(density(singer$height)$x, density(singer$height)$y*192)
trellis.unfocus()

This will give you the idea that it is possible. y axes labels etc are all
funny now

By using a user defined panel function plotting the histogram and the
density output plotted as panel.lines after scaling within the function you
should be able to get a plot
It may be easier to start off with an xyplot to send the data to the panel
function as you will need 2 types of data as original and that for the
histogram.
Whether you need to use panel.groups is another mater

HTH

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au

peter dalgaard

unread,
May 6, 2014, 3:20:18 AM5/6/14
to jimdare, r-h...@r-project.org
This illustrates why you really do not want percents... (I never quite understand why people do want them - I can understand raw counts wheh used in teaching as a precursor to the concept of a density, but percentages is an odd in-between sort of thing.)

Anyways, the scaling factor is the bin width (you figure out whether to multiply or divide, I get it wrong every second time), possibly multiplied by 100%. A pragmatic way out would seem to be switch out dnorm with dnorm_scaled <- function(...) scale*dnorm(...) in the call to panel.mathdensity.

-Peter D
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk Priv: PDa...@gmail.com

jimdare

unread,
May 7, 2014, 12:53:06 AM5/7/14
to r-h...@r-project.org
Thanks for your help Duncan and Peter! I ended up using a combination of
your suggestions. I used Duncan's y.limits ration of two plots with
differing types (percent and density), to provide me with a scale variable.
I then used this in Peter's dnorm_scaled function and called it using
panel.mathdensity. See below for my amended code.

Regards,
Jim


x1<-histogram(~rdf[,j]|Year,nint=20, data=rdf,main = i,strip = my.strip,xlab
= j,
type = "density",layout=c(2,1))

x2<-histogram(~rdf[,j]|Year,nint=20, data=rdf,main = i,strip = my.strip,xlab
= j,
type = "percent",layout=c(2,1))

scale <- x2$y.limits/x1$y.limits

dnorm_scaled <- function(...){ scale[1]*dnorm(...)}

histogram(~rdf[,j]|Year,nint=20, data=rdf,main = i,strip = my.strip,xlab =
j,
type = "percent",layout=c(2,1),
panel=function(x, ...) {

panel.histogram(x, ...)

panel.mathdensity(dmath=dnorm_scaled, col="black",
# Add na.rm = TRUE to mean() and sd()
args=list(mean=mean(x, na.rm = TRUE),
sd=sd(x, na.rm = TRUE)), ...)

})



--
View this message in context: http://r.789695.n4.nabble.com/Lattice-Histogram-with-Normal-Curve-Y-axis-as-percentages-tp4690000p4690093.html

Duncan Mackay

unread,
May 9, 2014, 9:19:21 PM5/9/14
to R, jimdare
Just an afterthought if any one really needs to do it again.

The crux of the matter is the different limits in the 2 plot x and y scales.
It may be easier to accomplish this with a prepanel function to get the
limits eg panel.loess

Duncan


-----Original Message-----
From: r-help-...@r-project.org [mailto:r-help-...@r-project.org] On
Behalf Of jimdare
Reply all
Reply to author
Forward
0 new messages