I have a set of nutrient concentrations (NO2_N, NO3_N, NH3_N, orgN),
which are separated into wet and dry season (Season) for each sampling
location (ID) and sampling round (round):
read.csv("http://dl.dropbox.com/u/4236038/nutrients2.csv") -
>nutrients2
#I melted this data and deleted all Nas:
melt_nutrients2 <- melt(nutrients2, id=c("ID", "round", "Season"),
na.rm=TRUE)
#And then I calculated the mean for each nutrient based on seasons and
ID:
cast(melt_nutrients2, ID+Season ~ variable, mean) ->meansbyseason
#I then melted the casted file:
melt_meansbyseason<-melt(meansbyseason, id=c("ID", "Season"))
#I then plotted the data into bar charts with facets:
melt_meansbyseason$ID <- factor(melt_meansbyseason$ID)
ggplot(melt_meansbyseason, aes(x=ID, y=value, fill=Season)) +
geom_bar(stat='identity', position='dodge') + facet_grid(variable ~ .,
scales = "free_y") + scale_x_discrete()
#My problem now is that I would like to add error bars on each bar
representing one standard deviation. In a first attempt I created #a
csv file out of the melt_meansbyseason and added the standard
deviation for each season and ID manually in a new column #called
STDV.
#I then tried to create the bar charts with the following code.
read.csv("http://dl.dropbox.com/u/4236038/nutrientsbyseason_stdv.csv")
-> nutrients_stdv
ggplot(nutrients_stdv, aes(x=ID, y=value, fill=Season)) +
geom_bar(stat='identity', position='dodge') +
geom_errorbar(aes(ymin=value-stdv,ymax=value+stdv), width=2) +
facet_grid(variable ~ ., scales = "free_y")
The following error came up:
Error: Non-continuous variable supplied to scale_y_continuous.
In addition: Warning messages:
1: In Ops.factor(value, stdv) : - not meaningful for factors
2: In Ops.factor(value, stdv) : + not meaningful for factors
My questions would be:
1. Is there a more effective way in that you can calculate one
standard deviation within the original data file "nutrients2.csv" in
R
and then just use ggplot to plot the bar charts with +-1 Stdv?
2. If not how can I use the modified file with the manually
calculated
standard deviation "nutrientsbyseason_stdv.csv" to display the error
bars on my bar charts?
In advance thanks a lot for your suggestions.
Cheers
Martin
On Thu, Feb 04, 2010 at 08:18:52PM -0800, waschbaer wrote:
> ...
>
> My questions would be:
> 1. Is there a more effective way in that you can calculate one
> standard deviation within the original data file "nutrients2.csv" in
> R
> and then just use ggplot to plot the bar charts with +-1 Stdv?
This is the same way I would do it.
> 2. If not how can I use the modified file with the manually
> calculated
> standard deviation "nutrientsbyseason_stdv.csv" to display the error
> bars on my bar charts?
In your nutrients_stdv, stdv is a factor. simply transform it to a
number and it will work.
regards,
stefan
I think plyr is easier to use than multiple rounds of melt + cast:
read.csv("http://dl.dropbox.com/u/4236038/nutrients2.csv") ->nutrients
df <- melt(nutrients, id=c("ID", "round", "Season"), na.rm=TRUE)
df <- ddply(df, .(ID, Season, variable),
summarize, mean = mean(value), sd = sd(value))
> 2. If not how can I use the modified file with the manually
> calculated
> standard deviation "nutrientsbyseason_stdv.csv" to display the error
> bars on my bar charts?
> In advance thanks a lot for your suggestions.
> Cheers
> Martin
>
> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> To post to this group, send email to ggp...@googlegroups.com
> To unsubscribe from this group, send email to
> ggplot2+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/ggplot2
Thanks but what is the code for number in R? I tried numeric but it
didn't work.
regards,
stefan
Hi @all
I am just figuring out how nice and convenient R is for plotting
graphs but I came across a problem, which I wasn't able to solve even
after doing a four-hour research on the net.
I have a set of nutrient concentrations (NO2_N, NO3_N, NH3_N, orgN),
which are separated into wet and dry season (Season) for each sampling
location (ID) and sampling round (round):
read.csv("http://dl.dropbox.com/u/4236038/nutrients2.csv") ->nutrients
I melted this data and deleted all Nas:
melt_nutrients2 <- melt(nutrients2, id=c("ID", "round", "Season"),
na.rm=TRUE)
And then I calculated the mean for each nutrient based on seasons and
ID:
cast(melt_nutrients2, ID+Season ~ variable, mean) ->meansbyseason
I then melted the casted file:
melt_meansbyseason<-melt(meansbyseason, id=c("ID", "Season"))
I then plotted the data into bar charts with facets:
ggplot(melt_meansbyseason, aes(x=ID, y=value, fill=Season)) +
geom_bar(stat='identity', position='dodge') + facet_grid(variable ~ .,
scales = "free_y")
My problem now is that I would like to add error bars on each bar
representing one standard deviation. In a first attempt I created a
csv file out of the melt_meansbyseason and added the standard
deviation for each season and ID manually in a new column called STDV.
I then tried to create the bar charts with the following code.
read.csv("http://dl.dropbox.com/u/4236038/nutrientsbyseason_stdv.csv")
-> nutrients_stdv
ggplot(nutrients_stdv, aes(x=ID, y=value, fill=Season)) +
geom_bar(stat='identity', position='dodge') +
geom_errorbar(aes(ymin=value-stdv,ymax=value+stdv), width=2) +
facet_grid(variable ~ ., scales = "free_y")
The following error came up:
Error: Non-continuous variable supplied to scale_y_continuous.
In addition: Warning messages:
1: In Ops.factor(value, stdv) : - not meaningful for factors
2: In Ops.factor(value, stdv) : + not meaningful for factors
My questions would be:
1. Is there a more effective way in that you can calculate one
standard deviation within the original data file "nutrients2.csv" in R
and then just use ggplot to plot the bar charts with +-1 Stdv?
2. If not how can I use the modified file with the manually calculated
standard deviation "nutrientsbyseason_stdv.csv" to display the error
bars on my bar charts?
In advance thanks a lot for your suggestions.
Cheers
Martin
First of all I would like to thank you for your extended comments to
my problem. I really appreciate that.
I agree that the approach I took using barcharts plus error bars on
top of them seems to be not really appropriate for the data I have. I
also read Frank Harrel's comment on dynamite plots as they call
barcharts with error bars. Harrel is right when he says that bar
charts cover a lot of space in the graph but you can't really see what
the spread of the data is and how many measurements were taken.They
simply can't show the detail of the data set.
All that made me think of using geom_point() for the illustration of
the entire data set (no means).
I added one more information to the original data set (catchment ID)
and separated the groundwater sampling locations from the surface
water sampling locations in order to avoid visual distortion of the
concentrations as in the groundwater data set the concentrations were
generally higher. In this example I am just using the surface water
sampling sites (sw). My data set (nutrients_new_sw) now includes:
Sampling ID ("ID), sampling round ("round"), Season ("Season"; wet and
dry), catchment ("catchment"; I have two subcatchments), and nutrient
concentrations (NO2-N...).
I melted the file, kept "ID", "round", "Season", and "catchment" and
removed all NAs. I then re-ordered the sampling IDs from upstream to
downstream location (left to right on the graphs) to reflect spatial
trends in the subcatchments. The last step was the creation of the
plot showing measured concentrations for nutrients for each sampling
location from upstream to downstream separated by catchments (1,2).
I my opinion this graph now shows the distribution of the data and the
spatial and seasonal differences between the sampling locations much
clearer. What do you think?
Furthermore, I have some outliers in 18 and 39, which are distorting
the graph, my question is: Is there a way to make the y-scale for each
nutrient somehow not continuous but stretch more the lower ranges to
show the variations in there and then with a big step just showing the
outliers at the top of the scale? For example: PO4-P conentrations are
generally very low (usualle below 1), but there are a couple of higher
concentrations with the highest being 2.34. This is the outlier in the
graph that is distorting the whole graph for PO4-P.
Thanks a lot for your great help and comments...Martin
Below is the code:
read.csv("http://dl.dropbox.com/u/4236038/new_files/
nutrients_new_sw.csv") -> nutrients_sw
melt_nutrients_sw <- melt(nutrients_sw, id=c("ID", "round", "Season",
"catchment"),
na.rm=TRUE)
melt_nutrients_sw$ID <- factor(melt_nutrients_sw$ID,
levels=c(5,9,10,12,15,14,17,18,19,22,20,1,4,54,50,49,48,43,46,39,38,37,32,33,35,21,23))
p <- ggplot(melt_nutrients_sw, aes(x = ID, y = value, color =
Season))
p + geom_point() + facet_grid(variable~catchment, scales = "free")
> > ggplot2+u...@googlegroups.com<ggplot2%2Bunsu...@googlegroups.com >
You could try adding + scale_y_log10(), I tried that and it gets a little squashed in the number of labels but have a look at the bottom of
http://had.co.nz/ggplot2/scale_continuous.html
for some other transforms. Of course log axis spread out the points at the lower part of the range, which actually obscures the range unless you make the log scale abundantly clear ...
--J