Multiple coloring criteria in barplots

146 views
Skip to first unread message

Domenico Simone

unread,
Nov 24, 2015, 6:22:22 AM11/24/15
to ggplot2

Hi,


I have obtained a dataframe from a phyloseq object with the function psmelt(). The dataframe has this structure:

        OTU Sample Abundance individual visit week                            subtype
630  411903   13-1      9816         13     1    0 Collinsella aerofaciens ATCC 25986
627  411903   15-3      6028         15     3    2 Collinsella aerofaciens ATCC 25986
273 1262850   13-2      2940         13     2    1            Collinsella sp. CAG:166
628  411903   15-2      2837         15     2    1 Collinsella aerofaciens ATCC 25986
625  411903   13-2      2257         13     2    1 Collinsella aerofaciens ATCC 25986
282 1262876   13-1      1860         13     1    0            Eggerthella sp. CAG:298
                    species       genus            family            order          class
630 Collinsella aerofaciens Collinsella Coriobacteriaceae Coriobacteriales Coriobacteriia
627 Collinsella aerofaciens Collinsella Coriobacteriaceae Coriobacteriales Coriobacteriia
273 Collinsella sp. CAG:166 Collinsella Coriobacteriaceae Coriobacteriales Coriobacteriia
628 Collinsella aerofaciens Collinsella Coriobacteriaceae Coriobacteriales Coriobacteriia
625 Collinsella aerofaciens Collinsella Coriobacteriaceae Coriobacteriales Coriobacteriia
282 Eggerthella sp. CAG:298 Eggerthella   Eggerthellaceae   Eggerthellales Coriobacteriia
            phylum superkingdom
630 Actinobacteria     Bacteria
627 Actinobacteria     Bacteria
273 Actinobacteria     Bacteria
628 Actinobacteria     Bacteria
625 Actinobacteria     Bacteria
282 Actinobacteria     Bacteria

I want to make barplots of Abundance values for each Sample using different colors for phylum and, since family is a rank lower than phylum, I want to use different shadings of the related phylum color for different families in the same phylum. After ordering the table by phylum (to have all the families from the same phylum stacked together) with 

otu200.dataframe.orderPhylumFamily = otu200.dataframe[order(otu200.dataframe$phylum,otu200.dataframe$family),]

I have done a ton of attempts to get the plot I want, the closest one being this:

ggplot(otu200.dataframe.orderPhylum, 
        aes(x=Sample, 
            y=Abundance/1e+06, 
            fill=phylum, alpha = factor(family))) +
   geom_bar(stat="identity", 
            position="fill")  +
   facet_grid(. ~ individual, 
              scales="free_x",
              space="free_x") +
   labs(title= "Distribution of phyla (top 200 OTUs)",
        x = "Week", 
        y = "Fraction of reads",
        fill = NULL) +
   scale_alpha_discrete("family") +  range=c(0.5,1)) +
   scale_x_discrete(labels=c('1', '2', '3'))

which gives the following plot:

At this point I'd like to improve my plot in these ways:


- adding breaks between each shading;

- making an effective legend where families are reported with the true color/shading (and possibly grouped by phylum).


Any suggestion?


Thanks


Domenico


Sam

unread,
Nov 24, 2015, 6:44:17 PM11/24/15
to ggplot2
Hi Domenico,

Are you able to post your data in a way that it is easier for us to help you?

Try dput(), make up a simple example or use one of R's built in datasets (iris seems like it might be a good fit here).

Your odds of getting your question answered is much better is someone can immediately load your data into R and start working on the actual problem.

Sam

Crump, Ron

unread,
Nov 26, 2015, 8:46:14 AM11/26/15
to Domenico Simone, ggplot2
Hi Domenico,

First off, can I echo Sam's suggestion that you will get more
help and quicker if you can provide an example for people to
get stuck into.

Having said that, I'm procrastinating over something, so
thought I'd reply.


>I want to make barplots of Abundance values for each
>Sample using different colors for
>phylum and, since family is a rank lower than
>phylum, I want to use different shadings of the related phylum color for
>different families in the same phylum. After ordering the table by
>phylum (to have all the families from the same phylum stacked together)

Without some useful data I'm going to use the diamonds data set
and use color in place of phylum, cut in place of family and
clarity in place of Sample.

>ggplot(otu200.dataframe.orderPhylum,
> aes(x=Sample,
> y=Abundance/1e+06,
> fill=phylum, alpha = factor(family))) +
> geom_bar(stat="identity",
>
> position="fill") +
> facet_grid(. ~ individual,
> scales="free_x",
> space="free_x") +
> labs(title= "Distribution of phyla (top 200 OTUs)",
> x = "Week",
> y = "Fraction of reads",
> fill = NULL) +
> scale_alpha_discrete("family") + range=c(0.5,1)) +
>
> scale_x_discrete(labels=c('1', '2', '3'))

My first thought is that you probably don't really want to
adjust the alpha setting, as that is the transparency rather
than the shade of the colour. I've assumed I'm right on this
and have constructed an answer along those lines.

>At this point I'd like to improve my plot in these ways:
>
>- adding breaks between each shading;

Straightforward, see the code below

>- making an effective legend where families are reported with the true
>color/shading (and possibly grouped by phylum).

Basically gets generated automatically once the phylum:family structure
is set up as you want it. However, you might need to tweak it as there
will be lots of levels. See some emails from last week and ?guide_legend
and ?theme .

And here we goŠ

data(diamonds)
#
# So instead of phylum and family, I'm using color and cut from the
diamonds data,
# with clarity on the x-axis instead of sample.
#
# Extract all values of cut and color
ct<-levels(diamonds$cut)
cl<-levels(diamonds$color)
#
# Seven colours to correspond to the seven levels of color in diamonds
my.colours <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2",
"#D55E00", "#CC79A7")
#
# this bit would need some tweaking as you are unlikely to have
# the same of of families in each phylum: maybe count the levels of
families and then
# use an apply-type function to loop over families within phylum.
#
# Anyway, you want to end up with a vector of phylum:family (colour:cut
here)
# and a corresponding vector of colours
all.colours <- data.frame( colour = rep(cl,times=5), cut = rep(ct,each=7),
value = c( my.colours,

adjustcolor(my.colours,1,0.9,0.9,0.9),

adjustcolor(my.colours,1,0.8,0.8,0.8),

adjustcolor(my.colours,1,0.7,0.7,0.7),

adjustcolor(my.colours,1,0.6,0.6,0.6) ),
stringsAsFactors=FALSE )
#
# Make an interaction column between colour and cut
all.colours$cc<-paste(all.colours$colour,all.colours$cut,sep=":")
# Order by colour
all.colours <- all.colours[order(all.colours$colour),]
#
# Make equivalent interaction column in diamonds data.frame
diamonds$cc<-paste(diamonds$color,diamonds$cut,sep=":")
# and convert to factor with levels in all.colours order
diamonds$cc<-factor(diamonds$cc,levels=all.colours$cc)
#
# now plot
diaplot <- ggplot(diamonds)+
geom_bar(aes(x=clarity,fill=cc),
# colour puts a line around the bars and blocks within bar, size sets the
width
colour='black',size=0.1,position='fill')+
# use the pre-defined colours
scale_fill_manual(values=all.colours$value)
# then you'll want to play with the legend: as you can seethis one is too
long.
# There was an email exchange on this area last week, I think.

So, that was my answer. Now somebody else can tell us how to do it more
elegantly and concisely.


Hope this is helpful.

Ron.

Reply all
Reply to author
Forward
0 new messages