Colors in PCA/Hierarchical Clustering

119 views
Skip to first unread message

shanno...@gmail.com

unread,
Aug 28, 2013, 8:01:01 AM8/28/13
to methylkit_...@googlegroups.com
This is a silly question that I should be able to figure out on my own, but I figure asking will speed up the process since I've hit a wall in trying to figure it out.

I've pulled out the SampleClustering/PCA code and have altered it so that I can change the colors used in the plots; however, is there a simple way to change the colors used on the plots within the package? 

Thanks!

Altuna Akalin

unread,
Aug 28, 2013, 8:09:54 AM8/28/13
to methylkit_...@googlegroups.com
There is no easy way, colors are decided based on number of groups in the treatment vector, and selected from rainbow() palatte. 

It shouldn't be so hard to add an additional argument that takes a colorPalatte as input. If you have such a function that is working robustly we can add it to the next release and cite your contribution.

Best,
Altuna


--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/groups/opt_out.

shanno...@gmail.com

unread,
Aug 29, 2013, 10:02:45 AM8/29/13
to methylkit_...@googlegroups.com
I don't necessarily have a terribly robust method, but it's working for me. I've simply added my.cols as an argument to the function. So, the default works as you have it now (using the rainbow pallete), but you can set the argument my.cols to whichever colors/pallete you'd like. 

So, for 'clusterSamples', I call the function (for example):
clusterSamples(Object, dist="correlation", method="ward", plot=TRUE,my.cols=c("red","blue"))

with the underlying code modified as such (in red)

# cluster function on matrix and return hierarchical plot
# x matrix each column is a sample
# dist.method method to get the distance between samples
# hclust.method the agglomeration method to be used
# plot if TRUE, plot the hierarchical clustering
.cluster=function(x, dist.method="correlation", hclust.method="ward", plot=TRUE,
                  treatment=treatment,sample.ids=sample.ids,context, my.cols){
  DIST.METHODS <- c("correlation", "euclidean", "maximum", "manhattan", "canberra", 
        "binary", "minkowski")
  dist.method <- pmatch(dist.method, DIST.METHODS)

  HCLUST.METHODS <- c("ward", "single", "complete", "average", "mcquitty", 
        "median", "centroid")
  hclust.method <- pmatch(hclust.method, HCLUST.METHODS)
  if (is.na(hclust.method)) 
    stop("invalid clustering method")
  if (hclust.method == -1) 
    stop("ambiguous clustering method")

  if(DIST.METHODS[dist.method] == "correlation")
    d = .dist.cor(t(x))
  else
    d=dist(scale(t(x)), method=DIST.METHODS[dist.method]);
  
  hc=hclust(d, HCLUST.METHODS[hclust.method]);
  
  if(plot){
    #plclust(hc,hang=-1, main=paste("CpG dinucleotide methylation clustering\nDistance method: ",
    #                               DIST.METHODS[dist.method],sep=""), xlab = "Samples");
    # plot
    treatment=treatment
    sample.ids=sample.ids
col.list=as.list(my.cols[treatment+1])
    names(col.list)=sample.ids

    colLab <- function(n,col.list)
      {
      if(is.leaf(n))
        {
          a <- attributes(n)

          attr(n, "nodePar") <- c(a$nodePar, list(lab.col =
          col.list[[a$label]], lab.cex=1,
          col=col.list[[a$label]], cex=1, pch=16 ))
        }
      n
      }
    
    dend = as.dendrogram(hc)
    dend_colored <- dendrapply(dend, colLab,col.list)
    
    plot(dend_colored, main = paste(context, "methylation clustering"));
    # end of plot
    }
  return(hc)
  }
  
  
  
  setGeneric("clusterSamples", function(.Object, dist="correlation", method="ward",
                                      sd.filter=TRUE,sd.threshold=0.5,
                                      filterByQuantile=TRUE, plot=TRUE, my.cols=rainbow(length(unique(treatment)), start=1, end=0.6)
                                              standardGeneric("clusterSamples"))

#' @rdname clusterSamples-methods
#' @aliases clusterSamples,methylBase-method
setMethod("clusterSamples", "methylBase",
  function(.Object, dist, method ,sd.filter, sd.threshold, 
                   filterByQuantile, plot, my.cols)
  {
    mat      =getData(.Object)
    # remove rows containing NA values, they might be introduced at unite step
    mat      =mat[ rowSums(is.na(mat))==0, ] 
    
    meth.mat = mat[, .Object@numCs.index]/
      (mat[,.Object@numCs.index] + mat[,.Object@numTs.index] )                                      
    names(meth.mat)=.Object@sample.ids
    
    # if Std. Dev. filter is on remove rows with low variation
    if(sd.filter){
      if(filterByQuantile){
        sds=rowSds(as.matrix(meth.mat))
        cutoff=quantile(sds,sd.threshold)
        meth.mat=meth.mat[sds>cutoff,]
      }else{
        meth.mat=meth.mat[rowSds(as.matrix(meth.mat))>sd.threshold,]
      }
    }
    
    .cluster(meth.mat, dist.method=dist, hclust.method=method, 
             plot=plot, treatment=.Object@treatment,
             sample.ids=.Object@sample.ids,
             context=.Object@context, my.cols)
    
  }
)

(I'm sure this is overly simplistic and that you could come up with a better solution. No need to cite any contribution, but figured I'd share how I'm doing it.)


On Wednesday, August 28, 2013 8:09:54 AM UTC-4, Altuna Akalin wrote:
There is no easy way, colors are decided based on number of groups in the treatment vector, and selected from rainbow() palatte. 

It shouldn't be so hard to add an additional argument that takes a colorPalatte as input. If you have such a function that is working robustly we can add it to the next release and cite your contribution.

Best,
Altuna
On Wed, Aug 28, 2013 at 2:01 PM, <shanno...@gmail.com> wrote:
This is a silly question that I should be able to figure out on my own, but I figure asking will speed up the process since I've hit a wall in trying to figure it out.

I've pulled out the SampleClustering/PCA code and have altered it so that I can change the colors used in the plots; however, is there a simple way to change the colors used on the plots within the package? 

Thanks!

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.

Altuna Akalin

unread,
Aug 30, 2013, 5:53:07 AM8/30/13
to methylkit_...@googlegroups.com
Thank for sharing Shannon, looks OK to me at first glance.

Best,
Altuna


To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages