computeMatrix scale-regions option

1,378 views
Skip to first unread message

Aslihan....@mdc-berlin.de

unread,
Jun 18, 2014, 7:22:53 AM6/18/14
to deep...@googlegroups.com
Dear all,

I have a question regarding the computeMatrix tool. When using computeMatrix with the scale-regions option, are the scores from the bigwig file somehow normalized to the length of the regions? So it's not that some scaled regions will show higher signals just because they were much larger to begin with, right?

Kind regards,
Aslihan Karabacak

Fidel Ramirez

unread,
Jun 18, 2014, 8:22:49 AM6/18/14
to Aslihan....@mdc-berlin.de, deep...@googlegroups.com
Hi Asli,

computeMatrix takes care to avoid any sort of bias when in 'scale-regions' mode. An extreme example is when using the the sum of the binned values instead of the usual mean or median: in this case, larger bins will produce higher values. That's why when a region needs to be scaled computeMatrix always uses bins of equal length for all regions. 

Because the target number of bins is equal for all regions what happens is that for long regions the bins are separated from each other and for small regions the bins may overlap. 

I prefer to avoid the scaled-regions mode when looking at signals at TSS or TES of genes. It is better to make two or three heat maps (or profiles), one for the vicinity of TSS, other for the body (this one scaled) and another for the vicinity of  TES.

 Best,

Fidel


--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Fidel Ramirez

António Miguel de Jesus Domingues

unread,
Jun 25, 2014, 5:00:23 AM6/25/14
to deep...@googlegroups.com, Aslihan....@mdc-berlin.de
Hi Fidel,

also regarding the scale-matrix option, I am getting a warning message:

A region that is shorter than then bin size was found:

This happens for several of my regions. I am scaling to 3kb, and the bed contains regions from ~200-10.000bp. should the regions be scaled to a size larger than the largest region?

Best,
António

Fidel Ramirez

unread,
Jun 25, 2014, 6:18:25 AM6/25/14
to António Miguel de Jesus Domingues, deep...@googlegroups.com, Aslihan....@mdc-berlin.de
Hi Antonio,

If computeMatrix finds a  region that is smaller than the binSize it will complain. A solution is to make the bin size smaller (use the --binSize parameter). Other option is to plot the regions using the reference-point mode and then using as --referencePoint center. 


Best,

Fidel


--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Fidel Ramirez

Friederike Dündar

unread,
Jun 25, 2014, 6:43:06 AM6/25/14
to Fidel Ramirez, António Miguel de Jesus Domingues, deep...@googlegroups.com, Aslihan....@mdc-berlin.de
Just to avoid confusions: --binSize is a different parameter than --regionBodyLength.

The bin size refers to the non-overlapping, consecutive regions over which the scores will be averaged. The bin size will also determine how "coarse" or finely resolved your heatmap will look like (compare a heatmap with binSize 10 and binSize 100 and you will see what I mean). The bin size should definitely be smaller than the smallest region.

The region body length only determins how much the regions will be squeezed/extended. Just yesterday I came across a nice summary of what one should pay attention to when creating those metagene plots: http://www.nature.com/nsmb/journal/v21/n2/box/nsmb.2763_BX3.html

If you scale regions with a median size of 200 bp to a region body length of 10,000, this will stretch them out quite a bit and you might see artifacts. The same is true for squeezing 30 kb regions into 100 bp body length. I prefer to use the median size of my regions for the --regionBodyLength option, but if you have a very wide range, you might want to consider either to filter out outliers or to go for the referencePoint or to make separate plots for very large and very small regions.

My five cents,

Cheers,

Friederike

António domingues

unread,
Jun 25, 2014, 10:46:05 AM6/25/14
to Friederike Dündar, Fidel Ramirez, deep...@googlegroups.com, Aslihan....@mdc-berlin.de
Thank you Friederike and Fidel for the very detailed reply.


If you scale regions with a median size of 200 bp to a region body length of 10,000, this will stretch them out quite a bit and you might see artifacts. The same is true for squeezing 30 kb regions into 100 bp body length. I prefer to use the median size of my regions for the --regionBodyLength option, but if you have a very wide range, you might want to consider either to filter out outliers or to go for the referencePoint or to make separate plots for very large and very small regions.

Good point. I did consider that, and 3000 is the median of my regions. I just hadn't thought about removed outliers. Cheers for the tip.

Thanks also for the reference.

Best,
António
Reply all
Reply to author
Forward
0 new messages