Understanding computeMatrix

202 views
Skip to first unread message

Amira Kramdi

unread,
Jun 15, 2017, 8:16:57 AM6/15/17
to deepTools
Hi,

I want to check if I really understand how computeMatrix works in 'scale-regions' mode. Here's what I understood so far (after browsing the documentation/asked questions), please correct me if I am wrong:

- Given a length to which all regions will be scaled (-m option, default 1000bp) and a bin size (--binSize option, default:10bp), regions will be divided into N equal bins of binSize. For for the given default option, regions should be divided into 100 bins, right?
- Next, coverage is somehow summarized over each bin. So at this point we have a value for each bin
- Since the number of bins is equal for all regions, the distribution of bins depends on the region length: for long regions the bins are separated from each other and for small regions the bins may overlap.
- We end up with a matrix of 6+N column: the first 6 describe the region and the last N contain the values of each bin

I fully agree that bins should have the same size to avoid biased values due to size difference, but I am wondering what is the effect of separated bins in case of large regions? How are they separated? aren't we missing data where bins are missing?

Thanks!!

Amira

Devon Ryan

unread,
Jun 15, 2017, 8:25:39 AM6/15/17
to Amira Kramdi, deepTools
There's never space between the bins. In your example with 100 bins, a 1kb region has bins of 10 bases while a 2kb region has bins of 20 bases. The --binSize option here refers to the bin size after scaling the signal. 

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amira Kramdi

unread,
Jun 15, 2017, 9:12:17 AM6/15/17
to Devon Ryan, deepTools
Thanks for the quick response!

So basically, --binSize is there to define the number of bins N (which is equal for all regions), and the size of the bins depends on the length of the regions  (size = regionLength/N)

I guess I misunderstood Fidel Ramirez answer to one of the questions about computeMatrix (https://groups.google.com/forum/#!searchin/deeptools/computematrix$20may$20overlap|sort:relevance/deeptools/7sjw2PqAz1o/41U2fhssaA0J);

"computeMatrix takes care to avoid any sort of bias when in 'scale-regions' mode. An extreme example is when using the the sum of the binned values instead of the usual mean or median: in this case, larger bins will produce higher values. That's why when a region needs to be scaled computeMatrix always uses bins of equal length for all regions. 

Because the target number of bins is equal for all regions what happens is that for long regions the bins are separated from each other and for small regions the bins may overlap. "

It's clear now that "target bin size" means the final set of bins after scaling. But since we have the same number of bins everywhere, I still don't get the last statement about bins being separated or overlapping.. What am I missing?

(sorry for the double response)

To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Amira KRAMDI
Bioinformatician, CNRS

Institut de Biologie de l'Ecole Normale Supérieure (IBENS)
Team Colot
46 rue d'Ulm
75230 PARIS CEDEX 05
01.44.32.35.69

Devon Ryan

unread,
Jun 15, 2017, 2:04:14 PM6/15/17
to Amira Kramdi, deepTools

Presumably what Fidel wrote there was correct for earlier version of deepTools. Since deepTools 2 what I wrote in my reply is how things are handled.

-- 
Devon Ryan, PhD
Bioinformatician / Data manager
Bioinformatics Core Facility
Max Planck Institute for Immunobiology and Epigenetics
Email: dpry...@gmail.com

Amira Kramdi

unread,
Jun 16, 2017, 4:14:53 AM6/16/17
to Devon Ryan, deepTools
Got it, many thanks!
Reply all
Reply to author
Forward
0 new messages