bamcompare log2 calculation

Varun Gupta

unread,

Feb 2, 2016, 12:41:57 PM2/2/16

to deepTools

Hi Fidel/Everyone,
I am using bamcompare to generate a normalized bigwig file. I am using log2 ratio as my method since this is a chip seq data with control and treatment.

I want to understand while calculating log2 ratio is it log2(READCOUNTS OF Treatment / READCOUNTS of controlin that bin region(50bp) or is it

RPKM OF TREATMENT/RPKM OF CONTROL in that bin size (50bp).

Hope to hear from you soon.

Regards
Varun

Devon Ryan

unread,

Feb 2, 2016, 1:54:49 PM2/2/16

to deep...@googlegroups.com

Hi Varun,

It uses the log2 of the ratio of the normalized counts. I should note that the ratio of the counts and the ratio of the RPKMs are identical after scaling (the K and M in RPKM are constants).

Devon

--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Devon Ryan, PhD
Bioinformatician / Data manager
Bioinformatics Core Facility
Max Planck Institute for Immunobiology and Epigenetics
Email: dpry...@gmail.com

Varun Gupta

unread,

Feb 2, 2016, 1:59:50 PM2/2/16

to deepTools

HI Devon,
What do you mean by normalized counts here??. I know in bamcompare there is a scaling step but M as in RPKM will be different for control sample and treatment. Does scaling takes that into account and then gives us normalized counts and then the ratio's are calculated??

Regards
Varun

Devon Ryan

unread,

Feb 2, 2016, 2:30:30 PM2/2/16

to deep...@googlegroups.com

Hi Varun

The purpose of scaling is to account for the differences in sequencing depth. After that, the Ms are the same across samples. Those normalized counts are then used for subsequent calculations.

Devon

Varun Gupta

unread,

Feb 2, 2016, 2:51:48 PM2/2/16

to deepTools

Hi Devon,
Thanks for your reply. I have another doubt. So in my bedgraph file obtained from bamcompare of control and treatment at a particular location which is this

1 1270825 1270850

bedgraph entry shows it's value as:

1 1270825 1270850 -2.00

Now there are no reads in the treatment and 2 reads in the control. How can we get the negative values as -2.00 since there are no reads for treatment and log2 is usually normalized treatment counts/normalized control count??
Might be something simple I am missing.

Regards
Varun

Devon Ryan

unread,

Feb 2, 2016, 2:58:25 PM2/2/16

to deep...@googlegroups.com

Note the --pseudocount option :)

This is a pretty common tactic, since otherwise you end up with infinities, which aren't particularly informative in and of themselves.

Devon

Varun Gupta

unread,

Feb 2, 2016, 3:21:28 PM2/2/16

to deepTools

--pseudocount PSEUDOCOUNT
small number to avoid x/0. Only useful when ratio =
log2 or ratio (default: 1)

So the default is 1 and hence if in my case treatment contains no count, then it is by default taken as 1. Right Devon??

Regards
VARUN

Devon Ryan

unread,

Feb 2, 2016, 3:27:23 PM2/2/16

to deep...@googlegroups.com

Right, though it's actually added in all cases (to both the numerator and denominator).

Devon

Varun Gupta

unread,

Feb 2, 2016, 3:34:43 PM2/2/16

to deepTools

Thanks Devon.

One more help. So I am still trying to understand how for the above coordinates I posted the value is -2.00

Total mapped reads for control is 1490897
Total mapped reads for treatment is 9315994

Scaling that bamcompare does is something like this

The scaling factors are:
[ 0.16003628 1. ]

Can you explain how the value for the above corrdinates are -2.00??

Thanks for all the help

Regards
Varun

Devon Ryan

unread,

Feb 2, 2016, 4:13:47 PM2/2/16

to deep...@googlegroups.com

In essence: log2(((number of reads in treatment/scaling factor) + 1) / ((number of reads in control/scaling factor)+1)), assuming the -2 is treatment vs. control. Since you have around 6x more reads in the treatment, its counts will be scaled down accordingly. If you really want more details I would recommend just reading the python code, it's pretty well commented.

Devon

Fidel Ramirez

unread,

Feb 2, 2016, 4:34:25 PM2/2/16

to Devon Ryan, deep...@googlegroups.com

Hi,

a little correction, the equation that is computed is:

log2((coverage treatment * scaling factor) + pseudocount / (coverage control * scaling factor) + pseudocount))

(see attached image)

In your case, since you say that the treatment has no reads over a particular bin, then the coverage of the control must have three reads such that:

log2((0 + 1) / (3 + 1))= -2

-fidel

Fidel Ramirez

Varun Gupta

unread,

Feb 2, 2016, 6:18:28 PM2/2/16

to deepTools, dpry...@gmail.com

HI Fidel,
Thanks for the explanation.
Taking the above example of coordinates ,
coordinates:

1 1270825 1270850 -2.00

I have only 2 reads in the control(one of the reads is 147bp long and is spanning more than 1 binsize since I gave the bin size to be 25 bp). I am attaching the IGV snapshot.

Is the scaling factor I use should be 0.16?

Total mapped reads for control is 1490897
Total mapped reads for treatment is 9315994

Scaling that bamcompare does is something like this

The scaling factors are:
[ 0.16003628 1. ]

How would you put the values in your equation?

Just want to make sure I understand this thing properly.

Thanks for your time and patience.

Regards
Varun

test.png

Fidel Ramirez

unread,

Feb 3, 2016, 1:27:43 AM2/3/16

to Varun Gupta, deepTools

Hi,

Which version of deepTools are you using?

The scaling factors are 0.16 for treatment and 1 for control

Fidel

<test.png>

Varun Gupta

unread,

Feb 3, 2016, 12:08:51 PM2/3/16

to deepTools, gupta5...@gmail.com

Hi Fidel,

This is the version of bamCompare I am using

bamCompare --version
bamCompare 1.5.11

Let me know what can be the issue, since I gave you the IGV snapshot also.

Also if a read is 150bp long and bin size is 25bp each then we count that read in all 5 bins right??

Regards

Varun

Fidel Ramirez

unread,

Feb 3, 2016, 12:24:20 PM2/3/16

to Varun Gupta, deepTools

Hi Varum,

By default deepTools 1.5.11 extends reads to match the fragment size. That is why in you case it counts 3 reads instead of 2. You can turn of this behaviour by setting --doNotExtendReads.

The new deepTools2 changed this behavior and read extension is off by default.

Best,

Fidel

Varun Gupta

unread,

Feb 4, 2016, 12:37:33 PM2/4/16

to deepTools, gupta5...@gmail.com

Hi Fidel,

deepTools 1.5.11 does not have the option of --doNotExtendReads in bamCompare.
So I went through some of the options of bamCompare and found that changing --fragmentLength to 0 will work. Since my reads are single end , if I change --fragmentLength to 0 the program won't extend the reads and then if I calculate the log2 values it produces the results I expect.

Also for the completeness of the answer, if the genome is divided into bins of 50bp each and a read spans 2 bins(read length can be 100bp or more), the read is counted in both the bins.

Although I have a question, if I provide the bin size to be 50bp specifically, why I see bin size to be around 150bp or 175 bp some times?? Any reason for that??

Thanks a lot for all the help Devon and Fidel.

R\egards
Varun

Fidel Ramirez

unread,

Feb 5, 2016, 4:02:20 AM2/5/16

to Varun Gupta, deepTools

Hi,

The option in deeptools-1.5 is called '--doNotExtendPairedEnds'. If you use a fragment length smaller than the read length, then no extension happens and the 'read length' is used but I recommend to use 1 instead of 0 because of some internal checks.

The bins are always 50bp. However, if an stretch of several bins have the same value then they are combined into one. This behaviour can be changed by modifying the code if you need that.

Again, this behavior changed in deeptools2 and extension is not done by default.